Jump to contentJump to page navigation: previous page [access key p]/next page [access key n]
Applies to i2MassChroQ 0.4.61

7 i2MassChroQ and Quantitative Proteomics

This chapter describes in detail the way to prepare the work that will be carried over by the MassChroQ module.

7.1 Interface to the MassChroQ Quantitative Proteomics Module

While it is certainly possible to perform pretty thorough analyses by exploring data by way of peptide identification—protein inference scrutiny strategies, it is necessary to expand the boundaries of these strategies if quantitative proteomics projects are being developed. We have now integrated MassChroQ in i2MassChroQ, which makes it straightforward to perform quantitative proteomics work right after the identification—protein inference process.

The way the MassChroQ program is harnessed in i2MassChroQ is according to the following outline:

  • Open an i2MassChroQ project or load protein identification results files;

  • Configure all the aspects of the MassChroQ run in a specific MassChroQ configuration window;

  • Use i2MassChroQ to run the external MassChroQ software or have i2MassChroQ only write the file that MassChroQ uses to perform its quantitative proteomics task at a later stage and outside of i2MassChroQ.

7.1.1 Preparing sample associations for MassChroQ

Performing quantitative proteomics experiments most likely involves comparing samples between them. That means that most often multiple samples need to be associated into meaningful groups. Before going on with the MassChroQ configuration, it is thus necessary to first define the sample associations. In fact, since a given sample is actually a given LC-MS run, and that each MS run's data are then used to perform protein identifications, these assocations are performed between MS runs.

To perform a quantitative proteomics experiment, the very first step is to load either the protein identification results (see Section 3.4, “Loading the Protein Identification Results” or an xpip project file (see Section 3.5, “Loading i2MassChroQ projects”).

Once the protein identification results have been loaded (or the i2MassChroQ project file), the sample associations (that is, between MS run files) need to be performed by first clicking onto the View MS identification list button of the main program window (see Section 3.4.2, “Displaying the MS Identifications List”). The MS runs are displayed in a table and sample associations can be performed by right-clicking onto the cells of the Alignment group column label, as shown in Figure 7.1, “Defining sample associations for XIC alignments”.

Note
Note

The sample (MS run) associations are critical not only because one wants to compare quantitative data about somehow related samples, but also because of the way MassChroQ performs quantification of proteomics data. Indeed, MassChroQ uses not spectral count-based strategies but an area under the curve strategy where the area of mass peaks is determined by looking at XIC chromatograms for these mass peaks. The associations will thus allow the software to perform the alignment of the XIC chromatograms that will be essential for the quantification analysis. Indeed, even LC-MS runs of an identical sample will not provide identical (m/z,retention time) pairs. But, to be able to quantify proteomics data on the basis of the area under the curve of XIC chromatogram peaks, it is necessary that all the XIC chromatograms for all the associated samples be properly aligned.

The associations between samples can be performed in any arbitrary way, according to the user's experimental scheme. Any number of groups can be defined that may contain any number of samples. The process is described in Figure 7.1, “Defining sample associations for XIC alignments” and Figure 7.2, “Sample associations are done by grouping samples into groups”.

Defining sample associations for XIC alignments

By right-clicking into the cells of the Alignment groups column, groups can be defined and samples can be associated to the groups.

Figure 7.1: Defining sample associations for XIC alignments
Sample associations are done by grouping samples into groups

Three groups have been defined, two groups having two samples each and one group having only one sample.

Figure 7.2: Sample associations are done by grouping samples into groups
Tip
Tip: Sample associations with specific sample sets

Sample associations play a critical role when samples (that is, MS runs) have conceptual relationships. For example, let's assume that a project used polyacrylamide gel electrophoresis as a protein separation method. Five related samples (be them biologically-relevant variants or technical replicates, for example) have been loaded onto five different lanes of the gel. The migration pattern between the five lanes is very similar and one could observe reproducible bands (albeit with different intensities) from one lane to the other, say, in sample 1 a band A, below a band B and so on. Sample 2 would also have that pattern, with a band A and a band B, and the same for the remaining samples (that is, lanes). Bands would be excised and subjected to trypsin digestion, the peptides would be extracted and analysed by mass spectrometry. The sample associations, here, would typically involve the definition of groups that associate related horizontal bands on the gel. For example, group A would associate all the bands A from the five samples, group B would associate all the bands B from the samples and so on. The sample associations would thus allow the quantification and comparison of kin proteins from the various samples.

The alignement of XIC chromatograms computed for samples from a given association group is performed by having one reference sample in that group. Each group must have a reference sample. The definition of the reference sample can be performed by the user at this stage (or at a later stage, described later) by using the context menu shown in Figure 7.3, “Setting the reference sample for the alignment”.

Setting the reference sample for the alignment

Use the context menu by right-clicking on the cells in the Alignment groups column to set the alignment reference in each group.

Figure 7.3: Setting the reference sample for the alignment
Note
Note

Selecting the proper alignment reference is not something to do without thinking because the reference sample will serve as the basis for the alignement of all the samples in the group. The best sample to be chosen as alignement reference is the sample that shares the most precursor ions' m/z values with all the other samples. It is possible to delegate to i2MassChroQ the choice of the alignment reference sample, as described later.

Now that the sample associations have been performed, the next step is to configure MassChroQ from within i2MassChroQ. This is described in the next sections.

7.1.2 Configuration of MassChroQ

i2MassChroQ provides an interface to MassChroQ, the software that performs XIC extractions for a list of precursor ions' m/z values. That interface is shown by selecting the MassChroQ menu item of the main File menu. The window that opens up is shown in Figure 7.4, “The MassChroQ interface window (Sample associations)”, and is described below.

The MassChroQ interface window (Sample associations)

This window offers an interface to the MassChroQ program. The Sample associations tab allows one to define groups of samples that will be processed together. All the configurations in the tabs are described in the sections below.

Figure 7.4: The MassChroQ interface window (Sample associations)

7.1.2.1 The Sample associations Tab

This tab allows one to configure the sample associations. The window state shown in Figure 7.4, “The MassChroQ interface window (Sample associations)” corresponds to a situation in which the user did not define sample associations according to the way described in Section 7.1.1, “Preparing sample associations for MassChroQ. In this case, it is assumed that the user wants to treat all the samples as a single group, (the All_samples group). To reveal all the samples (that is, MS runs) that are being handled, check the All_samples check button, which will associate all the samples in that single group and display them in the right hand side list widget, as shown in Figure 7.5, “The MassChroQ interface window (Sample associations) - all samples listes”.

The MassChroQ interface window (Sample associations) - all samples listes

By checking the All_samples check box on the left hand side list widget, all the samples in the project are associated in a single All_samples group and displayed in the list widget on the right hand side of the window.

Figure 7.5: The MassChroQ interface window (Sample associations) - all samples listes

If the user has crafted groups of associated samples, as described in Section 7.1.1, “Preparing sample associations for MassChroQ, the window displays different settings at start (see Figure 7.6, “The MassChroQ interface window (Sample associations) - pre-defined sample associations ”).

The MassChroQ interface window (Sample associations) - pre-defined sample associations

When the sample associations were defined before opening the MassChroQ interface window (the inserted window corresponds to Section 7.1.1, “Preparing sample associations for MassChroQ), the groups of associated samples are displayed in the list widget on the left hand side of the window. Selecting group names in that list allows one to display the samples associated in a given group. To include a group in the MassChroQ computations, check the corresponding check box widget.

Figure 7.6: The MassChroQ interface window (Sample associations) - pre-defined sample associations

To verify which samples are being associated in a given group, select that group in the list widget on the left hand side of the window.

To make sure a given group is going to be accounted for by i2MassChroQ during the preparation of the file that lists all the precursor ions' peaks for which the XIC extractions needs to be performed at a later stage by MassChroQ, check the corresponding check box.

The Check MS run data files button allows the user to make sure that all the samples associated in the various groups can be found as mass spectrometry data files (mzML or mzXML files). This is a hard requirement because MassChroQ does the quantification of peptide mass spectrometric signals by extracting ion current for the peptide's precursor ion (XIC extraction). For this to be possible, the software needs to access the mass spectrometry data files.

The Reference sample drop-down list widget allows one to select the alignment reference sample for the currently selected sample association group in the left hand side list. The alignment reference sample must be chosen with care, as explained in Figure 7.3, “Setting the reference sample for the alignment”.

Tip
Tip

If the selection of an alignment reference sample is not possible, the user might ask i2MassChroQ to search for it by clicking the Find the best reference sample button. i2MassChroQ will look into all the sample files associated in the current group and search for the sample that shares the maximum number of precursor ions with all the other samples. The discovered MS run file is then set to the drop-down list widget.

The Results format drop-down list widget allows the user to select the kind of format that the quantification results should be written in. The ODS format is the standard format for the LibreOffice software suite. The TSV format is a tab-separated values text format.

The Compare samples switch indicates if the results output file should display a low-details version of the data but arranged in a manner that allows the user to easily compare the quantification data about the various samples.

7.1.2.2 The Alignment Tab

This tab allows one to configure the way the XIC chromatograms obtained for the different associated samples are aligned (see Section 7.1.1, “Preparing sample associations for MassChroQ) as shown in Figure 7.7, “The MassChroQ interface window (Alignment)”.

The MassChroQ interface window (Alignment)

This tab configures the way i2MassChroQ performs the XIC chromatograms alignment between associated samples in the various groups. If the user is interested in the results of the alignment, the XIC retention time corrections can be stored in the directory specified at Store time corrections in this directory for later scrutiny.

Figure 7.7: The MassChroQ interface window (Alignment)

The MS2 alignment parameters group box widget gathers parameters that are critical to the XIC chromatogram alignment algorithm for all the samples associated in a given group, as described below.

  • MS2 tendency: half size of the window used to apply a moving median on the MS/MS retention time deviation curve. Used to create the tendency deviation curve. Of course the appropriate value for this window depends on the number of identified peptides that the two runs (reference run and run being aligned) have in common. Usually a good value is 10. While aligning, MassChroQ outputs the number of peptides in common which can be used to readjust this parameter if necessary.

  • MS smoothing: half size of the window used to apply a moving average on the MS/MS retention time deviation curve. Smooths the deviation curve. Same as the above parameter, usually a good value is 10.

  • MS1 smoothing: half size of the window used to apply a moving median on the MS retention time corrections curve. This smoothing parameter is optional, and it is not necessary most of the time. It could be used in place of the MS2 smoothing parameter in cases of a small number of shared identified peptides (< 100), in which case a good value is 20.

7.1.2.3 The Peak quantification Tab

This tab allows one to configure the way the peaks in the XIC chromatograms are evaluated from a quantification stand point. These parameters need some testing as they might depend on the instrument whence the data originated.

The MassChroQ interface window (peak quantification)

This tab configures the way i2MassChroQ performs the evaluation of the peaks in the XIC chromatograms from a quantification stand point. The settings in this dialog window might need some tweaking as they might depend on the instrument whence the data originated.

Figure 7.8: The MassChroQ interface window (peak quantification)
  • XIC extraction parameters: these parameters govern the way the program searches for m/z values in the mass spectral data.

    • XIC range: the m/z width (mass tolerance) for searching m/z values in the mass data during the XIC extraction. Units can be part-per-million (ppm), resolution (res) or Dalton (dalton). The wider the window, the rougher the XIC extraction. This value typically depends on the resolving power of the instrument that acquired the data.

    • Inside the range, take the: once the m/z window has been located in the mass spectral data, it will contain a nubmer of points. This settings determines what kind of signal intensity to compute for the m/z window (that is, what to do with the m/z points contained in the m/z window). If maxis selected, only the max-intensity point in the m/z window is used as the signal intensity corresponding to the m/z window. If sum is selected, the sum of the intensities of all the m/z points in the window is used.

  • Peak detection parameters: these parameters govers the way the program detects peaks.

    • smoothing: number of points around the point being considered in the XIC chromatogram. If set to one, the rolling window will contain three points: one before the considered point, one after it and the considered point itself. This setting thus determines the width of the rolling window that is used to iterate in the XIC chromatogram in search for peaks. This window, whatever the setting, will shift by one point at each iteration in the XIC chromatogram.

    • minmax half window: the half window size used to apply the close (min/max) transform on the XIC intensities. This window determines the number of scan points over which two peaks will be considered separately, otherwise they would have been merged. A good half window value is usually 3 (which makes a window of 7).

    • maxmin half window: same as above but for the close (max/min) transform. This window determines the minimum peak width (in scan points number) below which the peak would not be detected. A good half window value is usually 2 (which makes a window of 5).

    • minmax threshold: threshold on the close signal: a minimum intensity value below which peaks are not detected on the closing signal. This threshold is usually two or three times the background noise intensity level, which depends on your mass spectrometer.

    • maxmin threshold: threshold on the open signal: a minimum intensity value below which peaks are not detected. It corresponds to the opening signal upper limit and it represents the background signal upper level. A good value would thus be slightly bigger than your background noise intensity level.

7.1.2.4 The MassChroQ Tab

This tab allows one to configure the way MassChroQ actually performs the quantification (if using the Run MassChroQ) or the way i2MassChroQ writes the masschroqml file to be fed to the MassChroQ program.

The MassChroQ interface window (MassChroQ)

This tab configures the way either MassChroQ actually performs the quantification or i2MassChroQ writes the masschroqml file that MassChroQ will be fed with to perform the task.

Figure 7.9: The MassChroQ interface window (MassChroQ)
  • Edit MassChroQ execution: activate the check button to use the directory icon to locate the MassChroQ program on disk. The full path to the program will be printed in the line edit widget next to the icon.

  • Run MassChroQ through HTCondor: activate the check button to set the memory requirements for HTCondor.

  • MassChroQ parameters: these settings govern the actual MassChroQ quantification process:

    • Number of CPUs: set the number of central processing units that MassChroQ is allowed to use (these are actually called threads).

    • Temporary directory: use the directory icon to select a specific temporary directory where MassChroQ will write processing-related data. By default the directory is /tmp/. The temporary files are eliminated when no more used.

    • Use the temporary directory to store detected peaks: if checked, the detected peaks might be stored in files in the temporary directory described above. This can be construed as a swap area where to store peaks data if the available memory is insufficient.

7.1.2.5 Saving the File and Optionally Running MassChroQ

Once all the configuration has been done, the user can either only save the masschroqml file by clicking on the Save File button or immediately start MassChroQ by clicking on the Run MassChroQ button.

Note
Note

Even if the user decides to go down the direct Run MassChroQ route, the program will ask to save the masschroqml file. This is because that file is read by MassChroQ when i2MassChroQ internally calls it to run the quantification process.

The masschroqml file describes the proteins and peptides that were retained during the protein identification results analysis session. The contents of the file are shown in Figure 7.10, “Contents of the masschroqml file”.

Contents of the masschroqml file

The masschroqml file contains all the required data and configuration bits to perform the XIC extractions for all the peptidic precursor ions that allowed identifying proteins. This file is read by the MassChroQ program. (In this screen dump, the file contents were obviously redacted for brevity.)

Figure 7.10: Contents of the masschroqml file