The masschroqML format is the format of the input file to MassChroQ. This is an XML format, whose schema can be found here. It is in this input file that the user indicates all the data to analyse, the alignments to be performed on them, the different analysis parameters, all that MassChroQ needs to know to analyse these data. In our Download page we provide different masschroqML example files, corresponding to different analysis cases. To run an analysis, the user should take one of these files, edit it and modify it in order to fit to its data. I have LC-MS/MS raw files coming from my spectrometer that I want to process with MassChroQ. How do I proceed?
The classical workflow of LC-MS/MS data processing can be resumed in the following steps:
-
Conversion of data from RAW format to mzXML or mzML file format.
The RAW format is a proprietary closed binary format which cannot be decoded without the constructor libraries. On the contrary, mzXML and mzML formats are open standard proteomics formats, the information they contain is fully transparent. MassChroQ accepts only mzXML and mzML data formats. To convert your raw data to one of these formats, several free and open source tools exist. The Seattle Proteome Center lists all possible alternatives for specific spectrometer types on this webpage.
-
Protein identification and validation
MassChroQ does not perform protein identification and validation, but facilitates automatic export of identification data in your masschroqML input file. See the dedicated FAQ question for details.
-
Preparing the masschroqML input file to MassChroQ
If you are not using the X!Tandempipeline that exports your identification results in a ready-to-use masschroqML file, you should take one of the example masschroqML input files from our Documentation page. You should edit this file and modify the following lines:
- the data_file lines should point to your mzXML/mzML data to analyse;
- you should check and modify the groups block in order to form the groups that best fit your analysis (see dedicated question on groups);
- the peptide_files lines should point to your peptide identification result files;
- you should adapt the parameters of the alignment_method and quantification_method lines;
- check that the align lines references your groups, the desired reference sample in each group and the desired previously defined alignment_method;
- check that the quantify lines references your groups, the desired quantification mode and the desired previously defined quantification_method;
- choose the desired formats and file names for the results in the quantification_results block;
- choose the desired type of traces and output directory for the XIC traces in the quantification_traces block.
-
Launch MassChroQ, check execution and results
Launch the masschroq command on the previously edited masschroqML file. Check for execution messages on the command line. If an error occurs (“Oops, an error occurred in MassChroQ” message), most of the time it is because there is an error in the masschroqML file. The error message contains the line where the error is located, please check carefully this line and look for possible error causes (for example a reference to a bad sample or group name). Most of the time the masschroqML syntax errors are due to bad file names (not existing in the computer), bad references (to sample ou group names that do not exist), or to tag mismatches. A tag mismatch error in an xml syntax error, you have probably forgotten to close a, xml block or an attribute. In such cases you should take a deeper look at the masschroqML syntax in the examples or the masschroqML schema we provide. If no errors are encountered, when MassChroQ finishes you will find the following spreadsheet files in your system: the .time and .trace files which are produced by the alignment; the quantitative results ready for statistical analysis; the XIC traces: one trace file per XIC, containing the XIC-s before and after filtering, the detected and the matched peaks. To process these data and to check your results, you can run your own scripts or pipelines on them: these are spreadsheet files. We also provide some utility R and perl scripts that can help you with this.