This chapter describes the very few specific procedures to carry out when the proteomics data at hand are from the Bruker timsTOF line of instruments.
The mass spectrometers from the timsTOF line of instruments by Bruker implement ion mobility mass spectrometry by trapping ions and subjecting them to a gas flow that moves them in the trap according to their collisional cross section. Interestingly, the outcome of that operation is the reverse of conventional drift tube-based ion mobility: large ions are released first and smaller ions are released last.
Apart from the observation above, the result is nonetheless that ions entering the instrument are separated according to a new dimension that is orthogonal to the other two retention time and m/z ratio dimensions: the ion mobility dimension. This new dimension inevitably introduces more complexity and greater volume to the mass data.
In a historic move, Bruker has decided to publish the technical details of
their data format. During the acquisition, data are stored in two separate
files located in their data directory that has the .d
extension:
analysis.tdf
: this file is a
SQLite3 relational database that contains
all the metadata about the acquisition. The generic
metadata term defines data that describe the
actual data. So this file contains data that explain how the data are
organized in the actual data file below.
analysis.tdf_bin
: this file is a binary-format
file that holds the data in the form of a succession of numbers packed
according to a specific scheme that Bruker has decided to make public.
The SQLite3 analysis.tdf
file contains a set of tables. Most often, records in one table make
reference (relate) to other record(s) in other
table(s). This is why this database file is said to be a
relational database. A view of the tables making that
relational database file is shown in Figure 8.1, “View of the relational database file”.
When dealing with proteomics projects that have their data originating in timsTOF instruments, some specific steps are to be taken so as to inform i2MassChroQ that specific handling is required. These will be reviewed below in the same succession as they need be implemented when running i2MassChroQ.
It is possible to load Bruker timsTOF data right in the i2MassChroQ program's graphical user interface, as shown in the window pictured in Figure 3.2, “X!Tandem-based identification configuration”. The very first specific step to take in this case is to select the data files by clicking the Add Bruker timsTOF folders.
The Add Bruker timsTOF folders button lets the user
choose the Bruker data directory (.d
extension) and then asks if the data to be
loaded are in the TDF or MGF format (see Figure 8.2, “File format selection dialog for Bruker timsTOF data”).
Indeed, the MGF file generated by the Bruker software is automatically
installed in the .d
extension
data directory.
The DataAnalysis software from Bruker allows one to export proteomics MS/MS data into MGF format files (Mascot generic format). Their native data format, though, is the TDF format. It is important to keep in mind that the MGF format only stores MS/MS spectral data, no MS data. By using this format, i2MassChroQ and MassChroQ won't be able to access MS data, which are required in a number of situations, in particular when extracting ion currents for given m/z ratios, for example, or for area-under-the-curve quantitative proteomics. It is thus always recommended to use the native TDF format whenever available.
When loading Bruker timsTOF data right into i2MassChroQ as described above, the software performs some under-the-hood operations that the user might want to be aware of. The hidden operations are unveiled in the following sections, as they involve command line programs shipped along with i2MassChroQ that might be of interest to the user.
X!Tandem needs the mass spectrometric data that it uses for the database searches to be in the mzXML format. For this very reason, i2MassChroQ cannot work by harnessing the capabilities of X!Tandem starting from Bruker timsTOF data. These data need to be converted to an mzXML file before they can then be fed to X!Tandem.
In order to be able to store a mzXML file on disk, the user may convert timsTOF data files (TDF or MGF) to mzXML using the mzxmlconverter program that is shipped along with i2MassChroQ. This program is a command line program that takes a data file in input and that writes a mzXML file in output. The command line syntax is easy, as picture in Figure 8.3, “Converting mass data files to mzXML data files”.
To obtain help about the program, run the following:
$
[16]
<path_to>/mzxmlconverter --help
RETURN
It is important to grasp that the mzXML file that is generated by mzxmlconverter does not contain all the mass data that are contained in the Bruker's timsTOF data files. When loading the produced mzXML format files in i2MassChroQ, the X!Tandem program will be able to perform peptide and protein identifications. Later, however, when the user will try to activate features in i2MassChroQ that require the original data, the software will not be able to provide the expected results, like XIC reports or ion mobility values, because it won't have access to the original data.
The mzxmlconverter program is practical because it allows storing the mzXML file on disk and loading it in i2MassChroQ for X!Tandem to consume it for the identifications. However, as stated in the warning above, that mzXML file has not a full copy of the data in the original mass spectrometry data file (be that a mzML, or MGF, or TDF file). i2MassChroQ has a solution for this problem: using an integrated workflow to convert the original data file to mzXML, make X!Tandem use it, write out the X!Tandem results file and finally rewrite that file into a new version by adding a connection between the X!Tandem run results and the original mass data file. In this way, when the user will activate features in i2MassChroQ that need accessing the original mass data file, the expected results will be effectively displayed. This process is described in detail in the next section.
In order to seamlessly use Bruker timsTOF data in the context of performing X!Tandem identifications and later MassChroQ-based quantifications, the tandemwrapper program is made available to users who want to perform database searches using the command line interface. The tandemwrapper program performs an under-the-hood file format conversion as described in the previous section before automatically feeding the generated mzXML file to X!Tandem. After X!Tandem has produced its results file, that file is rewritten by tandemwrapper in such a manner that the connection with the original mass spectrometry data files is reinstated for further use in the i2MassChroQ graphical interface.
There is, however, a way to convert Bruker timsTOF data files to mzXML using a standalone program, called tandemwrapper, that is shipped along with i2MassChroQ. The tandemwrapper program is a command line program that takes as input a XML configuration file. The XML file is most similar to the configuration file that X!Tandem uses.
To obtain help about the program, run the following:
$
<path_to>/tandemwrapper --help
RETURN
A typical tandemwrapper input configuration file is shown below:
<?xml version="1.0" encoding="UTF-8"?> <bioml label="example-tandemwrapper-mass-data-file.mzxml"> <note type="heading">Paths</note> <note type="input" label="list path, default parameters">/full_path_to/xtandem-presets-file.xml</note> <note type="input" label="list path, taxonomy information">/full_path_to/database.xml</note> <note type="input" label="spectrum, path">full_path_to/mass-data-file.mzxml</note> <note type="heading">Protein general</note> <note type="input" label="protein, taxon">usedefined</note> <note type="heading">Output</note> <note type="input" label="output, path">/full_path_to/tandemwrapper-output.xml</note> </bioml>
The XML configuration file that is provided to tandemwrapper on the command line is a replicate of the file that X!Tandem itself expects. That file is shown in Figure 8.4, “Configuration file for tandemwrapper”.
The following elements need an explanation:
default parameters: In the example, the
/full_path_to/xtandem-presets-file.xml
file is the X!Tandem
presets file, already discussed in Section 3.3, “Setting the X!Tandem Run Presets”.
taxonomy information: In the example, the
/full_path_to/database.xml
file is the file that
configures the location of the FASTA protein database files that are
searched by X!Tandem. This file is described below.
path: In the example, the
/full_path_to/mass-data-file.mzxml
file is the mass
spectrometry data file in the mzXML format.
The mass spectrometry data file might be of any format that can be handled by ProteoWizard (open data formats only, particularly mzML) and also the Bruker's timsTOF TDF format that is handled by our own code.
output, path: In the example, the
/full_path_to/tandemwrapper-output.xml
file is the
file in which the X!Tandem configuration file is written for immediate
use by X!Tandem.
The configuration file that indicates where the FASTA protein database files are located, that is referenced in the tandemwrapper input configuration file is shown in Figure 8.5, “Configuration file pointing at the FASTA protein databases”.
The tandwrapper program performs the following tasks in sequence:
Convert the input mass data file to the mzXML file format that X!Tandem needs to perform the database searches. This step is only performed if the original mass spectrometry data file has not the mzXML format. X!Tandem produces an identification results file in an XML format;
The mzXML format that is consumed by X!Tandem is a pretty simple format that was not designed to store a large variety of data/metadata, like ion mobility data, for example. For this very reason, tandemwrapper reads the identification results file produced by X!Tandem (that file is also in a specific XML format) and rewrites it to a new analogous file that has all the necessary connections to the original mass data file. In this way, when the new version of the X!Tandem identification results file is loaded in i2MassChroQ all the original mass data can be accessed to provide the user with all the data, like XIC chromatograms, ion mobility data, for example. To load the X!Tandem identification results, see Section 3.4, “Loading the Protein Identification Results”.