Usage
Quickstart
[todo]
Usage modes
MS²PIP has various usage modes that each can be accessed through the command-line interface, or through the Python API.
predict-single
In this mode, a single peptide spectrum is predicted with MS²PIP and optionally plotted with spectrum_utils. For instance,
ms2pip predict-single "PGAQANPYSR/3" --model TMT
results in:
predict-batch
Provide a list of peptidoforms (see Peptides / PSMs) to predict multiple spectra at once. For instance,
ms2pip predict-batch peptides.tsv --model TMT
results in a file test_predictions.csv
with the predicted spectra.
predict-library
Predict spectra for a full peptide search space generated from a protein FASTA file. Various
peptide search space parameters can be configured to control the peptidoforms that are generated.
See ms2pip.search_space
for more information.
This mode was first developed in collaboration with the ProGenTomics group for the MS²PIP for DIA project.
correlate
Predict spectrum intensities for a list of peptides and correlate them with observed intensities from a spectrum file. This mode is useful for evaluating MS²PIP models or for (re)scoring peptide-spectrum matches.
get-training-data
Given a list of peptides and corresponding spectra, generate training data for MS²PIP. This includes observed intensities for the supported ion types and the feature vectors for each ion. For more info, see Training new MS²PIP models.
annotate-spectra
Given a list of peptides annotate the peaks in the corresponding spectra.
Input
Peptides / PSMs
PSM file types
For peptide information input, MS²PIP accepts any file format that is supported by
psm_utils
.See
Supported file formats for
the full list. The simplest format is a tab-separated file with at least the columns
peptidoform
and spectrum_id
present.
peptidoform
is the full ProForma 2.0 notation including amino acid modifications and precursor charge state.spectrum_id
should match theTITLE
ornativeID
field of the related spectrum in the optional MGF or mzML file, if provided. Otherwise, any value is accepted.
For example:
peptidoform spectrum_id
RNVIM[Oxidation]DKVAK/2 1
KHLEQHPK/2 2
...
See psm_utils.io.tsv
for the full specification.
Peptide sequence properties
Peptides must be strictly longer than 2 and shorter than 100 amino acids and cannot contain the following amino acid one-letter codes: B, J, O, U, X or Z. Peptides not fulfilling these requirements will be filtered out and will not be reported in the output.
Amino acid modifications
Amino acid modification labels must be resolvable to a known mass shift. This means that accepted labels are:
A name or accession from an controlled vocabulary, such as Unimod or PSI-MOD. (e.g.,
Oxidation
,U:Oxidation
,U:35
,MOD:00046
…)An elemental formula (e.g,
Formula:C12H20O2
)A mass shift in Da (e.g.,
+15.9949
)
Any unresolvable modification will result in an error. If needed, PSM files can be converted with
psm_utils.io
and modifications can be renamed with the
rename_modifications()
method.
Spectrum file
In the correlate and get-training-data usage modes, an MGF or mzML file with observed
spectra must be provided to MS²PIP. Make sure that the PSM file spectrum_id
matches the MGF
TITLE
field or mzML nativeID
fields. Spectra present in the spectrum file, but missing in
the PSM file (and vice versa) will be skipped.
Output
The predictions are saved in the output file(s) specified command. Note that the normalization of intensities depends on the output file format. In the CSV file output, intensities are log2-transformed. To “unlog” the intensities, use the following formula:
intensity = (2 ** log2_intensity) - 0.001