ms2pip

MS2PIP: Accurate and versatile peptide fragmentation spectrum prediction.

ms2pip.predict_single(peptidoform, model='HCD', model_dir=None)[source]

Predict fragmentation spectrum for a single peptide.

Parameters:
Return type:

ProcessingResult

ms2pip.predict_batch(psms, add_retention_time=False, psm_filetype=None, model='HCD', model_dir=None, processes=None)[source]

Predict fragmentation spectra for a batch of peptides.

Parameters:
  • psms (PSMList | str | Path) – PSMList or path to PSM file that is supported by psm_utils.

  • psm_filetype (str | None) – Filetype of the PSM file. By default, None. Should be one of the supported psm_utils filetypes. See https://psm-utils.readthedocs.io/en/stable/#supported-file-formats.

  • add_retention_time (bool) – Add retention time predictions with DeepLC (Requires optional DeepLC dependency).

  • model (str | None) – Model to use for prediction. Default: “HCD”.

  • model_dir (str | Path | None) – Directory where XGBoost model files are stored. Default: ~/.ms2pip.

  • processes (int | None) – Number of parallel processes for multiprocessing steps. By default, all available.

Returns:

predictions – Predicted spectra with theoretical m/z and predicted intensity values.

Return type:

List[ProcessingResult]

ms2pip.predict_library()[source]

Predict spectral library from protein FASTA file.

ms2pip.correlate(psms, spectrum_file, psm_filetype=None, spectrum_id_pattern=None, compute_correlations=False, add_retention_time=False, model='HCD', model_dir=None, ms2_tolerance=0.02, processes=None)[source]

Compare predicted and observed intensities and optionally compute correlations.

Parameters:
  • psms (PSMList | str | Path) – PSMList or path to PSM file that is supported by psm_utils.

  • spectrum_file (str | Path) – Path to spectrum file with target intensities.

  • psm_filetype (str | None) – Filetype of the PSM file. By default, None. Should be one of the supported psm_utils filetypes. See https://psm-utils.readthedocs.io/en/stable/#supported-file-formats.

  • spectrum_id_pattern (str | None) – Regular expression pattern to apply to spectrum titles before matching to peptide file spec_id entries.

  • compute_correlations (bool) – Compute correlations between predictions and targets.

  • add_retention_time (bool) – Add retention time predictions with DeepLC (Requires optional DeepLC dependency).

  • model (str | None) – Model to use for prediction. Default: “HCD”.

  • model_dir (str | Path | None) – Directory where XGBoost model files are stored. Default: ~/.ms2pip.

  • ms2_tolerance (float) – MS2 tolerance in Da for observed spectrum peak annotation. By default, 0.02 Da.

  • processes (int | None) – Number of parallel processes for multiprocessing steps. By default, all available.

Returns:

results – Predicted spectra with theoretical m/z and predicted intensity values, and optionally, correlations.

Return type:

List[ProcessingResult]

ms2pip.get_training_data(psms, spectrum_file, psm_filetype=None, spectrum_id_pattern=None, model='HCD', ms2_tolerance=0.02, processes=None)[source]

Extract feature vectors and target intensities from observed spectra for training.

Parameters:
  • psms (PSMList | str | Path) – PSMList or path to PSM file that is supported by psm_utils.

  • spectrum_file (str | Path) – Path to spectrum file with target intensities.

  • psm_filetype (str | None) – Filetype of the PSM file. By default, None. Should be one of the supported psm_utils filetypes. See https://psm-utils.readthedocs.io/en/stable/#supported-file-formats.

  • spectrum_id_pattern (str | None) – Regular expression pattern to apply to spectrum titles before matching to peptide file spec_id entries.

  • model (str | None) – Model to use as reference for the ion types that are extracted from the observed spectra. Default: “HCD”, which results in the extraction of singly charged b- and y-ions.

  • ms2_tolerance (float) – MS2 tolerance in Da for observed spectrum peak annotation. By default, 0.02 Da.

  • processes (int | None) – Number of parallel processes for multiprocessing steps. By default, all available.

Returns:

pandas.DataFrame with feature vectors and targets.

Return type:

features

ms2pip.annotate_spectra(psms, spectrum_file, psm_filetype=None, spectrum_id_pattern=None, model='HCD', ms2_tolerance=0.02, processes=None)[source]

Annotate observed spectra.

Parameters:
  • psms (PSMList | str | Path) – PSMList or path to PSM file that is supported by psm_utils.

  • spectrum_file (str | Path) – Path to spectrum file with target intensities.

  • psm_filetype (str | None) – Filetype of the PSM file. By default, None. Should be one of the supported psm_utils filetypes. See https://psm-utils.readthedocs.io/en/stable/#supported-file-formats.

  • spectrum_id_pattern (str | None) – Regular expression pattern to apply to spectrum titles before matching to peptide file spec_id entries.

  • model (str | None) – Model to use as reference for the ion types that are extracted from the observed spectra. Default: “HCD”, which results in the extraction of singly charged b- and y-ions.

  • ms2_tolerance (float) – MS2 tolerance in Da for observed spectrum peak annotation. By default, 0.02 Da.

  • processes (int | None) – Number of parallel processes for multiprocessing steps. By default, all available.

Returns:

results – List of ProcessingResult objects with theoretical m/z and observed intensity values.

Return type:

List[ProcessingResult]

ms2pip.download_models(models=None, model_dir=None)[source]

Download all specified models to the specified directory.

Parameters:
  • models (List[str] | None) – List of models to download. If not specified, all models will be downloaded.

  • model_dir (str | Path | None) – Directory where XGBoost model files are to be stored. Default: ~/.ms2pip.