ms2pip
MS2PIP: Accurate and versatile peptide fragmentation spectrum prediction.
- ms2pip.predict_single(peptidoform, model='HCD', model_dir=None)[source]
Predict fragmentation spectrum for a single peptide.
- Parameters:
peptidoform (Peptidoform | str) –
model (str | None) –
- Return type:
- ms2pip.predict_batch(psms, add_retention_time=False, psm_filetype=None, model='HCD', model_dir=None, processes=None)[source]
Predict fragmentation spectra for a batch of peptides.
- Parameters:
psms (PSMList | str | Path) – PSMList or path to PSM file that is supported by psm_utils.
psm_filetype (str | None) – Filetype of the PSM file. By default, None. Should be one of the supported psm_utils filetypes. See https://psm-utils.readthedocs.io/en/stable/#supported-file-formats.
add_retention_time (bool) – Add retention time predictions with DeepLC (Requires optional DeepLC dependency).
model (str | None) – Model to use for prediction. Default: “HCD”.
model_dir (str | Path | None) – Directory where XGBoost model files are stored. Default: ~/.ms2pip.
processes (int | None) – Number of parallel processes for multiprocessing steps. By default, all available.
- Returns:
predictions – Predicted spectra with theoretical m/z and predicted intensity values.
- Return type:
List[ProcessingResult]
- ms2pip.correlate(psms, spectrum_file, psm_filetype=None, spectrum_id_pattern=None, compute_correlations=False, add_retention_time=False, model='HCD', model_dir=None, ms2_tolerance=0.02, processes=None)[source]
Compare predicted and observed intensities and optionally compute correlations.
- Parameters:
psms (PSMList | str | Path) – PSMList or path to PSM file that is supported by psm_utils.
spectrum_file (str | Path) – Path to spectrum file with target intensities.
psm_filetype (str | None) – Filetype of the PSM file. By default, None. Should be one of the supported psm_utils filetypes. See https://psm-utils.readthedocs.io/en/stable/#supported-file-formats.
spectrum_id_pattern (str | None) – Regular expression pattern to apply to spectrum titles before matching to peptide file
spec_id
entries.compute_correlations (bool) – Compute correlations between predictions and targets.
add_retention_time (bool) – Add retention time predictions with DeepLC (Requires optional DeepLC dependency).
model (str | None) – Model to use for prediction. Default: “HCD”.
model_dir (str | Path | None) – Directory where XGBoost model files are stored. Default: ~/.ms2pip.
ms2_tolerance (float) – MS2 tolerance in Da for observed spectrum peak annotation. By default, 0.02 Da.
processes (int | None) – Number of parallel processes for multiprocessing steps. By default, all available.
- Returns:
results – Predicted spectra with theoretical m/z and predicted intensity values, and optionally, correlations.
- Return type:
List[ProcessingResult]
- ms2pip.get_training_data(psms, spectrum_file, psm_filetype=None, spectrum_id_pattern=None, model='HCD', ms2_tolerance=0.02, processes=None)[source]
Extract feature vectors and target intensities from observed spectra for training.
- Parameters:
psms (PSMList | str | Path) – PSMList or path to PSM file that is supported by psm_utils.
spectrum_file (str | Path) – Path to spectrum file with target intensities.
psm_filetype (str | None) – Filetype of the PSM file. By default, None. Should be one of the supported psm_utils filetypes. See https://psm-utils.readthedocs.io/en/stable/#supported-file-formats.
spectrum_id_pattern (str | None) – Regular expression pattern to apply to spectrum titles before matching to peptide file
spec_id
entries.model (str | None) – Model to use as reference for the ion types that are extracted from the observed spectra. Default: “HCD”, which results in the extraction of singly charged b- and y-ions.
ms2_tolerance (float) – MS2 tolerance in Da for observed spectrum peak annotation. By default, 0.02 Da.
processes (int | None) – Number of parallel processes for multiprocessing steps. By default, all available.
- Returns:
pandas.DataFrame
with feature vectors and targets.- Return type:
features
- ms2pip.annotate_spectra(psms, spectrum_file, psm_filetype=None, spectrum_id_pattern=None, model='HCD', ms2_tolerance=0.02, processes=None)[source]
Annotate observed spectra.
- Parameters:
psms (PSMList | str | Path) – PSMList or path to PSM file that is supported by psm_utils.
spectrum_file (str | Path) – Path to spectrum file with target intensities.
psm_filetype (str | None) – Filetype of the PSM file. By default, None. Should be one of the supported psm_utils filetypes. See https://psm-utils.readthedocs.io/en/stable/#supported-file-formats.
spectrum_id_pattern (str | None) – Regular expression pattern to apply to spectrum titles before matching to peptide file
spec_id
entries.model (str | None) – Model to use as reference for the ion types that are extracted from the observed spectra. Default: “HCD”, which results in the extraction of singly charged b- and y-ions.
ms2_tolerance (float) – MS2 tolerance in Da for observed spectrum peak annotation. By default, 0.02 Da.
processes (int | None) – Number of parallel processes for multiprocessing steps. By default, all available.
- Returns:
results – List of ProcessingResult objects with theoretical m/z and observed intensity values.
- Return type:
List[ProcessingResult]