Context-sensitive Markov Models for Peptide Scoring and Identification from Tandem Mass Spectrometry

Grover, Himanshu (2013) Context-sensitive Markov Models for Peptide Scoring and Identification from Tandem Mass Spectrometry. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Preview

PDF
Primary Text
Download (2MB) | Preview

Abstract

Computational methods for peptide identification via tandem mass spectrometry (MS/MS) lie at the heart of proteomic characterization of biological samples. Due to the complex nature of peptide fragmentation process inside mass spectrometers, most extant methods underutilize the intensity information available in the tandem mass spectrum. Further, high noise content and variability in MS/MS datasets present significant data analysis challenges. These factors contribute to loss of identifications, necessitating development of more complex approaches.
This dissertation develops and evaluates a novel probabilistic framework called Context-Sensitive Peptide Identification (CSPI) for improving peptide scoring and identification from MS/MS data. Employing Input-Output Hidden Markov Models (IO-HMM), CSPI addresses the above computational challenges by modeling the effect of peptide physicochemical features ("context") on their observed (normalized) MS/MS spectrum intensities. Flexibility and scalability of the CSPI framework enables incorporation of many different kinds of features from the domain into the modeling task. Design choices also include the underlying parameter representation and allow learning complex probability distributions and dependencies embedded in the data.
Empirical evaluation on multiple datasets of varying sizes and complexity demonstrates that CSPI's intensity-based scores significantly improve peptide identification performance, identifying up to ~25% more peptides at 1% False Discovery Rate (FDR) as compared with popular state-of-the-art approaches. It is further shown that a weighted score combination procedure that includes CSPI scores along with other commonly used scores leads to greater discrimination between true and false identifications, achieving ~4-8% more correct identifications at 1% FDR compared with the case without CSPI features.
Superior performance of the CSPI framework has the potential to impact downstream proteomic investigations (like protein identification, quantification and differential expression) that utilize results from peptide-level analyses. Being computationally intensive, the design and implementation of CSPI supports efficient handling of large MS/MS datasets, achieved through database indexing and parallelization of the computational workflow using multiprocessing architecture.

Citation/Export:
Social Networking:	Share \|

Details

Item Type:

University of Pittsburgh ETD

Status:

Unpublished

Creators/Authors:

Creators	Email	Pitt Username	ORCID
Grover, Himanshu	hig2@pitt.edu	HIG2

ETD Committee:

Title	Member	Email Address	Pitt Username
Committee Chair	Gopalakrishnan, Vanathi	vanathi@pitt.edu	VANATHI
Committee Member	Wallstrom, Garrick	Garrick.Wallstrom@asu.edu
Committee Member	Visweswaran, Shyam	shv3@pitt.edu	SHV3
Committee Member	Wu, Christine C.	chriswu@pitt.edu	CHRISWU

Date:

3 January 2013

Date Type:

Publication

Defense Date:

28 September 2012

Approval Date:

3 January 2013

Submission Date:

10 November 2012

Access Restriction:

5 year -- Restrict access to University of Pittsburgh for a period of 5 years.

Number of Pages:

121

Institution:

University of Pittsburgh

Schools and Programs:

School of Medicine > Biomedical Informatics

Degree:

PhD - Doctor of Philosophy

Thesis Type:

Doctoral Dissertation

Refereed:

Yes

Uncontrolled Keywords:

Bioinformatics, Machine learning, Input-output Hidden Markov Models, Proteomics, Mass spectrometry, Computational Proteomics, Peptide Identification, Database Searching

Date Deposited:

03 Jan 2013 19:39

Last Modified:

03 Jan 2018 06:15

URI:

http://d-scholarship.pitt.edu/id/eprint/16333

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item

My Account

Search

Browse

Information

Context-sensitive Markov Models for Peptide Scoring and Identification from Tandem Mass Spectrometry

Abstract

Share

Details

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

Connect with us

Send Comments or Questions

Feeds