Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Context-sensitive Markov Models for Peptide Scoring and Identification from Tandem Mass Spectrometry

Grover, Himanshu (2013) Context-sensitive Markov Models for Peptide Scoring and Identification from Tandem Mass Spectrometry. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Primary Text

Download (2MB) | Preview


Computational methods for peptide identification via tandem mass spectrometry (MS/MS) lie at the heart of proteomic characterization of biological samples. Due to the complex nature of peptide fragmentation process inside mass spectrometers, most extant methods underutilize the intensity information available in the tandem mass spectrum. Further, high noise content and variability in MS/MS datasets present significant data analysis challenges. These factors contribute to loss of identifications, necessitating development of more complex approaches.
This dissertation develops and evaluates a novel probabilistic framework called Context-Sensitive Peptide Identification (CSPI) for improving peptide scoring and identification from MS/MS data. Employing Input-Output Hidden Markov Models (IO-HMM), CSPI addresses the above computational challenges by modeling the effect of peptide physicochemical features ("context") on their observed (normalized) MS/MS spectrum intensities. Flexibility and scalability of the CSPI framework enables incorporation of many different kinds of features from the domain into the modeling task. Design choices also include the underlying parameter representation and allow learning complex probability distributions and dependencies embedded in the data.
Empirical evaluation on multiple datasets of varying sizes and complexity demonstrates that CSPI's intensity-based scores significantly improve peptide identification performance, identifying up to ~25% more peptides at 1% False Discovery Rate (FDR) as compared with popular state-of-the-art approaches. It is further shown that a weighted score combination procedure that includes CSPI scores along with other commonly used scores leads to greater discrimination between true and false identifications, achieving ~4-8% more correct identifications at 1% FDR compared with the case without CSPI features.
Superior performance of the CSPI framework has the potential to impact downstream proteomic investigations (like protein identification, quantification and differential expression) that utilize results from peptide-level analyses. Being computationally intensive, the design and implementation of CSPI supports efficient handling of large MS/MS datasets, achieved through database indexing and parallelization of the computational workflow using multiprocessing architecture.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Grover, Himanshuhig2@pitt.eduHIG2
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairGopalakrishnan, Vanathivanathi@pitt.eduVANATHI
Committee MemberWallstrom,
Committee MemberVisweswaran, Shyamshv3@pitt.eduSHV3
Committee MemberWu, Christine C.chriswu@pitt.eduCHRISWU
Date: 3 January 2013
Date Type: Publication
Defense Date: 28 September 2012
Approval Date: 3 January 2013
Submission Date: 10 November 2012
Access Restriction: 5 year -- Restrict access to University of Pittsburgh for a period of 5 years.
Number of Pages: 121
Institution: University of Pittsburgh
Schools and Programs: School of Medicine > Biomedical Informatics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Bioinformatics, Machine learning, Input-output Hidden Markov Models, Proteomics, Mass spectrometry, Computational Proteomics, Peptide Identification, Database Searching
Date Deposited: 03 Jan 2013 19:39
Last Modified: 03 Jan 2018 06:15


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item