Grover, Himanshu
(2013)
Context-sensitive Markov Models for Peptide Scoring and Identification from Tandem Mass Spectrometry.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
Computational methods for peptide identification via tandem mass spectrometry (MS/MS) lie at the heart of proteomic characterization of biological samples. Due to the complex nature of peptide fragmentation process inside mass spectrometers, most extant methods underutilize the intensity information available in the tandem mass spectrum. Further, high noise content and variability in MS/MS datasets present significant data analysis challenges. These factors contribute to loss of identifications, necessitating development of more complex approaches.
This dissertation develops and evaluates a novel probabilistic framework called Context-Sensitive Peptide Identification (CSPI) for improving peptide scoring and identification from MS/MS data. Employing Input-Output Hidden Markov Models (IO-HMM), CSPI addresses the above computational challenges by modeling the effect of peptide physicochemical features ("context") on their observed (normalized) MS/MS spectrum intensities. Flexibility and scalability of the CSPI framework enables incorporation of many different kinds of features from the domain into the modeling task. Design choices also include the underlying parameter representation and allow learning complex probability distributions and dependencies embedded in the data.
Empirical evaluation on multiple datasets of varying sizes and complexity demonstrates that CSPI's intensity-based scores significantly improve peptide identification performance, identifying up to ~25% more peptides at 1% False Discovery Rate (FDR) as compared with popular state-of-the-art approaches. It is further shown that a weighted score combination procedure that includes CSPI scores along with other commonly used scores leads to greater discrimination between true and false identifications, achieving ~4-8% more correct identifications at 1% FDR compared with the case without CSPI features.
Superior performance of the CSPI framework has the potential to impact downstream proteomic investigations (like protein identification, quantification and differential expression) that utilize results from peptide-level analyses. Being computationally intensive, the design and implementation of CSPI supports efficient handling of large MS/MS datasets, achieved through database indexing and parallelization of the computational workflow using multiprocessing architecture.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
Creators | Email | Pitt Username | ORCID |
---|
Grover, Himanshu | hig2@pitt.edu | HIG2 | |
|
ETD Committee: |
|
Date: |
3 January 2013 |
Date Type: |
Publication |
Defense Date: |
28 September 2012 |
Approval Date: |
3 January 2013 |
Submission Date: |
10 November 2012 |
Access Restriction: |
5 year -- Restrict access to University of Pittsburgh for a period of 5 years. |
Number of Pages: |
121 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
School of Medicine > Biomedical Informatics |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
Bioinformatics, Machine learning, Input-output Hidden Markov Models, Proteomics, Mass spectrometry, Computational Proteomics, Peptide Identification, Database Searching |
Date Deposited: |
03 Jan 2013 19:39 |
Last Modified: |
03 Jan 2018 06:15 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/16333 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |