Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Analytical Techniques for the Improvement of Mass Spectrometry Protein Profiling

Pelikan, Richard Craig (2011) Analytical Techniques for the Improvement of Mass Spectrometry Protein Profiling. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Primary Text

Download (10MB) | Preview


Bioinformatics is rapidly advancing through the "post-genomic" era following the sequencing of the human genome. In preparation for studying the inner workings behind genes, proteins and even smaller biological elements, several subdivisions of bioinformatics have developed. The subdivision of proteomics, concerning the structure and function of proteins, has been aided by the mass spectrometry data source. Biofluid or tissue samples are rapidly assayed for their protein composition. The resulting mass spectra are analyzed using machine learning techniques to discover reliable patterns which discriminate samples from two populations, for example, healthy or diseased, or treatment responders versus non-responders. However, this data source is imperfect and faces several challenges: unwanted variability arising from the data collection process, obtaining a robust discriminative model that generalizes well to future data, and validating a predictive pattern statistically and biologically.This thesis presents several techniques which attempt to intelligently deal with the problems facing each stage of the analytical process. First, an automatic preprocessing method selection system is demonstrated. This system learns from data and selects a combination of preprocessing methods which is most appropriate for the task at hand. This reduces the noise affecting potential predictive patterns. Our results suggest that this method can help adapt to data from different technologies, improving downstream predictive performance. Next, the issues of feature selection and predictive modeling are revisited with respect to the unique challenges posed by proteomic profile data. Approaches to model selection through kernel learning are also investigated. Key insights are obtained for designing the feature selection and predictive modeling portion of the analytical framework. Finally, methods for interpreting the resultsof predictive modeling are demonstrated. These methods are used to assure the user of various desirable properties: validation of the strength of a predictive model, validation of reproducible signal across multiple data generation sessions and generalizability of predictive models to future data. A method for labeling profile features with biological identities is also presented, which aids in the interpretation of the data. Overall, these novel techniques give the protein profiling community additional support and leverage to aid the predictive capability of the technology.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Pelikan, Richard
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairHauskrecht, Milosmilos@pitt.eduMILOS
Committee MemberCooper, Gregorygfc@pitt.eduGFC
Committee MemberGopalakrishnan, Vanathivanathi@pitt.ed
Committee MemberBigbee, William
Date: 30 June 2011
Date Type: Completion
Defense Date: 7 April 2011
Approval Date: 30 June 2011
Submission Date: 30 March 2011
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Institution: University of Pittsburgh
Schools and Programs: Dietrich School of Arts and Sciences > Intelligent Systems
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Machine Learning; Bioinformatics; Proteomics
Other ID:, etd-03302011-144506
Date Deposited: 10 Nov 2011 19:33
Last Modified: 15 Nov 2016 13:37


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item