Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form


Mi, Zhibao (2009) ROBUST CROSS-PLATFORM DISEASE PREDICTION USING GENE EXPRESSION MICROARRAYS. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Primary Text

Download (651kB) | Preview


Microarray technology has been used to predict patient prognosis and response to treatment, which is starting to have an impact on disease intervention and control, and is a significant measure for public health. However, the process has been hindered by a lack of adequate clinical validation. Since both microarray analyses and clinical trials are time and effort intensive, it is crucial to use accumulated inter-study data to validate information from individual studies. For over a decade, microarray data have been accumulated from different technologies. However, using data from one platform to build a model that robustly predicts the clinical characteristics of a new data from another platform remains a challenge. Current cross-platform gene prediction methods use only genes common to both training and test datasets. There are two main drawbacks to that approach: model reconstruction and loss of information. As a result, the prediction accuracy of those methods is unstable. In this dissertation, a module-based prediction strategy was developed to overcome the aforementioned drawbacks. By the current method, groups of genes sharing similar expression patterns rather than individual genes were used as the basic elements of the model predictor. Such an approach borrows information from genes¡¯ similarity when genes are absent in test data. By overcoming the problems of missing genes and noise across platforms, this method yielded robust predictions independent of information from the test data. The performance of this method was evaluated using publicly available microarray data. K-means clustering was used to group genes sharing similar expression profiles into gene modules and small modules were merged into their nearest neighbors. A univariate or multivariate feature selection procedures was applied and a representative gene from each selected module was identified. A prediction model was then constructed by the representative genes from selected gene modules. As a result, the prediction model is portable to any test study as long as partial genes in each module exist in the test study. The newly developed method showed advantages over the traditional methods in terms of prediction robustness to gene noise and gene mismatch issues in inter-study prediction.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Mi,, zmi@ptilabs.comZMI
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairTseng, George Cctseng@pitt.eduCTSENG
Committee MemberFeingold, Eleanorfeingold@pitt.eduFEINGOLD
Committee MemberTang, Gonggot1@pitt.eduGOT1
Committee MemberKaminski, Naftalinak38@pitt.eduNAK38
Date: 29 January 2009
Date Type: Completion
Defense Date: 15 October 2008
Approval Date: 29 January 2009
Submission Date: 4 December 2008
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Institution: University of Pittsburgh
Schools and Programs: School of Public Health > Biostatistics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: cross-platform; gene mismatch; gene noise; Microarray; module-based prediction; robust prediction
Other ID:, etd-12042008-152041
Date Deposited: 10 Nov 2011 20:08
Last Modified: 15 Nov 2016 13:53


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item