Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Effects of missing value imputation on down-stream analyses in microarray data

OH, sunghee (2010) Effects of missing value imputation on down-stream analyses in microarray data. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Primary Text

Download (1MB) | Preview


Amongst the high-throughput technologies, DNA microarray experiments provide enormous quantity of genes and arrays with biological information to disease. The studies of gene expression values in various conditions and various organisms in public health have led to the identification of genes to the comparison between tumor and normal, clinically relevant subtypes of tumor, and prognostic signatures and have ultimately provided the potential targets for specific therapy of public health disease. Despite such advances and the popular usage of microarray, the microarray experiments frequently produce multiple missing values due to many flaw factors such as dust, scratches on the slides, insufficient resolution, or hybridization errors on the chips. Thus, gene expression data contains missing entries and a large number of genes may be affected. Unfortunately, many downstream algorithms for gene expression analysis require a complete matrix as an input. Therefore effective missing value imputation methods are needed and have been developed in the literature so far. There exists no uniformly superior imputation method and the performance depends on the structure and nature of a data set. In addition, imputation methods have been mostly compared in terms of variants of RMSEs (Root Mean Squared Error) to compare similarity between true expression values and imputed expression values. The drawback of RMSE-based evaluation is that the measure does not reflect the true biological effect in down-stream analyses. In this dissertation, we will investigate how missing value imputation process affects the biological result of differentially expressed genes discovery, clustering and classification. Multiple statistical methods in each of the downstream analysis will be considered. Quantitative measures reflecting the true biological effects in each down-stream analysis will be used to evaluate imputation methods and be compared to RMSE-based evaluation.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairTseng, George Cctseng@pitt.eduCTSENG
Committee MemberJeong, Jonghyeonjeong@nsabp.pitt.eduJJEONG
Committee MemberKong, Lanlkong@pitt.eduLKONG
Committee MemberLin,
Date: 28 January 2010
Date Type: Completion
Defense Date: 19 June 2009
Approval Date: 28 January 2010
Submission Date: 15 October 2009
Access Restriction: 5 year -- Restrict access to University of Pittsburgh for a period of 5 years.
Institution: University of Pittsburgh
Schools and Programs: School of Public Health > Biostatistics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: down-stream analyses; imputation; microarray; missing value
Other ID:, etd-10152009-131427
Date Deposited: 10 Nov 2011 20:03
Last Modified: 15 Nov 2016 13:50


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item