OH, sunghee
(2010)
Effects of missing value imputation on down-stream analyses in microarray data.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
Amongst the high-throughput technologies, DNA microarray experiments provide enormous quantity of genes and arrays with biological information to disease. The studies of gene expression values in various conditions and various organisms in public health have led to the identification of genes to the comparison between tumor and normal, clinically relevant subtypes of tumor, and prognostic signatures and have ultimately provided the potential targets for specific therapy of public health disease. Despite such advances and the popular usage of microarray, the microarray experiments frequently produce multiple missing values due to many flaw factors such as dust, scratches on the slides, insufficient resolution, or hybridization errors on the chips. Thus, gene expression data contains missing entries and a large number of genes may be affected. Unfortunately, many downstream algorithms for gene expression analysis require a complete matrix as an input. Therefore effective missing value imputation methods are needed and have been developed in the literature so far. There exists no uniformly superior imputation method and the performance depends on the structure and nature of a data set. In addition, imputation methods have been mostly compared in terms of variants of RMSEs (Root Mean Squared Error) to compare similarity between true expression values and imputed expression values. The drawback of RMSE-based evaluation is that the measure does not reflect the true biological effect in down-stream analyses. In this dissertation, we will investigate how missing value imputation process affects the biological result of differentially expressed genes discovery, clustering and classification. Multiple statistical methods in each of the downstream analysis will be considered. Quantitative measures reflecting the true biological effects in each down-stream analysis will be used to evaluate imputation methods and be compared to RMSE-based evaluation.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
28 January 2010 |
Date Type: |
Completion |
Defense Date: |
19 June 2009 |
Approval Date: |
28 January 2010 |
Submission Date: |
15 October 2009 |
Access Restriction: |
5 year -- Restrict access to University of Pittsburgh for a period of 5 years. |
Institution: |
University of Pittsburgh |
Schools and Programs: |
School of Public Health > Biostatistics |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
down-stream analyses; imputation; microarray; missing value |
Other ID: |
http://etd.library.pitt.edu/ETD/available/etd-10152009-131427/, etd-10152009-131427 |
Date Deposited: |
10 Nov 2011 20:03 |
Last Modified: |
15 Nov 2016 13:50 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/9475 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |