McDade, Kevin
(2017)
Enabling Data-Guided Evaluation of Bioinformatics Workflow Quality.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
This is the latest version of this item.
Abstract
Bioinformatics can be divided into two phases, the first phase is conversion of raw data into processed data and the second phase is using processed data to obtain scientific results. It is important to consider the first “workflow” phase carefully, as there are many paths on the way to a final processed dataset. Some workflow paths may be different enough to influence the second phase, thereby, leading to ambiguity in the scientific literature. Workflow evaluation in bioinformatics enables the investigator to carefully plan how to process their data. A system that uses real data to determine the quality of a workflow can be based on the inherent biological relationships in the data itself. To our knowledge, a general software framework that performs real data-driven evaluation of bioinformatics workflows does not exist.
The Evaluation and Utility of workFLOW (EUFLOW) decision-theoretic framework, developed and tested on gene expression data, enables users of bioinformatics workflows to evaluate alternative workflow paths using inherent biological relationships. EUFLOW is implemented as an R package to enable users to evaluate workflow data. EUFLOW is a framework which also permits user-guided utility and loss functions, which enables the type of analysis to be considered in the workflow path decision. This framework was originally developed to address the quality of identifier mapping services between UNIPROT accessions and Affymetrix probesets to facilitate integrated analysis1. An extension to this framework evaluates Affymetrix probeset filtering methods on real data from endometrial cancer and TCGA ovarian serous carcinoma samples.2 Further evaluation of RNASeq workflow paths demonstrates generalizability of the EUFLOW framework. Three separate evaluations are performed including: 1) identifier filtering of features with biological attributes, 2) threshold selection parameter choice for low gene count features, and 3) commonly utilized RNASeq data workflow paths on The Cancer Genome Atlas data.
The EUFLOW decision-theoretic framework developed and tested in my dissertation enables users of bioinformatics workflows to evaluate alternative workflow paths guided by inherent biological relationships and user utility.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
12 May 2017 |
Date Type: |
Publication |
Defense Date: |
3 April 2017 |
Approval Date: |
12 May 2017 |
Submission Date: |
6 May 2017 |
Access Restriction: |
No restriction; Release the ETD for access worldwide immediately. |
Number of Pages: |
159 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
School of Medicine > Biomedical Informatics |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
Workflow, Data-Guided, RNASeq Workflow, Workflow Evaluation |
Date Deposited: |
12 May 2017 19:01 |
Last Modified: |
30 Jun 2022 15:22 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/31910 |
Available Versions of this Item
-
Enabling Data-Guided Evaluation of Bioinformatics Workflow Quality. (deposited 12 May 2017 19:01)
[Currently Displayed]
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
 |
View Item |