Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Enabling Data-Guided Evaluation of Bioinformatics Workflow Quality

McDade, Kevin (2017) Enabling Data-Guided Evaluation of Bioinformatics Workflow Quality. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

This is the latest version of this item.

Download (1MB) | Preview


Bioinformatics can be divided into two phases, the first phase is conversion of raw data into processed data and the second phase is using processed data to obtain scientific results. It is important to consider the first “workflow” phase carefully, as there are many paths on the way to a final processed dataset. Some workflow paths may be different enough to influence the second phase, thereby, leading to ambiguity in the scientific literature. Workflow evaluation in bioinformatics enables the investigator to carefully plan how to process their data. A system that uses real data to determine the quality of a workflow can be based on the inherent biological relationships in the data itself. To our knowledge, a general software framework that performs real data-driven evaluation of bioinformatics workflows does not exist.
The Evaluation and Utility of workFLOW (EUFLOW) decision-theoretic framework, developed and tested on gene expression data, enables users of bioinformatics workflows to evaluate alternative workflow paths using inherent biological relationships. EUFLOW is implemented as an R package to enable users to evaluate workflow data. EUFLOW is a framework which also permits user-guided utility and loss functions, which enables the type of analysis to be considered in the workflow path decision. This framework was originally developed to address the quality of identifier mapping services between UNIPROT accessions and Affymetrix probesets to facilitate integrated analysis1. An extension to this framework evaluates Affymetrix probeset filtering methods on real data from endometrial cancer and TCGA ovarian serous carcinoma samples.2 Further evaluation of RNASeq workflow paths demonstrates generalizability of the EUFLOW framework. Three separate evaluations are performed including: 1) identifier filtering of features with biological attributes, 2) threshold selection parameter choice for low gene count features, and 3) commonly utilized RNASeq data workflow paths on The Cancer Genome Atlas data.
The EUFLOW decision-theoretic framework developed and tested in my dissertation enables users of bioinformatics workflows to evaluate alternative workflow paths guided by inherent biological relationships and user utility.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
McDade, Kevinkkm5@pitt.edukkm5
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairGopalakrishnan, Vanathivanathi@pitt.eduvanathi
Committee MemberDay, Rogerday01@pitt.eduday01
Committee MemberChandran, Umachandran@pitt.educhandran
Committee MemberHochheiser, Harryharryh@pitt.eduharryh
Committee MemberLu, Xinghuaxinghua@pitt.eduxinghua
Committee MemberWeeks, Danielweeks@pitt.eduweeks0000-0001-9410-7228
Date: 12 May 2017
Date Type: Publication
Defense Date: 3 April 2017
Approval Date: 12 May 2017
Submission Date: 6 May 2017
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 159
Institution: University of Pittsburgh
Schools and Programs: School of Medicine > Biomedical Informatics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Workflow, Data-Guided, RNASeq Workflow, Workflow Evaluation
Date Deposited: 12 May 2017 19:01
Last Modified: 30 Jun 2022 15:22

Available Versions of this Item

  • Enabling Data-Guided Evaluation of Bioinformatics Workflow Quality. (deposited 12 May 2017 19:01) [Currently Displayed]


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item