Velazquez, Enrique Israel
(2015)
Alternatives to relational databases in precision medicine: comparison of NOSQL approaches for big data storage using supercomputers.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
Improvements in medical and genomic technologies have dramatically increased the production of electronic data over the last decade. As a result, data management is rapidly becoming a major determinant, and urgent challenge, for the development of Precision Medicine. Although successful data management is achievable using Relational Database Management Systems (RDBMS), exponential data growth is a significant contributor to failure scenarios. Growing amounts of data can also be observed in other sectors, such as economics and business, which, together with the previous facts, suggests that alternate database approaches (NoSQL) may soon be required for efficient storage and management of big databases. However, this hypothesis has been difficult to test in the Precision Medicine field since alternate database architectures are complex to assess and means to integrate heterogeneous electronic health records (EHR) with dynamic genomic data are not easily available.
In this dissertation, we present a novel set of experiments for identifying NoSQL database approaches that enable effective data storage and management in Precision Medicine using patients’ clinical and genomic information from the cancer genome atlas (TCGA). The first experiment draws on performance and scalability from biologically meaningful queries with differing complexity and database sizes. The second experiment measures performance and scalability in database updates without schema changes. The third experiment assesses performance and scalability in database updates with schema modifications due dynamic data. We have identified two NoSQL approach, based on Cassandra and Redis, which seems to be the ideal database management systems for our precision medicine queries in terms of performance and scalability. We present NoSQL approaches and show how they can be used to manage clinical and genomic big data. Our research is relevant to the public health since we are focusing on one of the main challenges to the development of Precision Medicine and, consequently, investigating a potential solution to the progressively increasing demands on health care.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
Creators | Email | Pitt Username | ORCID |
---|
Velazquez, Enrique Israel | eiv2@pitt.edu | EIV2 | |
|
ETD Committee: |
|
Date: |
28 September 2015 |
Date Type: |
Publication |
Defense Date: |
29 June 2015 |
Approval Date: |
28 September 2015 |
Submission Date: |
17 July 2015 |
Access Restriction: |
No restriction; Release the ETD for access worldwide immediately. |
Number of Pages: |
231 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
School of Public Health > Human Genetics |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
Precision Medicine, Human Genetics, Genomics, DBMS, NoSQL, DNA sequencing, RNA sequencing, Bioinformatics, Personalized Medicine, Computational Genetics, Statistical Genetics, Supercomputers |
Date Deposited: |
28 Sep 2015 19:23 |
Last Modified: |
15 Nov 2016 14:29 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/25614 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |