Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Alternatives to relational databases in precision medicine: comparison of NOSQL approaches for big data storage using supercomputers

Velazquez, Enrique Israel (2015) Alternatives to relational databases in precision medicine: comparison of NOSQL approaches for big data storage using supercomputers. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Submitted Version

Download (12MB)


Improvements in medical and genomic technologies have dramatically increased the production of electronic data over the last decade. As a result, data management is rapidly becoming a major determinant, and urgent challenge, for the development of Precision Medicine. Although successful data management is achievable using Relational Database Management Systems (RDBMS), exponential data growth is a significant contributor to failure scenarios. Growing amounts of data can also be observed in other sectors, such as economics and business, which, together with the previous facts, suggests that alternate database approaches (NoSQL) may soon be required for efficient storage and management of big databases. However, this hypothesis has been difficult to test in the Precision Medicine field since alternate database architectures are complex to assess and means to integrate heterogeneous electronic health records (EHR) with dynamic genomic data are not easily available.
In this dissertation, we present a novel set of experiments for identifying NoSQL database approaches that enable effective data storage and management in Precision Medicine using patients’ clinical and genomic information from the cancer genome atlas (TCGA). The first experiment draws on performance and scalability from biologically meaningful queries with differing complexity and database sizes. The second experiment measures performance and scalability in database updates without schema changes. The third experiment assesses performance and scalability in database updates with schema modifications due dynamic data. We have identified two NoSQL approach, based on Cassandra and Redis, which seems to be the ideal database management systems for our precision medicine queries in terms of performance and scalability. We present NoSQL approaches and show how they can be used to manage clinical and genomic big data. Our research is relevant to the public health since we are focusing on one of the main challenges to the development of Precision Medicine and, consequently, investigating a potential solution to the progressively increasing demands on health care.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Velazquez, Enrique Israeleiv2@pitt.eduEIV2
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Thesis AdvisorBarmada, M. Michaelbarmada@pitt.eduBARMADA
Committee MemberFeingold, Eleanorfeingold@pitt.eduFEINGOLD
Committee MemberHochheiser, Harryharryh@pitt.eduHARRYH
Committee MemberLabrinidis, Alexandroslabrinid@pitt.eduLABRINID
Committee MemberMinster, Ryan Leerminster@pitt.eduRMINSTER
Date: 28 September 2015
Date Type: Publication
Defense Date: 29 June 2015
Approval Date: 28 September 2015
Submission Date: 17 July 2015
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 231
Institution: University of Pittsburgh
Schools and Programs: School of Public Health > Human Genetics
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Precision Medicine, Human Genetics, Genomics, DBMS, NoSQL, DNA sequencing, RNA sequencing, Bioinformatics, Personalized Medicine, Computational Genetics, Statistical Genetics, Supercomputers
Date Deposited: 28 Sep 2015 19:23
Last Modified: 15 Nov 2016 14:29


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item