Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Advanced distributed data integration infrastructure and research data management portal

Karataev, Evgeny (2017) Advanced distributed data integration infrastructure and research data management portal. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

This is the latest version of this item.

Primary Text

Download (8MB) | Preview


The amount of data available due to the rapid spread of advanced information technology is exploding. At the same time, continued research on data integration systems aims to provide users with uniform data access and efficient data sharing. The ability to share data is particularly important for interdisciplinary research, where a comprehensive picture of the subject requires large amounts of data from disparate data sources from a variety of disciplines. While there are numerous data sets available from various groups worldwide, the existing data sources are principally oriented toward regional comparative efforts rather than global applications. They vary widely both in content and format. Such data sources cannot be easily integrated, and maintained by small groups of developers.
I propose an advanced infrastructure for large-scale data integration based on crowdsourcing. In particular, I propose a novel architecture and algorithms to efficiently store dynamically incoming heterogeneous datasets enabling both data integration and data autonomy. My proposed infrastructure combines machine learning algorithms and human expertise to perform efficient schema alignment and maintain relationships between the datasets. It provides efficient data exploration functionality without requiring users to write complex queries, as well as performs approximate information fusion when exact match does not exist. Finally, I introduce Col*Fusion system that implements the proposed advance data integration infrastructure.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Karataev, EvgenyKarataev.Evgeny@gmail.comEPK80000-0002-5750-0634
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairZadorozhny,
Committee MemberDruzdzel,
Committee MemberPelechrinis,
Committee MemberNystrom,
Date: 10 January 2017
Date Type: Publication
Defense Date: 6 May 2016
Approval Date: 10 January 2017
Submission Date: 19 October 2016
Access Restriction: 1 year -- Restrict access to University of Pittsburgh for a period of 1 year.
Number of Pages: 232
Institution: University of Pittsburgh
Schools and Programs: School of Information Sciences > Information Science
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Data Integration, Research Data Management, Data Management, Data Fusion, Crowdsourcing
Date Deposited: 10 Jan 2017 20:57
Last Modified: 10 Jan 2018 06:15

Available Versions of this Item

  • Advanced distributed data integration infrastructure and research data management portal. (deposited 10 Jan 2017 20:57) [Currently Displayed]


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item