Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Minimizing User Effort in Large Scale Example-driven Data Exploration

Ge, Xiaoyu (2021) Minimizing User Effort in Large Scale Example-driven Data Exploration. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

[img]
Preview
PDF
Download (7MB) | Preview

Abstract

Data Exploration is a key ingredient in a widely diverse set of discovery-oriented applications, including scientific computing, financial analysis, and evidence-based medicine. It refers to a series of exploratory tasks that aim to extract useful pieces of knowledge from data, and its challenge is to do so without requiring the user to specify with precision what information is being searched for. The goal of assisting users in constructing their exploratory queries effortlessly, which effectively reveals interesting data objects, has led to the development of a variety of intelligent semi-automatic approaches. Among such approaches, Example-driven Exploration is rapidly becoming an attractive choice for exploratory query formulation since it attempts to minimize the amount of prior knowledge required from the user to form an accurate exploratory query.
In particular, this dissertation focuses on interactive Example-driven Exploration, which steers the user towards discovering all data objects relevant to the users’ exploration based on their feedback on a small set of examples. Interactive Example-driven Exploration is especially beneficial for non-expert users, as it enables them to circumvent query languages by assigning relevancy to examples as a proxy for the intended exploratory analysis. However, existing interactive Example-driven Exploration systems fall short of supporting the need to perform complex explorations over large, unstructured high-dimensional data. To overcome these challenges, we have developed new methods of data reduction, example selection, data indexing, and result refinement that support practical, interactive data exploration.
The novelty of our approach is anchored on leveraging active learning and query optimization techniques that strike a balance between maximizing accuracy and minimizing user effort in providing feedback while enabling interactive performance for exploration tasks with arbitrary, large-sized datasets. Furthermore, it extends the exploration beyond the structured data by supporting a variety of high-dimensional unstructured data and enables the refinement of results when the exploration task is associated with too many relevant data objects that could be overwhelming to the user. To affirm the effectiveness of our proposed models, techniques, and algorithms, we implemented multiple prototype systems and evaluated them using real datasets. Some of them were also used in domain-specific analytics tools.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: University of Pittsburgh ETD
Status: Unpublished
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Ge, Xiaoyuxig34@pitt.eduxig340000-0002-2730-6304
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairChrysanthis, Panos K.panos@pitt.edupanos0000-0001-7189-9816
Committee MemberLabrinidis, Alexandroslabrinid@cs.pitt.edulabrinid0000-0003-1349-0056
Committee MemberKovashka, Adrianakovashka@cs.pitt.edukovashka0000-0003-1901-9660
Committee MemberSharaf, Mohamed A.msharaf@uaeu.ac.ae
Date: 8 October 2021
Date Type: Publication
Defense Date: 22 July 2021
Approval Date: 8 October 2021
Submission Date: 29 August 2021
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 154
Institution: University of Pittsburgh
Schools and Programs: Dietrich School of Arts and Sciences > Computer Science
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Data Exploration, Interactive Data Exploration, Interactive Search, Active Learning, Results Refinement
Date Deposited: 08 Oct 2021 19:39
Last Modified: 08 Oct 2021 19:39
URI: http://d-scholarship.pitt.edu/id/eprint/41742

Metrics

Monthly Views for the past 3 years

Plum Analytics


Actions (login required)

View Item View Item