Minimizing User Effort in Large Scale Example-driven Data Exploration

Ge, Xiaoyu (2021) Minimizing User Effort in Large Scale Example-driven Data Exploration. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Preview

PDF
Download (7MB) | Preview

Abstract

Data Exploration is a key ingredient in a widely diverse set of discovery-oriented applications, including scientific computing, financial analysis, and evidence-based medicine. It refers to a series of exploratory tasks that aim to extract useful pieces of knowledge from data, and its challenge is to do so without requiring the user to specify with precision what information is being searched for. The goal of assisting users in constructing their exploratory queries effortlessly, which effectively reveals interesting data objects, has led to the development of a variety of intelligent semi-automatic approaches. Among such approaches, Example-driven Exploration is rapidly becoming an attractive choice for exploratory query formulation since it attempts to minimize the amount of prior knowledge required from the user to form an accurate exploratory query.
In particular, this dissertation focuses on interactive Example-driven Exploration, which steers the user towards discovering all data objects relevant to the users’ exploration based on their feedback on a small set of examples. Interactive Example-driven Exploration is especially beneficial for non-expert users, as it enables them to circumvent query languages by assigning relevancy to examples as a proxy for the intended exploratory analysis. However, existing interactive Example-driven Exploration systems fall short of supporting the need to perform complex explorations over large, unstructured high-dimensional data. To overcome these challenges, we have developed new methods of data reduction, example selection, data indexing, and result refinement that support practical, interactive data exploration.
The novelty of our approach is anchored on leveraging active learning and query optimization techniques that strike a balance between maximizing accuracy and minimizing user effort in providing feedback while enabling interactive performance for exploration tasks with arbitrary, large-sized datasets. Furthermore, it extends the exploration beyond the structured data by supporting a variety of high-dimensional unstructured data and enables the refinement of results when the exploration task is associated with too many relevant data objects that could be overwhelming to the user. To affirm the effectiveness of our proposed models, techniques, and algorithms, we implemented multiple prototype systems and evaluated them using real datasets. Some of them were also used in domain-specific analytics tools.

Citation/Export:
Social Networking:	Share \|

Details

Item Type:

University of Pittsburgh ETD

Status:

Unpublished

Creators/Authors:

Creators	Email	Pitt Username	ORCID
Ge, Xiaoyu	xig34@pitt.edu	xig34	0000-0002-2730-6304

ETD Committee:

Title	Member	Email Address	Pitt Username	ORCID
Committee Chair	Chrysanthis, Panos K.	panos@pitt.edu	panos	0000-0001-7189-9816
Committee Member	Labrinidis, Alexandros	labrinid@cs.pitt.edu	labrinid	0000-0003-1349-0056
Committee Member	Kovashka, Adriana	kovashka@cs.pitt.edu	kovashka	0000-0003-1901-9660
Committee Member	Sharaf, Mohamed A.	msharaf@uaeu.ac.ae

Date:

8 October 2021

Date Type:

Publication

Defense Date:

22 July 2021

Approval Date:

8 October 2021

Submission Date:

29 August 2021

Access Restriction:

No restriction; Release the ETD for access worldwide immediately.

Number of Pages:

154

Institution:

University of Pittsburgh

Schools and Programs:

Dietrich School of Arts and Sciences > Computer Science

Degree:

PhD - Doctor of Philosophy

Thesis Type:

Doctoral Dissertation

Refereed:

Yes

Uncontrolled Keywords:

Data Exploration, Interactive Data Exploration, Interactive Search, Active Learning, Results Refinement

Date Deposited:

08 Oct 2021 19:39

Last Modified:

08 Oct 2021 19:39

URI:

http://d-scholarship.pitt.edu/id/eprint/41742

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item

My Account

Search

Browse

Information

Minimizing User Effort in Large Scale Example-driven Data Exploration

Abstract

Share

Details

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

Connect with us

Send Comments or Questions

Feeds