Methods and techniques for efficient processing of aggregated data

Yang, Fan (2022) Methods and techniques for efficient processing of aggregated data. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Preview

PDF
Download (4MB) | Preview

Abstract

With the explosion of information, massive amounts of data are being generated daily from different sources. Due to the limited infrastructure and human capacity for data integration and the requirement of efficient processing, some data, especially historical data, are stored in an aggregated form at different levels of aggregation. For example, epidemiological data preserves monthly counts of infected people. Meanwhile, data analysis and machine learning models often require elaborate knowledge of data for accurate analysis and prediction. This information should be obtained either from original or from aggregated data.

Motivated by the above challenge, this thesis aims to facilitate the generation and utilization of aggregated data from three aspects: 1) reconstructing higher-resolution time series from aggregated data with acceptable performance; 2) selecting aggregated data for analysis with minimal hurt for performance; 3) generating aggregated data for future studies with less information loss.

Most data reconstruction methods utilize domain knowledge, e.g., smoothness, periodicity, or sparsity, to improve reconstruction accuracy. Meanwhile, domain knowledge is limited and may be inaccurate in many applications, which leads to a worse reconstruction. In order to tackle this, I present two advanced methods: 1) ARES that performs data reconstruction by automatically discovering patterns in the time series using annihilating filter technique, 2) TURBOLIFT that aims to improve the quality of any existing disaggregation methods by refining the initial reconstruction.

Despite that reconstruction provides an elaborate view of data, its performance may vary depending on the data aggregation level, and it requires extra computational cost. Moreover, in some cases, analyzing coarse data may be sufficient to achieve acceptable accuracy. Therefore, I propose the SMARTPROGNOSIS to automatically suggest aggregation levels, which maximizes the performance under specific machine learning models.

It is noteworthy that most aggregation methods face information loss when aggregation levels increase. That results in lossy aggregated data, e.g., with annual counts, it is hard to capture the detailed trade during the year. In order to tackle this drawback, I propose the IAGG to aggregate data by emphasizing the critical information of original data.

Citation/Export:
Social Networking:	Share \|

Details

Item Type:

University of Pittsburgh ETD

Status:

Unpublished

Creators/Authors:

Creators	Email	Pitt Username	ORCID
Yang, Fan	fay28@pitt.edu	FAY28

ETD Committee:

Title	Member	Email Address	Pitt Username
Committee Chair	Zadorozhny, Vladimir	vladimirz@gmail.com	viz
Committee Member	Faloutsos, Christos	christos@cs.cmu.edu
Committee Member	Munro, Paul	paulwmunro@gmail.com
Committee Member	Pelechrinis, Konstantinos	kostas.pelechrinis@gmail.com

Date:

2 June 2022

Date Type:

Publication

Defense Date:

18 March 2022

Approval Date:

2 June 2022

Submission Date:

20 April 2022

Access Restriction:

No restriction; Release the ETD for access worldwide immediately.

Number of Pages:

109

Institution:

University of Pittsburgh

Schools and Programs:

School of Computing and Information > Information Science

Degree:

PhD - Doctor of Philosophy

Thesis Type:

Doctoral Dissertation

Refereed:

Yes

Uncontrolled Keywords:

Data Disaggregation, Data Navigation, Data Summarization

Date Deposited:

02 Jun 2022 21:11

Last Modified:

02 Jun 2022 21:11

URI:

http://d-scholarship.pitt.edu/id/eprint/42669

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item

My Account

Search

Browse

Information

Methods and techniques for efficient processing of aggregated data

Abstract

Share

Details

Metrics

Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

Connect with us

Send Comments or Questions

Feeds