Al-khateeb, Shadi
(2021)
A NOVEL APPROACH FOR IMPROVING THE QUALITY OF DATA USING AGGREGATION MECHANISM.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
This is the latest version of this item.
Abstract
Due to the inception of the big data applications, it is becoming increasingly important to manage and analyze large volumes of data. However, it is not always possible to efficiently analyze very big chunks of detailed data. Thus, data aggregation techniques emerged as an efficient solution for reducing the data size and providing summary of the key information in the original data. For example, yearly stock sales are used instead of daily sales to provide a general summary of the sales. Data aggregation aims to group raw data elements in order to facilitate the assessment of higher-level concepts. However, data aggregation can result in the loss of some important details in the original data, which means that the aggregation should be done in a creative manner in order to keep the data informative even if there is a loss in some details. In some cases, we may have only aggregated versions of the data due to the data collection constraints as well as high storage and processing requirements of the big data. In these cases, we need to find the relationship between aggregated datasets and original datasets. Data disaggregation is one solution for this issue. However, accurate disaggregation is not always possible and easy to utilize.
In this dissertation, we introduce a novel approach to improve the quality of data to be more informative without disaggregating the data. We propose information preserving signature based preprocessing strategy, as well as an aggregation-based information retrieval architecture using signatures. We compensate the loss of details in the raw data by highlighting the most informative parts in the aggregated data. Our approach can be used to assess similarity and correspondence between datasets and to link aggregated historical data with most related datasets. We extended our approach to be used with time series datasets. We also created hybrid signatures to be used at any aggregation level.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
7 June 2021 |
Date Type: |
Publication |
Defense Date: |
8 February 2021 |
Approval Date: |
7 June 2021 |
Submission Date: |
9 April 2021 |
Access Restriction: |
No restriction; Release the ETD for access worldwide immediately. |
Number of Pages: |
101 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
School of Computing and Information > Information Science |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
Aggregation, Signature, Data Quality, Data Preprocessing |
Date Deposited: |
07 Jun 2021 20:49 |
Last Modified: |
07 Jun 2021 20:49 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/41036 |
Available Versions of this Item
-
A NOVEL APPROACH FOR IMPROVING THE QUALITY OF DATA USING AGGREGATION MECHANISM. (deposited 07 Jun 2021 20:49)
[Currently Displayed]
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
|
View Item |