Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

A NOVEL APPROACH FOR IMPROVING THE QUALITY OF DATA USING AGGREGATION MECHANISM

Al-khateeb, Shadi (2021) A NOVEL APPROACH FOR IMPROVING THE QUALITY OF DATA USING AGGREGATION MECHANISM. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

This is the latest version of this item.

[img]
Preview
PDF
Download (3MB) | Preview

Abstract

Due to the inception of the big data applications, it is becoming increasingly important to manage and analyze large volumes of data. However, it is not always possible to efficiently analyze very big chunks of detailed data. Thus, data aggregation techniques emerged as an efficient solution for reducing the data size and providing summary of the key information in the original data. For example, yearly stock sales are used instead of daily sales to provide a general summary of the sales. Data aggregation aims to group raw data elements in order to facilitate the assessment of higher-level concepts. However, data aggregation can result in the loss of some important details in the original data, which means that the aggregation should be done in a creative manner in order to keep the data informative even if there is a loss in some details. In some cases, we may have only aggregated versions of the data due to the data collection constraints as well as high storage and processing requirements of the big data. In these cases, we need to find the relationship between aggregated datasets and original datasets. Data disaggregation is one solution for this issue. However, accurate disaggregation is not always possible and easy to utilize.
In this dissertation, we introduce a novel approach to improve the quality of data to be more informative without disaggregating the data. We propose information preserving signature based preprocessing strategy, as well as an aggregation-based information retrieval architecture using signatures. We compensate the loss of details in the raw data by highlighting the most informative parts in the aggregated data. Our approach can be used to assess similarity and correspondence between datasets and to link aggregated historical data with most related datasets. We extended our approach to be used with time series datasets. We also created hybrid signatures to be used at any aggregation level.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: University of Pittsburgh ETD
Status: Unpublished
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Al-khateeb, Shadisha68@pitt.edusha680000-0002-5896-3462
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Committee ChairZadorozhny, Vladimirviz@pitt.eduVIZ0000-0001-6420-1926
Committee MemberMunro, Paulpwm@pitt.edupwm0000-0003-2398-9248
Committee MemberPelechrinis, Konstantinoskpele@pitt.edukpele@pitt.edu0000-0002-6443-3935
Committee MemberGrant, Johngrant@cs.umd.edu
Date: 7 June 2021
Date Type: Publication
Defense Date: 8 February 2021
Approval Date: 7 June 2021
Submission Date: 9 April 2021
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 101
Institution: University of Pittsburgh
Schools and Programs: School of Computing and Information > Information Science
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: Aggregation, Signature, Data Quality, Data Preprocessing
Date Deposited: 07 Jun 2021 20:49
Last Modified: 07 Jun 2021 20:49
URI: http://d-scholarship.pitt.edu/id/eprint/41036

Available Versions of this Item

  • A NOVEL APPROACH FOR IMPROVING THE QUALITY OF DATA USING AGGREGATION MECHANISM. (deposited 07 Jun 2021 20:49) [Currently Displayed]

Metrics

Monthly Views for the past 3 years

Plum Analytics


Actions (login required)

View Item View Item