Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

MapReduce analysis for cloud-archived data

Palanisamy, B and Singh, A and Mandagere, N and Alatorre, G and Liu, L (2014) MapReduce analysis for cloud-archived data. Proceedings - 14th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2014. 51 - 60.

[img]
Preview
PDF
Available under License : See the attached license file.

Download (1MB) | Preview
[img] Plain Text (licence)
Available under License : See the attached license file.

Download (1kB)

Abstract

Public storage clouds have become a popular choice for archiving certain classes of enterprise data - for example, application and infrastructure logs. These logs contain sensitive information like IP addresses or user logins due to which regulatory and security requirements often require data to be encrypted before moved to the cloud. In order to leverage such data for any business value, analytics systems (e.g. Hadoop/MapReduce) first download data from these public clouds, decrypt it and then process it at the secure enterprise site. We propose VNCache: an efficient solution for MapReduceanalysis of such cloud-archived log data without requiring an apriori data transfer and loading into the local Hadoop cluster. VNcache dynamically integrates cloud-archived data into a virtual namespace at the enterprise Hadoop cluster. Through a seamless data streaming and prefetching model, Hadoop jobs can begin execution as soon as they are launched without requiring any apriori downloading. With VNcache's accurate pre-fetching and caching, jobs often run on a local cached copy of the data block significantly improving performance. When no longer needed, data is safely evicted from the enterprise cluster reducing the total storage footprint. Uniquely, VNcache is implemented with NO changes to the Hadoop application stack. © 2014 IEEE.


Share

Citation/Export:
Social Networking:
Share |

Details

Item Type: Article
Status: Published
Creators/Authors:
CreatorsEmailPitt UsernameORCID
Palanisamy, BBPALAN@pitt.eduBPALAN
Singh, A
Mandagere, N
Alatorre, G
Liu, L
Date: 1 January 2014
Date Type: Publication
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Journal or Publication Title: Proceedings - 14th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2014
Page Range: 51 - 60
Event Type: Conference
DOI or Unique Handle: 10.1109/ccgrid.2014.13
Institution: University of Pittsburgh
Schools and Programs: School of Information Sciences > Information Science
Refereed: Yes
ISBN: 9781479927838
Date Deposited: 24 Jun 2014 20:09
Last Modified: 02 Feb 2019 16:55
URI: http://d-scholarship.pitt.edu/id/eprint/22063

Metrics

Monthly Views for the past 3 years

Plum Analytics

Altmetric.com


Actions (login required)

View Item View Item