CGAcc: A Compressed Sparse Row Representation-Based BFS Graph Traversal Accelerator on Hybrid Memory Cube

Qian, Cheng and Childers, Bruce and Huang, Libo and Guo, Hui and Wang, Zhiying (2018) CGAcc: A Compressed Sparse Row Representation-Based BFS Graph Traversal Accelerator on Hybrid Memory Cube. Electronics, 7 (11). p. 307. ISSN 2079-9292

Preview

PDF
Published Version
Download (1MB) | Preview

Official URL: http://dx.doi.org/10.3390/electronics7110307

Abstract

Graph traversal is widely used in map routing, social network analysis, causal discovery and many more applications. Because it is a memory-bound process, graph traversal puts significant pressure on the memory subsystem. Due to poor spatial locality and the increasing size of today’s datasets, graph traversal consumes an ever-larger part of application execution time. One way to mitigate this cost is memory prefetching, which issues requests from the processor to the memory in anticipation of needing certain data. However, traditional prefetching does not work well for graph traversal due to data dependencies, the parallel nature of graphs and the need to move vast amounts of data from memory to the caches. In this paper, we propose a compressed sparse row representation-based graph accelerator on the Hybrid Memory Cube (HMC), called CGAcc. CGAcc combines Compressed Sparse Row (CSR) graph representation with in-memory prefetching and processing to improve the performance of graph traversal. Our approach integrates the prefetching and processing in the logic layer of a 3D stacked Dynamic Random-Access Memory (DRAM) architecture, based on Micron’s HMC. We selected HMC to implement CGAcc because it can provide quite high bandwidth and low access latency. Furthermore, this device has multiple DRAM layers connected to internal logic to control memory access and perform rudimentary computation. Using the CSR representation, CGAcc deploys prefetchers in the HMC to exploit the short transaction latency between the logic and DRAM layers. By doing this, it can also avoid large data movement costs. In the runtime, CGAcc pipelines the prefetching to fetch data from DRAM arrays to improve memory-level parallelism. To further reduce the access latency, several optimized internal caches are also introduced to hold the prefetched data to be Processed In-Memory (PIM). A comprehensive evaluation shows the effectiveness of CGAcc. Experimental results showed that, compared to a conventional HMC main memory equipped with a stream prefetcher, CGAcc achieved an average 3.51× speedup with moderate hardware cost.

Citation/Export:
Social Networking:	Share \|

Details

Item Type:

Article

Status:

Published

Creators/Authors:

Creators	Email	Pitt Username	ORCID
Qian, Cheng
Childers, Bruce	childers@cs.pitt.edu
Huang, Libo
Guo, Hui
Wang, Zhiying

Date:

7 November 2018

Date Type:

Publication

Journal or Publication Title:

Electronics

Volume:

Number:

Publisher:

MDPI AG

Page Range:

p. 307

DOI or Unique Handle:

10.3390/electronics7110307

Schools and Programs:

School of Computing and Information > Computer Science

Refereed:

Yes

Uncontrolled Keywords:

compressed sparse row, graph traversal, hybrid memory cube

ISSN:

2079-9292

Official URL:

http://dx.doi.org/10.3390/electronics7110307

Funders:

National Natural Science Foundation of China, YESSunder

Article Type:

Research Article

Date Deposited:

21 Jul 2021 20:04

Last Modified:

21 Jul 2021 20:04

URI:

http://d-scholarship.pitt.edu/id/eprint/41448

Metrics

Monthly Views for the past 3 years

Plum Analytics

Altmetric.com

Actions (login required)

View Item

My Account

Search

Browse

Information

CGAcc: A Compressed Sparse Row Representation-Based BFS Graph Traversal Accelerator on Hybrid Memory Cube

Abstract

Share

Details

Metrics

Monthly Views for the past 3 years

Plum Analytics

Altmetric.com

Actions (login required)

Connect with us

Send Comments or Questions

Feeds