Framework on FPGA
HitGraph, an FPGA framework to accelerate graph processing based on the edge-centric paradigm. HitGraph takes in an edge-centric graph algorithm and hardware resource constraints, determines design parameters, and then generates a Register Transfer Level (RTL) FPGA design. This makes accelerator design for various graph analytics transparent and
user-friendly by masking internal details of the accelerator design process. HitGraph enables increased data reuse and parallelism through novel algorithmic optimizations:
(1) an optimized data layout that reduces non-sequential external memory accesses
(2) an efficient update merging and filtering scheme to reduce the data communication between the FPGA and external memory
(3) a partition skipping scheme to reduce redundant edge traversals for non-stationary graph algorithms.
Based on our design methodology, we accelerate Sparse Matrix-Vector Multiplication (SpMV), PageRank (PR), Single Source Shortest Path (SSSP), and Weakly Connected Component (WCC).
We use Intel Stratix 10 1SX280LH3F55I3XG to conduct our experiments.
Find the paper on https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8685122
We used "ForeGraph: Exploring Large-scale Graph Processing on Multi-FPGA Architecture (https://web.cs.ucla.edu/~chiyuze/pub/fpga17.pdf) and "GraphOps: A dataflow library for graph analytics acceleration" (https://dl.acm.org/doi/pdf/10.1145/2847263.2847337) for baseline comparisons
- Targeted FPGA: Intel Stratix 10 1SX280LH3F55I3XG
- Tools: Intel Quartus 20.1
The hitgraph core contains the implementations of graph algorithms discussed in the paper (shown in yellow).
We have assumed that partial input and output data are stored in the internal memory before loaded into algorithm core (Refer to the above figure).
include 4 algorithm processing cores for Sparse Matrix-Vector Multiplication (SpMV), PageRank (PR), Weakly Connected Component (WCC), Single Source Shortest Path (SSSP). Make each module as the top module while running each algorithm. Use these files for synthesis.
IP core templates generated by Intel Quartus 20.1. These can be used as reference while creating the ip-cores.
Contains all the individual modules.
contains complete testing flow for the configuration of 1 partition with 1 pipeline
Contains unit test benches for the core modules.
Find: IP catalog => basic function => arithmetic => floating point function
Name: add
Other Info: choose Generate Enable and generate HDL
Find: IP catalog => basic function => arithmetic => floating point function
Name: mult
Other Info: In Functionality choose Generate Enable and generate HDL
- Create a new project using Intel Quartus 20.1 and use Intel Stratix 10 1SX280LH3F55I3XG as the targeted device
- Include project files in algorithm_processing_core folder
- Select the top module based on which algorithm to run on hardware
- Set up IP cores as mentioned in "IP cores Configurations"
- Run synthesis
- For simulating internal modules (Unit tests), use test benches in test_tb folder