High-Level Synthesis

Posted on Sat 09 November 2024 in high_level_synthesis

Overview

My postdoctoral work was based on the question of whether FPGAs could perhaps play a role in exascale computing. Our consortium was looking to leverage high-level synthesis tools to facilitate the deployment of conventional HPC workloads onto FPGA accelerators. Many numerical algorithms are relatively straightforward. For a start, there isn't a great deal of control flow. Instead the majority of the computational hotspots comprise nested for-loops, with a computational kernel comprising a series of arithmetic operations. These algorithms then are prime candidates for dataflow computing, and consequently appealing targets for FPGAs.

I didn't know a single thing about FPGAs when I began this work, and even less about writing firmware. Rather, it was a complete trial-by-fire. By the end, I have a vague appreciation for how to use the high-level synthesis tools that we were using, but for the most part it was a black box -- I had next to no idea about how the software I was writing would be implemented in the FPGA fabric. As I've worked in industry, I've come into contact with FPGAs every now and then, but I've never really done anything with them. I have, however, steadily gotten a little bit more of an appreciation for firmware development and the concepts underlying FPGAs, partially by osmosis, but principally through picking them up every now and then to try and become a little less ignorant.

One particular idea I keep coming back to is that I'd love to be able to understand how high-level synthesis actually works. There's clearly a tonne of academic literature out there, but my understanding is much too rudimentary to just start reading that. Sadly, virtually all high-level synthesis implementations are proprietary tools that are developed by Xilinx and Intel/ Altera, so there isn't much in the way of software I can dig into and try to understand. So, I decided it was about time to make my own high-level synthesis software. Of course, it's going to be terrible. By starting with a really rudimentary implementation, the hope is that I'll be able to incrementally build up my understanding. Aspirationally, the objective is to create something that can ingest a basic computational kernel in C and derive some suboptimal RTL implementation of it.

Chapters

Part I LLVM Mapping an LLVM IR CFG to dataflow.
Part II Scheduling Deriving a basic schedule from the CFG.
Part III FSMD Mapping from a schedule to RTL using an FSM with datapath.
Part IV Memory Subsystems Inferring RAMs from loads and stores.
Part V Loop Unrolling Dealing with cyclic CFGs through loop-unrolling.