GSoC Final Code Submission

 GSoC Final Code Submission

    Google Summer of Code 2020 is coming to a close tomorrow. It has been an amazing experience and I am proud of what I have been able to accomplish. It has really opened my eyes to the world of open source and I plan to continue being involved long in the future. This post will serve to document the work I have done since starting Summer of Code in June and as my final submission for Summer of Code.

Goal and Status

    The goal of my work for summer of code was to support a partial reconfiguration like flow in SymbiFlow by defining different architectures for each partial reconfiguration region (also referred to as ROI or partition region). I have successfully supported this use case along with a number of examples and documentation. I believe it is at the point where others can use the work for serious research and build off of it with new features. Currently, a user is able to generate separate FASM for each partition region and the overlay and concatenate the separate FASM together to generate a full bitstream. With a bit more work, it should be possible to generate real partial bitstreams for each partition region.

Zynq 7020 Support 

    The first pull request I got merged was integration of zynq 7020 support into symbiflow-arch-defs. This work is based off of previous out-of-date work by antmicro to add 7020 support and works on hardware. Specifically the pynqz1 board was used for testing, but since then more boards have been tested. This was a good starting place to become familiar with pull requests and working in symbiflow-arch-defs. The ability to use the pynqz1 board was also an important first step to porting previous separate compilation work to symbiflow. Link to PR: https://github.com/SymbiFlow/symbiflow-arch-defs/pull/1506 

NULL Tile Filtering

    The next pull request was a small fix for mapping tile names to vpr coordinates. Some vpr coordinates contained both a real tile and a NULL tile, causing issues when trying to figure out the location of a tile. The solution was to filter out all NULL tiles to always choose the real tile as a NULL tile should never be used as synthetic IO tiles. This fix allowed a wider range of tiles to be used as synthetic IOs. Link to PR: https://github.com/SymbiFlow/symbiflow-arch-defs/pull/1549

Synthetic IO Tile Wire Inference

    This pull request is really where the core of my work this summer began. It adds the ability to infer the wire required to place a synthetic IO from a Vivado node name, rather than having to manually specify a wire. This does not seem like a big change, but it makes the process of defining partition regions and overlays significantly easier because the user does not have to manually search in Vivado for the correct wire. Inferring wires also opens the door to future work to automatically choose interfacing nodes so a user does not have to specify the low level ROI definition at all. Link to PR: https://github.com/SymbiFlow/symbiflow-arch-defs/pull/1561

Heterogeneous Synthetic IO Tiles

    The next big focus of my work was allowing multiple interface ports to cross the ROI border at the same tile. Previously, only one synth IO tile could be placed at any one vpr grid coordinate, limiting to one interface port per tile. I got around this limitation by using a VPR feature called heterogeneous tiles, which allows multiple different kinds of tiles to be placed at the same VPR coordinate. Getting this to work required many changes to how synthetic IO tiles were inserted into the architecture. Link to PR: https://github.com/SymbiFlow/symbiflow-arch-defs/pull/1565

Overlay Architecture Generation

    The last big feature addition for my work was overlay architecture generation. By "overlay architecture" I mean an architecture that contains everything in the full architecture, except for what is contained in partition regions. Typically this includes chip IOs and the PS region if it is a zynq device, as well as logic. The overlay architecture also has to interface with each of the contained partition region architectures using synthetic IO tiles. The synthetic IO tiles were placed inside the hollowed out partition regions such that they align with the synthetic IO tiles added in the partition region architecture.

    This pull request also added a number of other features to allow a fully working overlay architecture. First was the addition of synthetic buffers. Before this addition, top level IOs that should have been placed at synthetic IO locations got packed into real IOs by VPR and therefore could not be placed as synthetic IOs. The synthetic buffers allowed a definition in verilog for top level IOs to ensure they were not packed into real IOs. Currently this definition is manual and somewhat tedious, but it should be able to be automated with a yosys iopad map pass in the future.

    The other additional feature this pull request added was a refactoring of the bitstream target creation in CMake to allow the FASM generated by multiple test targets to be concatenated before generating a bitstream. This change allows a user to map verilog to each partition region and the overlay separately and concatenate the results to generate a full bitstream. A simple swbut overlay generation test was added to make sure everything works. Link to PR: https://github.com/SymbiFlow/symbiflow-arch-defs/pull/1587

Documentation and Example

    Lastly, I wrote documentation using a more complicated example. The example (called switch_processing) contains two partition regions where the switch values are fed through each of the partition regions before being connected to LEDs. 

    Although this is not a true real world example, it demonstrates the power of the approach I have been working on over summer. Each partition region can be mapped separately with either a module that processes the input to produce output or just an identity module. The processing in each partition region can then be turned on or off without having to do any new mapping. Concatenating different FASM and generating a new bitstream is all that is required to change the hardware function significantly. An approach similar to this is very useful in real world examples, where the overlay may contain a lot of logic and take a long time to map. Instead of having to remap the entire design when changing one module, only a small portion must be remapped.

    This example was then used throughout the documentation to demonstrate how to define an overlay architecture and corresponding partition regions. Link to PR: https://github.com/SymbiFlow/symbiflow-arch-defs/pull/1649

Other Work

    Although I have not yet been able to release the work due to licensing reasons (hopefully will be resolved soon), I have also been working on porting a more real world example to the symbiflow partial reconfiguration flow. This test is based off of the Rosetta rendering HLS benchmark which communicates with the zynq PS to accelerate rendering a 3D model. I have been able to get the design working on symbiflow when implemented monolithically (one module mapped to a full device architecture). I am close to having a design working with partition regions and will make this open source as soon as I clear up licensing issues.

Comments