Posts

GSoC Final Code Submission

 GSoC Final Code Submission     Google Summer of Code 2020 is coming to a close tomorrow. It has been an amazing experience and I am proud of what I have been able to accomplish. It has really opened my eyes to the world of open source and I plan to continue being involved long in the future. This post will serve to document the work I have done since starting Summer of Code in June and as my final submission for Summer of Code. Goal and Status     The goal of my work for summer of code was to support a partial reconfiguration like flow in SymbiFlow by defining different architectures for each partial reconfiguration region (also referred to as ROI or partition region). I have successfully supported this use case along with a number of examples and documentation. I believe it is at the point where others can use the work for serious research and build off of it with new features. Currently, a user is able to generate separate FASM for each partition region and the overlay and concaten

August 24th, 2020

The switch processing test case is now working, going to clean up a bit more to make documentation clear. Finishing up summer of code with documentation by the end of the week This has been a great experience and I am happy with all the work I have been able to accomplish this summer

August 21st, 2020 Progress Report

Rendering Test Case     I now have a working decomposed rendering test case on SymbiFlow implemented monolithically. Soon I will progress to defining the partition region architectures for this test case to implement separately. This will probably be put on the back-burner for a bit while I finish up the documentation and switch processing test case.  Switch Processing Test Case    The switch processing test case is a new test case I came up with this week as a simpler test case that emulates the sort of 'streaming/pipelined' test cases like rendering.  It is not actually inherently pipelined as each module could be fully combinational, but it demonstrates the consistent input and output from each partition region with arbitrary compute nodes mapped in each region.        The general idea is to pass the switch values into the first partition region, which produces some output based on the input and then passes onto the next partition region before finally being displayed on the

August 20th, 2020

Decomposed rendering benchmark is working on symbiflow monolithically Next step is to define the partition region architectures to actually map separately Also working on a benchmark that 'processes' the switch input two times (once for each partition region) before displaying it on the LEDs This is a simpler case but shows the potential use for streaming and/or pipelined hardware where different stages can be replaced on the fly

August 19th, 2020

Run into some issues with the rendering test case Need to do a better job decomposing logic into partition regions to ensure everything will fit and not have a large number of IOs Modified the HLS to accommodate for this by streaming data between two partitions

August 18th, 2020

Continuing to work on the rendering implementation test case Starting to get more concerned about licensing issues Also in the process of moving so my work this week will be a bit more limited

August 14th, 2020 Progress Report

Overlay Device Generation     Overlay generation was merged this week.  Remaining work will focus on making it easier to use and providing detailed documentation.  Testing     Working towards a set of more advanced tests for partition region and overlay architectures.  Specifically I'm going to be using the zynq 7020 architecture and a DMA controller connected to the PS to interface with partition regions.  This way I'll be able to have a streaming interface in and out of each partition region that can then have different logic mapped to change the results the PS receives.     For example, the partition region could contain an identity module, an add 1 module, or a rendering module, each of which can take the same inputs but produce different outputs.  These modules can then be changed without having to recompile the overlay.     The only concern I have that I'm stilling trying to figure out is license issues.  I can finish removing all xilinx IP for the most part, but the

August 13th, 2020

Gave my virtual kanata talk today Continuing to work on some partition region tests using DMA Switching from the complex rendering benchmark to a simpler identity function for now

August 12th, 2020

Overlay architecture generation got merged Starting to work on documentation Finishing presentation preparations for tomorrow

August 11th, 2020

 Continuing to work on test cases Will start documentation soon Preparing for my presentation at virtual kanata

August 10th, 2020

Continuing to work on Rosetta rendering benchmark Started looking into using litex as an example Cleaned up overlay architecture generation

August 7th, 2020 Progress Report

Overlay Device Generation     Overlay generation CI is now green and just needs some final cleanup before being merged.  I will then be working on documentation to ensure others are able to use my work.  Partition Regions with >1 Clock Region     To support large enough partition regions to be useful in many cases, they must be able to span across multiple clock regions.  This produces a number of complications.  Most importantly, clocks can't cross over clock region boundaries horizontally on their own.  To remedy this issue we must either artificially connect together where the clock enters the partition region for each clock region (aka the output nodes of the BUFH) or expose the entire global clock tree for a clock that enters the partition region.     Currently I am exposing the global clock tree by creating a synth tile at the output of a BUFG.  This seems to be working but will potentially create some issues in the future with sharing BUFGs between multiple partition regi

August 6th, 2020

Cleaned up overlay generation pull request Still working on supporting partition regions with more than one clock region Placing a synth tile at the output of a BUFG seems to expose enough of the global clock tree to allow clock routing in the two clock regions Having issues with the constant network that seem like the constant network being unable to route across clock region boundaries

August 5th, 2020

Working on a much larger test of partition regions Running into a few issues with io constraints, working on a solution now Started looking for users HackerFoo seems interested in porting a PCIe harness

August 4th, 2020

Working on supporting partition regions that cross multiple clock regions Connecting together BUFH outputs in the partition region so clocks can be shared between clock regions I think this should work, but if it doesn't I can expose the entire clock tree to the partition region

August 3rd, 2020

Working on supporting partition regions that cross multiple clock regions Need to have switch between clock syn io and wire where clock is fed into each other clock region At least this is the current plan, there are potentially other ways that will work

July 31st, 2020 Progess Report

Overlay Device Generation     I made a few final fixes to the overlay architecture generation, which should now be very close to being merged.  An overlay and multiple partition regions can be defined and mapped to separately and then have the resulting fasm merged automatically merged by cmake before generating a bitstream.  CI is red right now because of a regression in fasm2bels, but that should be fixed soon.  PRFlow Rendering Benchmark with SymbiFlow Partition Regions     One of the important test cases I would like to use is the PRFlow implementation of the Rosetta rendering benchmark.  I have this test working in SymbiFlow on the 7020 with no partition regions, so the next step is to define a partition region to separate the user rendering application from the axi dma logic.     I still do not have a good algorithm for automatically picking partition pins, so for now I will have to either manually pick them or write some simplistic algorithm.     One major problem I have run int

July 30th, 2020

Still making fixes to overlay generation to get it merged Should be done in a few more days Working on a test case for the rosetta rendering benchmark that uses partition regions Confirmed that multiple different merged fasm tests can be generated by CMake and work on hardware

July 29th, 2020

Mostly fixed merged bit stream generation for overlays Still failing some CI tests Building locally now to test on hardware Working monolithic Rosetta/PRFlow rendering benchmark on symbiflow Next step is to put rendering module into a separate partition region

July 24th, 2020 Progress Report

Heterogeneous Synth Tiles     I made a number of final fixes to heterogeneous synth tiles and got it merged.  The biggest change was fixing the pin map csv generation to allow the removal of taking synth tiles as an input to ioplace generation.  I did this by merging synth_tile_to_pinmap_csv and create_pinmap_csv so pin map csvs can have both synth and real IOs. Overlay Device Generation Testing     I now have more evidence that overlay generation and heterogeneous synth tiles work.  I created a counter test which generates an overlay to feed two clocks into a partition region, one with half the frequency of the other.  I created two four bit counters inside the partition region, one controlled by each clock, and connected to LEDs. This test works on hardware, showing that overlays and partition regions line up correctly and that heterogeneous synth tiles work because clocks are fed in on the same tile location. Overlay Device Generation CMake    This week I also started working on in

July 23th, 2020

Used py-spy to profile graph node importing Made more fixes to stacked synth tiles which will hopefully go green now Started adding cmake to merge together fasm before producing bitstream

July 22nd, 2020

Spent most of today making changes to heterogeneous synth tiles Now doing what I think is a better solution Adding synth tiles to pin map csv correctly instead of checking synth tiles in create_ioplace Need to make some fixes to the roi harnesses in prjxray-db Working on checking if my fix for brams with a shared read/write port

July 21st, 2020

Realized I made a mistake in the 2 clock overlay generation Caused the second clock to not get generated Changed overlay to use PLL properly 2 clocks are properly being fed into the partition region now Can be seen in the following video. The top row is a 4 bit counter running off a 100 MHz clock and the bottom row is a 4 bit counter running off a 50 MHz clock generated outside the partition region Video: Started working on supporting brams with shared read/write ports Required for Rosetta rendering benchmark

July 20th, 2020

Confirmed that one clock is being routed into the partition region by implementing a counter test Building an overlay for a test that feeds 2 clocks into the partition region Also tests heterogeneous synth tiles Tomorrow will start on having the ability to merge FASM for an overlay and N partition regions and then generate a bitstream from that

July 17th, 2020 Progress Report

Overlay Device Generation     The big news of this week is that overlay generation with a single partition region works on hardware! I am still using the required fasm features in the design.json definition to merge the overlay and partition region fasm, but it should also be possible to just merge the end fasm together (this will be required for multiple partition region overlays). Overlay Tests     The main test I have been running is the simplest one I could imagine to show the overlay generation working on hardware. The overlay maps the switches and LEDs into the partition region, and the partition region connects these together.  There are two versions, one that connects them directly, and one that reverses which switch connects to which LED.     First thing next week will be adding more tests, especially testing more complex logic inside the partition region.  I also want to try adding another partition region and move around the first one to make sure the overlay generation work

July 16th, 2020

Spent most of the day debugging overlays Need to have an explicit bufg and SYN_OBUF for the clock Constant net routing wasn't working properly Turns out VCC and GND tiles for the overlay were not getting connected to constant nets correctly Made a fix and regenerating overnight, hopefully this should fix it

July 15th, 2020

Synth tile from node was merged into arch defs master Finished writing test cases for overlay.py and got those changes merged into prjxray Realized there was an issue with overlay io constraints VPR packed synth IOs into real IOPADs in the netlist, making placement on synth IOs impossible Solution is to add a SYN_OBUF and SYN_IBUF pb type to the architecture, allowing VPR to distinguish between real and synth IOs Currently these SYN_OBUFs and SYN_IBUFs have to be added explicitly in verilog, but I should be able to add a pass to yosys to add them automatically

July 14th, 2020

Synth tile from node docstrings created, should be merged tomorrow Working on fixing overlay io place generation Starting out with a simpler test case and copying fasm generated for overlay into pr design.json required features

July 13th, 2020

Rebased synth io from node on master with the updated prjxray-db and is running in CI now Worked on adding a first test for partial reconfiguration regions with overlay Making a number of CMake changes to allow for generating the overlay and pr separately and then merging

July 10th, 2020 Progress Report

Synth Tiles From Node       New prjxray-db should be merged soon, allowing for the synth tile from node work to be merged early next week. Heterogeneous Synth Tiles     The heterogeneous synth tiles pull request is ready to be merged, other than one vivado diff-fasm test that failed.  After loading the design into vivado and tracing back pips to compare with the original fasm, it is clear this is an uncovered bug in fasm2bels.  acomodi and litghost are planning to look into a fix next week so this work can be merged.             Overlay Device Generation      I created a pull request for adding the overlay python object to prjxray.  I need to do some cleanup there and add some prjxray test cases.          I also believe overlay generation is working completely. I have yet to write a comprehensive test case, but I am able to generate reasonable architecture and rr_graph files. The next step will be writing an initial testcase for generating different PR regions and merging the fasm wi

July 9th, 2020

PR for heterogeneous synth tiles has an issue with one diff_fasm test (dram32x1d) Trying to diagnose with fasm2bels so the PR can be merged soon Database changes were merged so hopefully synth tile from node name can be merged soon Overlay generation is still coming along well Running tests and slowly working out bugs, but the support should be there

July 8th, 2020

Heterogeneous synth tiles should be merged soon Merged change the prjxray that should allow new prjxray-db to be merged into arch-defs This frees up merging synth tile from node name Overlay generation is coming along Need to decide if synth tiles should/must be associated with a specific partition region or if they can just be lumped together

July 7th, 2020

Proper stacked synth tile support seems to work now for simple test cases Running on CI now Need to develop more robust test, but this probably requires working overlay generation first Next steps are to merge all these changes into my local overlay generation branch and try to get overlay generation working Hopefully a lot of these features can be merged in the meantime Still waiting on sdf fixes for prjxray-db

July 6th, 2020

Continued work on heterogeneous synth IO tiles Creating a new tile definition for every location so the capacity matches up correctly Still need to modify routing import and io place generation to match these changes

July 2nd, 2020 Progress Report

Synth Tiles From Node       This week I added a change to prjxray that generates an explicit synth io type in the design.json rather than inferring it from the io name.  Tim pushed a new database with these fixes to prjxray-db, which should hopefully allow my code for synth tiles from a node to integrate seamlessly with the current ROI harness. Heterogeneous Synth Tiles     Supporting heterogeneous synth tiles has been my big push this week, and I think it is close to working. I still need to decide the correct way to propagate the correct z location for each synth tile, but it will likely be adding a z loc to the synth_tiles.json.  I also need to decide if tiles should have more capacity than they need, or if I need to create a new tile definition for each synth tile location depending on how many synth tiles can be placed there.             Overlay Device Generation     I created an overlay python object as an inverse of the roi object.  It allows a design to cut out specific regions

July 1st, 2020

Submitted a pull request for stacked synth tiles, still not finished and working out issues now Got design.json generation with explicit port type definitions merged Created an overlay python object to act as the inverse of an roi

June 30th, 2020

Submitted pull request for my updated synth tile creation Submitted a pull request to prjxray to tweak the design.json generation and be able to merge the ROI into partition regions Working heterogeneous synth tiles creation. Pull request almost done. Need to modify top_io.place generation and add a test case.

June 29th, 2020

Have working heterogeneous synth tiles Need to merge these into the current master and should be able to make a pull request tomorrow Need to do a bit more testing to ensure it works with multiple tiles at same location Need to modify top_io.place generation to support it

June 26th, 2020 Progress Report

Zynq 7020       This week I have made a lot of progress on a mix of items.  I added final touches to zynq 7020 support, including CI and fixing the pynq z1 counter test to include an explicit BUFG.  These changes were merged into master earlier this week, but there is still a bit more work to test the 7020 support for other boards so they can be supported as well. Map to VPR Coord Fix      Much of this week was also spent on diagnosing and creating a proper fix for some synth tile locations mapping to multiple logical tiles. #1549 should be merged soon to fix this issue and allow synth tiles to be placed at arbitrary locations.  This fix, along with the ability to create synth IOs from node names, will go a long way in making the ROI a more general partition region that can be easily defined. Heterogeneous Synth IO Tiles       I am close to having heterogeneous synth IO tiles supported in symbiflow.  These use the heterogeneous tiles feature of VPR upstream, combined with increased c

June 25th, 2020

Got zynq 7020 changes merged into master Created filter to fix synth tile map to coordinates picking up multiple tiles Started researching heterogeneous tiles to have a better multi synth IO pad implementation

June 24th, 2020

Fixed zynq test so pull request should be good to go now Changed design.json and fixed issues with custom synth io generation Working on fixing database with two logical tiles pointing to the same phy_tile

June 23rd, 2020

Custom synth IOs almost working Trying to fix bug with rr_graph creation with custom synth IOs

June 22nd, 2020

Made changes to zynq-boards pull request to allow it to be merged soon Figured out that issues with my synth tile creation was likely due to tile names at the border of a clock region mapping to two vpr coordinates Created a fix for this issue by picking non-empty tile

June 19th, 2020 Progress Report

Node Only Synth IO Tiles    This week I started implementing support for a more generic roi like region. I have a mostly working ability to generate synth io tiles from a vivado node name that crosses the partition region border.  This will allow for easier harness creation and make matching up the overlay with the partition region much easier.         I accomplished this feature by searching the connections database for a pair of wires on the given node with minimum Manhattan distance where one is outside the region and one is inside the region.  This method also makes it trivial to get the wire the overlay synth tile needs to connect to (we already have the pair). WIP Pull Request Multiple Synth IOs at the Same VPR Coordinate     I am now working on implementing the ability to place multiple synth io tiles at the same vpr grid location.  This will especially be useful for routing multiple clocks into a partition region, as all of these clocks must be routed into the partition region

June 18th, 2020

Adding support for multiple synth io tiles at the same vpr location Important for routing multiple clocks into the partition region Also useful for arbitrary harnesses

June 17th, 2020

Added the ability to create synth tiles from only a node name (Automatically picks correct wire).  Test PR here. Still needs work to successfully make it through flow Need to add checks that node specified will work for IO direction intended Need to change synth tiles to support multiple IPINs and OPINs, especially for multiple clocks

June 16th, 2020

Figured out how to find the correct wire to make a synth pad given a node that crosses the boundary Going to finish implementing tomorrow and then should be able to switch to creating the overlay

June 15th, 2020

Mostly worked on figuring out how to choose partition pins Can trace back nets of vivado partition pins to find reasonable nodes Want to only require providing a node that crosses the partition boundary and then figure out the wire closest to the boundary but outside the region

June 12th, 2020 Progress Report

Progress    After this week I have gained a much better understanding of how to approach the problem of partial reconfiguration.  My approach will diverge quite a bit from what I originally planned on, based on advice from litghost.  I also familiarized myself much more with the symbiflow-arch-defs cmake and begun creating a setup to support the partial reconfiguration regions discussed.  I also think this approach to partial reconfiguration closes the cap between offline and online pr. Approach    Each of the partial reconfiguration regions plus the overlay will be a different symbiflow-arch-defs device (they could potentially be combined in some way in the future to be considered sub-devices of a single full device).  Each partial reconfiguration region will be restricted in a very similar fashion to the ROI based on information in the design.json, or other similar file.  The overlay device will be the reverse, containing the entire chip but the partition regions.  We will make the c

June 11th, 2020

Very productive meeting today with Keith that has shifted by perspective and plan a bit. Align partition region with frames Place synth tiles right outside of roi connecting to a wire that crosses the boundary Find crossing wires with sql query Do inverse for overlay generation in vpr Exclude all rois from overlay Place synth titles right inside the roi connecting to the same wire that is crossing the boundary Keep in symbiflow-arch-defs for now One device definition for each pr region and one for the overlay too Potentially could be automated to place correct cmake files based off of input file Include overlay/harness fasm with each roi device Fasm allows for repeated data so just cat all of the pr region fasms together

June 10th, 2020

Finished reintegrating work to generate synth tiles for partition pins Just need to finish testing to make sure it works as expected Still contemplating the proper inputs to this project if it were to be separated from arch-defs Haven't found a function to take an arbitrary subsection of the rr_graph I have a feeling this is difficult, due to the lack of location information in the rr_graph, unless there is some information I am missing Could take something like the channels.db as input

June 9th, 2020

Continuing work to get previous work on synth_tiles reintegrated Creates a synth_io pad near each partition pin and then creates a switch between the partition pin and the IO pad to force VPR to route to the correct location Met with Tim and discussed separating this project from arch-defs Set a meeting with Keith on Thursday to discuss partial reconfiguration

June 8th, 2020

Started adding partition pin options in cmake for testing the single partition region Began reintegrating previous work for generating synth tiles from vivado partition pins Tried to build VPR with graphics to see placements but ran into issues with getting it to run

June 5th, 2020 Progress Report

    Today concludes the first week of Summer of Code 2020, and my first progress report.  This has been a very useful week in terms of research and understanding the proper place to start with writing code next week. Progress     The biggest question I have been trying to answer this week is where in the SymbiFlow tool chain to implement offline partial reconfiguration regions.  I created a use cases document to consider potential uses for this work and think about the what different users of partial reconfiguration would require.  A few consistent and interrelated requirements of user for offline partial reconfiguration are small rr_graphs, fast compilation, and high levels of isolation between regions. For these reasons, I will proceed with implementing partial reconfiguration in symbiflow-arch-defs rather than VtR, as having physically separate (and therefore smaller) architecture files and rr_graphs will best reach these common user requirements for offline partial reconfiguratio

June 4th, 2020

Attended VtR meeting Interested in work on placement constraints in VtR, especially constraining within a rectangle Did more research into PR in arch-defs or VtR, think they can be complementary (for different use cases) VtR placement constraint works gets you most of the way there for PR based in VtR Leaning towards PR in arch-defs out of optimizing for speed after "overlay" generation and allowing much smaller rr_graphs for loading on board Still working on trying to run custom bitstreams on the zynq with PS code generated by Vitis/SDSoC Will help with testing real world PR later on Some progress loading from SD card using https://github.com/antmicro/zynq-mkbootimage Fixed mkbootimage bug, will submit pull request soon

June 3rd, 2020

Edited and sent intro email Spent time refreshing myself on partition pin work from last summer, will look into merging code for custom synth pads in ROI as first step Tried uploading a xilinx generated elf and bitstream to the pynq z1 with mixed success

June 2nd, 2020

Added user requirements to use cases document Made changes to intro email Rebased 7020 pull request to include updated prjxray-db submodule Leaning towards implementing partial reconfiguration regions in arch defs based on user requirements

June 1st, 2020

Today was the first official day of summer of code Wrote a document on potential use cases for my project Link Tomorrow I will dive into researching whether to implement partial reconfiguration in symibflow-arch-defs or VtR

Accepted to Summer of Code

My proposal for Google SoC with SymbiFlow was accepted. Here is a link to the proposal: Proposal