diff --git a/doc/verilog_manual.md b/doc/verilog_manual.md index 311c59a..5d2cd20 100644 --- a/doc/verilog_manual.md +++ b/doc/verilog_manual.md @@ -1,4 +1,92 @@ -# Firmware +Copyright 2023 (C) Peter McGoron. + +This file is a part of Upsilon, a free and open source software project. +For license terms, refer to the files in `doc/copying` in the Upsilon +source distribution. + +__________________________________________________________________________ + +The Hardware Maintenance Manu is an overview of the hardware (non-software) +parts of Upsilon. + +# Crash Course in FPGAs + +Upsilon runs on a Field Programmable Gate Array (FPGA). FPGAs are sets +of logic gates and other peripherals that can be changed by a computer. +FPGAs can implement CPUs, digital filters, and control code at a much +higher speed than a computer. The downside is that FPGAs are much more +difficult to program for. + +A large part of Upsilon is written in Verilog. Verilog is a Hardware +Description Language (HDL), which is similar to a programming language +(such as C++ or Python). + +The difference is, is that Verilog compiles to a *piece of hardware* that +deals with individual bits executing operations in sync with a clock. This +differs from a *piece of software*, which is a set of instructions that a +computer follows. Verilog is usually much less abstract than regular code. + +Regular code is tested on the system in which it is run. Hardware, +on the other hand, is very difficult to test on the device that it +is actually running on. Hardware is usually *simulated*. This project +primarily simulates Verilog code using the program Verilator, where the +code that runs the simulation is written in C++. + +Instead of strings, integers, and classes, the basic components of all +Verilog code is the wire and the register, which store bits (1 and 0). +Wires connect components together, and registers store data, in a similar +way to variables in software. Unlike usual programming languages, where +code executes one step at a time, most FPGA code runs at the tick of +a clock. Each block of code exceutes in parallel. + +To compile Verilog to a format suitable for execution on an FPGA, you +*synthesize* the Verilog into a low-level format that uses the specific +resources of the FPGA you are using, and then you run a *place and route* +program to allocate resources on the FPGA to fit your design. Running +synthesis on its own can help you understand how much resources a module +uses. Place-and-route gives you *timing reports*, which tell you about +major design problems that outstrip the capabilities of the FPGA (or the +programs you are using). You should look up what "timing" on an FPGA is +and learn as much as you can about it, because it is an issue that does +not happen in standard software and can be very difficult to fix when +you run into it. + +Once a bitstream is synthesized, it is loaded onto a FPGA through a cable +(for this project, openFPGALoader). + +## Recommendations for Learners + +Kishore Mishra. Advanced Chip Design. + +[Gisselquist Technology][GT] is the best free online resource for FPGA +programming out there. These articles will help you understand how to +write *good* FPGA code, not just valid code. + +[GT]: https://zipcpu.com/ + +Here are some exercises for you to ease yourself into FPGA programming. + +* Write an FPGA program that implements addition without using the `+` + operator. This program should add each number bit by bit, handling + carried digits properly. This is called a *full adder*. +* Write an FPGA program that multiplies two signed integers together, + without using the `*` operator. The width of these integers should + not be hard-coded: it should be easy to change. What you write in + this is something that is actually a part of this project: see + `boothmul.v`. You do not (and should not!) write it just like Upsilon + has written it. +* Write an FPGA program that communicates over SPI. For simplicity, + you only need to write it for a single SPI mode: look up on the internet + for details. There is an SPI slave device in this repository that you + can use to simulate an end for the SPI master you write, but you should + write the SPI slave yourself. For bonus points, connect your SPI master + to a real SPI device and confirm that your communication works. + +For each of these exercises, follow the complete "Design Testing Process" +below. At the very least, write simulations and test your programs on +real hardware. + +# Verilog Programming Guidelines See also [Dan Gisselquist][1]'s rules for FPGA development. @@ -30,16 +118,144 @@ See also [Dan Gisselquist][1]'s rules for FPGA development. a memory location. * Keep all Verilog as generic as possible. * Always initialize registers. +* Rerun tests after every change to the module. -# Software +## Design Testing Process -* Use free and open source libraries only. All libraries must be compatible - with the GNU GPL v3.0. -* Do not dynamically allocate memory. -* Use the [SEI CERT C Coding Standard][2] as a guideline. -* Use the [Linux kernel style guide][3] as a guideline (many parts of it - are not relevant for this project). -* Try to offload as much processing as possible to the computer. +### Simulation -[2]: https://wiki.sei.cmu.edu/confluence/display/c/SEI+CERT+C+Coding+Standard -[3]: https://www.kernel.org/doc/Documentation/process/coding-style. +When you write or modify a verilog module, the first thing you should do +is write/run a simulation of that module. A simulation of that module +should at the minimum compare the execution of the module with known +results (called "Ground truth testing"). A simulation should also consider +edge cases that you might overlook when writing Verilog. + +For example, a module that multiplies two signed integers together should +have a simulation that sends the module many pairs of integers, taking +care to ensure that all possible permutations of sign are tested (i.e. +positive times positive, negative times positive, etc.) and also that +special-cases are handled (i.e. largest 32-bit integer multiplied by +largest negative 32-bit integer, multiplication by 0 and 1, etc.). + +Writing simulation code is a very boring task, but you *must* do it. +Otherwise there is no way for you to check that + +1. Your code does what you want it to do +2. Any changes you make to your code don't break it + +If you find a bug that isn't covered by your simulation, make sure you +add that case to the simulation. + +The file `firmware/rtl/testbench.hpp` contains a class that you should +use to organize individual tests. Make a derived class of `TB` and +use the `posedge()` function to encode what default actions your test +should take at every positive edge of the clock. Remember, in C++ each +action is blocking: there is no equivalent to the non-blocking `<=`. + +If you have to do a lot of non-blocking code for your test, you +should write a Verilog wrapper for your test that implements +the non-blocking code. **Verilator only supports a subset of +non-synthesizable Verilog. Unless you really need to, use synthesizable +Verilog only.** See `firmware/rtl/waveform/waveform_sim.v` and +`firmware/rtl/waveform/dma_sim.v` for an example of Verilog files only +used for tests. + +### Test Synthesis + +**Yosys only accepts a subset of Verilog. You might write a bunch of +code that Verilator will happily simulate but that will fail to go +through Yosys.** + +Once you have simulated your design, you should use yosys to synthesize it. +This will allow you to understand how much and what resources the module +is taking up. To do this, you can put the follwing in a script file: + + read_verilog module_1.v + read_verilog module_2.v + ... + read_verilog top_module.v + synth_xilinx -flatten -nosrl -noclkbuf -nodsp -iopad -nowidelut + write_verilog yosys_synth_output.v + +and run `yosys -s scriptfile`. The options to `synth_xilinx` reflect +the current limitations that F4PGA has. The file `xc7.f4pga.tcl` that +F4PGA downloads is the complete synthesis script, read it to understand +the internals of what F4PGA does to compile your verilog. + +### Test Compilation + +I haven't been able to do this for most of this project. The basic idea +is to use `firmware/rtl/soc.py` to load only the module to test, and +to use LiteScope to write and read values from the module. For more +information, you can look at +[the boothmul test](https://software.mcgoron.com/peter/boothmul/src/branch/master/arty_test). + +### Formal Verification + +This isn't used for this project but it really should. + +# LiteX + +LiteX is a System on a Chip builder written in Python. It easily integrates +Verilog modules and large system components (CPU, RAM, Ethernet) into +a design using a Python script. + +All code written for LiteX is in `gateware/soc.py`. Run this script to build +the gateware. If you need to add new modules, you can add them to the design +by modifying the `Base` and the `UpsilonSoC` class. + +All the code that you need to understand in `soc.py` is heavily documented. +(If it's not, that means I don't understand it.) + +# Workarounds and Hacks + +## LiteX Compile Times Take Too Long for Testing + +Set `compile_software` to `False` in `soc.py` when checking for Verilog +compile errors. Set it back when you do an actual compile run, or your +program will not boot. + +If LiteX complains about not having a RiscV compiler, that is because +your system does not have compatible RISC-V compiler in your `$PATH`. +Refer to the LiteX install instructions above to see how to set up the +SiFive GCC, which will work. + +## F4PGA Crashes When Using Block RAM + +This is really a Yosys (and really, an abc bug). F4PGA defaults to using +the ABC flow, which can break, especially for block RAM. To fix, edit out +`-abc` in the tcl script (find it before you install it...) + +## Modules Simulate Correctly, but Don't Work at All in Hardware + +Yosys fails to calculate computed parameter values correctly. For instance, + + parameter CTRLVAL = 5; + localparam VALUE = CTRLVAL + 1; + +Yosys will *silently* fail to compile this, setting `VALUE` to be equal +to 0. The solution is to use macros. + +## Reset Pins Don't Work + +On the Arty A7 there is a Reset button. This is connected to the CPU and only +resets the CPU. Possibly due to timing issues modules get screwed up if they +share a reset pin with the CPU. The code currently connects button 0 to reset +the modules seperately from the CPU. + +## Verilog Macros Don't Work + +Verilog's preprocessor is awful. F4PGA (through yosys) barely supports it. + +You should only use Verilog macros as a replacement for `localparam`. +When you need to do so, you must preprocess the file with +Verilator. For example, if you have a file called `mod.v` in the folder +`firmware/rtl/mod/`, then in the file `firmware/rtl/mod/Makefile` add + + codegen: [...] mod_preprocessed.v + +(putting it after all other generated files). The file +`firmware/rtl/common.makefile` should automatically generate the +preprocessed file for you. + +Another alternative is to use GNU `m4`. diff --git a/gateware/soc.py b/gateware/soc.py index 9c24de5..8e006f4 100644 --- a/gateware/soc.py +++ b/gateware/soc.py @@ -56,8 +56,8 @@ from litedram.modules import MT41K128M16 from litedram.frontend.dma import LiteDRAMDMAReader from liteeth.phy.mii import LiteEthPHYMII -# Refer to `A7-constraints.xdc` for pin names. """ +Refer to `A7-constraints.xdc` for pin names. DAC: SS MOSI MISO SCK 0: 1 2 3 4 (PMOD A top, right to left) 1: 1 2 3 4 (PMOD A bottom, right to left) @@ -74,6 +74,12 @@ Outer chip header (C=CONV, K=SCK, D=SDO, XX=not connected) C4 K4 D4 C5 K5 D5 XX XX C6 K6 D6 C7 K7 D7 XX XX C0 K0 D0 C1 K1 D1 XX XX C2 K2 D2 C3 K3 D3 0 1 2 3 4 5 6 7 8 9 10 11 12 13 + +The `io` list maps hardware pins to names used by the SoC +generator. These pins are then connected to Verilog modules. + +If there is more than one pin in the Pins string, the resulting +name will be a vector of pins. """ io = [ ("differntial_output_low", 0, Pins("J17 J18 K15 J15 U14 V14 T13 U13 B6 E5 A3"), IOStandard("LVCMOS33")), @@ -93,7 +99,8 @@ io = [ class Base(Module, AutoCSR): """ The subclass AutoCSR will automatically make CSRs related to this class when those CSRs are attributes (i.e. accessed by - `self.csr_name`) of instances of this class. + `self.csr_name`) of instances of this class. (CSRs are MMIO, + they are NOT RISC-V CSRs!) Since there are a lot of input and output wires, the CSRs are assigned using `setattr()`. @@ -119,9 +126,19 @@ class Base(Module, AutoCSR): """ def _make_csr(self, name, csrclass, csrlen, description, num=None): - """ Add a CSR for a pin `f"{name_{num}"` with CSR type + """ Add a CSR for a pin `f"{name}_{num}"` with CSR type `csrclass`. This will automatically handle the `i_` and `o_` prefix in the keyword arguments. + + This function is used to automate the creation of memory mapped + IO pins for all the converters on the device. + + `csrclass` must be CSRStorage (Read-Write) or CSRStatus (Read only). + `csrlen` is the length in bits of the MMIO register. LiteX automatically + takes care of byte alignment, etc. so the length can be any positive + number. + + Description is optional but recommended for debugging. """ if name not in self.csrdict.keys(): @@ -191,6 +208,7 @@ class Base(Module, AutoCSR): self.kwargs["o_test_clock"] = platform.request("test_clock") self.kwargs["o_set_low"] = platform.request("differntial_output_low") + """ Dump all MMIO pins to a JSON file with their exact bit widths. """ with open("csr_bitwidth.json", mode='w') as f: import json json.dump(self.csrdict, f) @@ -236,10 +254,18 @@ class UpsilonSoC(SoCCore): platform = board_spec.Platform(variant=variant, toolchain="f4pga") rst = platform.request("cpu_reset") self.submodules.crg = _CRG(platform, sys_clk_freq, True, rst) - # These source files need to be sorted so that modules - # that rely on another module come later. For instance, - # `control_loop` depends on `control_loop_math`, so - # control_loop_math.v comes before control_loop.v + """ + These source files need to be sorted so that modules + that rely on another module come later. For instance, + `control_loop` depends on `control_loop_math`, so + control_loop_math.v comes before control_loop.v + + If you want to add a new verilog file to the design, look at the + modules that it refers to and place it the files with those modules. + + Since Yosys doesn't support modern Verilog, only put preprocessed + (if applicable) files here. + """ platform.add_source("rtl/spi/spi_switch_preprocessed.v") platform.add_source("rtl/spi/spi_master_preprocessed.v") platform.add_source("rtl/spi/spi_master_no_write_preprocessed.v") @@ -296,8 +322,6 @@ class UpsilonSoC(SoCCore): pads = platform.request("eth")) self.add_ethernet(phy=self.ethphy, dynamic_ip=True) - # Add the DAC and ADC pins as GPIO. They will be used directly - # by Zephyr. platform.add_extension(io) self.submodules.base = Base(ClockSignal(), self.sdram, platform)