documentation
This commit is contained in:
parent
2cdbc1ae9f
commit
9c9b28116e
|
@ -1,4 +1,92 @@
|
||||||
# Firmware
|
Copyright 2023 (C) Peter McGoron.
|
||||||
|
|
||||||
|
This file is a part of Upsilon, a free and open source software project.
|
||||||
|
For license terms, refer to the files in `doc/copying` in the Upsilon
|
||||||
|
source distribution.
|
||||||
|
|
||||||
|
__________________________________________________________________________
|
||||||
|
|
||||||
|
The Hardware Maintenance Manu is an overview of the hardware (non-software)
|
||||||
|
parts of Upsilon.
|
||||||
|
|
||||||
|
# Crash Course in FPGAs
|
||||||
|
|
||||||
|
Upsilon runs on a Field Programmable Gate Array (FPGA). FPGAs are sets
|
||||||
|
of logic gates and other peripherals that can be changed by a computer.
|
||||||
|
FPGAs can implement CPUs, digital filters, and control code at a much
|
||||||
|
higher speed than a computer. The downside is that FPGAs are much more
|
||||||
|
difficult to program for.
|
||||||
|
|
||||||
|
A large part of Upsilon is written in Verilog. Verilog is a Hardware
|
||||||
|
Description Language (HDL), which is similar to a programming language
|
||||||
|
(such as C++ or Python).
|
||||||
|
|
||||||
|
The difference is, is that Verilog compiles to a *piece of hardware* that
|
||||||
|
deals with individual bits executing operations in sync with a clock. This
|
||||||
|
differs from a *piece of software*, which is a set of instructions that a
|
||||||
|
computer follows. Verilog is usually much less abstract than regular code.
|
||||||
|
|
||||||
|
Regular code is tested on the system in which it is run. Hardware,
|
||||||
|
on the other hand, is very difficult to test on the device that it
|
||||||
|
is actually running on. Hardware is usually *simulated*. This project
|
||||||
|
primarily simulates Verilog code using the program Verilator, where the
|
||||||
|
code that runs the simulation is written in C++.
|
||||||
|
|
||||||
|
Instead of strings, integers, and classes, the basic components of all
|
||||||
|
Verilog code is the wire and the register, which store bits (1 and 0).
|
||||||
|
Wires connect components together, and registers store data, in a similar
|
||||||
|
way to variables in software. Unlike usual programming languages, where
|
||||||
|
code executes one step at a time, most FPGA code runs at the tick of
|
||||||
|
a clock. Each block of code exceutes in parallel.
|
||||||
|
|
||||||
|
To compile Verilog to a format suitable for execution on an FPGA, you
|
||||||
|
*synthesize* the Verilog into a low-level format that uses the specific
|
||||||
|
resources of the FPGA you are using, and then you run a *place and route*
|
||||||
|
program to allocate resources on the FPGA to fit your design. Running
|
||||||
|
synthesis on its own can help you understand how much resources a module
|
||||||
|
uses. Place-and-route gives you *timing reports*, which tell you about
|
||||||
|
major design problems that outstrip the capabilities of the FPGA (or the
|
||||||
|
programs you are using). You should look up what "timing" on an FPGA is
|
||||||
|
and learn as much as you can about it, because it is an issue that does
|
||||||
|
not happen in standard software and can be very difficult to fix when
|
||||||
|
you run into it.
|
||||||
|
|
||||||
|
Once a bitstream is synthesized, it is loaded onto a FPGA through a cable
|
||||||
|
(for this project, openFPGALoader).
|
||||||
|
|
||||||
|
## Recommendations for Learners
|
||||||
|
|
||||||
|
Kishore Mishra. Advanced Chip Design.
|
||||||
|
|
||||||
|
[Gisselquist Technology][GT] is the best free online resource for FPGA
|
||||||
|
programming out there. These articles will help you understand how to
|
||||||
|
write *good* FPGA code, not just valid code.
|
||||||
|
|
||||||
|
[GT]: https://zipcpu.com/
|
||||||
|
|
||||||
|
Here are some exercises for you to ease yourself into FPGA programming.
|
||||||
|
|
||||||
|
* Write an FPGA program that implements addition without using the `+`
|
||||||
|
operator. This program should add each number bit by bit, handling
|
||||||
|
carried digits properly. This is called a *full adder*.
|
||||||
|
* Write an FPGA program that multiplies two signed integers together,
|
||||||
|
without using the `*` operator. The width of these integers should
|
||||||
|
not be hard-coded: it should be easy to change. What you write in
|
||||||
|
this is something that is actually a part of this project: see
|
||||||
|
`boothmul.v`. You do not (and should not!) write it just like Upsilon
|
||||||
|
has written it.
|
||||||
|
* Write an FPGA program that communicates over SPI. For simplicity,
|
||||||
|
you only need to write it for a single SPI mode: look up on the internet
|
||||||
|
for details. There is an SPI slave device in this repository that you
|
||||||
|
can use to simulate an end for the SPI master you write, but you should
|
||||||
|
write the SPI slave yourself. For bonus points, connect your SPI master
|
||||||
|
to a real SPI device and confirm that your communication works.
|
||||||
|
|
||||||
|
For each of these exercises, follow the complete "Design Testing Process"
|
||||||
|
below. At the very least, write simulations and test your programs on
|
||||||
|
real hardware.
|
||||||
|
|
||||||
|
# Verilog Programming Guidelines
|
||||||
|
|
||||||
See also [Dan Gisselquist][1]'s rules for FPGA development.
|
See also [Dan Gisselquist][1]'s rules for FPGA development.
|
||||||
|
|
||||||
|
@ -30,16 +118,144 @@ See also [Dan Gisselquist][1]'s rules for FPGA development.
|
||||||
a memory location.
|
a memory location.
|
||||||
* Keep all Verilog as generic as possible.
|
* Keep all Verilog as generic as possible.
|
||||||
* Always initialize registers.
|
* Always initialize registers.
|
||||||
|
* Rerun tests after every change to the module.
|
||||||
|
|
||||||
# Software
|
## Design Testing Process
|
||||||
|
|
||||||
* Use free and open source libraries only. All libraries must be compatible
|
### Simulation
|
||||||
with the GNU GPL v3.0.
|
|
||||||
* Do not dynamically allocate memory.
|
|
||||||
* Use the [SEI CERT C Coding Standard][2] as a guideline.
|
|
||||||
* Use the [Linux kernel style guide][3] as a guideline (many parts of it
|
|
||||||
are not relevant for this project).
|
|
||||||
* Try to offload as much processing as possible to the computer.
|
|
||||||
|
|
||||||
[2]: https://wiki.sei.cmu.edu/confluence/display/c/SEI+CERT+C+Coding+Standard
|
When you write or modify a verilog module, the first thing you should do
|
||||||
[3]: https://www.kernel.org/doc/Documentation/process/coding-style.
|
is write/run a simulation of that module. A simulation of that module
|
||||||
|
should at the minimum compare the execution of the module with known
|
||||||
|
results (called "Ground truth testing"). A simulation should also consider
|
||||||
|
edge cases that you might overlook when writing Verilog.
|
||||||
|
|
||||||
|
For example, a module that multiplies two signed integers together should
|
||||||
|
have a simulation that sends the module many pairs of integers, taking
|
||||||
|
care to ensure that all possible permutations of sign are tested (i.e.
|
||||||
|
positive times positive, negative times positive, etc.) and also that
|
||||||
|
special-cases are handled (i.e. largest 32-bit integer multiplied by
|
||||||
|
largest negative 32-bit integer, multiplication by 0 and 1, etc.).
|
||||||
|
|
||||||
|
Writing simulation code is a very boring task, but you *must* do it.
|
||||||
|
Otherwise there is no way for you to check that
|
||||||
|
|
||||||
|
1. Your code does what you want it to do
|
||||||
|
2. Any changes you make to your code don't break it
|
||||||
|
|
||||||
|
If you find a bug that isn't covered by your simulation, make sure you
|
||||||
|
add that case to the simulation.
|
||||||
|
|
||||||
|
The file `firmware/rtl/testbench.hpp` contains a class that you should
|
||||||
|
use to organize individual tests. Make a derived class of `TB` and
|
||||||
|
use the `posedge()` function to encode what default actions your test
|
||||||
|
should take at every positive edge of the clock. Remember, in C++ each
|
||||||
|
action is blocking: there is no equivalent to the non-blocking `<=`.
|
||||||
|
|
||||||
|
If you have to do a lot of non-blocking code for your test, you
|
||||||
|
should write a Verilog wrapper for your test that implements
|
||||||
|
the non-blocking code. **Verilator only supports a subset of
|
||||||
|
non-synthesizable Verilog. Unless you really need to, use synthesizable
|
||||||
|
Verilog only.** See `firmware/rtl/waveform/waveform_sim.v` and
|
||||||
|
`firmware/rtl/waveform/dma_sim.v` for an example of Verilog files only
|
||||||
|
used for tests.
|
||||||
|
|
||||||
|
### Test Synthesis
|
||||||
|
|
||||||
|
**Yosys only accepts a subset of Verilog. You might write a bunch of
|
||||||
|
code that Verilator will happily simulate but that will fail to go
|
||||||
|
through Yosys.**
|
||||||
|
|
||||||
|
Once you have simulated your design, you should use yosys to synthesize it.
|
||||||
|
This will allow you to understand how much and what resources the module
|
||||||
|
is taking up. To do this, you can put the follwing in a script file:
|
||||||
|
|
||||||
|
read_verilog module_1.v
|
||||||
|
read_verilog module_2.v
|
||||||
|
...
|
||||||
|
read_verilog top_module.v
|
||||||
|
synth_xilinx -flatten -nosrl -noclkbuf -nodsp -iopad -nowidelut
|
||||||
|
write_verilog yosys_synth_output.v
|
||||||
|
|
||||||
|
and run `yosys -s scriptfile`. The options to `synth_xilinx` reflect
|
||||||
|
the current limitations that F4PGA has. The file `xc7.f4pga.tcl` that
|
||||||
|
F4PGA downloads is the complete synthesis script, read it to understand
|
||||||
|
the internals of what F4PGA does to compile your verilog.
|
||||||
|
|
||||||
|
### Test Compilation
|
||||||
|
|
||||||
|
I haven't been able to do this for most of this project. The basic idea
|
||||||
|
is to use `firmware/rtl/soc.py` to load only the module to test, and
|
||||||
|
to use LiteScope to write and read values from the module. For more
|
||||||
|
information, you can look at
|
||||||
|
[the boothmul test](https://software.mcgoron.com/peter/boothmul/src/branch/master/arty_test).
|
||||||
|
|
||||||
|
### Formal Verification
|
||||||
|
|
||||||
|
This isn't used for this project but it really should.
|
||||||
|
|
||||||
|
# LiteX
|
||||||
|
|
||||||
|
LiteX is a System on a Chip builder written in Python. It easily integrates
|
||||||
|
Verilog modules and large system components (CPU, RAM, Ethernet) into
|
||||||
|
a design using a Python script.
|
||||||
|
|
||||||
|
All code written for LiteX is in `gateware/soc.py`. Run this script to build
|
||||||
|
the gateware. If you need to add new modules, you can add them to the design
|
||||||
|
by modifying the `Base` and the `UpsilonSoC` class.
|
||||||
|
|
||||||
|
All the code that you need to understand in `soc.py` is heavily documented.
|
||||||
|
(If it's not, that means I don't understand it.)
|
||||||
|
|
||||||
|
# Workarounds and Hacks
|
||||||
|
|
||||||
|
## LiteX Compile Times Take Too Long for Testing
|
||||||
|
|
||||||
|
Set `compile_software` to `False` in `soc.py` when checking for Verilog
|
||||||
|
compile errors. Set it back when you do an actual compile run, or your
|
||||||
|
program will not boot.
|
||||||
|
|
||||||
|
If LiteX complains about not having a RiscV compiler, that is because
|
||||||
|
your system does not have compatible RISC-V compiler in your `$PATH`.
|
||||||
|
Refer to the LiteX install instructions above to see how to set up the
|
||||||
|
SiFive GCC, which will work.
|
||||||
|
|
||||||
|
## F4PGA Crashes When Using Block RAM
|
||||||
|
|
||||||
|
This is really a Yosys (and really, an abc bug). F4PGA defaults to using
|
||||||
|
the ABC flow, which can break, especially for block RAM. To fix, edit out
|
||||||
|
`-abc` in the tcl script (find it before you install it...)
|
||||||
|
|
||||||
|
## Modules Simulate Correctly, but Don't Work at All in Hardware
|
||||||
|
|
||||||
|
Yosys fails to calculate computed parameter values correctly. For instance,
|
||||||
|
|
||||||
|
parameter CTRLVAL = 5;
|
||||||
|
localparam VALUE = CTRLVAL + 1;
|
||||||
|
|
||||||
|
Yosys will *silently* fail to compile this, setting `VALUE` to be equal
|
||||||
|
to 0. The solution is to use macros.
|
||||||
|
|
||||||
|
## Reset Pins Don't Work
|
||||||
|
|
||||||
|
On the Arty A7 there is a Reset button. This is connected to the CPU and only
|
||||||
|
resets the CPU. Possibly due to timing issues modules get screwed up if they
|
||||||
|
share a reset pin with the CPU. The code currently connects button 0 to reset
|
||||||
|
the modules seperately from the CPU.
|
||||||
|
|
||||||
|
## Verilog Macros Don't Work
|
||||||
|
|
||||||
|
Verilog's preprocessor is awful. F4PGA (through yosys) barely supports it.
|
||||||
|
|
||||||
|
You should only use Verilog macros as a replacement for `localparam`.
|
||||||
|
When you need to do so, you must preprocess the file with
|
||||||
|
Verilator. For example, if you have a file called `mod.v` in the folder
|
||||||
|
`firmware/rtl/mod/`, then in the file `firmware/rtl/mod/Makefile` add
|
||||||
|
|
||||||
|
codegen: [...] mod_preprocessed.v
|
||||||
|
|
||||||
|
(putting it after all other generated files). The file
|
||||||
|
`firmware/rtl/common.makefile` should automatically generate the
|
||||||
|
preprocessed file for you.
|
||||||
|
|
||||||
|
Another alternative is to use GNU `m4`.
|
||||||
|
|
|
@ -56,8 +56,8 @@ from litedram.modules import MT41K128M16
|
||||||
from litedram.frontend.dma import LiteDRAMDMAReader
|
from litedram.frontend.dma import LiteDRAMDMAReader
|
||||||
from liteeth.phy.mii import LiteEthPHYMII
|
from liteeth.phy.mii import LiteEthPHYMII
|
||||||
|
|
||||||
# Refer to `A7-constraints.xdc` for pin names.
|
|
||||||
"""
|
"""
|
||||||
|
Refer to `A7-constraints.xdc` for pin names.
|
||||||
DAC: SS MOSI MISO SCK
|
DAC: SS MOSI MISO SCK
|
||||||
0: 1 2 3 4 (PMOD A top, right to left)
|
0: 1 2 3 4 (PMOD A top, right to left)
|
||||||
1: 1 2 3 4 (PMOD A bottom, right to left)
|
1: 1 2 3 4 (PMOD A bottom, right to left)
|
||||||
|
@ -74,6 +74,12 @@ Outer chip header (C=CONV, K=SCK, D=SDO, XX=not connected)
|
||||||
C4 K4 D4 C5 K5 D5 XX XX C6 K6 D6 C7 K7 D7 XX XX
|
C4 K4 D4 C5 K5 D5 XX XX C6 K6 D6 C7 K7 D7 XX XX
|
||||||
C0 K0 D0 C1 K1 D1 XX XX C2 K2 D2 C3 K3 D3
|
C0 K0 D0 C1 K1 D1 XX XX C2 K2 D2 C3 K3 D3
|
||||||
0 1 2 3 4 5 6 7 8 9 10 11 12 13
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13
|
||||||
|
|
||||||
|
The `io` list maps hardware pins to names used by the SoC
|
||||||
|
generator. These pins are then connected to Verilog modules.
|
||||||
|
|
||||||
|
If there is more than one pin in the Pins string, the resulting
|
||||||
|
name will be a vector of pins.
|
||||||
"""
|
"""
|
||||||
io = [
|
io = [
|
||||||
("differntial_output_low", 0, Pins("J17 J18 K15 J15 U14 V14 T13 U13 B6 E5 A3"), IOStandard("LVCMOS33")),
|
("differntial_output_low", 0, Pins("J17 J18 K15 J15 U14 V14 T13 U13 B6 E5 A3"), IOStandard("LVCMOS33")),
|
||||||
|
@ -93,7 +99,8 @@ io = [
|
||||||
class Base(Module, AutoCSR):
|
class Base(Module, AutoCSR):
|
||||||
""" The subclass AutoCSR will automatically make CSRs related
|
""" The subclass AutoCSR will automatically make CSRs related
|
||||||
to this class when those CSRs are attributes (i.e. accessed by
|
to this class when those CSRs are attributes (i.e. accessed by
|
||||||
`self.csr_name`) of instances of this class.
|
`self.csr_name`) of instances of this class. (CSRs are MMIO,
|
||||||
|
they are NOT RISC-V CSRs!)
|
||||||
|
|
||||||
Since there are a lot of input and output wires, the CSRs are
|
Since there are a lot of input and output wires, the CSRs are
|
||||||
assigned using `setattr()`.
|
assigned using `setattr()`.
|
||||||
|
@ -119,9 +126,19 @@ class Base(Module, AutoCSR):
|
||||||
"""
|
"""
|
||||||
|
|
||||||
def _make_csr(self, name, csrclass, csrlen, description, num=None):
|
def _make_csr(self, name, csrclass, csrlen, description, num=None):
|
||||||
""" Add a CSR for a pin `f"{name_{num}"` with CSR type
|
""" Add a CSR for a pin `f"{name}_{num}"` with CSR type
|
||||||
`csrclass`. This will automatically handle the `i_` and
|
`csrclass`. This will automatically handle the `i_` and
|
||||||
`o_` prefix in the keyword arguments.
|
`o_` prefix in the keyword arguments.
|
||||||
|
|
||||||
|
This function is used to automate the creation of memory mapped
|
||||||
|
IO pins for all the converters on the device.
|
||||||
|
|
||||||
|
`csrclass` must be CSRStorage (Read-Write) or CSRStatus (Read only).
|
||||||
|
`csrlen` is the length in bits of the MMIO register. LiteX automatically
|
||||||
|
takes care of byte alignment, etc. so the length can be any positive
|
||||||
|
number.
|
||||||
|
|
||||||
|
Description is optional but recommended for debugging.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
if name not in self.csrdict.keys():
|
if name not in self.csrdict.keys():
|
||||||
|
@ -191,6 +208,7 @@ class Base(Module, AutoCSR):
|
||||||
self.kwargs["o_test_clock"] = platform.request("test_clock")
|
self.kwargs["o_test_clock"] = platform.request("test_clock")
|
||||||
self.kwargs["o_set_low"] = platform.request("differntial_output_low")
|
self.kwargs["o_set_low"] = platform.request("differntial_output_low")
|
||||||
|
|
||||||
|
""" Dump all MMIO pins to a JSON file with their exact bit widths. """
|
||||||
with open("csr_bitwidth.json", mode='w') as f:
|
with open("csr_bitwidth.json", mode='w') as f:
|
||||||
import json
|
import json
|
||||||
json.dump(self.csrdict, f)
|
json.dump(self.csrdict, f)
|
||||||
|
@ -236,10 +254,18 @@ class UpsilonSoC(SoCCore):
|
||||||
platform = board_spec.Platform(variant=variant, toolchain="f4pga")
|
platform = board_spec.Platform(variant=variant, toolchain="f4pga")
|
||||||
rst = platform.request("cpu_reset")
|
rst = platform.request("cpu_reset")
|
||||||
self.submodules.crg = _CRG(platform, sys_clk_freq, True, rst)
|
self.submodules.crg = _CRG(platform, sys_clk_freq, True, rst)
|
||||||
# These source files need to be sorted so that modules
|
"""
|
||||||
# that rely on another module come later. For instance,
|
These source files need to be sorted so that modules
|
||||||
# `control_loop` depends on `control_loop_math`, so
|
that rely on another module come later. For instance,
|
||||||
# control_loop_math.v comes before control_loop.v
|
`control_loop` depends on `control_loop_math`, so
|
||||||
|
control_loop_math.v comes before control_loop.v
|
||||||
|
|
||||||
|
If you want to add a new verilog file to the design, look at the
|
||||||
|
modules that it refers to and place it the files with those modules.
|
||||||
|
|
||||||
|
Since Yosys doesn't support modern Verilog, only put preprocessed
|
||||||
|
(if applicable) files here.
|
||||||
|
"""
|
||||||
platform.add_source("rtl/spi/spi_switch_preprocessed.v")
|
platform.add_source("rtl/spi/spi_switch_preprocessed.v")
|
||||||
platform.add_source("rtl/spi/spi_master_preprocessed.v")
|
platform.add_source("rtl/spi/spi_master_preprocessed.v")
|
||||||
platform.add_source("rtl/spi/spi_master_no_write_preprocessed.v")
|
platform.add_source("rtl/spi/spi_master_no_write_preprocessed.v")
|
||||||
|
@ -296,8 +322,6 @@ class UpsilonSoC(SoCCore):
|
||||||
pads = platform.request("eth"))
|
pads = platform.request("eth"))
|
||||||
self.add_ethernet(phy=self.ethphy, dynamic_ip=True)
|
self.add_ethernet(phy=self.ethphy, dynamic_ip=True)
|
||||||
|
|
||||||
# Add the DAC and ADC pins as GPIO. They will be used directly
|
|
||||||
# by Zephyr.
|
|
||||||
platform.add_extension(io)
|
platform.add_extension(io)
|
||||||
self.submodules.base = Base(ClockSignal(), self.sdram, platform)
|
self.submodules.base = Base(ClockSignal(), self.sdram, platform)
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue