upsilon/doc/gateware.rst

355 lines
14 KiB
ReStructuredText

Copyright 2024 (C) Peter McGoron.
This file is a part of Upsilon, a free and open source software project.
For license terms, refer to the files in ``doc/copying`` in the Upsilon
source distribution.
***************************************************
This manual describes the hardware portion of Upsilon.
===============
LiteX and Migen
===============
Migen is a library that generates Verilog using Python. It uses Python
objects and methods as a DSL within Python.
LiteX is a SoC generator using Migen. LiteX includes RAM, CPU, bus logic,
etc. LiteX is very powerful but not well documented.
===================
System Architecture
===================
Upsilon uses a RISC-V CPU running Linux to power most operations. It currently
uses a single-core VexRISC-V CPU running mainline Linux 5.x. How the main core
communicates with the hardware is a software issue: see /doc/software.rst .
Basic configuration of the SoC is done in the /gateware/config.py file. If this
file does not exist, copy /gateware/config.py.def to /gateware/config.py .
This is the default config.
The **main CPU** is the VexRISC-V core running Linux. All other CPUs or bus
masters can be overridden by this main CPU. To avoid confusion, "master" is
used when referring to something that is the master of the Wishbone bus: other
CPUs besides the main CPU are masters, even though their actions are
subordinated by the main CPU.
------------
Wishbone Bus
------------
The bus on all CPUs is the Wishbone bus. The Wishbone bus is a relatively simple
yet powerful master-slave architecture. For this project each bus has one master
and multiple slaves, but each slave can connect to multiple buses (and hence,
multiple masters).
All of the Wishbone bus lines are connected directly to the master with the
exception of the ``cyc`` signal. The ``cyc`` signal indicates that the master
has selected that slave device for a transfer. The ``stb`` signal will then
go up (sometimes at the same time) to indicate that there is valid data on all
other bus lines of interest. The bus master waits until ``ack`` is asserted.
The main CPU has a timeout on the Wishbone bus, but other CPUs may not. This
is to simplify interconnect logic from a programming perspective.
-------------------------
The Wishbone Bus in LiteX
-------------------------
Each master and slave has a Wishbone bus ``Interface`` (under
``litex.soc.interconnect.wishbone``). To make it, do::
self.bus = Interface(data_width=32, address_width=32, addressing="byte")
The bus is always going to be 32 bit and will always be transmitting 32 bit
words. The reason for ``addressing="byte"`` will be discussed in the next
section.
The basic structure of the bus handling code is::
self.sync += [
If(self.bus.cyc & self.bus.stb & ~self.bus.ack,
Case(self.bus.adr[0:length], ...),
self.bus.ack.eq(1)
).Else(~self.bus.cyc,
self.bus.ack.eq(0)
)
]
``length`` is the length in bits that the bus code should look at. Since the
region could be anywhere in memory, the slave should never look at the entire
address (except for debugging purposes). Most of the time::
length = self.width.bit_length()
Note that Migen differs from Verilog, since all indexing is LSB-first and the
last index is excluded. Hence ``adr[0:length]`` is equivalent to ``adr[length-1:0]``
in generated Verilog.
-----------------------------------
Rules For Writing Wishbone Bus Code
-----------------------------------
The Main CPU is word-addressed. It only reads at 32-bit word boundaries, and
will specify sub-word-unit writes using ``sel``. When a *bus* is
word-addressed, that means it expects addresses to be words. For instance,
``0x0`` is word 0, ``0x1`` is word 1, etc.
Since this is confusing, **all Upsilon Wishbone bus code must be byte
addressed.** This means that ``0x0`` is byte 0 of word 0 (little endian),
``0x1`` is byte 1 of word 1, etc. and ``0x4`` is byte 0 of word 1. Even though
all masters and slaves must be byte-addressed, they are not required to handle
misaligned accesses. **Upsilon slaves can assume that all accesses are
word-aligned,** but they should give sane errors on misaligned access.
The only masters and slaves that are word-addressed are the ones that are
from LiteX itself. Those have special code to convert to the byte-addressed
masters/slaves.
If the slave has one bus, it **must** be an attribute called ``bus``.
Each class that is accessed by a wishbone bus **must** have an attribute
called ``width`` that is the size, in bytes, of the region. This must be a power
of 2 (exception: wrappers around slaves since they might wrap LiteX slaves
that don't have ``width`` attributes).
Each class **should** have a attribute ``public_registers`` that is a dictionary,
keys are names of the register shown to the programmer and
1. ``origin``: offset of the register in memory
2. ``size``: size of the register in bytes (multiple of 4)
are required attributes. Other attributes are ``rw``, ``direction``, that are
explained in /doc/controller_manual.rst .
-----------------------------
Adding Slaves to the Main CPU
-----------------------------
After adding a module with an ``Interface``, the interface is connected to
to main CPU bus by calling one of two functions.
If the slave region has no special areas in it, call::
self.bus.add_slave(name, slave.bus, SoCRegion(origin=None, size=slave.width, cached=False)
If the slave region has registers, add::
self.add_slave_with_registers(name, iface, SoCRegion(...), slave.public_registers)
where the SoCRegion parameters are the same as before. Each slave device
should have a ``slave.width`` and a ``slave.public_registers`` attribute,
unless noted. Some slaves have only one bus, some have multiple.
The Wishbone cache is very confusing and causes custom Wishbone bus code to
not work properly. Since a lot of this memory is volatile you should never
enable the cache (possible exception: SRAM).
---------------------------------------------------------
Working Around LiteX using pre_finalize and mmio_closures
---------------------------------------------------------
LiteX runs code prior to calling ``finalize()``, such as CSR allocation,
that makes it very difficult to write procedural code without preallocating
lengths.
Upsilon solves this with an ugly hack called ``pre_finalize``, which runs at
the end of the SoC main module instantiation. All pre_finalize functions are
put into a list which is run with no arguments and with their return result
ignored.
``pre_finalize`` calls are usually due to ``PreemptiveInterface``, which uses
CSR registers.
There is another ugly hack, ``mmio_closures``, which is used to generate the
``mmio.py`` library. The ``mmio.py`` library groups together relevant memory
regions and registers into instances of MicroPython classes. The only good
way to do this is to generate the code for ``mmio.py`` at instantiation time,
but the origin of each memory region is not known at instantiation time. The
functions have to be delayed until after memory locations are allocated, but
there is no hook in LiteX to do that, and the only interface I can think of
that one can use to look at the origins is ``csr.json``.
The solution is a list of closures that return strings that will be put into
``mmio.py``. They take one argument, ``csrs``, the ``csr.json`` file as a
Python dictionary. The closures use the memory location origin in ``csrs``
to generate code with the correct offsets.
Note that the ``csr.json`` file casefolds the memory locations into lowercase
but keeps CSR registers as-is.
====================
System Within a Chip
====================
A *system within a chip* (SWiC) is a SoC within a SoC. Upsilon has the
capability to add SWiCs that can be controlled by the main CPU. The CPU for
the SWiC is the PicoRV32, which is a RISC-V RV32IMC core (RISC-V, 32 bit,
standard registers, multiplication, and compressed instructions).
The main CPU controls the SWiC through a special memory region on the Wishbone
bus. (Currently there are CSRs, but I consider this a hack and they will be
removed.) There are three ways the main CPU interacts with the SWiC:
1. Direct control. The main CPU can start and reset the SWiC CPU. It can
also inspect the SWiC CPU's registers and program counter.
2. Exclusive registers. Small data can be transfered in the Main -> SWiC and
SWiC -> Main direction using *Special Registers*. They are small registers
that can be read by both CPUs but only one CPU can write to them. This is
used for sending parameters to programs without having to recompile them.
3. *Preemptive Interfaces* (PI), which connect a Wishbone slave to two or more
Wishbone buses. Only one bus has read-write access to the slave at any time.
The main CPU controls bus access. In the future, both read and write access
can be modified, instead of the both or neither.
As an example of PI, the SWiC RAM is behind a PI. The main CPU resets the SWiC
(through direct control), fills the SWiC with machine code, fills the exclusive
registers with values, and then starts the SWiC CPU. External communiciation
(such as SPI) is through PI.
---------------------------------
Adding Memory Regions to the SWiC
---------------------------------
PicoRV32 uses a byte-addressed bus. However, it looks like it will not attempt
non-word aligned accesses. Slaves written for the main CPU will work with the SWiC,
and vice-versa.
The processing for connecting a Wishbone slave to the PicoRV32 bus is slightly
different because the usual LiteX code interferes with the build process (LiteX
only expects one Wishbone bus). The code for managing the SWiC bus is in
/gateware/region.py .
To add an ``Interface`` called ``iface``::
pico.mmap.add_region(name, BasicRegion(origin=origin, size=iface.width, bus=iface))
Note that unlike in the main CPU, the origin of the region must be specified.
The origin does not have to be a power of 2 but must have enough zero bits
to completely store ``iface.width`` bytes.
=====================
Workarounds and Hacks
=====================
---------------------------------------------
LiteX Compile Times Take Too Long for Testing
---------------------------------------------
Set ``compile_software`` to ``False`` in ``soc.py`` when checking for Verilog
compile errors. Set it back when you do an actual compile run, or your program
will not boot.
If LiteX complains about not having a RiscV compiler, that is because your
system does not have compatible RISC-V compiler in your ``$PATH``. Refer to
the LiteX install instructions above to see how to set up the SiFive GCC, which
will work.
----------------------------------
F4PGA Crashes When Using Block RAM
----------------------------------
This is really a Yosys (and really, an abc bug). F4PGA defaults to using
the ABC flow, which can break, especially for block RAM. To fix, edit out
``-abc`` in the tcl script (find it before you install it...)
This is mitigated by using ``SRAM`` in LiteX directly, which seems to
magically work.
-------------------------------------------------------------
Modules Simulate Correctly, but Don't Work at All in Hardware
-------------------------------------------------------------
Yosys fails to calculate computed parameter values correctly. For instance,
parameter CTRLVAL = 5;
localparam VALUE = CTRLVAL + 1;
Yosys will *silently* fail to compile this, setting `VALUE` to be equal
to 0. The solution is to use macros.
This also seems to magically work in PicoRV32. This may work if ``localparam
integer`` is used instead.
---------------------
Reset Pins Don't Work
---------------------
On the Arty A7 there is a Reset button. This is connected to the CPU and only
resets the CPU. Possibly due to timing issues modules get screwed up if they
share a reset pin with the CPU. The code currently connects button 0 to reset
the modules seperately from the CPU.
-------------------------
Verilog Macros Don't Work
-------------------------
Verilog's preprocessor is awful. F4PGA (through yosys) barely supports it.
You should only use Verilog macros as a replacement for ``localparam``.
When you need to do so, you must preprocess the file with
Verilator. For example, if you have a file called ``mod.v`` in the folder
``firmware/rtl/mod/``, then in the file ``firmware/rtl/mod/Makefile`` add
codegen: [...] mod_preprocessed.v
(putting it after all other generated files). The file
``firmware/rtl/common.makefile`` should automatically generate the
preprocessed file for you.
If your Verilog is complex enough to need generation, consider writing
it in Migen instead.
-------------------------
RAM Check failure on Boot
-------------------------
This is most likely a bus issue. You might have overloaded the CSR bus. Move
some CSRs to a wishbone bus module. This can also happen due to timing errors
across the main CPU bus, which should be alleviated by reducing combinational
circuits and using registers through it.
--------------------------------------------------
Accesses to a Wishbone bus memory area do not work
--------------------------------------------------
Try reading 16 words (64 bytes) into the memory area and see if the
behavior changes. Many times this is due to the Wishbone Cache interfering
with volatile memory. Set the `cached` parameter in the SoCRegion to
`False` when adding the slave.
---------------------
Migen Recursion Error
---------------------
You passed the wrong value (like a string) where Migen expected a statement
or a value. For instance, instead of an assignment statement, you instead put a
string indiciating the value you want to assign.
---------------------
Sources Missing Error
---------------------
LiteX build will stop after creating the module tree. This is because you
imported a module that does not exist. LiteX will silently fail if a Verilog
source file you added does not exist, so either remove the module or add the
file.
---------------------------------------------
I overrode finalize and now things are broken
---------------------------------------------
*Never* override the ``finalize()`` function in a Migen module.
Each Migen module has a ``finalize()`` function inherited from the class. This
does code generation and calls ``do_finalize()``, which is a user-defined
function.
=========
TODO List
=========
Pseudo CSR bus for the main CPU?