24 KiB
Upsilon Maintenance Manual. This document may be distributed under your choice of the GNU GPL v3.0 (or any later version), or under the CC BY-SA 4.0.
Introduction
This document is aimed at maintainers of this software who are not experienced programmers (in either software or hardware). Its goal is to contain any pertinent information to the devlopment process of Upsilon.
This manual is (hopefully) modular enough that you can just skip to the section you need without having to read the entire thing.
Organization of the Project
Upsilon uses LiteX and Linux for it's FPGA code. LiteX generates HDL and glues it together. It also forms the build system of the hardware portion of Upsilon. Linux is the kernel portion, which deals with communication between the computer that receives scan data and the hardware that is executing the scan.
LiteX further uses F4PGA to compile the HDL code. F4PGA is primarily made up of Yosys (synthesis) and nextpnr (place and route).
Required Knowledge
This document is written under the assumption that you are using Linux. You can make this work on other platforms but I don't know how to.
Verilog is critical for writing hardware. You should hopefully not have to write much of it.
The kernel is written in C. This C is different than C you have written before because it is running "freestanding."
You do not need to know about Linux kernel development. You will need to know the basics of ssh, vi, and how to use Unix as a user.
Tests are written in C++ and verilog. You will not have to write C++ unless you modify the Verilog files.
The macro processing language GNU m4 is used occasionally. You will
need to know how to use m4 if you modify the main base.v.m4
file
(e.g. adding more software-accessable ports).
Python is used the SoC generator. The SoC generator uses a library called LiteX, which in turn uses migen. You do not need to know a lot about migen, but LiteX's documentation is poor so you will need to know some migen in order to read the code and understand how some modules work.
Compile Process
Although each component uses a different build system, you can run everything with
make
(compile everything in this folder)make clean
(clean up all compiled files)
Setting up the Toolchain
The toolchain is primarily designed around modern Linux. It may not work properly on Windows or MacOS. If you have access to a computational cluster (if you are at FSU physics, ask the Physics department) then you should set up the toolchain on their servers. You will be able to compile things on any computer with an internet connection.
F4PGA
- Clone F4PGA (if you want,
checkout commit
b6c5fff
, but you should try checking out master first) - Run
scripts/prepare_environment.sh
. Note that you will need to change the environment variable$F4PGA_INSTALL_DIR
if you do not have access to the default directory (which is root access). - Run
scripts/activate.sh
. If you run into problems, open the file and copy thesource
andconda
commands manually into your terminal. - Install meson and ninja through pip.
All commands should be done in the conda environment.
LiteX
- Download
litex_setup.py
from the LiteX repository, Upsilon uses 2022.08 to some directory (don't put it in your home directory because there will be a bunch of downloaded repositories. - Run
litex_setup.py --init --install --user --tag 2022.08
- Download a GCC RISC-V cross compiler. If you have root access to the build
machine, then you can probably install this with your package manager. Users
of Ubuntu 14 can download the sifive GCC. Otherwise you will have
to compile a cross compiler (
x86_64
host to RV32I target) manually. - Put the GCC RISC-V cross compiler in your
$PATH
variable.
Buildroot
Buildroot builds a Linux system for the FPGA. To build the Images, download a stable version of Buildroot that the config files support and run
make BR2_EXTERNAL=/upsilon_directory/buildroot litex_vexriscv_defconfig
OpenSBI
OpenSBI is a platform independent interface between the hardware and the kernel.
Download the latest version of OpenSBI that the config files support. Copy
the files in the opensbi
directory to the targets
directory and run
make CROSS_COMPILE=riscv64-linux-gnu- PLATFORM=litex/vexriscv
FPGA Build System
Make sure F4PGA and a RISC-V GCC compiler are in your path. Then just go into
the firmware
folder and run make
. This should generate everything you need
and compile the software. The synthesis suite is single threaded. This will
take about 15-20 minutes on a good computer.
The FPGA firmware (aka gateware) build system is designed in a recursive
manner. That means that each directory has a Makefile that processes all the
files in the directory. There is a common.makefile
in the rtl/
directory
that is used when a rule (such as preprocessing a Verilog source file)
is used in multiple Makefiles.
For the Arty A7, the bitstream is firmware/build/digilent_arty/gateware/digilent_arty.bit
.
Software Build System
It is recommended to use the docker files.
Loading the Software and Firmware
Network Setup
You will need the FPGA and the controlling computer on the same wired network. DO NOT CONNECT THE FPGA TO A WIDE NETWORK. USE A PRIVATE LAN THAT ONLY CONTAINS THE CONTROLLING COMPUTER AND THE FPGA. DO NOT ATTEMPT TO CONNECT THE FPGA TO THE INTERNET. The controlling computer can still connect to the internet, but through another LAN port. The best thing to do is to buy a USB to Ethernet adapter.
The default TFTP client connects to 192.168.1.100.
Connecting to the FPGA Over USB
Connect to the FPGA over USB and run litex_term /dev/ttyUSB1
(or whatever
connection it should be) and you should see the LiteX BIOS come up.
Loading the Firmware
Connect the FPGA to a computer using a Micro-USB to USB cable. Run
openFPGALoader -c digilent digilent_arty.bit
to upload the firmware
(gateware) to the controller.
You can load the software using serial boot but this is very slow. The better thing to do is to use TFTP boot, which goes over Ethernet. WHEN YOU RUN TFTP, DO NOT EXPOSE YOUR INTERFACE TO THE INTERNET CONNECTED NETWORK INTERFACE. THIS IS A BIG SECURITY RISK. ONLY RUN TFTP FOR THE AMOUNT OF TIME REQUIRED TO BOOT THE CONTROL SOFTWARE. You can read about how to setup a TFTP server on the OpenWRT wiki.
Using DNSMasq on linux, run
dnsmasq -d --port=0 --enable-tftp --tftp-root=/path/to/firmware/directory --user=root --group=root --interface=$INTERFACE
Do not use --tftp-no-blocksize
. The controller will only read the first
512 bytes of the kernel.
In the root of the TFTP server, have boot.bin
be the kernel binary
(zephyr.bin
).
FPGA
Upsilon runs on a Field Programmable Gate Array (FPGA). FPGAs are sets of logic gates and other peripherals that can be changed by a computer. FPGAs can implement CPUs, digital filters, and control code at a much higher speed than a computer. The downside is that FPGAs are much more difficult to program for.
A large part of Upsilon is written in Verilog. Verilog is a Hardware Description Language (HDL), which is similar to a programming language (such as C++ or Python).
The difference is, is that Verilog compiles to a piece of hardware that deals with individual bits executing operations in sync with a clock. This differs from a piece of software, which is a set of instructions that a computer follows. Verilog is usually much less abstract than regular code.
Regular code is tested on the system in which it is run. Hardware, on the other hand, is very difficult to test on the device that it is actually running on. Hardware is usually simulated. This project primarily simulates Verilog code using the program Verilator, where the code that runs the simulation is written in C++.
Instead of strings, integers, and classes, the basic components of all Verilog code is the wire and the register, which store bits (1 and 0). Wires connect components together, and registers store data, in a similar way to variables in software. Unlike usual programming languages, where code executes one step at a time, most FPGA code runs at the tick of the system clock in parallel.
To compile Verilog to a format suitable for execution on an FPGA, you synthesize the Verilog into a low-level format that uses the specific resources of the FPGA you are using, and then you run a place and route program to allocate resources on the FPGA to fit your design. Running synthesis on its own can help you understand how much resources a module uses. Place-and-route gives you timing reports, which tell you about major design problems that outstrip the capabilities of the FPGA (or the programs you are using). You should look up what "timing" on an FPGA is and learn as much as you can about it, because it is an issue that does not happen in standard software and can be very difficult to fix when you run into it.
Once a bitstream is synthesized, it is loaded onto a FPGA through a cable (for this project, openFPGALoader).
Recommendations to Learners
Gisselquist Technology is the best free online resource for FPGA programming out there. These articles will help you understand how to write good FPGA code, not just valid code.
Here are some exercises for you to ease yourself into FPGA programming.
- Write an FPGA program that implements addition without using the
+
operator. This program should add each number bit by bit, handling carried digits properly. This is called a full adder. - Write an FPGA program that multiplies two signed integers together,
without using the
*
operator. The width of these integers should not be hard-coded: it should be easy to change. What you write in this is something that is actually a part of this project: seeboothmul.v
. You do not (and should not!) write it just like Upsilon has written it. - Write an FPGA program that communicates over SPI. For simplicity, you only need to write it for a single SPI mode: look up on the internet for details. There is an SPI slave device in this repository that you can use to simulate an end for the SPI master you write, but you should write the SPI slave yourself. For bonus points, connect your SPI master to a real SPI device and confirm that your communication works.
For each of these exercises, follow the complete "Design Testing Process" below. At the very least, write simulations and test your programs on real hardware.
Control and Status Registers in Hardware
LiteX uses "Control and Status Registers" (CSRs) to communicate between the CPU and any Verilog modules. (RISC-V CPUs have something with the same name, but Upsilon does not use that.)
Design Testing Process
Simulation
When you write or modify a verilog module, the first thing you should do is write/run a simulation of that module. A simulation of that module should at the minimum compare the execution of the module with known results (called "Ground truth testing"). A simulation should also consider edge cases that you might overlook when writing Verilog.
For example, a module that multiplies two signed integers together should have a simulation that sends the module many pairs of integers, taking care to ensure that all possible permutations of sign are tested (i.e. positive times positive, negative times positive, etc.) and also that special-cases are handled (i.e. largest 32-bit integer multiplied by largest negative 32-bit integer, multiplication by 0 and 1, etc.).
Writing simulation code is a very boring task, but you must do it. Otherwise there is no way for you to check that
- Your code does what you want it to do
- Any changes you make to your code don't break it
If you find a bug that isn't covered by your simulation, make sure you add that case to the simulation.
The file firmware/rtl/testbench.hpp
contains a class that you should
use to organize individual tests. Make a derived class of TB
and
use the posedge()
function to encode what default actions your test
should take at every positive edge of the clock. Remember, in C++ each
action is blocking: there is no equivalent to the non-blocking <=
.
If you have to do a lot of non-blocking code for your test, you
should write a Verilog wrapper for your test that implements
the non-blocking code. Verilator only supports a subset of
non-synthesizable Verilog. Unless you really need to, use synthesizable
Verilog only. See firmware/rtl/waveform/waveform_sim.v
and
firmware/rtl/waveform/dma_sim.v
for an example of Verilog files only
used for tests.
Test Synthesis
Yosys only accepts a subset of Verilog. You might write a bunch of code that Verilator will happily simulate but that will fail to go through Yosys.
Once you have simulated your design, you should use yosys to synthesize it. This will allow you to understand how much and what resources the module is taking up. To do this, you can put the follwing in a script file:
read_verilog module_1.v
read_verilog module_2.v
...
read_verilog top_module.v
synth_xilinx -flatten -nosrl -noclkbuf -nodsp -iopad -nowidelut
write_verilog yosys_synth_output.v
and run yosys -s scriptfile
. The options to synth_xilinx
reflect
the current limitations that F4PGA has. The file xc7.f4pga.tcl
that
F4PGA downloads is the complete synthesis script, read it to understand
the internals of what F4PGA does to compile your verilog.
Test Compilation
I haven't been able to do this for most of this project. The basic idea
is to use firmware/rtl/soc.py
to load only the module to test, and
to use LiteScope to write and read values from the module. For more
information, you can look at
the boothmul test.
Software Programming
The "software" is the code written in C that runs on the FPGA. This handles access to hardware components, running scripts sent by the controlling computer, and sending information between the hardware and the controlling computer.
Crash Course in Multithreaded Programming
Each script (up to 32 by default, change by redefining a macro) runs in a separate thread. This allows for multiple scripts to execute without having to explicitly hand control from one component to another, but since there is no defined execution path (one thread may execute before or after another thread), the program must handle scripts attempting to access the same component.
Upsilon handles multiple threads using
- Mutexes
- Thread Local Storage
Mutexes ("mutual exclusion") are objects that only allow for one thread to access them at a time. When one thread locks a mutex, other threads attempting to lock the mutex sleep until the thread unlocks the mutex. After the thread that locked the mutex unlocks it, some other thread gets the mutex.
Mutex management is important because if multiple threads attempt to
read or write to a converter at the same time, the scripts could deadlock,
requiring a hard reset of the system. (You could add manual deadlock
aborting by adding new commands that call k_thread_abort
, as long as
all threads are not deadlocked. This is a hack but may be necessary.)
Each thread can lock the mutex as many times as it wants, but it must
unlock the mutex the same number of times. Thread local storage (the
__thread
modifier) is used to count the number of times that each mutex
is locked by a thread. Since (as the name implies) TLS is thread-local,
there is no need to control access to it by mutexes: each thread gets
its own local version of the thread local variables.
The software has to count the number of recursive locks because when
the thread finally releases control of the mutex, another thread must
be able to access the hardware in a well defined state: it should not
attempt to write to hardware while the hardware is running (certain
specific exceptions apply). When the unlock routines (see for example
waveform_release()
) reach the final unlock
(e.g. waveform_locked[i] == 1
), the software waits for the hardware
to finish what its doing before unlocking.
The kernel implements "time-slicing", which means that each running
program executes in chunks. After each chunk is finished, another
program can execute. The amount of time for each thread is controlled
by CONFIG_TIMESLICE_SIZE
in prj.conf
. When executing critical code,
use k_sched_lock
and k_sched_unlock
.
TODO: Use k_thread_time_slice_set
to implement an abort check for
threads.
Crash Course in Network Programming
The kernel communicates with the controlling computer using a TCP/IP connection. You should connect the controller and the computer to a router and assign the kernel a static IP.
Each script that runs on the kernel is a separate connection. Each connection runs on a separate thread, because each thread runs a Creole interpreter.
TCP can usually detect when a connection breaks, but you should gracefully shutdown all connections. Otherwise dead connections can hang around for minutes at a time.
Static IPs
The client and controller IPs are baked into the software and firmware
at build time. The software configuration is in software/prj.conf
. The
firmware configuration is in firmware/soc.py
(see local_ip
and remote_ip
settings in SoCCore
).
The controlling computer must have it's static IP on the interface connected
to the controller to be the same as remote_ip
. By default this is 91.168.1.100
.
Logging
TODO: Do logging via UDP?
Logging is done via UART. Connect the micro-USB slot to the controlling computer to get debug output.
All you need to know is
- Use
LOG_WRN
for errors that you can recover from (i.e. closing a connection - Use
LOG_ERR
for errors that are fatal and halt the firmware, requiring a reset - Use
LOG_INF
for misc information (i.e. initialization completed, accepted connection, closing connection) - Use
LOG_DBG
for debugging output
If you need debugging output, add a line of the form
set_source_file_properties(src_file PROPERTIES COMPILE_FLAGS -DFILE_LOG_LEVEL=4
This will enable debugging output for this file only. Do not enable debugging output for the entire system! This will make the debugging output unusuable.
When you are done, set 4
to 3
in that line.
TODO: Ethernet debugging output.
Control and Status Registers in Software
CSR read and write functions are generated by /firmware/generate_csr_locations.py
.
You should not need to directly call write
and read
on raw addresses.
If you add a new CSR, add it to the generator script.
Implementation Information
CSRs can be used in software by using litex_write8
,
litex_read16
, etc. In the Zephyr source, look at
soc/riscv/litex-vexriscv/soc.h
for the complete implementation.
Also look at include/zephyr/arch/common/sys_io.h
to see how these
functions are implemented.
Do not directly write to CSR ports without using litex_writeN
and
litex_readN
, and do not directly use sys_io.h
functions. If you are
not careful you will not access the registers correctly and you will
crash the software.
Controlling Computer
Creole
Creole is the bytecode that the kernel runs. It is written using a python library. It looks very similar to assembly, but is custom built to make it easier to write direct assembly code.
Creole programs are the scripts run by the kernel to communicate with hardware and send messages over Ethernet to the controlling computer. Each creole program should do one thing: i.e. monitor an ADC, run the raster scan, output waveforms, etc.
Creole programs should reserve the hardware modules (DAC, ADC, CLOOP, waveforms) that they use explicitly. This makes your program faster and less error prone.
Since the Creole assembler is a python library, you can use things like Python format strings to automate production of Creole code. You can also add virtual instructions (by directly modifying the library) easily.
Creole has a concept of data blocks, assigned using the DB
command.
These blocks are used for waveforms and for printing sets of data out
to the datastream.
Creole uses a self-synchronizing code to detect encoding and transmission errors. This makes programs bigger, but you should not write big Creole programs.
The controlling computer sends a 16 bit little endian unsigned integer (the size of the Creole program in bytes) followed by Creole bytecode.
Hacks and Pitfalls
The open source software stack that Upsilon uses is novel and unstable.
LiteX
Set compile_software
to False
in soc.py
when checking for Verilog
compile errors. Set it back when you do an actual compile run, or your
program will not boot.
If LiteX complains about not having a RiscV compiler, that is because
your system does not have compatible RISC-V compiler in your $PATH
.
Refer to the LiteX install instructions above to see how to set up the
SiFive GCC, which will work.
F4PGA
This is really a Yosys (and really, an abc bug). F4PGA defaults to using
the ABC flow, which can break, especially for block RAM. To fix, edit out
-abc
in the tcl script (find it before you install it...)
Yosys
Yosys fails to calculate computed parameter values correctly. For instance,
parameter CTRLVAL = 5;
localparam VALUE = CTRLVAL + 1;
Yosys will silently fail to compile this, setting VALUE
to be equal
to 0. The solution is to use macros.
Reset Pins
On the Arty A7 there is a Reset button. This is connected to the CPU and only resets the CPU. Possibly due to timing issues modules get screwed up if they share a reset pin with the CPU. The code currently connects button 0 to reset the modules seperately from the CPU.
Clock Speeds
The output pins on the FPGA (except for the high speed PMOD outputs) cannot switch fast enough to
Macros
Verilog's preprocessor is awful. F4PGA (through yosys) barely supports it.
You should only use Verilog macros as a replacement for localparam
.
When you need to do so, you must preprocess the file with
Verilator. For example, if you have a file called mod.v
in the folder
firmware/rtl/mod/
, then in the file firmware/rtl/mod/Makefile
add
codegen: [...] mod_preprocessed.v
(putting it after all other generated files). The file
firmware/rtl/common.makefile
should automatically generate the
preprocessed file for you.
If The Controlling Computer Cannot Connect to the Internet
When you connect your computer to the controller over Ethernet, your computer may attempt to route all traffic over the controller network (since it is wired) instead of another network (like a wireless network). This means that your computer can't connect to the internet (or your connection is really slow). If this happens to you on a Linux machine, you can change the routing table.
Run route -n
(or ip route
if this does not work) to print the routing table.
Find the entry named default via [...] dev eth-interface
. This is the default route
for the ethernet device. Remove it using ip route del default via [...] dev eth-interface
.
If the route keeps on reappearing, delete it and quickly enter
ip route del default via [...] dev eth0 metric 65534
. This will make the
route the last priority.
Getting The Correct IP for the Controlling Computer
Some routers can automatically assign IPs based on MAC address. If your computer can do that, great. Otherwise you will need to configure your computer with a static ip.
- Remove your computer from the DHCP list that the router has.
- Run
ip link set eth-interface up
. - Then run
ip addr
and runip addr del [ip] dev eth-interface
on each ip on the ethernet interface that is connected to the controller. - Run
ip addr add 192.168.1.100/24 dev eth-interface
(or whatever ip + subnet mask you need) - If
ip route
does not give a routing entry for192.168.1.0/24
, runip route add 192.168.1.0/24 dev eth0 proto kernel scope link
(again, change depending on different situations)
This will use the static ip 192.168.1.100
, which is the default TFTP boot
IP.