VexRiscv/README.md

1090 lines
49 KiB
Markdown
Raw Normal View History

2017-07-16 11:47:32 -04:00
## Index
- [Index](#index)
2017-07-16 11:47:32 -04:00
- [Description](#description)
2017-07-16 12:06:45 -04:00
- [Area usage and maximal frequency](#area-usage-and-maximal-frequency)
2017-07-16 11:47:32 -04:00
- [Dependencies](#dependencies)
- [CPU generation](#cpu-generation)
- [Regression tests](#regression-tests)
2017-07-16 12:10:03 -04:00
- [Interactive debug of the simulated CPU via GDB OpenOCD and Verilator](#interactive-debug-of-the-simulated-cpu-via-gdb-openocd-and-verilator)
2018-06-18 20:19:37 -04:00
- [Using Eclipse to run the software and debug it](#using-Eclipse-to-run-the-software-and-debug-it)
* [By using gnu-mcu-eclipse](#by-using-gnu-mcu-eclipse)
* [By using Zylin plugin (old)](#by-using-zylin-plugin-old)
2017-07-16 11:47:32 -04:00
- [Briey SoC](#briey-soc)
2017-07-29 16:25:28 -04:00
- [Murax SoC](#murax-soc)
2019-04-21 08:41:27 -04:00
- [Running Linux](#running-linux)
2017-07-16 11:47:32 -04:00
- [Build the RISC-V GCC](#build-the-risc-v-gcc)
2017-07-17 08:22:13 -04:00
- [CPU parametrization and instantiation example](#cpu-parametrization-and-instantiation-example)
- [Add a custom instruction to the CPU via the plugin system](#add-a-custom-instruction-to-the-cpu-via-the-plugin-system)
2018-02-07 19:07:51 -05:00
- [Adding a new CSR via the plugin system](#adding-a-new-csr-via-the-plugin-system)
2017-10-16 05:31:03 -04:00
- [CPU clock and resets](#cpu-clock-and-resets)
- [VexRiscv Architecture](#vexriscv-architecture)
* [Plugins](#plugins)
2017-10-16 05:31:03 -04:00
2017-07-16 11:47:32 -04:00
## Description
2018-06-18 20:09:29 -04:00
This repository hosts a RISC-V implementation written in SpinalHDL. Here are some specs :
2017-03-26 16:38:07 -04:00
2019-04-25 15:11:23 -04:00
- RV32I[M][C][A] instruction set (Atomic only inside a single core)
- Pipelined from 2 to 5+ stages ([Fetch*X], Decode, Execute, [Memory], [WriteBack])
- 1.44 DMIPS/Mhz --no-inline when nearly all features are enabled (1.57 DMIPS/Mhz when the divider lookup table is enabled)
2019-04-25 15:11:23 -04:00
- Optimized for FPGA, do not use any vendor specific IP block / primitive
- AXI4, Avalon, wishbone ready
2018-06-18 20:09:29 -04:00
- Optional MUL/DIV extensions
2017-05-19 11:13:33 -04:00
- Optional instruction and data caches
2019-04-21 08:41:27 -04:00
- Optional hardware refilled MMU
2018-06-18 20:09:29 -04:00
- Optional debug extension allowing Eclipse debugging via a GDB >> openOCD >> JTAG connection
2019-04-25 15:11:23 -04:00
- Optional interrupts and exception handling with Machine, [Supervisor] and [User] modes as defined in the [RISC-V Privileged ISA Specification v1.10](https://riscv.org/specifications/privileged-isa/).
2018-06-18 20:09:29 -04:00
- Two implementations of shift instructions: Single cycle and shiftNumber cycles
- Each stage can have optional bypass or interlock hazard logic
2019-05-28 05:28:07 -04:00
- Linux compatible (SoC : https://github.com/enjoy-digital/linux-on-litex-vexriscv)
2019-04-25 15:11:23 -04:00
- Zephyr compatible
2018-06-18 20:09:29 -04:00
- [FreeRTOS port](https://github.com/Dolu1990/FreeRTOS-RISCV)
2017-03-26 16:38:07 -04:00
2018-06-18 20:09:29 -04:00
The hardware description of this CPU is done by using a very software oriented approach
(without any overhead in the generated hardware). Here is a list of software concepts used:
2017-03-26 16:38:07 -04:00
2018-06-18 20:19:37 -04:00
- There are very few fixed things. Nearly everything is plugin based. The PC manager is a plugin, the register file is a plugin, the hazard controller is a plugin, ...
2018-06-18 20:09:29 -04:00
- There is an automatic a tool which allows plugins to insert data in the pipeline at a given stage, and allows other plugins to read it in another stage through automatic pipelining.
2018-06-19 04:41:24 -04:00
- There is a service system which provides a very dynamic framework. For instance, a plugin could provide an exception service which can then be used by other plugins to emit exceptions from the pipeline.
2017-03-26 16:38:07 -04:00
2018-02-07 19:01:14 -05:00
There is a gitter channel for all questions about VexRiscv :<br>
2018-02-07 19:01:01 -05:00
[![Gitter](https://badges.gitter.im/SpinalHDL/VexRiscv.svg)](https://gitter.im/SpinalHDL/VexRiscv?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge)
2018-06-18 20:09:29 -04:00
For commercial support, please contact spinalhdl@gmail.com.
2018-02-27 16:43:53 -05:00
2018-07-21 12:03:22 -04:00
## Area usage and maximal frequency
2018-06-18 20:09:29 -04:00
The following numbers were obtained by synthesizing the CPU as toplevel without any specific synthesis options to save area or to get better maximal frequency (neutral).<br>
2018-06-19 04:53:31 -04:00
The clock constraint is set to an unattainable value, which tends to increase the design area.<br>
2018-06-18 20:09:29 -04:00
The dhrystone benchmark was compiled with the `-O3 -fno-inline` option.<br>
2018-07-21 12:03:22 -04:00
All the cached configurations have some cache trashing during the dhrystone benchmark except the `VexRiscv full max perf` one. This of course reduces the performance. It is possible to produce
2018-06-18 20:09:29 -04:00
dhrystone binaries which fit inside a 4KB I$ and 4KB D$ (I already had this case once) but currently it isn't the case.<br>
The CPU configurations used below can be found in the `src/scala/vexriscv/demo` directory.
```
2020-02-23 18:07:14 -05:00
VexRiscv small (RV32I, 0.52 DMIPS/Mhz, no datapath bypass, no interrupt) ->
Artix 7 -> 239 Mhz 494 LUT 505 FF
Cyclone V -> 189 Mhz 345 ALMs
Cyclone IV -> 179 Mhz 730 LUT 494 FF
2019-06-15 08:23:09 -04:00
iCE40 -> 92 Mhz 1130 LC
2018-07-21 12:03:22 -04:00
2020-02-23 18:07:14 -05:00
VexRiscv small (RV32I, 0.52 DMIPS/Mhz, no datapath bypass) ->
Artix 7 -> 238 Mhz 552 LUT 562 FF
Cyclone V -> 192 Mhz 390 ALMs
Cyclone IV -> 172 Mhz 832 LUT 551 FF
2019-06-15 08:23:09 -04:00
iCE40 -> 85 Mhz 1292 LC
2018-07-21 12:03:22 -04:00
VexRiscv small and productive (RV32I, 0.82 DMIPS/Mhz) ->
2020-02-23 18:07:14 -05:00
Artix 7 -> 225 Mhz 699 LUT 532 FF
Cyclone V -> 144 Mhz 493 ALMs
Cyclone IV -> 148 Mhz 1,111 LUT 526 FF
2019-06-15 08:23:09 -04:00
iCE40 -> 63 Mhz 1596 LC
2018-07-21 12:03:22 -04:00
2019-04-25 17:18:45 -04:00
VexRiscv small and productive with I$ (RV32I, 0.70 DMIPS/Mhz, 4KB-I$) ->
2020-02-23 18:07:14 -05:00
Artix 7 -> 225 Mhz 719 LUT 566 FF
2019-06-15 08:23:09 -04:00
Cyclone V -> 145 Mhz 511 ALMs
2020-02-23 18:07:14 -05:00
Cyclone IV -> 150 Mhz 1,138 LUT 532 FF
2019-06-15 08:23:09 -04:00
iCE40 -> 66 Mhz 1680 LC
2017-07-17 09:38:52 -04:00
2019-04-25 17:18:45 -04:00
VexRiscv full no cache (RV32IM, 1.21 DMIPS/Mhz 2.30 Coremark/Mhz, single cycle barrel shifter, debug module, catch exceptions, static branch) ->
2020-02-23 18:07:14 -05:00
Artix 7 -> 219 Mhz 1486 LUT 975 FF
Cyclone V -> 149 Mhz 943 ALMs
Cyclone IV -> 138 Mhz 2,013 LUT 966 FF
2018-07-21 12:03:22 -04:00
2019-04-25 17:18:45 -04:00
VexRiscv full (RV32IM, 1.21 DMIPS/Mhz 2.30 Coremark/Mhz with cache trashing, 4KB-I$,4KB-D$, single cycle barrel shifter, debug module, catch exceptions, static branch) ->
2020-02-23 18:07:14 -05:00
Artix 7 -> 204 Mhz 1661 LUT 1172 FF
Cyclone V -> 143 Mhz 1,118 ALMs
Cyclone IV -> 133 Mhz 2,278 LUT 1,061 FF
2018-07-21 12:03:22 -04:00
2020-02-23 18:07:14 -05:00
VexRiscv full max perf (HZ*IPC) -> (RV32IM, 1.38 DMIPS/Mhz 2.57 Coremark/Mhz, 8KB-I$,8KB-D$, single cycle barrel shifter, debug module, catch exceptions, dynamic branch prediction in the fetch stage, branch and shift operations done in the Execute stage) ->
Artix 7 -> 199 Mhz 1739 LUT 1229 FF
Cyclone V -> 132 Mhz 1,129 ALMs
Cyclone IV -> 126 Mhz 2,345 LUT 1,114 FF
2018-01-29 09:24:14 -05:00
2019-04-25 17:18:45 -04:00
VexRiscv full with MMU (RV32IM, 1.24 DMIPS/Mhz 2.35 Coremark/Mhz, with cache trashing, 4KB-I$, 4KB-D$, single cycle barrel shifter, debug module, catch exceptions, dynamic branch, MMU) ->
2020-02-23 18:07:14 -05:00
Artix 7 -> 167 Mhz 1927 LUT 1553 FF
Cyclone V -> 128 Mhz 1,302 ALMs
Cyclone IV -> 125 Mhz 2,685 LUT 1,466 FF
2019-04-24 06:32:57 -04:00
2019-04-25 17:18:45 -04:00
VexRiscv linux balanced (RV32IMA, 1.21 DMIPS/Mhz 2.27 Coremark/Mhz, with cache trashing, 4KB-I$, 4KB-D$, single cycle barrel shifter, catch exceptions, static branch, MMU, Supervisor, Compatible with mainstream linux) ->
2020-02-23 18:07:14 -05:00
Artix 7 -> 179 Mhz 2685 LUT 2177 FF
Cyclone V -> 136 Mhz 1,666 ALMs
Cyclone IV -> 123 Mhz 3,350 LUT 2,059 FF
```
2018-07-21 12:03:22 -04:00
The following configuration results in 1.44 DMIPS/MHz:
- 5 stage : F -> D -> E -> M -> WB
- single cycle ADD/SUB/Bitwise/Shift ALU
- branch/jump done in the E stage
2018-07-21 12:03:22 -04:00
- memory load values are bypassed in the WB stage (late result)
- 33 cycle division with bypassing in the M stage (late result)
- single cycle multiplication with bypassing in the WB stage (late result)
2018-06-19 04:41:24 -04:00
- dynamic branch prediction done in the F stage with a direct mapped target buffer cache (no penalties on correct predictions)
2018-11-22 16:49:16 -05:00
Note that recently, the capability to remove the Fetch/Memory/WriteBack stage was added to reduce the area of the CPU, which end up with a smaller CPU and a better DMIPS/Mhz for the small configurations.
2017-06-15 08:06:32 -04:00
## Dependencies
On Ubuntu 14 :
```sh
# JAVA JDK 8
2018-07-20 20:14:46 -04:00
sudo add-apt-repository -y ppa:openjdk-r/ppa
2018-06-08 12:00:22 -04:00
sudo apt-get update
sudo apt-get install openjdk-8-jdk -y
sudo update-alternatives --config java
sudo update-alternatives --config javac
2017-06-15 08:06:32 -04:00
# Install SBT - https://www.scala-sbt.org/
2017-06-15 08:06:32 -04:00
echo "deb https://dl.bintray.com/sbt/debian /" | sudo tee -a /etc/apt/sources.list.d/sbt.list
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 2EE0EA64E40A89B84B2DF73499E82A75642AC823
sudo apt-get update
sudo apt-get install sbt
# Verilator (for sim only, realy need 3.9+, in general apt-get will give you 3.8)
2017-06-15 14:27:20 -04:00
sudo apt-get install git make autoconf g++ flex bison
git clone http://git.veripool.org/git/verilator # Only first time
unsetenv VERILATOR_ROOT # For csh; ignore error if on bash
unset VERILATOR_ROOT # For bash
cd verilator
git pull # Make sure we're up-to-date
2018-06-08 13:06:30 -04:00
git checkout verilator_3_918
2017-06-15 14:27:20 -04:00
autoconf # Create ./configure script
./configure
make
sudo make install
2017-06-15 08:06:32 -04:00
```
2017-03-26 18:33:34 -04:00
2017-06-15 07:44:21 -04:00
## CPU generation
2018-06-18 20:09:29 -04:00
You can find two example CPU instances in:
2019-02-26 11:22:13 -05:00
- src/main/scala/vexriscv/demo/GenFull.scala
- src/main/scala/vexriscv/demo/GenSmallest.scala
2017-03-26 18:33:34 -04:00
2018-12-04 13:07:51 -05:00
To generate the corresponding RTL as a VexRiscv.v file, run the following commands in the root directory of this repository:
2017-07-09 12:02:01 -04:00
2017-06-15 07:44:21 -04:00
```sh
2019-01-03 14:07:38 -05:00
sbt "runMain vexriscv.demo.GenFull"
2018-06-18 20:19:37 -04:00
```
or
2017-06-15 07:44:21 -04:00
2018-06-18 20:19:37 -04:00
```sh
2019-01-03 14:07:38 -05:00
sbt "runMain vexriscv.demo.GenSmallest"
2017-06-15 07:44:21 -04:00
```
2018-06-18 20:09:29 -04:00
NOTES:
2018-06-18 20:19:37 -04:00
- It could take time the first time you run it.
2018-07-21 12:03:22 -04:00
- The VexRiscv project may need an unreleased master-head of the SpinalHDL repo. If it fails to compile, just get the SpinalHDL repository and
2018-06-18 20:09:29 -04:00
do a "sbt clean compile publish-local" in it as described in the dependencies chapter.
2017-07-16 11:47:32 -04:00
## Regression tests
2019-04-21 08:41:27 -04:00
[![Build Status](https://travis-ci.org/SpinalHDL/VexRiscv.svg?branch=master)](https://travis-ci.org/SpinalHDL/VexRiscv)
2020-03-01 07:03:40 -05:00
To run tests (need java, scala, verilator), just do :
2017-06-15 07:44:21 -04:00
```sh
export VEXRISCV_REGRESSION_SEED=42
export VEXRISCV_REGRESSION_TEST_ID=
sbt "testOnly vexriscv.TestIndividualFeatures"
2017-06-15 07:44:21 -04:00
```
This will generate random VexRiscv configuration and test them with:
- ISA tests from https://github.com/riscv/riscv-tests/tree/master/isa and https://github.com/riscv/riscv-compliance
2017-07-27 18:07:51 -04:00
- Dhrystone benchmark
- Coremark benchmark
- Zephyr os
- Buildroot/Linux os
2017-07-27 18:07:51 -04:00
- Some handwritten tests to check the CSR, debug module and MMU plugins
You can rerun some specific test by setting VEXRISCV_REGRESSION_TEST_ID by their id. For instance, if you want to rerun :
- test_id_5_test_IBus_CachedS1024W1BPL32Relaxvexriscv.plugin.DYNAMIC_DBus_CachedS8192W2BPL16_MulDiv_MulDivFpga_Shift_FullLate_Branch_Late_Hazard_BypassAll_RegFile_SyncDR_Src__Csr_AllNoException_Decoder__Debug_None_DBus_NoMmu
- test_id_9_test_IBus_Simple1S2InjStagevexriscv.plugin.STATIC_DBus_SimpleLate_MulDiv_MulDivFpgaSimple_Shift_FullEarly_Branch_Late_Hazard_Interlock_RegFile_AsyncER_Src_AddSubExecute_Csr_None_Decoder__Debug_None_DBus_NoMmu
then :
```
export VEXRISCV_REGRESSION_TEST_ID=5,9
```
Also there is a few environnement variable that you can use to modulate the random generation :
| Parameters | range | description |
| ------------------------------------------- | ------------------ | ----------- |
| VEXRISCV_REGRESSION_SEED | Int | Seed used to generate the random configurations |
| VEXRISCV_REGRESSION_TEST_ID | \[Int\[,\Int\]\*\] | Random configuration that should be keeped and tested |
| VEXRISCV_REGRESSION_CONFIG_COUNT | Int | Number of random configurations |
| VEXRISCV_REGRESSION_CONFIG_RVC_RATE | 0.0-1.0 | Chance to generate a RVC config |
| VEXRISCV_REGRESSION_CONFIG_LINUX_RATE | 0.0-1.0 | Chance to generate a linux ready config |
| VEXRISCV_REGRESSION_CONFIG_MACHINE_OS_RATE | 0.0-1.0 | Chance to generate a machine mode OS ready config |
| VEXRISCV_REGRESSION_LINUX_REGRESSION | yes/no | Enable the linux test |
| VEXRISCV_REGRESSION_COREMARK | yes/no | Enable the Coremark test |
| VEXRISCV_REGRESSION_ZEPHYR_COUNT | Int | Number of zephyr tests to run on capable configs |
| VEXRISCV_REGRESSION_CONFIG_DEMW_RATE | 0.0-1.0 | Chance to generate a config with writeback stage |
| VEXRISCV_REGRESSION_CONFIG_DEM_RATE | 0.0-1.0 | Chance to generate a config with memory stage |
2017-07-27 18:07:51 -04:00
2017-07-16 12:10:03 -04:00
## Interactive debug of the simulated CPU via GDB OpenOCD and Verilator
2018-06-18 20:19:37 -04:00
It's as described to run tests, but you just have to add `DEBUG_PLUGIN_EXTERNAL=yes` in the make arguments.
2017-06-15 07:44:21 -04:00
Work for the GenFull, but not for the GenSmallest as this configuration has no debug module.
Then you can use the https://github.com/SpinalHDL/openocd_riscv tool to create a GDB server connected to the target (the simulated CPU)
```sh
#in the VexRiscv repository, to run the simulation on which one OpenOCD can connect itself =>
2019-01-03 14:07:38 -05:00
sbt "runMain vexriscv.demo.GenFull"
cd src/test/cpp/regression
make run DEBUG_PLUGIN_EXTERNAL=yes
#In the openocd git, after building it =>
2017-06-15 07:44:21 -04:00
src/openocd -c "set VEXRISCV_YAML PATH_TO_THE_GENERATED_CPU0_YAML_FILE" -f tcl/target/vexriscv_sim.cfg
#Run a GDB session with an elf RISCV executable (GenFull CPU)
YourRiscvToolsPath/bin/riscv32-unknown-elf-gdb VexRiscvRepo/src/test/resources/elf/uart.elf
target remote localhost:3333
monitor reset halt
load
continue
# Now it should print messages in the Verilator simulation of the CPU
2017-03-26 18:33:34 -04:00
```
2018-06-18 20:19:37 -04:00
## Using Eclipse to run the software and debug it
2018-01-09 13:58:57 -05:00
2019-04-12 11:41:15 -04:00
### By using gnu-mcu-eclipse
You can download releases of the IDE here : https://github.com/gnu-mcu-eclipse/org.eclipse.epp.packages/releases
In the IDE, you can import a makefile project by :
- file -> import -> C/C++ -> existing Code as Makefile Project
- Select the folder which contain the makefile, select "Cross GCC" (not "RISC-V Cross GCC")
2019-04-12 11:41:15 -04:00
To create a new debug configuration :
- run -> Debug Configurations -> GDB OpenOCD Debugging double click
- Look at https://drive.google.com/open?id=1c46tyEV0xLwOsk76b0y2qqs8CYy7Zq3f for a configuration example
### By using Zylin plugin (old)
2018-06-18 20:19:37 -04:00
You can use the Eclipse + Zylin embedded CDT plugin to do it (http://opensource.zylin.com/embeddedcdt.html). Tested with Helios Service Release 2 (http://www.Eclipse.org/downloads/download.php?file=/technology/epp/downloads/release/helios/SR2/Eclipse-cpp-helios-SR2-linux-gtk-x86_64.tar.gz) and the corresponding zylin plugin.
2017-03-26 18:33:34 -04:00
2018-06-18 20:19:37 -04:00
To following commands will download Eclipse and install the plugin.
2017-12-13 07:23:55 -05:00
```sh
wget http://www.eclipse.org/downloads/download.php?file=/technology/epp/downloads/release/helios/SR2/eclipse-cpp-helios-SR2-linux-gtk-x86_64.tar.gz
tar -xvzf download.php?file=%2Ftechnology%2Fepp%2Fdownloads%2Frelease%2Fhelios%2FSR2%2Feclipse-cpp-helios-SR2-linux-gtk-x86_64.tar.gz
cd eclipse
./eclipse -application org.eclipse.equinox.p2.director -repository http://opensource.zylin.com/zylincdt -installIU com.zylin.cdt.feature.feature.group/
```
2018-01-09 13:58:57 -05:00
See https://drive.google.com/drive/folders/1NseNHH05B6lmIXqQFVwK8xRjWE4ydeG-?usp=sharing to import a makefile project and create a debug configuration.
2018-06-18 20:19:37 -04:00
Note that sometime this Eclipse need to be restarted in order to be able to place new breakpoints.
2018-01-09 13:58:57 -05:00
## Briey SoC
2018-07-21 12:03:22 -04:00
As a demonstrator, a SoC named Briey is implemented in `src/main/scala/vexriscv/demo/Briey.scala`. This SoC is very similar to
2018-06-18 20:19:37 -04:00
the [Pinsec SOC](https://spinalhdl.github.io/SpinalDoc/spinal/lib/pinsec/hardware/):
2017-10-16 06:06:24 -04:00
![Alt text](assets/brieySoc.png?raw=true "")
2017-07-09 12:02:01 -04:00
2018-06-18 20:19:37 -04:00
To generate the Briey SoC Hardware:
2017-07-09 12:02:01 -04:00
```sh
2019-01-03 14:07:38 -05:00
sbt "runMain vexriscv.demo.Briey"
2017-07-09 12:02:01 -04:00
```
2018-06-19 04:53:31 -04:00
To run the verilator simulation of the Briey SoC which can then be connected to OpenOCD/GDB, first get those dependencies:
2017-07-09 12:02:01 -04:00
```sh
sudo apt-get install build-essential xorg-dev libudev-dev libts-dev libgl1-mesa-dev libglu1-mesa-dev libasound2-dev libpulse-dev libopenal-dev libogg-dev libvorbis-dev libaudiofile-dev libpng12-dev libfreetype6-dev libusb-dev libdbus-1-dev zlib1g-dev libdirectfb-dev libsdl2-dev
```
2018-06-19 04:39:37 -04:00
Then go in `src/test/cpp/briey` and run the simulation with (UART TX is printed in the terminal, VGA is displayed in a GUI):
2017-07-09 12:02:01 -04:00
```sh
make clean run
```
2017-07-09 12:02:01 -04:00
To connect OpenOCD (https://github.com/SpinalHDL/openocd_riscv) to the simulation :
```sh
src/openocd -f tcl/interface/jtag_tcp.cfg -c "set BRIEY_CPU0_YAML /home/spinalvm/Spinal/VexRiscv/cpu0.yaml" -f tcl/target/briey.cfg
2017-07-31 18:01:52 -04:00
```
2017-07-09 12:02:01 -04:00
2018-06-19 04:53:31 -04:00
You can find multiple software examples and demos here: https://github.com/SpinalHDL/VexRiscvSocSoftware/tree/master/projects/briey
2017-07-09 12:02:01 -04:00
2018-06-19 04:53:31 -04:00
You can find some FPGA projects which instantiate the Briey SoC here (DE1-SoC, DE0-Nano): https://drive.google.com/drive/folders/0B-CqLXDTaMbKZGdJZlZ5THAxRTQ?usp=sharing
2017-07-09 12:02:01 -04:00
2018-07-21 12:03:22 -04:00
Here are some measurements of Briey SoC timings and area :
2017-07-19 12:36:30 -04:00
```
Artix 7 -> 186 Mhz 3138 LUT 3328 FF
2019-06-15 08:23:09 -04:00
Cyclone V -> 139 Mhz 2,175 ALMs
Cyclone IV -> 129 Mhz 4,337 LUT 3,170 FF
2017-07-29 16:25:28 -04:00
```
## Murax SoC
2018-06-19 04:39:37 -04:00
Murax is a very light SoC (it fits in an ICE40 FPGA) which can work without any external components:
2017-07-29 20:42:14 -04:00
- VexRiscv RV32I[M]
2018-06-18 20:19:37 -04:00
- JTAG debugger (Eclipse/GDB/openocd ready)
2017-07-29 20:42:14 -04:00
- 8 kB of on-chip ram
2017-07-29 16:25:28 -04:00
- Interrupt support
- APB bus for peripherals
- 32 GPIO pin
- one 16 bits prescaler, two 16 bits timers
2017-07-31 07:57:34 -04:00
- one UART with tx/rx fifo
2017-07-29 16:25:28 -04:00
2018-06-19 04:39:37 -04:00
Depending the CPU configuration, on the ICE40-hx8k FPGA with icestorm for synthesis, the full SoC has the following area/performance :
2018-02-02 11:18:47 -05:00
- RV32I interlocked stages => 51 Mhz, 2387 LC 0.45 DMIPS/Mhz
- RV32I bypassed stages => 45 Mhz, 2718 LC 0.65 DMIPS/Mhz
2017-07-29 20:42:14 -04:00
2018-06-19 04:39:37 -04:00
Its implementation can be found here: `src/main/scala/vexriscv/demo/Murax.scala`.
2017-07-29 16:25:28 -04:00
To generate the Murax SoC Hardware :
```sh
# To generate the SoC without any content in the ram
2019-01-03 14:07:38 -05:00
sbt "runMain vexriscv.demo.Murax"
# To generate the SoC with a demo program already in ram
2019-01-03 14:07:38 -05:00
sbt "runMain vexriscv.demo.MuraxWithRamInit"
2017-07-29 16:25:28 -04:00
```
The demo program included by default with `MuraxWithRamInit` will blink the
LEDs and echo characters received on the UART back to the user. To see this
when running the Verilator sim, type some text and press enter.
2017-07-29 16:25:28 -04:00
Then go in src/test/cpp/murax and run the simulation with :
```sh
make clean run
```
To connect OpenOCD (https://github.com/SpinalHDL/openocd_riscv) to the simulation :
```sh
src/openocd -f tcl/interface/jtag_tcp.cfg -c "set MURAX_CPU0_YAML /home/spinalvm/Spinal/VexRiscv/cpu0.yaml" -f tcl/target/murax.cfg
```
2018-06-19 04:39:37 -04:00
You can find multiple software examples and demos here: https://github.com/SpinalHDL/VexRiscvSocSoftware/tree/master/projects/murax
2017-07-31 18:01:27 -04:00
2018-06-19 04:39:37 -04:00
Here are some timing and area measurements of the Murax SoC:
2017-07-29 16:25:28 -04:00
```
2018-06-15 07:00:59 -04:00
Murax interlocked stages (0.45 DMIPS/Mhz, 8 bits GPIO) ->
Artix 7 -> 215 Mhz 1044 LUT 1202 FF
2019-06-15 08:23:09 -04:00
Cyclone V -> 173 Mhz 737 ALMs
Cyclone IV -> 144 Mhz 1,484 LUT 1,206 FF
iCE40 -> 64 Mhz 2422 LC (nextpnr)
2018-06-15 07:00:59 -04:00
MuraxFast bypassed stages (0.65 DMIPS/Mhz, 8 bits GPIO) ->
Artix 7 -> 229 Mhz 1269 LUT 1302 FF
2019-06-15 08:23:09 -04:00
Cyclone V -> 159 Mhz 864 ALMs
Cyclone IV -> 137 Mhz 1,688 LUT 1,241 FF
iCE40 -> 66 Mhz 2799 LC (nextpnr)
2017-07-19 12:36:30 -04:00
```
2017-07-19 12:34:16 -04:00
2018-06-19 04:39:37 -04:00
Some scripts to generate the SoC and call the icestorm toolchain can be found here: `scripts/Murax/`
2017-07-29 16:43:43 -04:00
2018-06-19 04:39:37 -04:00
A toplevel simulation testbench with the same features + a GUI are implemented with SpinalSim. You can find it in `src/test/scala/vexriscv/MuraxSim.scala`.
2018-07-21 12:03:22 -04:00
To run it :
```sh
2018-06-19 04:39:37 -04:00
# This will generate the Murax RTL + run its testbench. You need Verilator 3.9xx installated.
sbt "test:runMain vexriscv.MuraxSim"
```
2019-04-21 08:41:27 -04:00
## Running Linux
A default configuration is located in src/main/scala/vexriscv/demo/Linux.scala
This file also contains
- The commands to compile the buildroot image
- How to run the Verilator simulation in interative mode
There is currently no SoC to run it on hardware, it is WIP. But the CPU simulation can already boot linux and run user space application (even python).
Note that VexRiscv can run Linux on both cache full and cache less design.
2017-07-09 12:02:01 -04:00
## Build the RISC-V GCC
2018-06-19 04:39:37 -04:00
A prebuild GCC toolsuite can be found here:
- https://www.sifive.com/products/tools/ => SiFive GNU Embedded Toolchain
2018-02-05 10:34:10 -05:00
The VexRiscvSocSoftware makefiles are expecting to find this prebuild version in /opt/riscv/__contentOfThisPreBuild__
```sh
wget https://static.dev.sifive.com/dev-tools/riscv64-unknown-elf-gcc-20171231-x86_64-linux-centos6.tar.gz
tar -xzvf riscv64-unknown-elf-gcc-20171231-x86_64-linux-centos6.tar.gz
sudo mv riscv64-unknown-elf-gcc-20171231-x86_64-linux-centos6 /opt/riscv64-unknown-elf-gcc-20171231-x86_64-linux-centos6
sudo mv /opt/riscv64-unknown-elf-gcc-20171231-x86_64-linux-centos6 /opt/riscv
2018-07-21 12:03:22 -04:00
echo 'export PATH=/opt/riscv/bin:$PATH' >> ~/.bashrc
2018-02-05 10:34:10 -05:00
```
2018-06-19 04:53:31 -04:00
If you want to compile the rv32i and rv32im GCC toolchain from source code and install them in `/opt/`, do the following (will take one hour):
2017-07-09 12:02:01 -04:00
```sh
# Be carefull, sometime the git clone has issue to successfully clone riscv-gnu-toolchain.
sudo apt-get install autoconf automake autotools-dev curl libmpc-dev libmpfr-dev libgmp-dev gawk build-essential bison flex texinfo gperf libtool patchutils bc zlib1g-dev -y
2018-02-05 10:34:10 -05:00
git clone --recursive https://github.com/riscv/riscv-gnu-toolchain riscv-gnu-toolchain
2017-07-09 12:02:01 -04:00
cd riscv-gnu-toolchain
2018-02-05 10:34:10 -05:00
echo "Starting RISC-V Toolchain build process"
2017-07-09 12:02:01 -04:00
2018-02-05 10:34:10 -05:00
ARCH=rv32im
rmdir -rf $ARCH
mkdir $ARCH; cd $ARCH
../configure --prefix=/opt/$ARCH --with-arch=$ARCH --with-abi=ilp32
sudo make -j4
2017-07-09 12:02:01 -04:00
cd ..
2018-02-05 10:34:10 -05:00
ARCH=rv32i
rmdir -rf $ARCH
mkdir $ARCH; cd $ARCH
../configure --prefix=/opt/$ARCH --with-arch=$ARCH --with-abi=ilp32
sudo make -j4
cd ..
2018-02-05 10:34:10 -05:00
echo -e "\\nRISC-V Toolchain installation completed!"
```
2017-07-17 08:19:28 -04:00
## CPU parametrization and instantiation example
2017-03-26 16:38:07 -04:00
2018-07-21 12:03:22 -04:00
You can find many examples of different configurations in the https://github.com/SpinalHDL/VexRiscv/tree/master/src/main/scala/vexriscv/demo folder.
2018-06-19 04:53:31 -04:00
Here is one such example:
2017-03-26 18:33:34 -04:00
2017-03-26 16:38:07 -04:00
```scala
import vexriscv._
import vexriscv.plugin._
2017-07-17 08:19:28 -04:00
//Instanciate one VexRiscv
val cpu = new VexRiscv(
//Provide a configuration instance
config = VexRiscvConfig(
//Provide a list of plugins which will futher add their logic into the CPU
plugins = List(
2018-06-15 07:00:59 -04:00
new IBusSimplePlugin(
2017-07-17 08:19:28 -04:00
resetVector = 0x00000000l,
cmdForkOnSecondStage = true,
cmdForkPersistence = true
2017-07-17 08:19:28 -04:00
),
new DBusSimplePlugin(
catchAddressMisaligned = false,
catchAccessFault = false
),
new DecoderSimplePlugin(
catchIllegalInstruction = false
),
new RegFilePlugin(
regFileReadyKind = Plugin.SYNC,
zeroBoot = true
),
new IntAluPlugin,
new SrcPlugin(
separatedAddSub = false,
executeInsertion = false
),
new LightShifterPlugin,
new HazardSimplePlugin(
bypassExecute = false,
bypassMemory = false,
bypassWriteBack = false,
bypassWriteBackBuffer = false
),
new BranchPlugin(
earlyBranch = false,
2018-06-15 07:00:59 -04:00
catchAddressMisaligned = false
2017-07-17 08:19:28 -04:00
),
new YamlPlugin("cpu0.yaml")
)
)
)
```
2017-07-17 08:22:13 -04:00
## Add a custom instruction to the CPU via the plugin system
2017-07-17 08:19:28 -04:00
2018-06-19 04:39:37 -04:00
Here is an example of a simple plugin which adds a simple SIMD_ADD instruction:
2017-03-26 16:38:07 -04:00
2017-07-17 08:19:28 -04:00
```scala
import spinal.core._
import vexriscv.plugin.Plugin
import vexriscv.{Stageable, DecoderService, VexRiscv}
2017-07-17 08:19:28 -04:00
//This plugin example will add a new instruction named SIMD_ADD which do the following :
//
//RD : Regfile Destination, RS : Regfile Source
//RD( 7 downto 0) = RS1( 7 downto 0) + RS2( 7 downto 0)
//RD(16 downto 8) = RS1(16 downto 8) + RS2(16 downto 8)
//RD(23 downto 16) = RS1(23 downto 16) + RS2(23 downto 16)
//RD(31 downto 24) = RS1(31 downto 24) + RS2(31 downto 24)
//
//Instruction encoding :
//0000011----------000-----0110011
// |RS2||RS1| |RD |
//
//Note : RS1, RS2, RD positions follow the RISC-V spec and are common for all instruction of the ISA
class SimdAddPlugin extends Plugin[VexRiscv]{
//Define the concept of IS_SIMD_ADD signals, which specify if the current instruction is destined for ths plugin
object IS_SIMD_ADD extends Stageable(Bool)
2017-03-26 16:38:07 -04:00
2017-03-26 16:43:00 -04:00
//Callback to setup the plugin and ask for different services
2017-03-26 16:38:07 -04:00
override def setup(pipeline: VexRiscv): Unit = {
import pipeline.config._
2017-07-17 08:19:28 -04:00
//Retrieve the DecoderService instance
2017-03-26 16:38:07 -04:00
val decoderService = pipeline.service(classOf[DecoderService])
2017-07-17 08:19:28 -04:00
//Specify the IS_SIMD_ADD default value when instruction are decoded
decoderService.addDefault(IS_SIMD_ADD, False)
//Specify the instruction decoding which should be applied when the instruction match the 'key' parttern
decoderService.add(
//Bit pattern of the new SIMD_ADD instruction
key = M"0000011----------000-----0110011",
//Decoding specification when the 'key' pattern is recognized in the instruction
List(
IS_SIMD_ADD -> True,
REGFILE_WRITE_VALID -> True, //Enable the register file write
BYPASSABLE_EXECUTE_STAGE -> True, //Notify the hazard management unit that the instruction result is already accessible in the EXECUTE stage (Bypass ready)
BYPASSABLE_MEMORY_STAGE -> True, //Same as above but for the memory stage
RS1_USE -> True, //Notify the hazard management unit that this instruction use the RS1 value
RS2_USE -> True //Same than above but for RS2.
)
)
2017-03-26 16:38:07 -04:00
}
override def build(pipeline: VexRiscv): Unit = {
import pipeline._
2017-07-17 08:19:28 -04:00
import pipeline.config._
2017-03-26 16:38:07 -04:00
//Add a new scope on the execute stage (used to give a name to signals)
execute plug new Area {
//Define some signals used internally to the plugin
val rs1 = execute.input(RS1).asUInt
//32 bits UInt value of the regfile[RS1]
val rs2 = execute.input(RS2).asUInt
val rd = UInt(32 bits)
//Do some computation
rd(7 downto 0) := rs1(7 downto 0) + rs2(7 downto 0)
rd(16 downto 8) := rs1(16 downto 8) + rs2(16 downto 8)
rd(23 downto 16) := rs1(23 downto 16) + rs2(23 downto 16)
rd(31 downto 24) := rs1(31 downto 24) + rs2(31 downto 24)
//When the instruction is a SIMD_ADD one, then write the result into the register file data path.
when(execute.input(IS_SIMD_ADD)) {
execute.output(REGFILE_WRITE_DATA) := rd.asBits
}
2017-03-26 16:38:07 -04:00
}
}
}
2017-05-19 11:13:33 -04:00
```
2017-07-17 08:19:28 -04:00
2018-06-19 04:53:31 -04:00
If you want to add this plugin to a given CPU, you just need to add it to its parameterized plugin list.
2017-07-17 08:19:28 -04:00
2018-06-19 04:39:37 -04:00
This example is a very simple one, but each plugin can really have access to the whole CPU:
2017-07-17 08:19:28 -04:00
- Halt a given stage of the CPU
- Unschedule instructions
- Emit an exception
- Introduce new instruction decoding specification
- Ask to jump the PC somewhere
- Read signals published by other plugins
- override published signals values
- Provide an alternative implementation
- ...
2018-07-21 12:03:22 -04:00
As a demonstrator, this SimdAddPlugin was integrated in the `src/main/scala/vexriscv/demo/GenCustomSimdAdd.scala` CPU configuration
2018-06-19 04:39:37 -04:00
and is self-tested by the `src/test/cpp/custom/simd_add` application by running the following commands :
```sh
# Generate the CPU
2019-01-03 14:07:38 -05:00
sbt "runMain vexriscv.demo.GenCustomSimdAdd"
cd src/test/cpp/regression/
# Optionally add TRACE=yes if you want to get the VCD waveform from the simulation.
# Also you have to know that by default, the testbench introduce instruction/data bus stall.
# Note the CUSTOM_SIMD_ADD flag is set to yes.
make clean run IBUS=SIMPLE DBUS=SIMPLE CSR=no MMU=no DEBUG_PLUGIN=no MUL=no DIV=no DHRYSTONE=no REDO=2 CUSTOM_SIMD_ADD=yes
```
2018-06-19 04:39:37 -04:00
To retrieve the plugin related signals in your waveform viewer, just filter with `simd`.
2017-10-16 05:31:03 -04:00
2018-02-07 19:07:51 -05:00
## Adding a new CSR via the plugin system
2018-07-21 12:03:22 -04:00
Here are two examples about how to add a custom CSR to the CPU via the plugin system:
2018-02-07 19:07:51 -05:00
https://github.com/SpinalHDL/VexRiscv/blob/master/src/main/scala/vexriscv/demo/CustomCsrDemoPlugin.scala
2018-06-19 04:41:24 -04:00
The first one (`CustomCsrDemoPlugin`) adds an instruction counter and a clock cycle counter into the CSR mapping (and also do tricky stuff as a demonstration).
2018-06-19 04:39:37 -04:00
The second one (`CustomCsrDemoGpioPlugin`) creates a GPIO peripheral directly mapped into the CSR.
2018-02-07 19:07:51 -05:00
2017-10-16 05:31:03 -04:00
## CPU clock and resets
2018-06-19 04:53:31 -04:00
Without the debug plugin, the CPU will have a standard `clk` input and a `reset` input. But with the debug plugin the situation is the following :
2017-10-16 05:31:03 -04:00
- clk : As before, the clock which drive the whole CPU design, including the debug logic
- reset : Reset all the CPU states excepted the debug logics
- debugReset : Reset the debug logic of the CPU
2018-06-19 04:39:37 -04:00
- debug_resetOut : a CPU output signal which allows the JTAG to reset the CPU + the memory interconnect + the peripherals
2017-10-16 05:31:03 -04:00
2018-06-19 04:39:37 -04:00
So here is the reset interconnect in case you use the debug plugin :
2017-10-16 05:31:03 -04:00
```
VexRiscv
2017-11-06 18:12:58 -05:00
+------------------+
| |
toplevelReset >----+--------> debugReset |
| | |
| +-----< debug_resetOut |
| | | |
+--or>-+-> reset |
| | |
| +------------------+
|
+-> Interconnect / Peripherals
2017-10-16 05:31:03 -04:00
```
## VexRiscv Architecture
2018-07-21 12:03:22 -04:00
VexRiscv is implemented via a 5 stage in-order pipeline on which many optional and complementary plugins add functionalities to provide a functional RISC-V CPU.
This approach is completely unconventional and only possible through meta hardware description languages (SpinalHDL in the current case) but has proven its advantages
2018-06-19 04:53:31 -04:00
via the VexRiscv implementation:
- You can swap/turn on/turn off parts of the CPU directly via the plugin system
- You can add new functionalities/instruction without having to modify any sources code of the CPU
2018-06-19 04:39:37 -04:00
- It allows the CPU configuration to cover a very large spectrum of implementation without cooking spaghetti code
- It allows your code base to truly produce a parametrized CPU design
2018-07-21 12:03:22 -04:00
If you generate the CPU without any plugin, it will only contain the definition of the 5 pipeline stages and their basic arbitration, but nothing else,
2018-06-19 04:39:37 -04:00
as everything else, including the program counter is added into the CPU via plugins.
2018-02-08 18:56:14 -05:00
### Plugins
2018-06-19 04:39:37 -04:00
This chapter describes plugins currently implemented.
2018-03-22 10:24:56 -04:00
- [IBusSimplePlugin](#ibussimpleplugin)
- [IBusCachedPlugin](#ibuscachedplugin)
- [DecoderSimplePlugin](#decodersimpleplugin)
- [RegFilePlugin](#regfileplugin)
- [HazardSimplePlugin](#hazardsimpleplugin)
- [SrcPlugin](#srcplugin)
- [IntAluPlugin](#intaluplugin)
- [LightShifterPlugin](#lightshifterplugin)
- [FullBarrelShifterPlugin](#fullbarrelshifterplugin)
2018-03-22 10:24:56 -04:00
- [BranchPlugin](#branchplugin)
- [DBusSimplePlugin](#dbussimpleplugin)
- [DBusCachedPlugin](#dbuscachedplugin)
- [MulPlugin](#mulplugin)
- [DivPlugin](#divplugin)
- [MulDivIterativePlugin](#muldiviterativeplugin)
- [CsrPlugin](#csrplugin)
- [StaticMemoryTranslatorPlugin](#staticmemorytranslatorplugin)
- [MemoryTranslatorPlugin](#memorytranslatorplugin)
- [DebugPlugin](#debugplugin)
- [YamlPlugin](#yamlplugin)
#### IBusSimplePlugin
2018-06-15 07:00:59 -04:00
This plugin implement the CPU frontend (instruction fetch) via a very simple and neutral memory interface going outside the CPU.
| Parameters | type | description |
2019-03-21 01:17:07 -04:00
| ------ | ----------- | ------ |
| catchAccessFault | Boolean | When true, an instruction read response with read error asserted results in a CPU exception trap. |
| resetVector | BigInt | Address of the program counter after the reset. |
| cmdForkOnSecondStage | Boolean | When false, branches immediately update the program counter. This minimizes branch penalties but might reduce FMax because the instruction bus address signal is a combinatorial path. When true, this combinatorial path is removed and the program counter is updated one cycle after a branch is detected. While FMax may improve, an additional branch penalty will be incurred as well. |
| cmdForkPersistence | Boolean | When false, requests on the iBus can disappear/change before they are acknowledged. This reduces area but isn't safe/supported by many arbitration/slaves. When true, once initiated, iBus requests will stay until they are acknowledged. |
| compressedGen | Boolean | Enable RISC-V compressed instruction (RVC) support. |
| busLatencyMin | Int | Specifies the minimal latency between the iBus.cmd and iBus.rsp. A corresponding number of stages are added to the frontend to keep the IPC to 1.|
| injectorStage | Boolean | When true, a stage between the frontend and the decode stage of the CPU is added to improve FMax. (busLatencyMin + injectorStage) should be at least two. |
| prediction | BranchPrediction | Can be set to NONE/STATIC/DYNAMIC/DYNAMIC_TARGET to specify the branch predictor implementation. See below for more details. |
| historyRamSizeLog2 | Int | Specify the number of entries in the direct mapped prediction cache of DYNAMIC/DYNAMIC_TARGET implementation. 2 pow historyRamSizeLog2 entries. |
Here is the SimpleBus interface definition:
```scala
case class IBusSimpleCmd() extends Bundle{
val pc = UInt(32 bits)
}
case class IBusSimpleRsp() extends Bundle with IMasterSlave{
val error = Bool
val inst = Bits(32 bits)
override def asMaster(): Unit = {
out(error,inst)
}
}
case class IBusSimpleBus(interfaceKeepData : Boolean) extends Bundle with IMasterSlave{
var cmd = Stream(IBusSimpleCmd())
var rsp = Flow(IBusSimpleRsp())
override def asMaster(): Unit = {
master(cmd)
slave(rsp)
}
}
```
**Important** : Checkout the cmdForkPersistence parameter, because if it's not set, it can break the iBus compatibility with your memory system (unless you externaly add some buffers)
Setting cmdForkPersistence and cmdForkOnSecondStage improves iBus cmd timings.
2019-03-21 01:17:07 -04:00
The iBusSimplePlugin includes bridges to convert from the IBusSimpleBus to AXI4, Avalon, and Wishbone interfaces.
2019-03-21 01:17:07 -04:00
This plugin implements a jump interface that allows all other plugins to issue a jump:
2018-06-15 07:00:59 -04:00
```scala
trait JumpService{
def createJumpInterface(stage : Stage) : Flow[UInt]
}
```
2019-03-21 01:17:07 -04:00
The stage argument specifies the stage from which the jump is asked. This allows the PcManagerSimplePlugin plugin to manage priorities between jump requests from
diffent stages.
#### IBusCachedPlugin
2018-06-19 04:39:37 -04:00
Simple and light multi-way instruction cache.
2018-02-18 17:48:20 -05:00
| Parameters | type | description |
2018-07-21 12:03:22 -04:00
| ------ | ----------- | ------ |
2019-03-21 01:34:15 -04:00
| resetVector | BigInt | Address of the program counter after the reset. |
| relaxedPcCalculation | Boolean | When false, branches immediately update the program counter. This minimizes branch penalties but might reduce FMax because the instruction bus address signal is a combinatorial path. When true, this combinatorial path is removed and the program counter is updated one cycle after a branch is detected. While FMax may improve, an additional branch penalty will be incurred as well. |
| prediction | BranchPrediction | Can be set to NONE/STATIC/DYNAMIC/DYNAMIC_TARGET to specify the branch predictor implementation. See below for more details. |
2018-06-15 07:00:59 -04:00
| historyRamSizeLog2 | Int | Specify the number of entries in the direct mapped prediction cache of DYNAMIC/DYNAMIC_TARGET implementation. 2 pow historyRamSizeLog2 entries |
2019-03-21 01:34:15 -04:00
| compressedGen | Boolean | Enable RISC-V compressed instruction (RVC) support. |
| config.cacheSize | Int | Total storage capacity of the cache in bytes. |
| config.bytePerLine | Int | Number of bytes per cache line |
| config.wayCount | Int | Number of cache ways |
| config.twoCycleRam | Boolean | Check the tags values in the decode stage instead of the fetch stage to relax timings |
| config.asyncTagMemory | Boolean | Read the cache tags in an asynchronous manner instead of syncronous one |
| config.addressWidth | Int | CPU address width. Should be 32 |
| config.cpuDataWidth | Int | CPU data width. Should be 32 |
| config.memDataWidth | Int | Memory data width. Could potentialy be something else than 32, but only 32 is currently tested |
| config.catchIllegalAccess | Boolean | Catch when a memory access is done on non-valid memory address (MMU) |
| config.catchAccessFault | Boolean | Catch when the memeory bus is responding with an error |
| config.catchMemoryTranslationMiss | Boolean | Catch when the MMU miss a TLB |
Note: If you enable the twoCycleRam option and if wayCount is bigger than one, then the register file plugin should be configured to read the regFile in an asynchronous manner.
#### DecoderSimplePlugin
2018-07-21 12:03:22 -04:00
This plugin provides instruction decoding capabilities to others plugins.
2018-06-19 04:39:37 -04:00
2019-03-21 01:53:27 -04:00
For instance, for a given instruction, the pipeline hazard plugin needs to know if it uses the register file source 1/2 in order to stall the pipeline until the hazard is gone.
2018-07-21 12:03:22 -04:00
To provide this kind of information, each plugin which implements an instruction documents this kind of information to the DecoderSimplePlugin plugin.
| Parameters | type | description |
2018-07-21 12:03:22 -04:00
| ------ | ----------- | ------ |
2019-03-21 01:53:27 -04:00
| catchIllegalInstruction | Boolean | When true, instructions that don't match a decoding specification will generate a trap exception |
2018-07-21 12:03:22 -04:00
Here is a usage example :
```scala
//Specify the instruction decoding which should be applied when the instruction match the 'key' pattern
decoderService.add(
//Bit pattern of the new instruction
key = M"0000011----------000-----0110011",
//Decoding specification when the 'key' pattern is recognized in the instruction
List(
2019-03-21 01:53:27 -04:00
IS_SIMD_ADD -> True, //Inform the pipeline that the current instruction is a SIMD_ADD instruction
REGFILE_WRITE_VALID -> True, //Notify the hazard management unit that this instruction writes to the register file
BYPASSABLE_EXECUTE_STAGE -> True, //Notify the hazard management unit that the instruction result is already accessible in the EXECUTE stage (Bypass ready)
BYPASSABLE_MEMORY_STAGE -> True, //Same as above but for the memory stage
2019-03-21 01:53:27 -04:00
RS1_USE -> True, //Notify the hazard management unit that this instruction uses the RS1 value
RS2_USE -> True //Same than above but for RS2.
)
)
}
```
2018-06-19 04:39:37 -04:00
This plugin operates in the Decode stage.
#### RegFilePlugin
2018-06-19 04:39:37 -04:00
This plugin implements the register file.
| Parameters | type | description |
2018-07-21 12:03:22 -04:00
| ------ | ----------- | ------ |
2018-06-19 04:39:37 -04:00
| regFileReadyKind | RegFileReadKind | Can bet set to ASYNC or SYNC. Specifies the kind of memory read used to implement the register file. ASYNC means zero cycle latency memory read, while SYNC means one cycle latency memory read which can be mapped into standard FPGA memory blocks |
| zeroBoot | Boolean | Load all registers with zeroes at the beginning of simulations to keep everything deterministic in logs/traces|
2018-06-19 04:41:24 -04:00
This register file use a `don't care` read-during-write policy, so the bypassing/hazard plugin should take care of this.
#### HazardSimplePlugin
2018-07-21 12:03:22 -04:00
This plugin checks the pipeline instruction dependencies and, if necessary or possible, will stop the instruction in the decoding stage or bypass the instruction results
2018-06-19 04:39:37 -04:00
from the later stages to the decode stage.
2018-06-19 04:39:37 -04:00
Since the register file is implemented with a `don't care` read-during-write policy, this plugin also manages these kind of hazards.
| Parameters | type | description |
2018-07-21 12:03:22 -04:00
| ------ | ----------- | ------ |
2018-06-19 04:39:37 -04:00
| bypassExecute | Boolean | Enable the bypassing of instruction results coming from the Execute stage |
| bypassMemory | Boolean | Enable the bypassing of instruction results coming from the Memory stage |
| bypassWriteBack | Boolean | Enable the bypassing of instruction results coming from the WriteBack stage |
| bypassWriteBackBuffer | Boolean | Enable the bypassing of the previous cycle register file written value |
#### SrcPlugin
2018-06-19 04:39:37 -04:00
This plugin muxes different input values to produce SRC1/SRC2/SRC_ADD/SRC_SUB/SRC_LESS values which are common values used by many plugins in the execute stage (ALU/Branch/Load/Store).
| Parameters | type | description |
2018-07-21 12:03:22 -04:00
| ------ | ----------- | ------ |
2018-06-19 04:39:37 -04:00
| separatedAddSub | RegFileReadKind | By default SRC_ADD/SRC_SUB are generated from a single controllable adder/substractor, but if this is set to true, it use separate adder/substractors |
| executeInsertion | Boolean | By default SRC1/SRC2 are generated in the Decode stage, but if this parameter is true, it is done in the Execute stage (It will relax the bypassing network) |
2018-06-19 04:39:37 -04:00
Except for SRC1/SRC2, this plugin does everything at the begining of Execute stage.
#### IntAluPlugin
2018-06-19 04:39:37 -04:00
This plugin implements all ADD/SUB/SLT/SLTU/XOR/OR/AND/LUI/AUIPC instructions in the execute stage by using the SrcPlugin outputs. It is a realy simple plugin.
The result is injected into the pipeline directly at the end of the execute stage.
#### LightShifterPlugin
2018-06-19 04:39:37 -04:00
Implements SLL/SRL/SRA instructions by using an iterative shifter register, while using one cycle per bit shift.
The result is injected into the pipeline directly at the end of the execute stage.
#### FullBarrelShifterPlugin
2018-06-19 04:41:24 -04:00
Implements SLL/SRL/SRA instructions by using a full barrel shifter, so it execute all shifts in a single cycle.
| Parameters | type | description |
2018-07-21 12:03:22 -04:00
| ------ | ----------- | ------ |
| earlyInjection | Boolean | By default the result of the shift is injected into the pipeline in the Memory stage to relax timings, but if this option is true it will be done in the Execute stage |
#### BranchPlugin
2018-06-15 07:00:59 -04:00
This plugin implement all branch/jump instructions (JAL/JALR/BEQ/BNE/BLT/BGE/BLTU/BGEU) with primitives used by the cpu frontend plugins to implement branch prediction. The prediction implementation is set in the frontend plugins (IBusX)
| Parameters | type | description |
2018-07-21 12:03:22 -04:00
| ------ | ----------- | ------ |
| earlyBranch | Boolean | By default the branch is done in the Memory stage to relax timings, but if this option is set it's done in the Execute stage|
| catchAddressMisaligned | Boolean | If a jump/branch is done in an unaligned PC address, it will fire an trap exception |
Each miss predicted jumps will produce between 2 and 4 cycles penalty depending the `earlyBranch` and the `PcManagerSimplePlugin.relaxedPcCalculation` configurations
##### Prediction NONE
2018-06-19 04:39:37 -04:00
No prediction: each PC change due to a jump/branch will produce a penalty.
##### Prediction STATIC
2018-07-21 12:03:22 -04:00
In the decode stage, a conditional branch pointing backwards or a JAL is branched speculatively. If the speculation is right, the branch penalty is reduced to a single cycle,
2018-06-19 04:39:37 -04:00
otherwise the standard penalty is applied.
##### Prediction DYNAMIC
2018-06-19 04:39:37 -04:00
Same as the STATIC prediction, except that to do the prediction, it use a direct mapped 2 bit history cache (BHT) which remembers if the branch is more likely to be taken or not.
##### Prediction DYNAMIC_TARGET
2018-07-21 12:03:22 -04:00
This predictor uses a direct mapped branch target buffer (BTB) in the Fetch stage which store the PC of the instruction, the target PC of the instruction and a 2 bit history to remember
if the branch is more likely to be taken or not. This is the most efficient branch predictor actualy implemented on VexRiscv as when the branch prediction is right, it produce no branch penalty.
2018-06-19 04:39:37 -04:00
The down side is that this predictor has a long combinatorial path coming from the prediction cache read port to the programm counter by passing through the jump interface.
#### DBusSimplePlugin
2019-04-21 08:41:27 -04:00
This plugin implements the load and store instructions (LB/LH/LW/LBU/LHU/LWU/SB/SH/SW) via a simple memory bus going out of the CPU.
| Parameters | type | description |
2018-07-21 12:03:22 -04:00
| ------ | ----------- | ------ |
2018-06-19 04:39:37 -04:00
| catchAddressMisaligned | Boolean | If a memory access is done to an unaligned memory address, it will fire a trap exception |
| catchAccessFault | Boolean | If a memory read returns an error, it will fire a trap exception |
| earlyInjection | Boolean | By default, the memory read values are injected into the pipeline in the WriteBack stage to relax the timings. If this parameter is true, it's done in the Memory stage |
2018-06-19 04:39:37 -04:00
Here is the DBusSimpleBus
```scala
case class DBusSimpleCmd() extends Bundle{
val wr = Bool
val address = UInt(32 bits)
val data = Bits(32 bit)
val size = UInt(2 bit)
}
case class DBusSimpleRsp() extends Bundle with IMasterSlave{
val ready = Bool
val error = Bool
val data = Bits(32 bit)
override def asMaster(): Unit = {
out(ready,error,data)
}
}
case class DBusSimpleBus() extends Bundle with IMasterSlave{
val cmd = Stream(DBusSimpleCmd())
val rsp = DBusSimpleRsp()
override def asMaster(): Unit = {
master(cmd)
slave(rsp)
}
}
```
2018-06-19 04:39:37 -04:00
Note that bridges are available to convert this interface into AXI4 and Avalon
2018-06-19 04:39:37 -04:00
There is at least one cycle latency between a cmd and the corresponding rsp. The rsp.ready flag should be false after a read cmd until the rsp is present.
#### DBusCachedPlugin
2019-04-21 08:41:27 -04:00
Multi way cache implementation with writh-through and allocate on read strategy. (Documentation is WIP)
#### MulPlugin
2018-07-21 12:03:22 -04:00
Implements the multiplication instruction from the RISC-V M extension. Its implementation was done in a FPGA friendly way by using 4 17*17 bit multiplications.
2018-06-19 04:39:37 -04:00
The processing is fully pipelined between the Execute/Memory/Writeback stage. The results of the instructions are always inserted in the WriteBack stage.
#### DivPlugin
2018-07-21 12:03:22 -04:00
Implements the division/modulo instruction from the RISC-V M extension. It is done in a simple iterative way which always takes 34 cycles. The result is inserted into the
2018-06-19 04:39:37 -04:00
Memory stage.
2018-03-10 06:57:42 -05:00
2018-07-21 12:03:22 -04:00
This plugin is now based on the MulDivIterativePlugin one.
2018-03-10 06:57:42 -05:00
#### MulDivIterativePlugin
2018-06-19 04:39:37 -04:00
This plugin implements the multiplication, division and modulo of the RISC-V M extension in an iterative way, which is friendly for small FPGAs that don't have DSP blocks.
2018-03-10 06:57:42 -05:00
2018-06-19 04:39:37 -04:00
This plugin is able to unroll the iterative calculation process to reduce the number of cycles used to execute mul/div instructions.
2018-03-10 06:57:42 -05:00
| Parameters | type | description |
2018-07-21 12:03:22 -04:00
| ------ | ----------- | ------ |
2018-06-19 04:39:37 -04:00
| genMul | Boolean | Enables multiplication support. Can be set to false if you want to use the MulPlugin instead |
| genDiv | Boolean | Enables division support |
| mulUnrollFactor | Int | Number of combinatorial stages used to speed up the multiplication, should be > 0 |
| divUnrollFactor | Int | Number of combinatorial stages used to speed up the division, should be > 0 |
2018-03-10 06:57:42 -05:00
2018-06-19 05:02:43 -04:00
The number of cycles used to execute a multiplication is '32/mulUnrollFactor'
The number of cycles used to execute a division is '32/divUnrollFactor + 1'
2018-03-10 06:57:42 -05:00
2018-06-19 04:39:37 -04:00
Both mul/div are processed into the memory stage (late result).
#### CsrPlugin
2018-07-21 12:03:22 -04:00
Implements most of the Machine mode and a few of the User mode registers as specified in the RISC-V priviledged spec.
2018-06-19 04:39:37 -04:00
The access mode of most of the CSR is parameterizable (NONE/READ_ONLY/WRITE_ONLY/READ_WRITE) to reduce the area usage of unneeded features.
(CsrAccess can be NONE/READ_ONLY/WRITE_ONLY/READ_WRITE)
| Parameters | type | description |
2018-07-21 12:03:22 -04:00
| ------ | ----------- | ------ |
| catchIllegalAccess | Boolean | |
| mvendorid | BigInt | |
| marchid | BigInt | |
| mimpid | BigInt | |
| mhartid | BigInt | |
| misaExtensionsInit | Int | |
| misaAccess | CsrAccess | |
| mtvecAccess | CsrAccess | |
| mtvecInit | BigInt | |
| mepcAccess | CsrAccess | |
| mscratchGen | Boolean | |
| mcauseAccess | CsrAccess | |
| mbadaddrAccess | CsrAccess | |
| mcycleAccess | CsrAccess | |
| minstretAccess | CsrAccess | |
| ucycleAccess | CsrAccess | |
| wfiGen | Boolean | |
| ecallGen | Boolean | |
2018-06-19 04:39:37 -04:00
If an interrupt occurs, before jumping to mtvec, the plugin will stop the Prefetch stage and wait for all the instructions in the later pipeline stages to complete their execution.
2018-07-21 12:03:22 -04:00
If an exception occur, the plugin will kill the corresponding instruction, flush all previous instructions, and wait until the previously killed instructions reach the WriteBack
2018-06-19 04:39:37 -04:00
stage before jumping to mtvec.
#### StaticMemoryTranslatorPlugin
2018-06-19 04:39:37 -04:00
Static memory translator plugin which allows one to specify which range of the memory addresses is IO mapped and shouldn't be cached.
2019-04-21 08:41:27 -04:00
#### MmuPlugin
2019-04-21 08:41:27 -04:00
Hardware refilled MMU implementation. Allows others plugins such as DBusCachedPlugin/IBusCachedPlugin to instanciate memory address translation ports. Each port has a small dedicated
fully associative TLB cache which is refilled automaticaly via a dbus access sharing.
#### DebugPlugin
2018-07-21 12:03:22 -04:00
This plugin implements enough CPU debug features to allow comfortable GDB/Eclipse debugging. To access those debug features, it provides a simple memory bus interface.
2018-06-19 04:39:37 -04:00
The JTAG interface is provided by another bridge, which makes it possible to efficiently connect multiple CPUs to the same JTAG.
| Parameters | type | description |
2018-07-21 12:03:22 -04:00
| ------ | ----------- | ------ |
| debugClockDomain | ClockDomain | As the debug unit is able to reset the CPU itself, it should use another clock domain to avoid killing itself (only the reset wire should differ) |
2018-07-21 12:03:22 -04:00
The internals of the debug plugin are done in a manner which reduces the area usage and the FMax impact of this plugin.
2018-07-21 12:03:22 -04:00
Here is the simple bus to access it, the rsp come one cycle after the request :
```scala
case class DebugExtensionCmd() extends Bundle{
val wr = Bool
val address = UInt(8 bit)
val data = Bits(32 bit)
}
case class DebugExtensionRsp() extends Bundle{
val data = Bits(32 bit)
}
case class DebugExtensionBus() extends Bundle with IMasterSlave{
val cmd = Stream(DebugExtensionCmd())
2018-07-21 12:03:22 -04:00
val rsp = DebugExtensionRsp()
override def asMaster(): Unit = {
master(cmd)
in(rsp)
}
}
2018-07-21 12:03:22 -04:00
```
2018-07-21 12:03:22 -04:00
Here is the register mapping :
```
Read address 0x00 ->
bit 0 : resetIt
bit 1 : haltIt
bit 2 : isPipBusy
bit 3 : haltedByBreak
bit 4 : stepIt
Write address 0x00 ->
bit 4 : stepIt
bit 16 : set resetIt
bit 17 : set haltIt
bit 24 : clear resetIt
bit 25 : clear haltIt and haltedByBreak
2018-07-21 12:03:22 -04:00
Read Address 0x04 ->
bits (31 downto 0) : Last value written into the register file
Write Address 0x04 ->
bits (31 downto 0) : Instruction that should be pushed into the CPU pipeline for debug purposes
```
The OpenOCD port is there :
https://github.com/SpinalHDL/openocd_riscv
#### YamlPlugin
2018-07-21 12:03:22 -04:00
This plugin offers a service to others plugins to generate a usefull Yaml file about the CPU configuration. It contains, for instance, the sequence of instruction required
2018-06-19 04:39:37 -04:00
to flush the data cache (information used by openocd).