Readme: fix spelling and made some clarifications

This commit is contained in:
japm48 2020-02-11 18:18:39 +01:00 committed by GitHub
parent ee36c36fdd
commit fd961dccb5
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 91 additions and 93 deletions

158
README.md
View File

@ -7,7 +7,7 @@
- [CPU generation](#cpu-generation) - [CPU generation](#cpu-generation)
- [Regression tests](#regression-tests) - [Regression tests](#regression-tests)
- [Interactive debug of the simulated CPU via GDB OpenOCD and Verilator](#interactive-debug-of-the-simulated-cpu-via-gdb-openocd-and-verilator) - [Interactive debug of the simulated CPU via GDB OpenOCD and Verilator](#interactive-debug-of-the-simulated-cpu-via-gdb-openocd-and-verilator)
- [Using Eclipse to run the software and debug it](#using-Eclipse-to-run-the-software-and-debug-it) - [Using Eclipse to run and debug the software](#using-Eclipse-to-run-and-debug-the-software)
* [By using gnu-mcu-eclipse](#by-using-gnu-mcu-eclipse) * [By using gnu-mcu-eclipse](#by-using-gnu-mcu-eclipse)
* [By using Zylin plugin (old)](#by-using-zylin-plugin-old) * [By using Zylin plugin (old)](#by-using-zylin-plugin-old)
- [Briey SoC](#briey-soc) - [Briey SoC](#briey-soc)
@ -30,14 +30,14 @@ This repository hosts a RISC-V implementation written in SpinalHDL. Here are som
- RV32I[M][C][A] instruction set (Atomic only inside a single core) - RV32I[M][C][A] instruction set (Atomic only inside a single core)
- Pipelined from 2 to 5+ stages ([Fetch*X], Decode, Execute, [Memory], [WriteBack]) - Pipelined from 2 to 5+ stages ([Fetch*X], Decode, Execute, [Memory], [WriteBack])
- 1.44 DMIPS/Mhz --no-inline when nearly all features are enabled (1.57 DMIPS/Mhz when the divider lookup table is enabled) - 1.44 DMIPS/Mhz --no-inline when nearly all features are enabled (1.57 DMIPS/Mhz when the divider lookup table is enabled)
- Optimized for FPGA, do not use any vendor specific IP block / primitive - Optimized for FPGA, does not use any vendor specific IP block / primitive
- AXI4, Avalon, wishbone ready - AXI4, Avalon, wishbone ready
- Optional MUL/DIV extensions - Optional MUL/DIV extensions
- Optional instruction and data caches - Optional instruction and data caches
- Optional hardware refilled MMU - Optional hardware refilled MMU
- Optional debug extension allowing Eclipse debugging via a GDB >> openOCD >> JTAG connection - Optional debug extension allowing Eclipse debugging via a GDB >> openOCD >> JTAG connection
- Optional interrupts and exception handling with Machine, [Supervisor] and [User] modes as defined in the [RISC-V Privileged ISA Specification v1.10](https://riscv.org/specifications/privileged-isa/). - Optional interrupts and exception handling with Machine, [Supervisor] and [User] modes as defined in the [RISC-V Privileged ISA Specification v1.10](https://riscv.org/specifications/privileged-isa/).
- Two implementations of shift instructions: Single cycle and shiftNumber cycles - Two implementations of shift instructions: single cycle (full barrel shifter) and shiftNumber cycles
- Each stage can have optional bypass or interlock hazard logic - Each stage can have optional bypass or interlock hazard logic
- Linux compatible (SoC : https://github.com/enjoy-digital/linux-on-litex-vexriscv) - Linux compatible (SoC : https://github.com/enjoy-digital/linux-on-litex-vexriscv)
- Zephyr compatible - Zephyr compatible
@ -60,7 +60,7 @@ For commercial support, please contact spinalhdl@gmail.com.
The following numbers were obtained by synthesizing the CPU as toplevel without any specific synthesis options to save area or to get better maximal frequency (neutral).<br> The following numbers were obtained by synthesizing the CPU as toplevel without any specific synthesis options to save area or to get better maximal frequency (neutral).<br>
The clock constraint is set to an unattainable value, which tends to increase the design area.<br> The clock constraint is set to an unattainable value, which tends to increase the design area.<br>
The dhrystone benchmark was compiled with the `-O3 -fno-inline` option.<br> The dhrystone benchmark was compiled with the `-O3 -fno-inline` option.<br>
All the cached configurations have some cache trashing during the dhrystone benchmark except the `VexRiscv full max perf` one. This of course reduces the performance. It is possible to produce All the cached configurations have some cache trashing during the dhrystone benchmark except the `VexRiscv full max perf` one. This, of course, reduces the performance. It is possible to produce
dhrystone binaries which fit inside a 4KB I$ and 4KB D$ (I already had this case once) but currently it isn't the case.<br> dhrystone binaries which fit inside a 4KB I$ and 4KB D$ (I already had this case once) but currently it isn't the case.<br>
The CPU configurations used below can be found in the `src/scala/vexriscv/demo` directory. The CPU configurations used below can be found in the `src/scala/vexriscv/demo` directory.
@ -126,7 +126,7 @@ The following configuration results in 1.44 DMIPS/MHz:
- single cycle multiplication with bypassing in the WB stage (late result) - single cycle multiplication with bypassing in the WB stage (late result)
- dynamic branch prediction done in the F stage with a direct mapped target buffer cache (no penalties on correct predictions) - dynamic branch prediction done in the F stage with a direct mapped target buffer cache (no penalties on correct predictions)
Note that recently, the capability to remove the Fetch/Memory/WriteBack stage was added to reduce the area of the CPU, which end up with a smaller CPU and a better DMIPS/Mhz for the small configurations. Note that, recently, the capability to remove the Fetch/Memory/WriteBack stage was added to reduce the area of the CPU, which ends up with a smaller CPU and a better DMIPS/Mhz for the small configurations.
## Dependencies ## Dependencies
@ -146,7 +146,7 @@ sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 2EE0EA64E40A89
sudo apt-get update sudo apt-get update
sudo apt-get install sbt sudo apt-get install sbt
# Verilator (for sim only, realy need 3.9+, in general apt-get will give you 3.8) # Verilator (for sim only, really needs 3.9+, in general apt-get will give you 3.8)
sudo apt-get install git make autoconf g++ flex bison sudo apt-get install git make autoconf g++ flex bison
git clone http://git.veripool.org/git/verilator # Only first time git clone http://git.veripool.org/git/verilator # Only first time
unsetenv VERILATOR_ROOT # For csh; ignore error if on bash unsetenv VERILATOR_ROOT # For csh; ignore error if on bash
@ -162,10 +162,10 @@ sudo make install
## CPU generation ## CPU generation
You can find two example CPU instances in: You can find two example CPU instances in:
- src/main/scala/vexriscv/demo/GenFull.scala - `src/main/scala/vexriscv/demo/GenFull.scala`
- src/main/scala/vexriscv/demo/GenSmallest.scala - `src/main/scala/vexriscv/demo/GenSmallest.scala`
To generate the corresponding RTL as a VexRiscv.v file, run the following commands in the root directory of this repository: To generate the corresponding RTL as a `VexRiscv.v` file, run the following commands in the root directory of this repository:
```sh ```sh
sbt "runMain vexriscv.demo.GenFull" sbt "runMain vexriscv.demo.GenFull"
@ -186,11 +186,11 @@ NOTES:
[![Build Status](https://travis-ci.org/SpinalHDL/VexRiscv.svg?branch=master)](https://travis-ci.org/SpinalHDL/VexRiscv) [![Build Status](https://travis-ci.org/SpinalHDL/VexRiscv.svg?branch=master)](https://travis-ci.org/SpinalHDL/VexRiscv)
To run tests (need the verilator simulator), go in the src/test/cpp/regression folder and run : To run tests (Verilator simulator is required), go in the `src/test/cpp/regression` folder and run:
```sh ```sh
# To test the GenFull CPU # To test the GenFull CPU
# (Don't worry about the CSR test not passing, basicaly the GenFull isn't the truly full version of the CPU, some CSR features are disable in it) # (Don't worry about the CSR test not passing, basically the GenFull isn't the truly full version of the CPU, some CSR features are disabled in it)
make clean run make clean run
# To test the GenSmallest CPU # To test the GenSmallest CPU
@ -203,16 +203,16 @@ The self-test includes:
- 24 FreeRTOS tests - 24 FreeRTOS tests
- Some handwritten tests to check the CSR, debug module and MMU plugins - Some handwritten tests to check the CSR, debug module and MMU plugins
You can enable FreeRTOS tests by adding `FREERTOS=yes` to the command line, but it will take time to run. Also, it uses THREAD_COUNT host CPU threads to run multiple regression in parallel. You can enable FreeRTOS tests by adding `FREERTOS=yes` to the command line, but it will take time to run. Also, it uses `THREAD_COUNT` host CPU threads to run multiple regression tests in parallel.
## Interactive debug of the simulated CPU via GDB OpenOCD and Verilator ## Interactive debug of the simulated CPU via GDB OpenOCD and Verilator
It's as described to run tests, but you just have to add `DEBUG_PLUGIN_EXTERNAL=yes` in the make arguments. To use this, you just need to use the same command as with running tests, but adding `DEBUG_PLUGIN_EXTERNAL=yes` in the make arguments.
Work for the GenFull, but not for the GenSmallest as this configuration has no debug module. This works for the `GenFull` configuration, but not for `GenSmallest`, as this configuration has no debug module.
Then you can use the https://github.com/SpinalHDL/openocd_riscv tool to create a GDB server connected to the target (the simulated CPU) Then, you can use the [OpenOCD RISC-V](https://github.com/SpinalHDL/openocd_riscv) tool to create a GDB server connected to the target (the simulated CPU), as follows:
```sh ```sh
#in the VexRiscv repository, to run the simulation on which one OpenOCD can connect itself => #In the VexRiscv repository, to run the simulation on which one OpenOCD can connect itself =>
sbt "runMain vexriscv.demo.GenFull" sbt "runMain vexriscv.demo.GenFull"
cd src/test/cpp/regression cd src/test/cpp/regression
make run DEBUG_PLUGIN_EXTERNAL=yes make run DEBUG_PLUGIN_EXTERNAL=yes
@ -230,15 +230,15 @@ continue
# Now it should print messages in the Verilator simulation of the CPU # Now it should print messages in the Verilator simulation of the CPU
``` ```
## Using Eclipse to run the software and debug it ## Using Eclipse to run and debug the software
### By using gnu-mcu-eclipse ### By using gnu-mcu-eclipse
You can download releases of the IDE here : https://github.com/gnu-mcu-eclipse/org.eclipse.epp.packages/releases You can download releases of the IDE here: <https://github.com/gnu-mcu-eclipse/org.eclipse.epp.packages/releases>
In the IDE, you can import a makefile project by: In the IDE, you can import a makefile project by:
- file -> import -> C/C++ -> existing Code as Makefile Project - file -> import -> C/C++ -> existing Code as Makefile Project
- Select the folder which contain the makefile, select "Cross GCC" (not "RISC-V Cross GCC") - Select the folder which contains the makefile, then select "Cross GCC" (not "RISC-V Cross GCC")
To create a new debug configuration: To create a new debug configuration:
- run -> Debug Configurations -> GDB OpenOCD Debugging double click - run -> Debug Configurations -> GDB OpenOCD Debugging double click
@ -257,13 +257,13 @@ cd eclipse
See https://drive.google.com/drive/folders/1NseNHH05B6lmIXqQFVwK8xRjWE4ydeG-?usp=sharing to import a makefile project and create a debug configuration. See https://drive.google.com/drive/folders/1NseNHH05B6lmIXqQFVwK8xRjWE4ydeG-?usp=sharing to import a makefile project and create a debug configuration.
Note that sometime this Eclipse need to be restarted in order to be able to place new breakpoints. Note that sometimes Eclipse needs to be restarted in order to be able to place new breakpoints.
## Briey SoC ## Briey SoC
As a demonstrator, a SoC named Briey is implemented in `src/main/scala/vexriscv/demo/Briey.scala`. This SoC is very similar to As a demonstration, a SoC named Briey is implemented in `src/main/scala/vexriscv/demo/Briey.scala`. This SoC is very similar to
the [Pinsec SOC](https://spinalhdl.github.io/SpinalDoc/spinal/lib/pinsec/hardware/): the [Pinsec SoC](https://spinalhdl.github.io/SpinalDoc/spinal/lib/pinsec/hardware/):
![Alt text](assets/brieySoc.png?raw=true "") ![Briey SoC](assets/brieySoc.png?raw=true "")
To generate the Briey SoC Hardware: To generate the Briey SoC Hardware:
@ -271,7 +271,7 @@ To generate the Briey SoC Hardware:
sbt "runMain vexriscv.demo.Briey" sbt "runMain vexriscv.demo.Briey"
``` ```
To run the verilator simulation of the Briey SoC which can then be connected to OpenOCD/GDB, first get those dependencies: To run the verilator simulation of the Briey SoC, which can then be connected to OpenOCD/GDB, first get these dependencies:
```sh ```sh
sudo apt-get install build-essential xorg-dev libudev-dev libts-dev libgl1-mesa-dev libglu1-mesa-dev libasound2-dev libpulse-dev libopenal-dev libogg-dev libvorbis-dev libaudiofile-dev libpng12-dev libfreetype6-dev libusb-dev libdbus-1-dev zlib1g-dev libdirectfb-dev libsdl2-dev sudo apt-get install build-essential xorg-dev libudev-dev libts-dev libgl1-mesa-dev libglu1-mesa-dev libasound2-dev libpulse-dev libopenal-dev libogg-dev libvorbis-dev libaudiofile-dev libpng12-dev libfreetype6-dev libusb-dev libdbus-1-dev zlib1g-dev libdirectfb-dev libsdl2-dev
@ -289,7 +289,7 @@ To connect OpenOCD (https://github.com/SpinalHDL/openocd_riscv) to the simulatio
src/openocd -f tcl/interface/jtag_tcp.cfg -c "set BRIEY_CPU0_YAML /home/spinalvm/Spinal/VexRiscv/cpu0.yaml" -f tcl/target/briey.cfg src/openocd -f tcl/interface/jtag_tcp.cfg -c "set BRIEY_CPU0_YAML /home/spinalvm/Spinal/VexRiscv/cpu0.yaml" -f tcl/target/briey.cfg
``` ```
You can find multiple software examples and demos here: https://github.com/SpinalHDL/VexRiscvSocSoftware/tree/master/projects/briey You can find multiple software examples and demos here: <https://github.com/SpinalHDL/VexRiscvSocSoftware/tree/master/projects/briey>
You can find some FPGA projects which instantiate the Briey SoC here (DE1-SoC, DE0-Nano): https://drive.google.com/drive/folders/0B-CqLXDTaMbKZGdJZlZ5THAxRTQ?usp=sharing You can find some FPGA projects which instantiate the Briey SoC here (DE1-SoC, DE0-Nano): https://drive.google.com/drive/folders/0B-CqLXDTaMbKZGdJZlZ5THAxRTQ?usp=sharing
@ -313,7 +313,7 @@ Murax is a very light SoC (it fits in an ICE40 FPGA) which can work without any
- one 16 bits prescaler, two 16 bits timers - one 16 bits prescaler, two 16 bits timers
- one UART with tx/rx fifo - one UART with tx/rx fifo
Depending the CPU configuration, on the ICE40-hx8k FPGA with icestorm for synthesis, the full SoC has the following area/performance : Depending on the CPU configuration, on the ICE40-hx8k FPGA with icestorm for synthesis, the full SoC has the following area/performance:
- RV32I interlocked stages => 51 Mhz, 2387 LC 0.45 DMIPS/Mhz - RV32I interlocked stages => 51 Mhz, 2387 LC 0.45 DMIPS/Mhz
- RV32I bypassed stages => 45 Mhz, 2718 LC 0.65 DMIPS/Mhz - RV32I bypassed stages => 45 Mhz, 2718 LC 0.65 DMIPS/Mhz
@ -333,7 +333,7 @@ The demo program included by default with `MuraxWithRamInit` will blink the
LEDs and echo characters received on the UART back to the user. To see this LEDs and echo characters received on the UART back to the user. To see this
when running the Verilator sim, type some text and press enter. when running the Verilator sim, type some text and press enter.
Then go in src/test/cpp/murax and run the simulation with : Then go in `src/test/cpp/murax` and run the simulation with:
```sh ```sh
make clean run make clean run
@ -376,13 +376,13 @@ sbt "test:runMain vexriscv.MuraxSim"
## Running Linux ## Running Linux
A default configuration is located in src/main/scala/vexriscv/demo/Linux.scala A default configuration is located in `src/main/scala/vexriscv/demo/Linux.scala`.
This file also contains This file also contains
- The commands to compile the buildroot image - The commands to compile the buildroot image
- How to run the Verilator simulation in interative mode - How to run the Verilator simulation in interative mode
There is currently no SoC to run it on hardware, it is WIP. But the CPU simulation can already boot linux and run user space application (even python). There is currently no SoC to run it on hardware, it is WIP. But the CPU simulation can already boot linux and run user space applications (even python).
Note that VexRiscv can run Linux on both cache full and cache less design. Note that VexRiscv can run Linux on both cache full and cache less design.
@ -494,7 +494,7 @@ import spinal.core._
import vexriscv.plugin.Plugin import vexriscv.plugin.Plugin
import vexriscv.{Stageable, DecoderService, VexRiscv} import vexriscv.{Stageable, DecoderService, VexRiscv}
//This plugin example will add a new instruction named SIMD_ADD which do the following : //This plugin example will add a new instruction named SIMD_ADD which does the following:
// //
//RD : Regfile Destination, RS : Regfile Source //RD : Regfile Destination, RS : Regfile Source
//RD( 7 downto 0) = RS1( 7 downto 0) + RS2( 7 downto 0) //RD( 7 downto 0) = RS1( 7 downto 0) + RS2( 7 downto 0)
@ -509,7 +509,7 @@ import vexriscv.{Stageable, DecoderService, VexRiscv}
//Note : RS1, RS2, RD positions follow the RISC-V spec and are common for all instruction of the ISA //Note : RS1, RS2, RD positions follow the RISC-V spec and are common for all instruction of the ISA
class SimdAddPlugin extends Plugin[VexRiscv]{ class SimdAddPlugin extends Plugin[VexRiscv]{
//Define the concept of IS_SIMD_ADD signals, which specify if the current instruction is destined for ths plugin //Define the concept of IS_SIMD_ADD signals, which specify if the current instruction is destined for this plugin
object IS_SIMD_ADD extends Stageable(Bool) object IS_SIMD_ADD extends Stageable(Bool)
//Callback to setup the plugin and ask for different services //Callback to setup the plugin and ask for different services
@ -519,10 +519,10 @@ class SimdAddPlugin extends Plugin[VexRiscv]{
//Retrieve the DecoderService instance //Retrieve the DecoderService instance
val decoderService = pipeline.service(classOf[DecoderService]) val decoderService = pipeline.service(classOf[DecoderService])
//Specify the IS_SIMD_ADD default value when instruction are decoded //Specify the IS_SIMD_ADD default value when instructions are decoded
decoderService.addDefault(IS_SIMD_ADD, False) decoderService.addDefault(IS_SIMD_ADD, False)
//Specify the instruction decoding which should be applied when the instruction match the 'key' parttern //Specify the instruction decoding which should be applied when the instruction matches the 'key' parttern
decoderService.add( decoderService.add(
//Bit pattern of the new SIMD_ADD instruction //Bit pattern of the new SIMD_ADD instruction
key = M"0000011----------000-----0110011", key = M"0000011----------000-----0110011",
@ -533,8 +533,8 @@ class SimdAddPlugin extends Plugin[VexRiscv]{
REGFILE_WRITE_VALID -> True, //Enable the register file write REGFILE_WRITE_VALID -> True, //Enable the register file write
BYPASSABLE_EXECUTE_STAGE -> True, //Notify the hazard management unit that the instruction result is already accessible in the EXECUTE stage (Bypass ready) BYPASSABLE_EXECUTE_STAGE -> True, //Notify the hazard management unit that the instruction result is already accessible in the EXECUTE stage (Bypass ready)
BYPASSABLE_MEMORY_STAGE -> True, //Same as above but for the memory stage BYPASSABLE_MEMORY_STAGE -> True, //Same as above but for the memory stage
RS1_USE -> True, //Notify the hazard management unit that this instruction use the RS1 value RS1_USE -> True, //Notify the hazard management unit that this instruction uses the RS1 value
RS2_USE -> True //Same than above but for RS2. RS2_USE -> True //Same as above but for RS2.
) )
) )
} }
@ -545,19 +545,19 @@ class SimdAddPlugin extends Plugin[VexRiscv]{
//Add a new scope on the execute stage (used to give a name to signals) //Add a new scope on the execute stage (used to give a name to signals)
execute plug new Area { execute plug new Area {
//Define some signals used internally to the plugin //Define some signals used internally by the plugin
val rs1 = execute.input(RS1).asUInt val rs1 = execute.input(RS1).asUInt
//32 bits UInt value of the regfile[RS1] //32 bits UInt value of the regfile[RS1]
val rs2 = execute.input(RS2).asUInt val rs2 = execute.input(RS2).asUInt
val rd = UInt(32 bits) val rd = UInt(32 bits)
//Do some computation //Do some computations
rd(7 downto 0) := rs1(7 downto 0) + rs2(7 downto 0) rd(7 downto 0) := rs1(7 downto 0) + rs2(7 downto 0)
rd(16 downto 8) := rs1(16 downto 8) + rs2(16 downto 8) rd(16 downto 8) := rs1(16 downto 8) + rs2(16 downto 8)
rd(23 downto 16) := rs1(23 downto 16) + rs2(23 downto 16) rd(23 downto 16) := rs1(23 downto 16) + rs2(23 downto 16)
rd(31 downto 24) := rs1(31 downto 24) + rs2(31 downto 24) rd(31 downto 24) := rs1(31 downto 24) + rs2(31 downto 24)
//When the instruction is a SIMD_ADD one, then write the result into the register file data path. //When the instruction is a SIMD_ADD, write the result into the register file data path.
when(execute.input(IS_SIMD_ADD)) { when(execute.input(IS_SIMD_ADD)) {
execute.output(REGFILE_WRITE_DATA) := rd.asBits execute.output(REGFILE_WRITE_DATA) := rd.asBits
} }
@ -572,14 +572,14 @@ This example is a very simple one, but each plugin can really have access to the
- Halt a given stage of the CPU - Halt a given stage of the CPU
- Unschedule instructions - Unschedule instructions
- Emit an exception - Emit an exception
- Introduce new instruction decoding specification - Introduce a new instruction decoding specification
- Ask to jump the PC somewhere - Ask to jump the PC somewhere
- Read signals published by other plugins - Read signals published by other plugins
- override published signals values - Override published signals values
- Provide an alternative implementation - Provide an alternative implementation
- ... - ...
As a demonstrator, this SimdAddPlugin was integrated in the `src/main/scala/vexriscv/demo/GenCustomSimdAdd.scala` CPU configuration As a demonstration, this SimdAddPlugin was integrated in the `src/main/scala/vexriscv/demo/GenCustomSimdAdd.scala` CPU configuration
and is self-tested by the `src/test/cpp/custom/simd_add` application by running the following commands: and is self-tested by the `src/test/cpp/custom/simd_add` application by running the following commands:
```sh ```sh
@ -589,7 +589,7 @@ sbt "runMain vexriscv.demo.GenCustomSimdAdd"
cd src/test/cpp/regression/ cd src/test/cpp/regression/
# Optionally add TRACE=yes if you want to get the VCD waveform from the simulation. # Optionally add TRACE=yes if you want to get the VCD waveform from the simulation.
# Also you have to know that by default, the testbench introduce instruction/data bus stall. # Also you have to know that, by default, the testbench introduce instruction/data bus stall.
# Note the CUSTOM_SIMD_ADD flag is set to yes. # Note the CUSTOM_SIMD_ADD flag is set to yes.
make clean run IBUS=SIMPLE DBUS=SIMPLE CSR=no MMU=no DEBUG_PLUGIN=no MUL=no DIV=no DHRYSTONE=no REDO=2 CUSTOM_SIMD_ADD=yes make clean run IBUS=SIMPLE DBUS=SIMPLE CSR=no MMU=no DEBUG_PLUGIN=no MUL=no DIV=no DHRYSTONE=no REDO=2 CUSTOM_SIMD_ADD=yes
``` ```
@ -609,12 +609,12 @@ The second one (`CustomCsrDemoGpioPlugin`) creates a GPIO peripheral directly ma
Without the debug plugin, the CPU will have a standard `clk` input and a `reset` input. But with the debug plugin the situation is the following: Without the debug plugin, the CPU will have a standard `clk` input and a `reset` input. But with the debug plugin the situation is the following:
- clk : As before, the clock which drive the whole CPU design, including the debug logic - `clk`: as before, the clock which drives the whole CPU design, including the debug logic
- reset : Reset all the CPU states excepted the debug logics - `reset`: reset all the CPU states except the debug logic
- debugReset : Reset the debug logic of the CPU - `debugReset`: reset the debug logic of the CPU
- debug_resetOut : a CPU output signal which allows the JTAG to reset the CPU + the memory interconnect + the peripherals - `debug_resetOut`: a CPU output signal which allows the JTAG to reset the CPU + the memory interconnect + the peripherals
So here is the reset interconnect in case you use the debug plugin : So here is the reset interconnect, in case you use the debug plugin:
``` ```
VexRiscv VexRiscv
@ -635,20 +635,20 @@ toplevelReset >----+--------> debugReset |
## VexRiscv Architecture ## VexRiscv Architecture
VexRiscv is implemented via a 5 stage in-order pipeline on which many optional and complementary plugins add functionalities to provide a functional RISC-V CPU. VexRiscv is implemented via a 5 stage in-order pipeline on which many optional and complementary plugins add functionalities to provide a functional RISC-V CPU.
This approach is completely unconventional and only possible through meta hardware description languages (SpinalHDL in the current case) but has proven its advantages This approach is completely unconventional and only possible through meta hardware description languages (SpinalHDL, in the current case) but has proven its advantages
via the VexRiscv implementation: via the VexRiscv implementation:
- You can swap/turn on/turn off parts of the CPU directly via the plugin system - You can swap/turn on/turn off parts of the CPU directly via the plugin system
- You can add new functionalities/instruction without having to modify any sources code of the CPU - You can add new functionalities/instructions without having to modify any of the sources of the CPU
- It allows the CPU configuration to cover a very large spectrum of implementation without cooking spaghetti code - It allows the CPU configuration to cover a very large spectrum of implementations without cooking spaghetti code
- It allows your codebase to truly produce a parametrized CPU design - It allows your codebase to truly produce a parametrized CPU design
If you generate the CPU without any plugin, it will only contain the definition of the 5 pipeline stages and their basic arbitration, but nothing else, If you generate the CPU without any plugin, it will only contain the definition of the 5 pipeline stages and their basic arbitration, but nothing else,
as everything else, including the program counter is added into the CPU via plugins. and everything else, including the program counter is added into the CPU via plugins.
### Plugins ### Plugins
This chapter describes plugins currently implemented. This chapter describes the currently implemented plugins.
- [IBusSimplePlugin](#ibussimpleplugin) - [IBusSimplePlugin](#ibussimpleplugin)
- [IBusCachedPlugin](#ibuscachedplugin) - [IBusCachedPlugin](#ibuscachedplugin)
@ -674,7 +674,7 @@ This chapter describes plugins currently implemented.
#### IBusSimplePlugin #### IBusSimplePlugin
This plugin implement the CPU frontend (instruction fetch) via a very simple and neutral memory interface going outside the CPU. This plugin implements the CPU frontend (instruction fetch) via a very simple and neutral memory interface going outside the CPU.
| Parameters | type | description | | Parameters | type | description |
| ------ | ----------- | ------ | | ------ | ----------- | ------ |
@ -715,7 +715,7 @@ case class IBusSimpleBus(interfaceKeepData : Boolean) extends Bundle with IMaste
} }
``` ```
**Important** : Checkout the cmdForkPersistence parameter, because if it's not set, it can break the iBus compatibility with your memory system (unless you externaly add some buffers) **Important** : check out the cmdForkPersistence parameter, because if it is not set, it can break the iBus compatibility with your memory system (unless you externaly add some buffers).
Setting cmdForkPersistence and cmdForkOnSecondStage improves iBus cmd timings. Setting cmdForkPersistence and cmdForkOnSecondStage improves iBus cmd timings.
@ -759,10 +759,10 @@ Note: If you enable the twoCycleRam option and if wayCount is bigger than one, t
#### DecoderSimplePlugin #### DecoderSimplePlugin
This plugin provides instruction decoding capabilities to others plugins. This plugin provides instruction decoding capabilities to other plugins.
For instance, for a given instruction, the pipeline hazard plugin needs to know if it uses the register file source 1/2 in order to stall the pipeline until the hazard is gone. For instance, for a given instruction, the pipeline hazard plugin needs to know if it uses the register file source 1/2 in order to stall the pipeline until the hazard is gone.
To provide this kind of information, each plugin which implements an instruction documents this kind of information to the DecoderSimplePlugin plugin. Each plugin that implements an instruction provides this kind of information to the DecoderSimplePlugin plugin.
| Parameters | type | description | | Parameters | type | description |
| ------ | ----------- | ------ | | ------ | ----------- | ------ |
@ -771,7 +771,7 @@ To provide this kind of information, each plugin which implements an instruction
Here is a usage example: Here is a usage example:
```scala ```scala
//Specify the instruction decoding which should be applied when the instruction match the 'key' pattern //Specify the instruction decoding which should be applied when the instruction matches the 'key' pattern
decoderService.add( decoderService.add(
//Bit pattern of the new instruction //Bit pattern of the new instruction
key = M"0000011----------000-----0110011", key = M"0000011----------000-----0110011",
@ -797,15 +797,15 @@ This plugin implements the register file.
| Parameters | type | description | | Parameters | type | description |
| ------ | ----------- | ------ | | ------ | ----------- | ------ |
| regFileReadyKind | RegFileReadKind | Can bet set to ASYNC or SYNC. Specifies the kind of memory read used to implement the register file. ASYNC means zero cycle latency memory read, while SYNC means one cycle latency memory read which can be mapped into standard FPGA memory blocks | | regFileReadyKind | RegFileReadKind | Can be set to ASYNC or SYNC. Specifies the kind of memory read used to implement the register file. ASYNC means zero cycle latency memory read, while SYNC means one cycle latency memory read which can be mapped into standard FPGA memory blocks |
| zeroBoot | Boolean | Load all registers with zeroes at the beginning of simulations to keep everything deterministic in logs/traces| | zeroBoot | Boolean | Load all registers with zeroes at the beginning of the simulation to keep everything deterministic in logs/traces|
This register file use a `don't care` read-during-write policy, so the bypassing/hazard plugin should take care of this. This register file use a `don't care` read-during-write policy, so the bypassing/hazard plugin should take care of this.
#### HazardSimplePlugin #### HazardSimplePlugin
This plugin checks the pipeline instruction dependencies and, if necessary or possible, will stop the instruction in the decoding stage or bypass the instruction results This plugin checks the pipeline instruction dependencies and, if necessary or possible, will stop the instruction in the decoding stage or bypass the instruction results
from the later stages to the decode stage. from the later stages of the decode stage.
Since the register file is implemented with a `don't care` read-during-write policy, this plugin also manages these kind of hazards. Since the register file is implemented with a `don't care` read-during-write policy, this plugin also manages these kind of hazards.
@ -829,7 +829,7 @@ Except for SRC1/SRC2, this plugin does everything at the begining of Execute sta
#### IntAluPlugin #### IntAluPlugin
This plugin implements all ADD/SUB/SLT/SLTU/XOR/OR/AND/LUI/AUIPC instructions in the execute stage by using the SrcPlugin outputs. It is a realy simple plugin. This plugin implements all ADD/SUB/SLT/SLTU/XOR/OR/AND/LUI/AUIPC instructions in the execute stage by using the SrcPlugin outputs. It is a really simple plugin.
The result is injected into the pipeline directly at the end of the execute stage. The result is injected into the pipeline directly at the end of the execute stage.
@ -841,7 +841,7 @@ The result is injected into the pipeline directly at the end of the execute stag
#### FullBarrelShifterPlugin #### FullBarrelShifterPlugin
Implements SLL/SRL/SRA instructions by using a full barrel shifter, so it execute all shifts in a single cycle. Implements SLL/SRL/SRA instructions by using a full barrel shifter, so it executes all shifts in a single cycle.
| Parameters | type | description | | Parameters | type | description |
| ------ | ----------- | ------ | | ------ | ----------- | ------ |
@ -849,7 +849,7 @@ Implements SLL/SRL/SRA instructions by using a full barrel shifter, so it execut
#### BranchPlugin #### BranchPlugin
This plugin implement all branch/jump instructions (JAL/JALR/BEQ/BNE/BLT/BGE/BLTU/BGEU) with primitives used by the cpu frontend plugins to implement branch prediction. The prediction implementation is set in the frontend plugins (IBusX) This plugin implements all branch/jump instructions (JAL/JALR/BEQ/BNE/BLT/BGE/BLTU/BGEU) with primitives used by the cpu frontend plugins to implement branch prediction. The prediction implementation is set in the frontend plugins (IBusX).
| Parameters | type | description | | Parameters | type | description |
| ------ | ----------- | ------ | | ------ | ----------- | ------ |
@ -869,13 +869,13 @@ otherwise the standard penalty is applied.
##### Prediction DYNAMIC ##### Prediction DYNAMIC
Same as the STATIC prediction, except that to do the prediction, it use a direct mapped 2 bit history cache (BHT) which remembers if the branch is more likely to be taken or not. Same as the STATIC prediction, except that to do the prediction, it uses a direct mapped 2 bit history cache (BHT) which remembers if the branch is more likely to be taken or not.
##### Prediction DYNAMIC_TARGET ##### Prediction DYNAMIC_TARGET
This predictor uses a direct mapped branch target buffer (BTB) in the Fetch stage which store the PC of the instruction, the target PC of the instruction and a 2 bit history to remember This predictor uses a direct mapped branch target buffer (BTB) in the Fetch stage which stores the PC of the instruction, the target PC of the instruction and a 2 bit history to remember
if the branch is more likely to be taken or not. This is the most efficient branch predictor actualy implemented on VexRiscv as when the branch prediction is right, it produce no branch penalty. if the branch is more likely to be taken or not. This is actually the most efficient branch predictor implemented on VexRiscv, because when the branch prediction is right, it produces no branch penalty.
The down side is that this predictor has a long combinatorial path coming from the prediction cache read port to the programm counter by passing through the jump interface. The downside is that this predictor has a long combinatorial path coming from the prediction cache read port to the programm counter, passing through the jump interface.
#### DBusSimplePlugin #### DBusSimplePlugin
@ -919,7 +919,7 @@ case class DBusSimpleBus() extends Bundle with IMasterSlave{
} }
``` ```
Note that bridges are available to convert this interface into AXI4 and Avalon Note that there are bridges available that can convert this interface into AXI4 and Avalon.
There is at least one cycle latency between a cmd and the corresponding rsp. The rsp.ready flag should be false after a read cmd until the rsp is present. There is at least one cycle latency between a cmd and the corresponding rsp. The rsp.ready flag should be false after a read cmd until the rsp is present.
@ -937,7 +937,7 @@ The processing is fully pipelined between the Execute/Memory/Writeback stage. Th
Implements the division/modulo instruction from the RISC-V M extension. It is done in a simple iterative way which always takes 34 cycles. The result is inserted into the Implements the division/modulo instruction from the RISC-V M extension. It is done in a simple iterative way which always takes 34 cycles. The result is inserted into the
Memory stage. Memory stage.
This plugin is now based on the MulDivIterativePlugin one. This plugin is now based on MulDivIterativePlugin.
#### MulDivIterativePlugin #### MulDivIterativePlugin
@ -955,14 +955,14 @@ This plugin is able to unroll the iterative calculation process to reduce the nu
The number of cycles used to execute a multiplication is '32/mulUnrollFactor' The number of cycles used to execute a multiplication is '32/mulUnrollFactor'
The number of cycles used to execute a division is '32/divUnrollFactor + 1' The number of cycles used to execute a division is '32/divUnrollFactor + 1'
Both mul/div are processed into the memory stage (late result). Both mul/div are processed in the memory stage (late result).
#### CsrPlugin #### CsrPlugin
Implements most of the Machine mode and a few of the User mode registers as specified in the RISC-V priviledged spec. Implements most of the Machine mode and a few of the User mode registers, as specified in the RISC-V priviledged spec.
The access mode of most of the CSR is parameterizable (NONE/READ_ONLY/WRITE_ONLY/READ_WRITE) to reduce the area usage of unneeded features. The access mode of most of the CSR is parameterizable to reduce the area usage of unneeded features.
(CsrAccess can be NONE/READ_ONLY/WRITE_ONLY/READ_WRITE) (CsrAccess can be `NONE/READ_ONLY/WRITE_ONLY/READ_WRITE`)
| Parameters | type | description | | Parameters | type | description |
| ------ | ----------- | ------ | | ------ | ----------- | ------ |
@ -992,11 +992,11 @@ stage before jumping to mtvec.
#### StaticMemoryTranslatorPlugin #### StaticMemoryTranslatorPlugin
Static memory translator plugin which allows one to specify which range of the memory addresses is IO mapped and shouldn't be cached. Static memory translator plugin which allows to specify which range of the memory addresses is I/O mapped and shouldn't be cached.
#### MmuPlugin #### MmuPlugin
Hardware refilled MMU implementation. Allows others plugins such as DBusCachedPlugin/IBusCachedPlugin to instanciate memory address translation ports. Each port has a small dedicated Hardware refilled MMU implementation. Allows other plugins such as DBusCachedPlugin/IBusCachedPlugin to instanciate memory address translation ports. Each port has a small dedicated
fully associative TLB cache which is refilled automaticaly via a dbus access sharing. fully associative TLB cache which is refilled automaticaly via a dbus access sharing.
#### DebugPlugin #### DebugPlugin
@ -1010,7 +1010,7 @@ The JTAG interface is provided by another bridge, which makes it possible to eff
The internals of the debug plugin are done in a manner which reduces the area usage and the FMax impact of this plugin. The internals of the debug plugin are done in a manner which reduces the area usage and the FMax impact of this plugin.
Here is the simple bus to access it, the rsp come one cycle after the request : Here is the simple bus to access it, the rsp comes one cycle after the request:
```scala ```scala
case class DebugExtensionCmd() extends Bundle{ case class DebugExtensionCmd() extends Bundle{
@ -1056,12 +1056,10 @@ Write Address 0x04 ->
bits (31 downto 0) : Instruction that should be pushed into the CPU pipeline for debug purposes bits (31 downto 0) : Instruction that should be pushed into the CPU pipeline for debug purposes
``` ```
The OpenOCD port is there : The OpenOCD port is here: <https://github.com/SpinalHDL/openocd_riscv>
https://github.com/SpinalHDL/openocd_riscv
#### YamlPlugin #### YamlPlugin
This plugin offers a service to others plugins to generate a usefull Yaml file about the CPU configuration. It contains, for instance, the sequence of instruction required This plugin offers a service to other plugins to generate a useful Yaml file describing the CPU configuration. It contains, for instance, the sequence of instructions required
to flush the data cache (information used by openocd). to flush the data cache (information used by openocd).