Merge branch 'master' into dev

# Conflicts:
#	README.md
#	build.sbt
This commit is contained in:
Dolu1990 2020-03-09 13:41:23 +01:00
commit ab2f4cd2b7
3 changed files with 92 additions and 94 deletions

180
README.md
View file

@ -7,7 +7,7 @@
- [CPU generation](#cpu-generation)
- [Regression tests](#regression-tests)
- [Interactive debug of the simulated CPU via GDB OpenOCD and Verilator](#interactive-debug-of-the-simulated-cpu-via-gdb-openocd-and-verilator)
- [Using Eclipse to run the software and debug it](#using-Eclipse-to-run-the-software-and-debug-it)
- [Using Eclipse to run and debug the software](#using-Eclipse-to-run-and-debug-the-software)
* [By using gnu-mcu-eclipse](#by-using-gnu-mcu-eclipse)
* [By using Zylin plugin (old)](#by-using-zylin-plugin-old)
- [Briey SoC](#briey-soc)
@ -30,14 +30,14 @@ This repository hosts a RISC-V implementation written in SpinalHDL. Here are som
- RV32I[M][C][A] instruction set (Atomic only inside a single core)
- Pipelined from 2 to 5+ stages ([Fetch*X], Decode, Execute, [Memory], [WriteBack])
- 1.44 DMIPS/Mhz --no-inline when nearly all features are enabled (1.57 DMIPS/Mhz when the divider lookup table is enabled)
- Optimized for FPGA, do not use any vendor specific IP block / primitive
- Optimized for FPGA, does not use any vendor specific IP block / primitive
- AXI4, Avalon, wishbone ready
- Optional MUL/DIV extensions
- Optional instruction and data caches
- Optional hardware refilled MMU
- Optional debug extension allowing Eclipse debugging via a GDB >> openOCD >> JTAG connection
- Optional interrupts and exception handling with Machine, [Supervisor] and [User] modes as defined in the [RISC-V Privileged ISA Specification v1.10](https://riscv.org/specifications/privileged-isa/).
- Two implementations of shift instructions: Single cycle and shiftNumber cycles
- Two implementations of shift instructions: single cycle (full barrel shifter) and shiftNumber cycles
- Each stage can have optional bypass or interlock hazard logic
- Linux compatible (SoC : https://github.com/enjoy-digital/linux-on-litex-vexriscv)
- Zephyr compatible
@ -60,7 +60,7 @@ For commercial support, please contact spinalhdl@gmail.com.
The following numbers were obtained by synthesizing the CPU as toplevel without any specific synthesis options to save area or to get better maximal frequency (neutral).<br>
The clock constraint is set to an unattainable value, which tends to increase the design area.<br>
The dhrystone benchmark was compiled with the `-O3 -fno-inline` option.<br>
All the cached configurations have some cache trashing during the dhrystone benchmark except the `VexRiscv full max perf` one. This of course reduces the performance. It is possible to produce
All the cached configurations have some cache trashing during the dhrystone benchmark except the `VexRiscv full max perf` one. This, of course, reduces the performance. It is possible to produce
dhrystone binaries which fit inside a 4KB I$ and 4KB D$ (I already had this case once) but currently it isn't the case.<br>
The CPU configurations used below can be found in the `src/scala/vexriscv/demo` directory.
@ -117,7 +117,7 @@ VexRiscv linux balanced (RV32IMA, 1.21 DMIPS/Mhz 2.27 Coremark/Mhz, with cache t
The following configuration results in 1.44 DMIPS/MHz:
- 5 stage : F -> D -> E -> M -> WB
- 5 stage: F -> D -> E -> M -> WB
- single cycle ADD/SUB/Bitwise/Shift ALU
- branch/jump done in the E stage
- memory load values are bypassed in the WB stage (late result)
@ -125,11 +125,11 @@ The following configuration results in 1.44 DMIPS/MHz:
- single cycle multiplication with bypassing in the WB stage (late result)
- dynamic branch prediction done in the F stage with a direct mapped target buffer cache (no penalties on correct predictions)
Note that recently, the capability to remove the Fetch/Memory/WriteBack stage was added to reduce the area of the CPU, which end up with a smaller CPU and a better DMIPS/Mhz for the small configurations.
Note that, recently, the capability to remove the Fetch/Memory/WriteBack stage was added to reduce the area of the CPU, which ends up with a smaller CPU and a better DMIPS/Mhz for the small configurations.
## Dependencies
On Ubuntu 14 :
On Ubuntu 14:
```sh
# JAVA JDK 8
@ -145,14 +145,14 @@ sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 2EE0EA64E40A89
sudo apt-get update
sudo apt-get install sbt
# Verilator (for sim only, realy need 3.9+, in general apt-get will give you 3.8)
# Verilator (for sim only, really needs 3.9+, in general apt-get will give you 3.8)
sudo apt-get install git make autoconf g++ flex bison
git clone http://git.veripool.org/git/verilator # Only first time
unsetenv VERILATOR_ROOT # For csh; ignore error if on bash
unset VERILATOR_ROOT # For bash
cd verilator
git pull # Make sure we're up-to-date
git checkout verilator_3_918
git checkout v3.916
autoconf # Create ./configure script
./configure
make
@ -161,10 +161,10 @@ sudo make install
## CPU generation
You can find two example CPU instances in:
- src/main/scala/vexriscv/demo/GenFull.scala
- src/main/scala/vexriscv/demo/GenSmallest.scala
- `src/main/scala/vexriscv/demo/GenFull.scala`
- `src/main/scala/vexriscv/demo/GenSmallest.scala`
To generate the corresponding RTL as a VexRiscv.v file, run the following commands in the root directory of this repository:
To generate the corresponding RTL as a `VexRiscv.v` file, run the following commands in the root directory of this repository:
```sh
sbt "runMain vexriscv.demo.GenFull"
@ -228,13 +228,13 @@ Also there is a few environnement variable that you can use to modulate the rand
| VEXRISCV_REGRESSION_CONFIG_DEM_RATE | 0.0-1.0 | Chance to generate a config with memory stage |
## Interactive debug of the simulated CPU via GDB OpenOCD and Verilator
It's as described to run tests, but you just have to add `DEBUG_PLUGIN_EXTERNAL=yes` in the make arguments.
Work for the GenFull, but not for the GenSmallest as this configuration has no debug module.
To use this, you just need to use the same command as with running tests, but adding `DEBUG_PLUGIN_EXTERNAL=yes` in the make arguments.
This works for the `GenFull` configuration, but not for `GenSmallest`, as this configuration has no debug module.
Then you can use the https://github.com/SpinalHDL/openocd_riscv tool to create a GDB server connected to the target (the simulated CPU)
Then, you can use the [OpenOCD RISC-V](https://github.com/SpinalHDL/openocd_riscv) tool to create a GDB server connected to the target (the simulated CPU), as follows:
```sh
#in the VexRiscv repository, to run the simulation on which one OpenOCD can connect itself =>
#In the VexRiscv repository, to run the simulation on which one OpenOCD can connect itself =>
sbt "runMain vexriscv.demo.GenFull"
cd src/test/cpp/regression
make run DEBUG_PLUGIN_EXTERNAL=yes
@ -252,17 +252,17 @@ continue
# Now it should print messages in the Verilator simulation of the CPU
```
## Using Eclipse to run the software and debug it
## Using Eclipse to run and debug the software
### By using gnu-mcu-eclipse
You can download releases of the IDE here : https://github.com/gnu-mcu-eclipse/org.eclipse.epp.packages/releases
You can download releases of the IDE here: <https://github.com/gnu-mcu-eclipse/org.eclipse.epp.packages/releases>
In the IDE, you can import a makefile project by :
In the IDE, you can import a makefile project by:
- file -> import -> C/C++ -> existing Code as Makefile Project
- Select the folder which contain the makefile, select "Cross GCC" (not "RISC-V Cross GCC")
- Select the folder which contains the makefile, then select "Cross GCC" (not "RISC-V Cross GCC")
To create a new debug configuration :
To create a new debug configuration:
- run -> Debug Configurations -> GDB OpenOCD Debugging double click
- Look at https://drive.google.com/open?id=1c46tyEV0xLwOsk76b0y2qqs8CYy7Zq3f for a configuration example
@ -279,13 +279,13 @@ cd eclipse
See https://drive.google.com/drive/folders/1NseNHH05B6lmIXqQFVwK8xRjWE4ydeG-?usp=sharing to import a makefile project and create a debug configuration.
Note that sometime this Eclipse need to be restarted in order to be able to place new breakpoints.
Note that sometimes Eclipse needs to be restarted in order to be able to place new breakpoints.
## Briey SoC
As a demonstrator, a SoC named Briey is implemented in `src/main/scala/vexriscv/demo/Briey.scala`. This SoC is very similar to
the [Pinsec SOC](https://spinalhdl.github.io/SpinalDoc/spinal/lib/pinsec/hardware/):
As a demonstration, a SoC named Briey is implemented in `src/main/scala/vexriscv/demo/Briey.scala`. This SoC is very similar to
the [Pinsec SoC](https://spinalhdl.github.io/SpinalDoc/spinal/lib/pinsec/hardware/):
![Alt text](assets/brieySoc.png?raw=true "")
![Briey SoC](assets/brieySoc.png?raw=true "")
To generate the Briey SoC Hardware:
@ -293,7 +293,7 @@ To generate the Briey SoC Hardware:
sbt "runMain vexriscv.demo.Briey"
```
To run the verilator simulation of the Briey SoC which can then be connected to OpenOCD/GDB, first get those dependencies:
To run the verilator simulation of the Briey SoC, which can then be connected to OpenOCD/GDB, first get these dependencies:
```sh
sudo apt-get install build-essential xorg-dev libudev-dev libts-dev libgl1-mesa-dev libglu1-mesa-dev libasound2-dev libpulse-dev libopenal-dev libogg-dev libvorbis-dev libaudiofile-dev libpng12-dev libfreetype6-dev libusb-dev libdbus-1-dev zlib1g-dev libdirectfb-dev libsdl2-dev
@ -311,11 +311,11 @@ To connect OpenOCD (https://github.com/SpinalHDL/openocd_riscv) to the simulatio
src/openocd -f tcl/interface/jtag_tcp.cfg -c "set BRIEY_CPU0_YAML /home/spinalvm/Spinal/VexRiscv/cpu0.yaml" -f tcl/target/briey.cfg
```
You can find multiple software examples and demos here: https://github.com/SpinalHDL/VexRiscvSocSoftware/tree/master/projects/briey
You can find multiple software examples and demos here: <https://github.com/SpinalHDL/VexRiscvSocSoftware/tree/master/projects/briey>
You can find some FPGA projects which instantiate the Briey SoC here (DE1-SoC, DE0-Nano): https://drive.google.com/drive/folders/0B-CqLXDTaMbKZGdJZlZ5THAxRTQ?usp=sharing
Here are some measurements of Briey SoC timings and area :
Here are some measurements of Briey SoC timings and area:
```
Artix 7 -> 181 Mhz 3220 LUT 3181 FF
@ -335,13 +335,13 @@ Murax is a very light SoC (it fits in an ICE40 FPGA) which can work without any
- one 16 bits prescaler, two 16 bits timers
- one UART with tx/rx fifo
Depending the CPU configuration, on the ICE40-hx8k FPGA with icestorm for synthesis, the full SoC has the following area/performance :
Depending on the CPU configuration, on the ICE40-hx8k FPGA with icestorm for synthesis, the full SoC has the following area/performance:
- RV32I interlocked stages => 51 Mhz, 2387 LC 0.45 DMIPS/Mhz
- RV32I bypassed stages => 45 Mhz, 2718 LC 0.65 DMIPS/Mhz
Its implementation can be found here: `src/main/scala/vexriscv/demo/Murax.scala`.
To generate the Murax SoC Hardware :
To generate the Murax SoC Hardware:
```sh
# To generate the SoC without any content in the ram
@ -355,13 +355,13 @@ The demo program included by default with `MuraxWithRamInit` will blink the
LEDs and echo characters received on the UART back to the user. To see this
when running the Verilator sim, type some text and press enter.
Then go in src/test/cpp/murax and run the simulation with :
Then go in `src/test/cpp/murax` and run the simulation with:
```sh
make clean run
```
To connect OpenOCD (https://github.com/SpinalHDL/openocd_riscv) to the simulation :
To connect OpenOCD (https://github.com/SpinalHDL/openocd_riscv) to the simulation:
```sh
src/openocd -f tcl/interface/jtag_tcp.cfg -c "set MURAX_CPU0_YAML /home/spinalvm/Spinal/VexRiscv/cpu0.yaml" -f tcl/target/murax.cfg
@ -387,7 +387,7 @@ MuraxFast bypassed stages (0.65 DMIPS/Mhz, 8 bits GPIO) ->
Some scripts to generate the SoC and call the icestorm toolchain can be found here: `scripts/Murax/`
A toplevel simulation testbench with the same features + a GUI are implemented with SpinalSim. You can find it in `src/test/scala/vexriscv/MuraxSim.scala`.
A top level simulation testbench with the same features + a GUI are implemented with SpinalSim. You can find it in `src/test/scala/vexriscv/MuraxSim.scala`.
To run it :
@ -398,13 +398,13 @@ sbt "test:runMain vexriscv.MuraxSim"
## Running Linux
A default configuration is located in src/main/scala/vexriscv/demo/Linux.scala
A default configuration is located in `src/main/scala/vexriscv/demo/Linux.scala`.
This file also contains
- The commands to compile the buildroot image
- How to run the Verilator simulation in interative mode
There is currently no SoC to run it on hardware, it is WIP. But the CPU simulation can already boot linux and run user space application (even python).
There is currently no SoC to run it on hardware, it is WIP. But the CPU simulation can already boot linux and run user space applications (even python).
Note that VexRiscv can run Linux on both cache full and cache less design.
@ -516,7 +516,7 @@ import spinal.core._
import vexriscv.plugin.Plugin
import vexriscv.{Stageable, DecoderService, VexRiscv}
//This plugin example will add a new instruction named SIMD_ADD which do the following :
//This plugin example will add a new instruction named SIMD_ADD which does the following:
//
//RD : Regfile Destination, RS : Regfile Source
//RD( 7 downto 0) = RS1( 7 downto 0) + RS2( 7 downto 0)
@ -531,7 +531,7 @@ import vexriscv.{Stageable, DecoderService, VexRiscv}
//Note : RS1, RS2, RD positions follow the RISC-V spec and are common for all instruction of the ISA
class SimdAddPlugin extends Plugin[VexRiscv]{
//Define the concept of IS_SIMD_ADD signals, which specify if the current instruction is destined for ths plugin
//Define the concept of IS_SIMD_ADD signals, which specify if the current instruction is destined for this plugin
object IS_SIMD_ADD extends Stageable(Bool)
//Callback to setup the plugin and ask for different services
@ -541,10 +541,10 @@ class SimdAddPlugin extends Plugin[VexRiscv]{
//Retrieve the DecoderService instance
val decoderService = pipeline.service(classOf[DecoderService])
//Specify the IS_SIMD_ADD default value when instruction are decoded
//Specify the IS_SIMD_ADD default value when instructions are decoded
decoderService.addDefault(IS_SIMD_ADD, False)
//Specify the instruction decoding which should be applied when the instruction match the 'key' parttern
//Specify the instruction decoding which should be applied when the instruction matches the 'key' parttern
decoderService.add(
//Bit pattern of the new SIMD_ADD instruction
key = M"0000011----------000-----0110011",
@ -555,8 +555,8 @@ class SimdAddPlugin extends Plugin[VexRiscv]{
REGFILE_WRITE_VALID -> True, //Enable the register file write
BYPASSABLE_EXECUTE_STAGE -> True, //Notify the hazard management unit that the instruction result is already accessible in the EXECUTE stage (Bypass ready)
BYPASSABLE_MEMORY_STAGE -> True, //Same as above but for the memory stage
RS1_USE -> True, //Notify the hazard management unit that this instruction use the RS1 value
RS2_USE -> True //Same than above but for RS2.
RS1_USE -> True, //Notify the hazard management unit that this instruction uses the RS1 value
RS2_USE -> True //Same as above but for RS2.
)
)
}
@ -567,19 +567,19 @@ class SimdAddPlugin extends Plugin[VexRiscv]{
//Add a new scope on the execute stage (used to give a name to signals)
execute plug new Area {
//Define some signals used internally to the plugin
//Define some signals used internally by the plugin
val rs1 = execute.input(RS1).asUInt
//32 bits UInt value of the regfile[RS1]
val rs2 = execute.input(RS2).asUInt
val rd = UInt(32 bits)
//Do some computation
//Do some computations
rd(7 downto 0) := rs1(7 downto 0) + rs2(7 downto 0)
rd(16 downto 8) := rs1(16 downto 8) + rs2(16 downto 8)
rd(23 downto 16) := rs1(23 downto 16) + rs2(23 downto 16)
rd(31 downto 24) := rs1(31 downto 24) + rs2(31 downto 24)
//When the instruction is a SIMD_ADD one, then write the result into the register file data path.
//When the instruction is a SIMD_ADD, write the result into the register file data path.
when(execute.input(IS_SIMD_ADD)) {
execute.output(REGFILE_WRITE_DATA) := rd.asBits
}
@ -594,15 +594,15 @@ This example is a very simple one, but each plugin can really have access to the
- Halt a given stage of the CPU
- Unschedule instructions
- Emit an exception
- Introduce new instruction decoding specification
- Introduce a new instruction decoding specification
- Ask to jump the PC somewhere
- Read signals published by other plugins
- override published signals values
- Override published signals values
- Provide an alternative implementation
- ...
As a demonstrator, this SimdAddPlugin was integrated in the `src/main/scala/vexriscv/demo/GenCustomSimdAdd.scala` CPU configuration
and is self-tested by the `src/test/cpp/custom/simd_add` application by running the following commands :
As a demonstration, this SimdAddPlugin was integrated in the `src/main/scala/vexriscv/demo/GenCustomSimdAdd.scala` CPU configuration
and is self-tested by the `src/test/cpp/custom/simd_add` application by running the following commands:
```sh
# Generate the CPU
@ -611,7 +611,7 @@ sbt "runMain vexriscv.demo.GenCustomSimdAdd"
cd src/test/cpp/regression/
# Optionally add TRACE=yes if you want to get the VCD waveform from the simulation.
# Also you have to know that by default, the testbench introduce instruction/data bus stall.
# Also you have to know that, by default, the testbench introduce instruction/data bus stall.
# Note the CUSTOM_SIMD_ADD flag is set to yes.
make clean run IBUS=SIMPLE DBUS=SIMPLE CSR=no MMU=no DEBUG_PLUGIN=no MUL=no DIV=no DHRYSTONE=no REDO=2 CUSTOM_SIMD_ADD=yes
```
@ -629,14 +629,14 @@ The second one (`CustomCsrDemoGpioPlugin`) creates a GPIO peripheral directly ma
## CPU clock and resets
Without the debug plugin, the CPU will have a standard `clk` input and a `reset` input. But with the debug plugin the situation is the following :
Without the debug plugin, the CPU will have a standard `clk` input and a `reset` input. But with the debug plugin the situation is the following:
- clk : As before, the clock which drive the whole CPU design, including the debug logic
- reset : Reset all the CPU states excepted the debug logics
- debugReset : Reset the debug logic of the CPU
- debug_resetOut : a CPU output signal which allows the JTAG to reset the CPU + the memory interconnect + the peripherals
- `clk`: as before, the clock which drives the whole CPU design, including the debug logic
- `reset`: reset all the CPU states except the debug logic
- `debugReset`: reset the debug logic of the CPU
- `debug_resetOut`: a CPU output signal which allows the JTAG to reset the CPU + the memory interconnect + the peripherals
So here is the reset interconnect in case you use the debug plugin :
So here is the reset interconnect, in case you use the debug plugin:
```
VexRiscv
@ -657,20 +657,20 @@ toplevelReset >----+--------> debugReset |
## VexRiscv Architecture
VexRiscv is implemented via a 5 stage in-order pipeline on which many optional and complementary plugins add functionalities to provide a functional RISC-V CPU.
This approach is completely unconventional and only possible through meta hardware description languages (SpinalHDL in the current case) but has proven its advantages
This approach is completely unconventional and only possible through meta hardware description languages (SpinalHDL, in the current case) but has proven its advantages
via the VexRiscv implementation:
- You can swap/turn on/turn off parts of the CPU directly via the plugin system
- You can add new functionalities/instruction without having to modify any sources code of the CPU
- It allows the CPU configuration to cover a very large spectrum of implementation without cooking spaghetti code
- It allows your code base to truly produce a parametrized CPU design
- You can add new functionalities/instructions without having to modify any of the sources of the CPU
- It allows the CPU configuration to cover a very large spectrum of implementations without cooking spaghetti code
- It allows your codebase to truly produce a parametrized CPU design
If you generate the CPU without any plugin, it will only contain the definition of the 5 pipeline stages and their basic arbitration, but nothing else,
as everything else, including the program counter is added into the CPU via plugins.
and everything else, including the program counter is added into the CPU via plugins.
### Plugins
This chapter describes plugins currently implemented.
This chapter describes the currently implemented plugins.
- [IBusSimplePlugin](#ibussimpleplugin)
- [IBusCachedPlugin](#ibuscachedplugin)
@ -696,7 +696,7 @@ This chapter describes plugins currently implemented.
#### IBusSimplePlugin
This plugin implement the CPU frontend (instruction fetch) via a very simple and neutral memory interface going outside the CPU.
This plugin implements the CPU frontend (instruction fetch) via a very simple and neutral memory interface going outside the CPU.
| Parameters | type | description |
| ------ | ----------- | ------ |
@ -737,7 +737,7 @@ case class IBusSimpleBus(interfaceKeepData : Boolean) extends Bundle with IMaste
}
```
**Important** : Checkout the cmdForkPersistence parameter, because if it's not set, it can break the iBus compatibility with your memory system (unless you externaly add some buffers)
**Important** : check out the cmdForkPersistence parameter, because if it is not set, it can break the iBus compatibility with your memory system (unless you externaly add some buffers).
Setting cmdForkPersistence and cmdForkOnSecondStage improves iBus cmd timings.
@ -781,19 +781,19 @@ Note: If you enable the twoCycleRam option and if wayCount is bigger than one, t
#### DecoderSimplePlugin
This plugin provides instruction decoding capabilities to others plugins.
This plugin provides instruction decoding capabilities to other plugins.
For instance, for a given instruction, the pipeline hazard plugin needs to know if it uses the register file source 1/2 in order to stall the pipeline until the hazard is gone.
To provide this kind of information, each plugin which implements an instruction documents this kind of information to the DecoderSimplePlugin plugin.
Each plugin that implements an instruction provides this kind of information to the DecoderSimplePlugin plugin.
| Parameters | type | description |
| ------ | ----------- | ------ |
| catchIllegalInstruction | Boolean | When true, instructions that don't match a decoding specification will generate a trap exception |
Here is a usage example :
Here is a usage example:
```scala
//Specify the instruction decoding which should be applied when the instruction match the 'key' pattern
//Specify the instruction decoding which should be applied when the instruction matches the 'key' pattern
decoderService.add(
//Bit pattern of the new instruction
key = M"0000011----------000-----0110011",
@ -819,15 +819,15 @@ This plugin implements the register file.
| Parameters | type | description |
| ------ | ----------- | ------ |
| regFileReadyKind | RegFileReadKind | Can bet set to ASYNC or SYNC. Specifies the kind of memory read used to implement the register file. ASYNC means zero cycle latency memory read, while SYNC means one cycle latency memory read which can be mapped into standard FPGA memory blocks |
| zeroBoot | Boolean | Load all registers with zeroes at the beginning of simulations to keep everything deterministic in logs/traces|
| regFileReadyKind | RegFileReadKind | Can be set to ASYNC or SYNC. Specifies the kind of memory read used to implement the register file. ASYNC means zero cycle latency memory read, while SYNC means one cycle latency memory read which can be mapped into standard FPGA memory blocks |
| zeroBoot | Boolean | Load all registers with zeroes at the beginning of the simulation to keep everything deterministic in logs/traces|
This register file use a `don't care` read-during-write policy, so the bypassing/hazard plugin should take care of this.
#### HazardSimplePlugin
This plugin checks the pipeline instruction dependencies and, if necessary or possible, will stop the instruction in the decoding stage or bypass the instruction results
from the later stages to the decode stage.
from the later stages of the decode stage.
Since the register file is implemented with a `don't care` read-during-write policy, this plugin also manages these kind of hazards.
@ -851,7 +851,7 @@ Except for SRC1/SRC2, this plugin does everything at the begining of Execute sta
#### IntAluPlugin
This plugin implements all ADD/SUB/SLT/SLTU/XOR/OR/AND/LUI/AUIPC instructions in the execute stage by using the SrcPlugin outputs. It is a realy simple plugin.
This plugin implements all ADD/SUB/SLT/SLTU/XOR/OR/AND/LUI/AUIPC instructions in the execute stage by using the SrcPlugin outputs. It is a really simple plugin.
The result is injected into the pipeline directly at the end of the execute stage.
@ -863,7 +863,7 @@ The result is injected into the pipeline directly at the end of the execute stag
#### FullBarrelShifterPlugin
Implements SLL/SRL/SRA instructions by using a full barrel shifter, so it execute all shifts in a single cycle.
Implements SLL/SRL/SRA instructions by using a full barrel shifter, so it executes all shifts in a single cycle.
| Parameters | type | description |
| ------ | ----------- | ------ |
@ -871,7 +871,7 @@ Implements SLL/SRL/SRA instructions by using a full barrel shifter, so it execut
#### BranchPlugin
This plugin implement all branch/jump instructions (JAL/JALR/BEQ/BNE/BLT/BGE/BLTU/BGEU) with primitives used by the cpu frontend plugins to implement branch prediction. The prediction implementation is set in the frontend plugins (IBusX)
This plugin implements all branch/jump instructions (JAL/JALR/BEQ/BNE/BLT/BGE/BLTU/BGEU) with primitives used by the cpu frontend plugins to implement branch prediction. The prediction implementation is set in the frontend plugins (IBusX).
| Parameters | type | description |
| ------ | ----------- | ------ |
@ -891,13 +891,13 @@ otherwise the standard penalty is applied.
##### Prediction DYNAMIC
Same as the STATIC prediction, except that to do the prediction, it use a direct mapped 2 bit history cache (BHT) which remembers if the branch is more likely to be taken or not.
Same as the STATIC prediction, except that to do the prediction, it uses a direct mapped 2 bit history cache (BHT) which remembers if the branch is more likely to be taken or not.
##### Prediction DYNAMIC_TARGET
This predictor uses a direct mapped branch target buffer (BTB) in the Fetch stage which store the PC of the instruction, the target PC of the instruction and a 2 bit history to remember
if the branch is more likely to be taken or not. This is the most efficient branch predictor actualy implemented on VexRiscv as when the branch prediction is right, it produce no branch penalty.
The down side is that this predictor has a long combinatorial path coming from the prediction cache read port to the programm counter by passing through the jump interface.
This predictor uses a direct mapped branch target buffer (BTB) in the Fetch stage which stores the PC of the instruction, the target PC of the instruction and a 2 bit history to remember
if the branch is more likely to be taken or not. This is actually the most efficient branch predictor implemented on VexRiscv, because when the branch prediction is right, it produces no branch penalty.
The downside is that this predictor has a long combinatorial path coming from the prediction cache read port to the programm counter, passing through the jump interface.
#### DBusSimplePlugin
@ -941,7 +941,7 @@ case class DBusSimpleBus() extends Bundle with IMasterSlave{
}
```
Note that bridges are available to convert this interface into AXI4 and Avalon
Note that there are bridges available that can convert this interface into AXI4 and Avalon.
There is at least one cycle latency between a cmd and the corresponding rsp. The rsp.ready flag should be false after a read cmd until the rsp is present.
@ -959,7 +959,7 @@ The processing is fully pipelined between the Execute/Memory/Writeback stage. Th
Implements the division/modulo instruction from the RISC-V M extension. It is done in a simple iterative way which always takes 34 cycles. The result is inserted into the
Memory stage.
This plugin is now based on the MulDivIterativePlugin one.
This plugin is now based on MulDivIterativePlugin.
#### MulDivIterativePlugin
@ -977,14 +977,14 @@ This plugin is able to unroll the iterative calculation process to reduce the nu
The number of cycles used to execute a multiplication is '32/mulUnrollFactor'
The number of cycles used to execute a division is '32/divUnrollFactor + 1'
Both mul/div are processed into the memory stage (late result).
Both mul/div are processed in the memory stage (late result).
#### CsrPlugin
Implements most of the Machine mode and a few of the User mode registers as specified in the RISC-V priviledged spec.
The access mode of most of the CSR is parameterizable (NONE/READ_ONLY/WRITE_ONLY/READ_WRITE) to reduce the area usage of unneeded features.
Implements most of the Machine mode and a few of the User mode registers, as specified in the RISC-V priviledged spec.
The access mode of most of the CSR is parameterizable to reduce the area usage of unneeded features.
(CsrAccess can be NONE/READ_ONLY/WRITE_ONLY/READ_WRITE)
(CsrAccess can be `NONE/READ_ONLY/WRITE_ONLY/READ_WRITE`)
| Parameters | type | description |
| ------ | ----------- | ------ |
@ -1014,11 +1014,11 @@ stage before jumping to mtvec.
#### StaticMemoryTranslatorPlugin
Static memory translator plugin which allows one to specify which range of the memory addresses is IO mapped and shouldn't be cached.
Static memory translator plugin which allows to specify which range of the memory addresses is I/O mapped and shouldn't be cached.
#### MmuPlugin
Hardware refilled MMU implementation. Allows others plugins such as DBusCachedPlugin/IBusCachedPlugin to instanciate memory address translation ports. Each port has a small dedicated
Hardware refilled MMU implementation. Allows other plugins such as DBusCachedPlugin/IBusCachedPlugin to instanciate memory address translation ports. Each port has a small dedicated
fully associative TLB cache which is refilled automaticaly via a dbus access sharing.
#### DebugPlugin
@ -1032,7 +1032,7 @@ The JTAG interface is provided by another bridge, which makes it possible to eff
The internals of the debug plugin are done in a manner which reduces the area usage and the FMax impact of this plugin.
Here is the simple bus to access it, the rsp come one cycle after the request :
Here is the simple bus to access it, the rsp comes one cycle after the request:
```scala
case class DebugExtensionCmd() extends Bundle{
@ -1056,7 +1056,7 @@ case class DebugExtensionBus() extends Bundle with IMasterSlave{
```
Here is the register mapping :
Here is the register mapping:
```
Read address 0x00 ->
@ -1078,12 +1078,10 @@ Write Address 0x04 ->
bits (31 downto 0) : Instruction that should be pushed into the CPU pipeline for debug purposes
```
The OpenOCD port is there :
https://github.com/SpinalHDL/openocd_riscv
The OpenOCD port is here: <https://github.com/SpinalHDL/openocd_riscv>
#### YamlPlugin
This plugin offers a service to others plugins to generate a usefull Yaml file about the CPU configuration. It contains, for instance, the sequence of instruction required
This plugin offers a service to other plugins to generate a useful Yaml file describing the CPU configuration. It contains, for instance, the sequence of instructions required
to flush the data cache (information used by openocd).

View file

@ -3,7 +3,7 @@
#cannot rm build because it erase software images that the make file copy there
#rm -rf ./build
mkdir ./build
mkdir -p ./build
cd ./build
vivado -mode batch -source ../make_vivado_project.tcl -notrace

View file

@ -21,7 +21,7 @@ proc swap_bits { bit } {
# open_run impl_1
# Find all the RAMs, place in a list
set rams [get_cells -hier -regexp {.*core/system_ram/.*} -filter {REF_NAME =~ RAMB36E1}]
set rams [get_cells -hier -regexp {.*core/system_ram/.*} -filter {REF_NAME == RAMB36E1 || REF_NAME == RAMB18E1}]
puts "[llength $rams] RAMs in total"
foreach m $rams {puts $m}
@ -89,7 +89,7 @@ set i 0
foreach ram $rams {
# Get the RAM location
set loc_val [get_property LOC [get_cells $ram]]
regexp -- {(RAMB36_)([0-9XY]+)} $loc_val full ram_name loc_xy
regexp {(RAMB.+_)([0-9XY]+)} $loc_val full ram_name loc_xy
set memi [dict create ram $ram loc $loc_xy]