add murax peripheral extension tutorial

This commit is contained in:
Sallar Ahmadi-Pour 2022-04-25 12:21:41 +02:00
parent 53d52692de
commit bd74833900
26 changed files with 2037 additions and 0 deletions

592
doc/gcdPeripheral/README.md Normal file
View File

@ -0,0 +1,592 @@
# Tutorial on Implementing a Peripheral for the VexRiscv Based Murax SoC
**By**
**Sallar Ahmadi-Pour - Researcher, University of Bremen, Group of Computer Architecture**
[http://www.informatik.uni-bremen.de/agra/projects/risc-v/](http://www.informatik.uni-bremen.de/agra/projects/risc-v/)
[http://www.informatik.uni-bremen.de/agra/](http://www.informatik.uni-bremen.de/agra/)
## 1. Introduction
Traditional hardware design often requires using languages like VHDL and Verilog and tooling that don't catch errors that can be caught with static analysis of the design. Additionally, information developers receive from the tools is scarce and often lead inexperienced developers on an odyssey. Currently emerging tools (Verilator, Yosys, etc.) for hardware design and languages for hardware description (SpinalHDL, Amaranth, etc.) tackle these and other existing issues.
Projects like SpinalHDL and the thereon based highly configurable VexRiscv processor experience a rise in popularity and usage amongst academic and commercial users. The increased popularity also requires an increase in educational resources. Due to the specific popularity in the academic environment it only seems natural that researchers document their approaches and insights (not only in peer reviewed publications in a journal). This will allow the next generation of hardware designers to extend and explore big projects like VexRiscv.
## 2. Our Goal for this Tutorial
Murax SoC is a VexRiscv configuration that is a very lightweight RISC-V platform.
It features a basic set of peripherals (UART, GPIO, Prescalers and Timers) around a pipelined memory bus and Apb3 peripheral bus.
The Murax SoC features enough to more than a toy system and being small and thus offering space for extension.
For the choice of possible algorithms, that we want to describe in hardware rather than software, the algorithm for calculating the Greatest Common Divisor (GCD) is a good example to start off. There are many digital design resources available on designing a GCD module.
We will add the hardware peripheral module to the Murax on the Apb3 bus with memory mapped registers to control the module and transfer the data around for the calculation.
In this way we transfer the resources the software to the hardware implementation.
The aspects we will shed some light upon will be
a) How do we implement an algorithm that we know from the software domain in a hardware implementation suited for FPGAs?
b) How do we prepare and integrate a new peripheral into the Murax domain and map its control and data ports via memory mapped registers?
c) How do we extend the software to use the peripheral easily in our baremetal code?
For a) we will start off the pseudocode of the GCD and work our way to a hardware implementation in SpinalHDL.
We will evaluate that design in a SpinalHDL testbench with Verilator as the simulation backend and drive the testbench with randomly generated values which we compare to a software implementation of the same algorithm.
For b) we will look into the features of SpinalHDL and the structure of the Murax SoC to get an idea where and how to integrate our peripheral.
Before adding the peripheral into the Murax we also need to decide on the details of memory mapping our control and data ports to memory mapped registers (i.e, addresses, write/read/clear modes, etc.).
At the end there is a small list of possible extensions from which anyone can continue with their own additions.
## 3. GCD HW Implementation
Let us start the HW implementation by looking at some kind of specification.
```c
// Pseudocode of the Euclids algorithm for calculating the GCD
inputs: [a, b]
outputs: [ready, a]
ready := False
while(!ready):
if(a > b):
a := a - b
else if(b > a):
b := b - a
else:
ready := True
```
The pseudocode shows the GCD algorithm we want to implement in hardware.
Implementing algorithms in hardware in the Register Transfer Level (RTL) style will require you to separate the control path (so if, else, while, for) and the data path (moving, calculating and comparing data).
Inevitably results from data and comparisons affect the control flow and the control flow affects the data flow.
Thus the two paths need to communicate the shared information.
But let us start at defining the interface of our module that will calculate the GCD.
![GCD Top Diagram](./img/murax-gcd-diagrams-gcd.png)
Our pseudocode already defines some in- and outputs that can aid us in defining the interface for our module.
At this point we don't want to think about which bus we connect our module to (APB, AXI, Wishbone, etc.).
We take care about that part later.
We simply know we have our input integers A and B, a signal to indicate the start of the calculation, the result and a signal indicating the completion of the calculation.
We choose 32 bit integers and use a valid-ready mechanism (we add a valid signal to kick of the calculation).
The interface features the values A, B and result as the data signals, valid and ready are control signals.
Signals for reset and clock are omitted for readability (unless explicitly used these are handled by SpinalHDL internally anyways).
From this top level perspective we can describe the behavior as follows: Once we apply a set of operands A and B and then apply the valid signal the module calculates the GCD for a variable amount of clock cycles.
We know the result is ready and can be read once the ready signal is asserted.
Inside the GCD module we will have two other modules: the data path GCDData and the control path GCDCtrl.
We notice again, the data signals (opA, opB and result) belong to our data path and the control signals (valid and ready) belong to our control path.
![GCD top level block diagram](./img/murax-gcd-diagrams-gcd-dp+cp.png)
The data path will consist of some basic RTL blocks like multiplexers, a subtraction, comparators and registers.
The elements are connected and arranged such that they represent the dataflow of the algorithm.
Parts of the data path are enabled by the control path.
The control path will be represented by a Finite State Machine (FSM), which orchestrates the data paths calculation of the result.
![GCD data path](./img/murax-gcd-diagrams-gcd-datapath.png)
The diagram of the data path shows the processing elements for our algorithm in hardware, with their control input and outputs respectively.
From this we can already see what the interface towards the control path looks like.
The control path needs to know the results of the comparisons.
Vice versa the data path gets controlled through selecting the subtract operands (or more precisely their order), the register enables and an initiation signal for a defined start state.
In the data path, the D-Flipflops (DFF) hold the values A and B that are used for the calculation and they change value throughout the computation.
A subtraction which is set up for a computation such that `r = x - y` with x being the "left" and y being the "right" operand.
The left and right operands are multiplexed from our control path inputs.
Two comparators compute the greater than (cmpAgtB) and less than (cmpAltB) operation.
The result, the GCD of A and B, will be available in the A register after the calculation is done.
Completion of the calculation is signaled by the control path.
![GCD control path](./img/murax-gcd-diagrams-gcd-controlpath.png)
In the diagram of the control path we see the same interface (with inverse directions — this information will be helpful later in SpinalHDL).
The interface of the control path are the top level valid signal, the ready signal indicating the finished computation, the results of the two comparisons `A > B` (*cmpAgtB*) and `B > A` (*cmpAltB*).
Initially the FSM is in an idle state, waiting for the valid signal to be asserted, on exit of this state, the init signal is set to 1 to clock in the values of A and B into their respective registers.
Similar to the pseudocode the FSM loops for the calculation and based on the comparators of the data path and orchestrates the data path to calculate either `a := a - b` or `b := b - a`.
If both if the comparators outputs are 0, the end of the calculation is reached.
Within the `calcDone` state the `ready` signal is set to 1.
With the entry of the `idle` state the module becomes ready to calculate another GCD.
The control path drives all the outputs based on the state in the state machine (Moore FSM).
The guards on the transitions show the condition with which the respective transition occurs.
These block diagrams, digital logic and the FSM can be quickly implemented in SpinalHDL, things like the `DataControlIF` that interconnect between the data path and control path can be quickly created and connected in SpinalHDL as well.
## 4. SpinalHDL implementation
First we can take a look at the interface between the data and control path.
```scala
// in GCDTop.scala
case class GCDDataControl() extends Bundle with IMasterSlave{
val cmpAgtB = Bool
val cmpAltB = Bool
val loadA = Bool
val loadB = Bool
val init = Bool
val selL = Bool
val selR = Bool
override def asMaster(): Unit = {
out(loadA, loadB, selL, selR, init)
in(cmpAgtB, cmpAltB)
}
}
```
We can define a Bundle that implements the `IMasterSlave` Interface (see the [Bundle documentation](https://spinalhdl.github.io/SpinalDoc-RTD/master/SpinalHDL/Data%20types/bundle.html?highlight=master%20slave#master-slave)), which allows us to use a operator (`<>`) to interconnect modules and their signals without explicitly describing each wire and connection (other than inside the Bundle from above).
In the Bundle we can define the signals with their types.
We override the `asMaster()` Method (line 10 to 13) from the `IMasterSlave` interface.
In the `asMaster()` Method we define the signal direction from the point of view of the control path.
Thus `cmpAgtB` and `cmpAltB` are inputs and `loadA`, `loadB`, `selL`, `selR`, `init` are outputs.
SpinalHDL will infer the directions for the data path side when we will use the `<>`-Operator.
With that our top level module will look very tidy:
```scala
// in GCDTop.scala
class GCDTop() extends Component {
val io = new Bundle {
val valid = in Bool()
val ready = out Bool()
val a = in(UInt(32 bits))
val b = in(UInt(32 bits))
val res = out(UInt(32 bits))
}
val gcdCtr = new GCDCtrl()
gcdCtr.io.valid := io.valid
io.ready := gcdCtr.io.ready
val gcdDat = new GCDData()
gcdDat.io.a := io.a
gcdDat.io.b := io.b
io.res := gcdDat.io.res
gcdCtr.io.dataCtrl <> gcdDat.io.dataCtrl
}
```
Lines 2 to 8 define the input/output Bundle inline, lines 9 and 12 instantiate the control and data path. All other lines are interconnecting the IO signals. Note in line 16 we interconnect the control and data path by using the `<>`-Operator as they use the shared interface description from earlier as a input (called `dataCtrl` in the design). We will see this in the respective modules input/output bundles.
Our data path in SpinalHDL looks like this:
```scala
// in GCDData.scala
class GCDData() extends Component {
val io = new Bundle {
val a = in(UInt(32 bits))
val b = in(UInt(32 bits))
val res = out(UInt(32 bits))
val dataCtrl = slave(GCDDataControl())
}
//registers
val regA = Reg(UInt(32 bits)) init(0)
val regB = Reg(UInt(32 bits)) init(0)
// compare
val xGTy = regA > regB
val xLTy = regA < regB
// mux
val chX = io.dataCtrl.selL ? regB | regA
val chY = io.dataCtrl.selR ? regB | regA
// subtract
val subXY = chX - chY
// load logic
when(io.dataCtrl.init){
regA := io.a
regB := io.b
}
when(io.dataCtrl.loadA){
regA := subXY
}
when(io.dataCtrl.loadB){
regB := subXY
}
io.dataCtrl.cmpAgtB := xGTy
io.dataCtrl.cmpAltB := xLTy
io.res := regA
}
```
Lines 2 to 7 show the Bundle for the IO signals. Note the signal in line 6 (`dataCtrl`), we use the defined Bundle from earlier and give it the direction `slave()` instead `in()` or `out()`.
This tells SpinalHDL to infer the directions of the Bundle signals according to the `asMaster()` method (in that case the inverse directions).
We will see this again in the control path.
The rest of the module (or components, thats how SpinalHDL modules are called) consists of defining signals, registers, and behavior.
Registers can be defined through a `Reg()` components that takes a type and optionally a reset value (via `init()`).
We can write to the register in our `when()` Blocks which could be interpreted as the enable signals for the registers.
(* Side note: technically we describe a multiplexing onto each register as we have multiple cases of enables and different data sources, but we can abstract from that in SpinalHDL a bit and keep it in the back of our minds*).
Now for the control path of our GCD module:
```scala
// in GCDCtrl.scala
class GCDCtrl() extends Component {
val io = new Bundle {
val valid = in Bool()
val ready = out Bool()
val dataCtrl = master(GCDDataControl())
}
val fsm = new StateMachine{
io.dataCtrl.loadA := False
io.dataCtrl.loadB := False
io.dataCtrl.init := False
io.dataCtrl.selL := False
io.dataCtrl.selR := False
io.ready := False
val idle : State = new State with EntryPoint{
whenIsActive{
when(io.valid){
io.dataCtrl.init := True
goto(calculate)
}
}
}
val calculate : State = new State{
whenIsActive{
when(io.dataCtrl.cmpAgtB){
goto(calcA)
}.elsewhen(io.dataCtrl.cmpAltB){
goto(calcB)
}.elsewhen(!io.dataCtrl.cmpAgtB & !io.dataCtrl.cmpAgtB){
goto(calcDone)
}
}
}
val calcA : State = new State{
whenIsActive{
io.dataCtrl.selR := True
io.dataCtrl.loadA := True
goto(calculate)
}
}
val calcB : State = new State{
whenIsActive{
io.dataCtrl.selL := True
io.dataCtrl.loadB := True
goto(calculate)
}
}
val calcDone : State = new State{
whenIsActive{
io.ready := True
goto(idle)
}
}
}
}
```
The lines 2 to 6 show the input/output signals again, and this time the `dataCtrl` signal, at line 5, shows the direction as `master()`.
This will apply the directions that we set in the first code snipped.
SpinalHDL offers a library to build FSMs and since this module is only that, our control path is descriptive.
We set default values for outputs (lines 8 to 13) and apply the according value in the respective state.
The API for FSMs in SpinalHDL offers much more than we use here.
In each state we can describe actions for `onEntry`, `onExit`, `whenIsNext` and for `whenIsActive` phases (see the [State Machine documentation](https://spinalhdl.github.io/SpinalDoc-RTD/master/SpinalHDL/Libraries/fsm.html)).
The `onEntry` phase refers to the cycle before entering the state, `onExit` will be executed if the next cycle will be in a different state, and `whenIsNext` will be executed if the state machine will be in that state in the next cycle.
That resembles the capabilities of FSM we have in UML/SysML or in StateCharts.
There is also the possibility to nest FSMs hierarchically or have delay states for a certain amount of cycles.
Describing these things in classic HDL is a lot of boilerplate that SpinalHDL can generate for us instead.
But with these modules we can already run some first simulations, testing our design for functionality.
And as traditional HDLs go we need a testbench for this.
This applies to SpinalHDL as well.
The default way for [simulation in SpinalHDL](https://spinalhdl.github.io/SpinalDoc-RTD/master/SpinalHDL/Simulation/index.html) is by writing a testbench with SpinalHDL and Scala and then getting it simulated through Verilator.
Verilator compiles our HDL (generated from SpinalHDL) to a C++ simulation model, our testbench interacts with that and thus we can have a fast simulation at hand.
Lets jump straight into the simulation testbench and see how SpinalHDL aids our work here:
```scala
// in GCDTopSim.scala
object GCDTopSim {
def main(args: Array[String]) {
SimConfig.doSim(new GCDTop()){dut =>
def gcd(a: Long,b: Long): Long = {
if(b==0) a else gcd(b, a%b)
}
def RndNextUInt32(): Long = {
ThreadLocalRandom.current().nextLong(Math.pow(2, 32).toLong - 1)
}
var a = 0L
var b = 0L
var model = 0L
dut.io.a #= 0
dut.io.b #= 0
dut.io.valid #= false
dut.clockDomain.forkStimulus(period = 10)
dut.clockDomain.waitRisingEdge()
for(idx <- 0 to 50000){
a = RndNextUInt32()
b = RndNextUInt32()
model = gcd(a,b)
dut.io.a #= a
dut.io.b #= b
dut.io.valid #= true
dut.clockDomain.waitRisingEdge()
dut.io.valid #= false
waitUntil(dut.io.ready.toBoolean)
assert(
assertion = (dut.io.res.toBigInt == model),
message = "test " + idx + " failed. Expected " + model + ", retrieved: " + dut.io.res.toBigInt
)
waitUntil(!dut.io.ready.toBoolean)
}
}
}
}
```
In line 3 we basically setup our Design Under Test (DUT), and we could setup some other simulations options like generating the VCD wavetrace.
We want to generate some arbitrary amount of testcases and compare the results against a (different) implementation of the GCD algorithm in software.
Doing this for enough random cases can give confidence in the design, tho it will not always cover edge cases and other aspects that are covered by a constrained random approach, white box testing or formal methods to verify our design.
Lines 4 to 6 are our (recursive) software implementation.
Lines 7 to 9 generate a random number in the range of a UInt32 — we have to do this by hand because of the nature of Java, Scala and SpinalHDL and how they interact with each other when it comes to numeric values and types.
Lines 10 to 15 setup our input for the DUT and in line 17 and 18 we set up the clock and trigger the first event for our signals to be applied to the inputs.
Lines 20 to 35 describe the application of 50k random integers to our design, and our software model, and then comparing them after we waited for the hardware cycles to pass.
We use the `assert` to output a message to the terminal in case a testcase doesn't match with the software model.
If we add `.withWave` to the line 3 we can obtain a wavetrace (tho its recommended not to run as many testcases, as the dump will be huge otherwise).
![GCD wave trace](./img/simulationWave.PNG)
## 5. GCD Murax Integration
Now that we have a standalone module that we want to integrate into the Murax SoC.
Since the Murax is using the APB bus for the peripherals, our module needs to map the IO signals into the memory mapped space of the APB bus.
```scala
// in Apb3GCDCtrl.scala
object Apb3GCDCtrl {
def getApb3Config = Apb3Config(
addressWidth = 5,
dataWidth = 32,
selWidth = 1,
useSlaveError = false
)
}
class Apb3GCDCtrl(apb3Config : Apb3Config) extends Component {
val io = new Bundle {
val apb = slave(Apb3(Apb3GCDCtrl.getApb3Config))
}
val gcdCtrl = new GCDTop()
val apbCtrl = Apb3SlaveFactory(io.apb)
apbCtrl.driveAndRead(gcdCtrl.io.a, address=0)
apbCtrl.driveAndRead(gcdCtrl.io.b, address=4)
val resSyncBuf = RegNextWhen(gcdCtrl.io.res, gcdCtrl.io.ready)
apbCtrl.read(resSyncBuf, address=8)
apbCtrl.onRead(8)(resSyncBuf := 0)
apbCtrl.onRead(8)(rdySyncBuf := False)
val rdySyncBuf = RegNextWhen(gcdCtrl.io.ready, gcdCtrl.io.ready)
apbCtrl.read(rdySyncBuf, address=12)
gcdCtrl.io.valid := apbCtrl.setOnSet(RegNext(False) init(False), address=16, 0)
}
```
Looking at the other peripherals in the Murax, we get an idea how to implement our own Apb3 Mapping (this is also part of the SpinalHDL Workshop).
The components uses the APB3 Bus as a slave peripheral.
In line 14 we create a instance of our GCD module, in line 15 we create a [APB3 Slave Factory](https://spinalhdl.github.io/SpinalDoc-RTD/master/SpinalHDL/Libraries/bus_slave_factory.html) (for our APB bus connection of the component).
This factory offers us to add memory mapped registers very easily that create all the logic needed to interconnect with our module properly.
A register which can be read and written to can be seen in line 16 and 17 (`driveAndRead()`).
We pass the signal we want to be buffered through that register and an address.
Our result is [buffered with a `RegNextWhen`](https://spinalhdl.github.io/SpinalDoc-RTD/master/SpinalHDL/Sequential%20logic/registers.html#instantiation) (which buffers the first argument `gcdCtrl.io.res` based on the enable signal that is the second argument `gcdCtrl.io.ready`).
We need this because our result is visible for the clock cycle that the ready signal is asserted true by the control path.
We do something similar with the ready signal, and keep it buffered for longer than just one clock cycle (since we don't know when the software will check these registers).
The result and ready registers will be read-only (`read()`) on their respective addresses.
If the result is read (even if ready was not checked) we will flush both registers as if we fetched the result and don't need it anymore.
The valid signal shouldn't be asserted longer than one clock cycle, this is achieved in line 24.
We use a register that sets itself to 0/false whenever its written to.
So if we write a 1/true into it, after one cycle its set to 0/false again.
| Address | Name | Description | Mode |
|---------|-------|----------------------------------------------|----------------------------------|
| 0 | a | Operand A of the GCD(a,b) | R/W |
| 4 | b | Operand B of the GCD(a,b) | R/W |
| 8 | res | Result of GCD(a,b) | RO, clears res and ready on read |
| 12 | ready | Ready, 1 if result available, 0 otherwise | RO |
| 16 | valid | Valid, write 1 to start calculating GCD(a,b) | WO, clear after write |
In this way we implemented this memory mapped register bank with various modes.
Now all thats left is to attach our module to the APB bus of the Murax SoC and write some bare metal firmware to access it.
We created our modules inside the VexRiscv structure as follows:
```
src/main/scala/
├── spinal
└── vexriscv
├── demo
├── ip
├── periph <--- we add this directory with subdir
│ └── gcd
│ ├── Apb3GCDCtrl.scala
│ ├── GCDCtrl.scala
│ ├── GCDData.scala
│ ├── GCDTop.scala
│ └── GCDTopSim.scala
├── plugin
└── test
```
To integrate our `Apb3GCDCtrl` peripheral into the Murax we need to modify the Murax SoC (`src/main/scala/vexriscv/demo/Murax.scala`) directly.
Deep in the source there will be a comment designating the start of the APB peripherals (`//******** APB peripherals *********`).
There we are going to add our peripheral and designate some memory mapped space to it.
This step is straightforward as we can add the peripheral similarly to the existing ones.
After the code for the timer `MuraxApb3Timer` module we add our GCD peripheral:
```scala
val gcd = new Apb3GCDCtrl(
apb3Config = Apb3Config(
addressWidth = 20,
dataWidth = 32
)
)
apbMapping += gcd.io.apb -> (0x30000, 1 kB)
```
And thats it!
The Murax SoC now supports our own GCD peripheral.
All thats left now is to use the peripheral in a piece of software.
## 6. Software Driver Integration
We start off the software part with the existing `hello_world` example and copy it into a new directory `gcd_world`.
Since we support a new peripheral in hardware we also need to support it from the software (its supported but we are making it more usable for the developer).
We add a new file in the `gcd_world/src` directory called `gcd.h`.
```c
// in gcd.h
#ifndef GCD_H_
#define GCD_H_
typedef struct
{
volatile uint32_t A;
volatile uint32_t B;
volatile uint32_t RES;
volatile uint32_t READY;
volatile uint32_t VALID;
} Gcd_Reg;
#endif /* GCD_H_ */
```
With that we define the available memory mapped registers starting from the base address of the peripheral.
We then edit the `murax.h` header file in the same directory:
```c
#ifndef __MURAX_H__
#define __MURAX_H__
#include "timer.h"
#include "prescaler.h"
#include "interrupt.h"
#include "gpio.h"
#include "uart.h"
#include "gcd.h"
#define GPIO_A ((Gpio_Reg*)(0xF0000000))
#define TIMER_PRESCALER ((Prescaler_Reg*)0xF0020000)
#define TIMER_INTERRUPT ((InterruptCtrl_Reg*)0xF0020010)
#define TIMER_A ((Timer_Reg*)0xF0020040)
#define TIMER_B ((Timer_Reg*)0xF0020050)
#define UART ((Uart_Reg*)(0xF0010000))
#define GCD ((Gcd_Reg*)(0xF0030000))
#endif /* __MURAX_H__ */
```
Our addition is the line `#define GCD ((Gcd_Reg*)(0xF0030000))`.
With that we create a way of accessing the memory mapped registers without directly referring to the peripherals address (`0xF0030000`) or having to calculate offsets for the registers.
Now we can start writing our software!
In our `main.c` we add a function to make the peripheral handling a bit more convenient:
```c
uint32_t gcd(uint32_t a, uint32_t b){
GCD->A = a;
GCD->B = b;
GCD->VALID = 0x00000001;
uint32_t rdyFlag = 0;
do{
rdyFlag = GCD->READY;
}while(!rdyFlag);
return GCD->RES;
}
```
This function will take the parameters `a` and `b` and applies them to the respective hardware registers `A` and `B` of our peripheral.
Then the `VALID` signal is set (our Apb3 wrapper takes care of setting it back to 0).
All thats left is waiting for the result, which is done by polling the ready flag until its available and then returning our result value `RES`.
The software contains a little more code for formatting numbers to print them onto the UART device but reading and understanding that is left as an exercise to the reader.
So how do we execute our software on the Murax now?
First we compile the software with the make file. For that call `make` inside `src/main/c/murax/gcd_world`.
You should get some minor warnings and a statistics about the memory usage like
```
Memory region Used Size Region Size %age Used
RAM: 1752 B 2 KB 85.55%
```
Now we can edit the `Murax.scala` one last time before we execute our simulation.
For this scroll down in the `Murax.scala` file until `MuraxWithRamInit`.
In order to load the memory with our new software instead of the `hello_world` example we edit this part.
```scala
object MuraxWithRamInit {
def main(args: Array[String]) {
SpinalVerilog(
Murax(
MuraxConfig.default.copy(
onChipRamSize = 4 kB,
onChipRamHexFile = "src/main/c/murax/gcd_world/build/gcd_world.hex"
)
)
)
}
}
```
Then in the root directory we open `sbt` and call `runMain vexriscv.demo.MuraxWithRamInit` or we call `sbt "runMain vexriscv.demo.MuraxWithRamInit"` directly.
This will call SpinalHDL to generate the modified Murax SoC with our small software example.
The last thing we need to do is call the simulation.
For that navigate to `src/test/cpp/murax` and call `make clean run`.
After some time you should see the following output in the terminal:
```
...
BOOT
hello gcd world
gcd(1,123913):
1
gcd(461952,116298):
18
gcd(461952,1162):
2
gcd(461952,11623):
1
```
Keep in mind that we are simulating a SoC. There is no shutdown for our simulation so we have to stop it by ourselves by pressing `CTRL+C`!
Otherwise the simulation won't stop.
## 7. Conclusion
In a tutorial we described how to convert pseudocode for the GCD calculation into SpinalHDL based hardware. Furthermore the hardware was integrated into the VexRiscv based Murax SoC.
To demonstrate the usage an example C project was set up and the hardware peripheral was used from within the software.
This tutorial covered the translation from RTL into SpinalHDL, writing a small wrapper for the APB3 bus used in the Murax SoC, integrating the peripheral into the Murax SoC with designated memory mapped space and writing software in C for the Murax SoC that uses the hardware peripheral to calculate the GCD and print it out on the UART of the Murax SoC.
Now there are a few open challanges to approach as an exercise here are two that would follow up naturally to our existing code:
* The Murax SoC features interrupts, we could stop polling our ready flag and instead trigger an interrupt from the `Apb3GCDCtrl` instead.
* Write the same algorithm in C and compare it with the hardware peripheral. Is it faster, is it smaller (interacting with the peripheral in software still costs instruction in terms of memory)

Binary file not shown.

After

Width:  |  Height:  |  Size: 67 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 55 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 15 KiB

File diff suppressed because one or more lines are too long

Binary file not shown.

After

Width:  |  Height:  |  Size: 47 KiB

View File

@ -0,0 +1,134 @@
PROJ_NAME=gcd_world
DEBUG=no
BENCH=no
MULDIV=no
SRCS = $(wildcard src/*.c) \
$(wildcard src/*.cpp) \
$(wildcard src/*.S)
OBJDIR = build
INC =
LIBS =
LIBSINC = -L$(OBJDIR)
LDSCRIPT = ./src/linker.ld
#include ../../../resources/gcc.mk
# Set it to yes if you are using the sifive precompiled GCC pack
SIFIVE_GCC_PACK ?= no
ifeq ($(SIFIVE_GCC_PACK),yes)
RISCV_NAME ?= riscv64-unknown-elf
RISCV_PATH ?= /home/sallar/tools/riscv-64-newlib-dist/
else
RISCV_NAME ?= riscv32-unknown-elf
ifeq ($(MULDIV),yes)
RISCV_PATH ?= /home/sallar/tools/riscv-32-imac-ilp32-newlib-dist/
else
RISCV_PATH ?= /home/sallar/tools/rv32i-ilp32-dist/
endif
endif
MABI=ilp32
MARCH := rv32i
ifeq ($(MULDIV),yes)
MARCH := $(MARCH)m
endif
ifeq ($(COMPRESSED),yes)
MARCH := $(MARCH)ac
endif
CFLAGS += -march=$(MARCH) -mabi=$(MABI) -DNDEBUG
LDFLAGS += -march=$(MARCH) -mabi=$(MABI)
#include ../../../resources/subproject.mk
ifeq ($(DEBUG),yes)
CFLAGS += -g3 -O0
endif
ifeq ($(DEBUG),no)
CFLAGS += -g -Os
endif
ifeq ($(BENCH),yes)
CFLAGS += -fno-inline
endif
ifeq ($(SIFIVE_GCC_PACK),yes)
RISCV_CLIB=$(RISCV_PATH)/$(RISCV_NAME)/lib/$(MARCH)/$(MABI)/
else
RISCV_CLIB=$(RISCV_PATH)/$(RISCV_NAME)/lib/
endif
RISCV_OBJCOPY = $(RISCV_PATH)/bin/$(RISCV_NAME)-objcopy
RISCV_OBJDUMP = $(RISCV_PATH)/bin/$(RISCV_NAME)-objdump
RISCV_CC=$(RISCV_PATH)/bin/$(RISCV_NAME)-gcc
CFLAGS += -MD -fstrict-volatile-bitfields -fno-strict-aliasing
LDFLAGS += -nostdlib -lgcc -mcmodel=medany -nostartfiles -ffreestanding -Wl,-Bstatic,-T,$(LDSCRIPT),-Map,$(OBJDIR)/$(PROJ_NAME).map,--print-memory-usage
#LDFLAGS += -lgcc -lc -lg -nostdlib -lgcc -msave-restore --strip-debug,
OBJS := $(SRCS)
OBJS := $(OBJS:.c=.o)
OBJS := $(OBJS:.cpp=.o)
OBJS := $(OBJS:.S=.o)
OBJS := $(OBJS:..=miaou)
OBJS := $(addprefix $(OBJDIR)/,$(OBJS))
all: $(OBJDIR)/$(PROJ_NAME).elf $(OBJDIR)/$(PROJ_NAME).hex $(OBJDIR)/$(PROJ_NAME).asm $(OBJDIR)/$(PROJ_NAME).v
$(OBJDIR)/%.elf: $(OBJS) | $(OBJDIR)
$(RISCV_CC) $(CFLAGS) -o $@ $^ $(LDFLAGS) $(LIBSINC) $(LIBS)
%.hex: %.elf
$(RISCV_OBJCOPY) -O ihex $^ $@
%.bin: %.elf
$(RISCV_OBJCOPY) -O binary $^ $@
%.v: %.elf
$(RISCV_OBJCOPY) -O verilog $^ $@
%.asm: %.elf
$(RISCV_OBJDUMP) -S -d $^ > $@
$(OBJDIR)/%.o: %.c
mkdir -p $(dir $@)
$(RISCV_CC) -c $(CFLAGS) $(INC) -o $@ $^
$(RISCV_CC) -S $(CFLAGS) $(INC) -o $@.disasm $^
$(OBJDIR)/%.o: %.cpp
mkdir -p $(dir $@)
$(RISCV_CC) -c $(CFLAGS) $(INC) -o $@ $^
$(OBJDIR)/%.o: %.S
mkdir -p $(dir $@)
$(RISCV_CC) -c $(CFLAGS) -o $@ $^ -D__ASSEMBLY__=1
$(OBJDIR):
mkdir -p $@
.PHONY: clean
clean:
rm -rf $(OBJDIR)/src
rm -f $(OBJDIR)/$(PROJ_NAME).elf
rm -f $(OBJDIR)/$(PROJ_NAME).hex
rm -f $(OBJDIR)/$(PROJ_NAME).map
rm -f $(OBJDIR)/$(PROJ_NAME).v
rm -f $(OBJDIR)/$(PROJ_NAME).asm
find $(OBJDIR) -type f -name '*.o' -print0 | xargs -0 -r rm
find $(OBJDIR) -type f -name '*.d' -print0 | xargs -0 -r rm
clean-all : clean
.SECONDARY: $(OBJS)

View File

@ -0,0 +1 @@
sbt.version=1.4.9

View File

@ -0,0 +1,98 @@
.global crtStart
.global main
.global irqCallback
.section .start_jump,"ax",@progbits
crtStart:
//long jump to allow crtInit to be anywhere
//do it always in 12 bytes
lui x2, %hi(crtInit)
addi x2, x2, %lo(crtInit)
jalr x1,x2
nop
.section .text
.global trap_entry
.align 5
trap_entry:
sw x1, - 1*4(sp)
sw x5, - 2*4(sp)
sw x6, - 3*4(sp)
sw x7, - 4*4(sp)
sw x10, - 5*4(sp)
sw x11, - 6*4(sp)
sw x12, - 7*4(sp)
sw x13, - 8*4(sp)
sw x14, - 9*4(sp)
sw x15, -10*4(sp)
sw x16, -11*4(sp)
sw x17, -12*4(sp)
sw x28, -13*4(sp)
sw x29, -14*4(sp)
sw x30, -15*4(sp)
sw x31, -16*4(sp)
addi sp,sp,-16*4
call irqCallback
lw x1 , 15*4(sp)
lw x5, 14*4(sp)
lw x6, 13*4(sp)
lw x7, 12*4(sp)
lw x10, 11*4(sp)
lw x11, 10*4(sp)
lw x12, 9*4(sp)
lw x13, 8*4(sp)
lw x14, 7*4(sp)
lw x15, 6*4(sp)
lw x16, 5*4(sp)
lw x17, 4*4(sp)
lw x28, 3*4(sp)
lw x29, 2*4(sp)
lw x30, 1*4(sp)
lw x31, 0*4(sp)
addi sp,sp,16*4
mret
.text
crtInit:
.option push
.option norelax
la gp, __global_pointer$
.option pop
la sp, _stack_start
bss_init:
la a0, _bss_start
la a1, _bss_end
bss_loop:
beq a0,a1,bss_done
sw zero,0(a0)
add a0,a0,4
j bss_loop
bss_done:
ctors_init:
la a0, _ctors_start
addi sp,sp,-4
ctors_loop:
la a1, _ctors_end
beq a0,a1,ctors_done
lw a3,0(a0)
add a0,a0,4
sw a0,0(sp)
jalr a3
lw a0,0(sp)
j ctors_loop
ctors_done:
addi sp,sp,4
li a0, 0x880 //880 enable timer + external interrupts
csrw mie,a0
li a0, 0x1808 //1808 enable interrupts
csrw mstatus,a0
call main
infinitLoop:
j infinitLoop

View File

@ -0,0 +1,13 @@
#ifndef GCD_H_
#define GCD_H_
typedef struct
{
volatile uint32_t A;
volatile uint32_t B;
volatile uint32_t RES;
volatile uint32_t READY;
volatile uint32_t VALID;
} Gcd_Reg;
#endif /* GCD_H_ */

View File

@ -0,0 +1,15 @@
#ifndef GPIO_H_
#define GPIO_H_
typedef struct
{
volatile uint32_t INPUT;
volatile uint32_t OUTPUT;
volatile uint32_t OUTPUT_ENABLE;
} Gpio_Reg;
#endif /* GPIO_H_ */

View File

@ -0,0 +1,17 @@
#ifndef INTERRUPTCTRL_H_
#define INTERRUPTCTRL_H_
#include <stdint.h>
typedef struct
{
volatile uint32_t PENDINGS;
volatile uint32_t MASKS;
} InterruptCtrl_Reg;
static void interruptCtrl_init(InterruptCtrl_Reg* reg){
reg->MASKS = 0;
reg->PENDINGS = 0xFFFFFFFF;
}
#endif /* INTERRUPTCTRL_H_ */

View File

@ -0,0 +1,110 @@
/*
This is free and unencumbered software released into the public domain.
Anyone is free to copy, modify, publish, use, compile, sell, or
distribute this software, either in source code form or as a compiled
binary, for any purpose, commercial or non-commercial, and by any
means.
*/
OUTPUT_FORMAT("elf32-littleriscv", "elf32-littleriscv", "elf32-littleriscv")
OUTPUT_ARCH(riscv)
ENTRY(crtStart)
MEMORY {
RAM (rwx): ORIGIN = 0x80000000, LENGTH = 2k
}
_stack_size = DEFINED(_stack_size) ? _stack_size : 256;
_heap_size = DEFINED(_heap_size) ? _heap_size : 0;
SECTIONS {
._vector ORIGIN(RAM): {
*crt.o(.start_jump);
*crt.o(.text);
} > RAM
._user_heap (NOLOAD):
{
. = ALIGN(8);
PROVIDE ( end = . );
PROVIDE ( _end = . );
PROVIDE ( _heap_start = .);
. = . + _heap_size;
. = ALIGN(8);
PROVIDE ( _heap_end = .);
} > RAM
._stack (NOLOAD):
{
. = ALIGN(16);
PROVIDE (_stack_end = .);
. = . + _stack_size;
. = ALIGN(16);
PROVIDE (_stack_start = .);
} > RAM
.data :
{
*(.rdata)
*(.rodata .rodata.*)
*(.gnu.linkonce.r.*)
*(.data .data.*)
*(.gnu.linkonce.d.*)
. = ALIGN(8);
PROVIDE( __global_pointer$ = . + 0x800 );
*(.sdata .sdata.*)
*(.gnu.linkonce.s.*)
. = ALIGN(8);
*(.srodata.cst16)
*(.srodata.cst8)
*(.srodata.cst4)
*(.srodata.cst2)
*(.srodata .srodata.*)
} > RAM
.bss (NOLOAD) : {
. = ALIGN(4);
/* This is used by the startup in order to initialize the .bss secion */
_bss_start = .;
*(.sbss*)
*(.gnu.linkonce.sb.*)
*(.bss .bss.*)
*(.gnu.linkonce.b.*)
*(COMMON)
. = ALIGN(4);
_bss_end = .;
} > RAM
.rodata :
{
*(.rdata)
*(.rodata .rodata.*)
*(.gnu.linkonce.r.*)
} > RAM
.noinit (NOLOAD) : {
. = ALIGN(4);
*(.noinit .noinit.*)
. = ALIGN(4);
} > RAM
.memory : {
*(.text);
end = .;
} > RAM
.ctors :
{
. = ALIGN(4);
_ctors_start = .;
KEEP(*(.init_array*))
KEEP (*(SORT(.ctors.*)))
KEEP (*(.ctors))
. = ALIGN(4);
_ctors_end = .;
PROVIDE ( END_OF_SW_IMAGE = . );
} > RAM
}

View File

@ -0,0 +1,62 @@
//#include "stddefs.h"
#include <stdint.h>
#include "murax.h"
#include "main.h"
#define DEBUG 0
uint32_t gcd(uint32_t a, uint32_t b){
GCD->A = a;
GCD->B = b;
GCD->VALID = 0x00000001;
uint32_t rdyFlag = 0;
do{
rdyFlag = GCD->READY;
}while(!rdyFlag);
return GCD->RES;
}
void calcPrintGCD(uint32_t a, uint32_t b){
uint32_t myGCD = 0;
char buf[5] = { 0x00 };
char aBuf[11] = { 0x00 };
char bBuf[11] = { 0x00 };
itoa(a, aBuf, 10);
itoa(b, bBuf, 10);
print("gcd(");print(aBuf);print(",");print(bBuf);println("):");
myGCD = gcd(a,b);
itoa(myGCD, buf, 10);
println(buf);
}
void main() {
GPIO_A->OUTPUT_ENABLE = 0x0000000F;
GPIO_A->OUTPUT = 0x00000001;
println("hello gcd world");
const int nleds = 4;
const int nloops = 2000000;
GCD->VALID = 0x00000000;
while(GCD->READY);
calcPrintGCD(1, 123913);
calcPrintGCD(461952, 116298);
calcPrintGCD(461952, 116298);
calcPrintGCD(461952, 116298);
while(1){
for(unsigned int i=0;i<nleds-1;i++){
GPIO_A->OUTPUT = 1<<i;
delay(nloops);
}
for(unsigned int i=0;i<nleds-1;i++){
GPIO_A->OUTPUT = (1<<(nleds-1))>>i;
delay(nloops);
}
}
}
void irqCallback(){
}

View File

@ -0,0 +1,78 @@
//----------------------------
// integer to ascii (itoa) with util functions
//----------------------------
// function to swap two numbers
void swap(char *x, char *y) {
char t = *x; *x = *y; *y = t;
}
// function to reverse buffer[i..j]
char* reverse(char *buffer, int i, int j) {
while (i < j)
swap(&buffer[i++], &buffer[j--]);
return buffer;
}
// Iterative function to implement itoa() function in C
char* itoa(int value, char* buffer, int base) {
// invalid input
if (base < 2 || base > 32)
return buffer;
// consider absolute value of number
int n = (value < 0) ? -value : value;
int i = 0;
while (n) {
int r = n % base;
if (r >= 10)
buffer[i++] = 65 + (r - 10);
else
buffer[i++] = 48 + r;
n = n / base;
}
// if number is 0
if (i == 0)
buffer[i++] = '0';
// If base is 10 and value is negative, the resulting string
// is preceded with a minus sign (-)
// With any other base, value is always considered unsigned
if (value < 0 && base == 10)
buffer[i++] = '-';
buffer[i] = '\0'; // null terminate string
// reverse the string and return it
return reverse(buffer, 0, i - 1);
}
//----------------------------
// print, println, dbgprint
//----------------------------
void print(const char*str){
while(*str){
uart_write(UART,*str);
str++;
}
}
void println(const char*str){
print(str);
uart_write(UART,'\n');
}
void dbgPrintln(const char*str){
#if DEBUG == 1
println(str);
#else
void;
#endif
}
void delay(uint32_t loops){
for(int i=0;i<loops;i++){
int tmp = GPIO_A->OUTPUT;
}
}

View File

@ -0,0 +1,20 @@
#ifndef __MURAX_H__
#define __MURAX_H__
#include "timer.h"
#include "prescaler.h"
#include "interrupt.h"
#include "gpio.h"
#include "uart.h"
#include "gcd.h"
#define GPIO_A ((Gpio_Reg*)(0xF0000000))
#define TIMER_PRESCALER ((Prescaler_Reg*)0xF0020000)
#define TIMER_INTERRUPT ((InterruptCtrl_Reg*)0xF0020010)
#define TIMER_A ((Timer_Reg*)0xF0020040)
#define TIMER_B ((Timer_Reg*)0xF0020050)
#define UART ((Uart_Reg*)(0xF0010000))
#define GCD ((Gcd_Reg*)(0xF0030000))
#endif /* __MURAX_H__ */

View File

@ -0,0 +1,16 @@
#ifndef PRESCALERCTRL_H_
#define PRESCALERCTRL_H_
#include <stdint.h>
typedef struct
{
volatile uint32_t LIMIT;
} Prescaler_Reg;
static void prescaler_init(Prescaler_Reg* reg){
}
#endif /* PRESCALERCTRL_H_ */

View File

@ -0,0 +1,20 @@
#ifndef TIMERCTRL_H_
#define TIMERCTRL_H_
#include <stdint.h>
typedef struct
{
volatile uint32_t CLEARS_TICKS;
volatile uint32_t LIMIT;
volatile uint32_t VALUE;
} Timer_Reg;
static void timer_init(Timer_Reg *reg){
reg->CLEARS_TICKS = 0;
reg->VALUE = 0;
}
#endif /* TIMERCTRL_H_ */

View File

@ -0,0 +1,42 @@
#ifndef UART_H_
#define UART_H_
typedef struct
{
volatile uint32_t DATA;
volatile uint32_t STATUS;
volatile uint32_t CLOCK_DIVIDER;
volatile uint32_t FRAME_CONFIG;
} Uart_Reg;
enum UartParity {NONE = 0,EVEN = 1,ODD = 2};
enum UartStop {ONE = 0,TWO = 1};
typedef struct {
uint32_t dataLength;
enum UartParity parity;
enum UartStop stop;
uint32_t clockDivider;
} Uart_Config;
static uint32_t uart_writeAvailability(Uart_Reg *reg){
return (reg->STATUS >> 16) & 0xFF;
}
static uint32_t uart_readOccupancy(Uart_Reg *reg){
return reg->STATUS >> 24;
}
static void uart_write(Uart_Reg *reg, uint32_t data){
while(uart_writeAvailability(reg) == 0);
reg->DATA = data;
}
static void uart_applyConfig(Uart_Reg *reg, Uart_Config *config){
reg->CLOCK_DIVIDER = config->clockDivider;
reg->FRAME_CONFIG = ((config->dataLength-1) << 0) | (config->parity << 8) | (config->stop << 16);
}
#endif /* UART_H_ */

View File

@ -0,0 +1,559 @@
package vexriscv.demo
import spinal.core._
import spinal.lib._
import spinal.lib.bus.amba3.apb._
import spinal.lib.bus.misc.SizeMapping
import spinal.lib.bus.simple.PipelinedMemoryBus
import spinal.lib.com.jtag.Jtag
import spinal.lib.com.spi.ddr.SpiXdrMaster
import spinal.lib.com.uart._
import spinal.lib.io.{InOutWrapper, TriStateArray}
import spinal.lib.misc.{InterruptCtrl, Prescaler, Timer}
import spinal.lib.soc.pinsec.{PinsecTimerCtrl, PinsecTimerCtrlExternal}
import vexriscv.plugin._
import vexriscv.{VexRiscv, VexRiscvConfig, plugin}
import spinal.lib.com.spi.ddr._
import spinal.lib.bus.simple._
import scala.collection.mutable.ArrayBuffer
import vexriscv.periph.gcd._
import vexriscv.periph.tasks.gen._
import vexriscv.periph.tasks.map._
import vexriscv.periph.tasks.sort._
import vexriscv.periph.tasks.max._
import vexriscv.periph.tasks.sum._
import vexriscv.periph.tasks.hash._
/** Created by PIC32F_USER on 28/07/2017.
*
* Murax is a very light SoC which could work without any external component.
* - ICE40-hx8k + icestorm => 53 Mhz, 2142 LC
* - 0.37 DMIPS/Mhz
* - 8 kB of on-chip ram
* - JTAG debugger (eclipse/GDB/openocd ready)
* - Interrupt support
* - APB bus for peripherals
* - 32 GPIO pin
* - one 16 bits prescaler, two 16 bits timers
* - one UART with tx/rx fifo
*/
case class MuraxConfig(
coreFrequency: HertzNumber,
onChipRamSize: BigInt,
onChipRamHexFile: String,
pipelineDBus: Boolean,
pipelineMainBus: Boolean,
pipelineApbBridge: Boolean,
gpioWidth: Int,
uartCtrlConfig: UartCtrlMemoryMappedConfig,
xipConfig: SpiXdrMasterCtrl.MemoryMappingParameters,
hardwareBreakpointCount: Int,
cpuPlugins: ArrayBuffer[Plugin[VexRiscv]]
) {
require(
pipelineApbBridge || pipelineMainBus,
"At least pipelineMainBus or pipelineApbBridge should be enable to avoid wipe transactions"
)
val genXip = xipConfig != null
}
object MuraxConfig {
def default: MuraxConfig = default(false, false)
def default(withXip: Boolean = false, bigEndian: Boolean = false) =
MuraxConfig(
coreFrequency = 12 MHz,
onChipRamSize = 8 kB,
onChipRamHexFile = null,
pipelineDBus = true,
pipelineMainBus = false,
pipelineApbBridge = true,
gpioWidth = 32,
xipConfig = ifGen(withXip)(
SpiXdrMasterCtrl.MemoryMappingParameters(
SpiXdrMasterCtrl
.Parameters(8, 12, SpiXdrParameter(2, 2, 1))
.addFullDuplex(0, 1, false),
cmdFifoDepth = 32,
rspFifoDepth = 32,
xip = SpiXdrMasterCtrl
.XipBusParameters(addressWidth = 24, lengthWidth = 2)
)
),
hardwareBreakpointCount = if (withXip) 3 else 0,
cpuPlugins = ArrayBuffer( //DebugPlugin added by the toplevel
new IBusSimplePlugin(
resetVector = if (withXip) 0xf001e000L else 0x80000000L,
cmdForkOnSecondStage = true,
cmdForkPersistence = withXip, //Required by the Xip controller
prediction = NONE,
catchAccessFault = false,
compressedGen = false,
bigEndian = bigEndian
),
new DBusSimplePlugin(
catchAddressMisaligned = false,
catchAccessFault = false,
earlyInjection = false,
bigEndian = bigEndian
),
new CsrPlugin(
CsrPluginConfig.smallest(mtvecInit =
if (withXip) 0xe0040020L else 0x80000020L
)
),
new DecoderSimplePlugin(
catchIllegalInstruction = false
),
new RegFilePlugin(
regFileReadyKind = plugin.SYNC,
zeroBoot = false
),
new IntAluPlugin,
new SrcPlugin(
separatedAddSub = false,
executeInsertion = false
),
new LightShifterPlugin,
new HazardSimplePlugin(
bypassExecute = false,
bypassMemory = false,
bypassWriteBack = false,
bypassWriteBackBuffer = false,
pessimisticUseSrc = false,
pessimisticWriteRegFile = false,
pessimisticAddressMatch = false
),
new BranchPlugin(
earlyBranch = false,
catchAddressMisaligned = false
),
new YamlPlugin("cpu0.yaml")
),
uartCtrlConfig = UartCtrlMemoryMappedConfig(
uartCtrlConfig = UartCtrlGenerics(
dataWidthMax = 8,
clockDividerWidth = 20,
preSamplingSize = 1,
samplingSize = 3,
postSamplingSize = 1
),
initConfig = UartCtrlInitConfig(
baudrate = 115200,
dataLength = 7, //7 => 8 bits
parity = UartParityType.NONE,
stop = UartStopType.ONE
),
busCanWriteClockDividerConfig = false,
busCanWriteFrameConfig = false,
txFifoDepth = 16,
rxFifoDepth = 16
)
)
def fast = {
val config = default
//Replace HazardSimplePlugin to get datapath bypass
config.cpuPlugins(
config.cpuPlugins.indexWhere(_.isInstanceOf[HazardSimplePlugin])
) = new HazardSimplePlugin(
bypassExecute = true,
bypassMemory = true,
bypassWriteBack = true,
bypassWriteBackBuffer = true
)
// config.cpuPlugins(config.cpuPlugins.indexWhere(_.isInstanceOf[LightShifterPlugin])) = new FullBarrelShifterPlugin()
config
}
}
case class Murax(config: MuraxConfig) extends Component {
import config._
val io = new Bundle {
//Clocks / reset
val asyncReset = in Bool ()
val mainClk = in Bool ()
//Main components IO
val jtag = slave(Jtag())
//Peripherals IO
val gpioA = master(TriStateArray(gpioWidth bits))
val uart = master(Uart())
val xip = ifGen(genXip)(master(SpiXdrMaster(xipConfig.ctrl.spi)))
}
val resetCtrlClockDomain = ClockDomain(
clock = io.mainClk,
config = ClockDomainConfig(
resetKind = BOOT
)
)
val resetCtrl = new ClockingArea(resetCtrlClockDomain) {
val mainClkResetUnbuffered = False
//Implement an counter to keep the reset axiResetOrder high 64 cycles
// Also this counter will automatically do a reset when the system boot.
val systemClkResetCounter = Reg(UInt(6 bits)) init (0)
when(systemClkResetCounter =/= U(systemClkResetCounter.range -> true)) {
systemClkResetCounter := systemClkResetCounter + 1
mainClkResetUnbuffered := True
}
when(BufferCC(io.asyncReset)) {
systemClkResetCounter := 0
}
//Create all reset used later in the design
val mainClkReset = RegNext(mainClkResetUnbuffered)
val systemReset = RegNext(mainClkResetUnbuffered)
}
val systemClockDomain = ClockDomain(
clock = io.mainClk,
reset = resetCtrl.systemReset,
frequency = FixedFrequency(coreFrequency)
)
val debugClockDomain = ClockDomain(
clock = io.mainClk,
reset = resetCtrl.mainClkReset,
frequency = FixedFrequency(coreFrequency)
)
val system = new ClockingArea(systemClockDomain) {
val pipelinedMemoryBusConfig = PipelinedMemoryBusConfig(
addressWidth = 32,
dataWidth = 32
)
val bigEndianDBus = config.cpuPlugins.exists(_ match {
case plugin: DBusSimplePlugin => plugin.bigEndian
case _ => false
})
//Arbiter of the cpu dBus/iBus to drive the mainBus
//Priority to dBus, !! cmd transactions can change on the fly !!
val mainBusArbiter =
new MuraxMasterArbiter(pipelinedMemoryBusConfig, bigEndianDBus)
//Instanciate the CPU
val cpu = new VexRiscv(
config = VexRiscvConfig(
plugins = cpuPlugins += new DebugPlugin(
debugClockDomain,
hardwareBreakpointCount
)
)
)
//Checkout plugins used to instanciate the CPU to connect them to the SoC
val timerInterrupt = False
val externalInterrupt = False
for (plugin <- cpu.plugins) plugin match {
case plugin: IBusSimplePlugin =>
mainBusArbiter.io.iBus.cmd <> plugin.iBus.cmd
mainBusArbiter.io.iBus.rsp <> plugin.iBus.rsp
case plugin: DBusSimplePlugin => {
if (!pipelineDBus)
mainBusArbiter.io.dBus <> plugin.dBus
else {
mainBusArbiter.io.dBus.cmd << plugin.dBus.cmd.halfPipe()
mainBusArbiter.io.dBus.rsp <> plugin.dBus.rsp
}
}
case plugin: CsrPlugin => {
plugin.externalInterrupt := externalInterrupt
plugin.timerInterrupt := timerInterrupt
}
case plugin: DebugPlugin =>
plugin.debugClockDomain {
resetCtrl.systemReset setWhen (RegNext(plugin.io.resetOut))
io.jtag <> plugin.io.bus.fromJtag()
}
case _ =>
}
//****** MainBus slaves ********
val mainBusMapping = ArrayBuffer[(PipelinedMemoryBus, SizeMapping)]()
val ram = new MuraxPipelinedMemoryBusRam(
onChipRamSize = onChipRamSize,
onChipRamHexFile = onChipRamHexFile,
pipelinedMemoryBusConfig = pipelinedMemoryBusConfig,
bigEndian = bigEndianDBus
)
mainBusMapping += ram.io.bus -> (0x80000000L, onChipRamSize)
val apbBridge = new PipelinedMemoryBusToApbBridge(
apb3Config = Apb3Config(
addressWidth = 20,
dataWidth = 32
),
pipelineBridge = pipelineApbBridge,
pipelinedMemoryBusConfig = pipelinedMemoryBusConfig
)
mainBusMapping += apbBridge.io.pipelinedMemoryBus -> (0xf0000000L, 1 MB)
//******** APB peripherals *********
val apbMapping = ArrayBuffer[(Apb3, SizeMapping)]()
val gpioACtrl = Apb3Gpio(gpioWidth = gpioWidth, withReadSync = true)
io.gpioA <> gpioACtrl.io.gpio
apbMapping += gpioACtrl.io.apb -> (0x00000, 4 kB)
val uartCtrl = Apb3UartCtrl(uartCtrlConfig)
uartCtrl.io.uart <> io.uart
externalInterrupt setWhen (uartCtrl.io.interrupt)
apbMapping += uartCtrl.io.apb -> (0x10000, 4 kB)
val timer = new MuraxApb3Timer()
timerInterrupt setWhen (timer.io.interrupt)
apbMapping += timer.io.apb -> (0x20000, 4 kB)
val gcd = new Apb3GCDCtrl(
apb3Config = Apb3Config(
addressWidth = 20,
dataWidth = 32
)
)
apbMapping += gcd.io.apb -> (0x30000, 1 kB)
val xip = ifGen(genXip)(new Area {
val ctrl = Apb3SpiXdrMasterCtrl(xipConfig)
ctrl.io.spi <> io.xip
externalInterrupt setWhen (ctrl.io.interrupt)
apbMapping += ctrl.io.apb -> (0x1f000, 4 kB)
val accessBus = new PipelinedMemoryBus(PipelinedMemoryBusConfig(24, 32))
mainBusMapping += accessBus -> (0xe0000000L, 16 MB)
ctrl.io.xip.fromPipelinedMemoryBus() << accessBus
val bootloader = Apb3Rom("src/main/c/murax/xipBootloader/crt.bin")
apbMapping += bootloader.io.apb -> (0x1e000, 4 kB)
})
//******** Memory mappings *********
val apbDecoder = Apb3Decoder(
master = apbBridge.io.apb,
slaves = apbMapping
)
val mainBusDecoder = new Area {
val logic = new MuraxPipelinedMemoryBusDecoder(
master = mainBusArbiter.io.masterBus,
specification = mainBusMapping,
pipelineMaster = pipelineMainBus
)
}
}
}
object Murax {
def main(args: Array[String]) {
SpinalVerilog(Murax(MuraxConfig.default))
}
}
object Murax_iCE40_hx8k_breakout_board_xip {
case class SB_GB() extends BlackBox {
val USER_SIGNAL_TO_GLOBAL_BUFFER = in Bool ()
val GLOBAL_BUFFER_OUTPUT = out Bool ()
}
case class SB_IO_SCLK() extends BlackBox {
addGeneric("PIN_TYPE", B"010000")
val PACKAGE_PIN = out Bool ()
val OUTPUT_CLK = in Bool ()
val CLOCK_ENABLE = in Bool ()
val D_OUT_0 = in Bool ()
val D_OUT_1 = in Bool ()
setDefinitionName("SB_IO")
}
case class SB_IO_DATA() extends BlackBox {
addGeneric("PIN_TYPE", B"110000")
val PACKAGE_PIN = inout(Analog(Bool))
val CLOCK_ENABLE = in Bool ()
val INPUT_CLK = in Bool ()
val OUTPUT_CLK = in Bool ()
val OUTPUT_ENABLE = in Bool ()
val D_OUT_0 = in Bool ()
val D_OUT_1 = in Bool ()
val D_IN_0 = out Bool ()
val D_IN_1 = out Bool ()
setDefinitionName("SB_IO")
}
case class Murax_iCE40_hx8k_breakout_board_xip() extends Component {
val io = new Bundle {
val mainClk = in Bool ()
val jtag_tck = in Bool ()
val jtag_tdi = in Bool ()
val jtag_tdo = out Bool ()
val jtag_tms = in Bool ()
val uart_txd = out Bool ()
val uart_rxd = in Bool ()
val mosi = inout(Analog(Bool))
val miso = inout(Analog(Bool))
val sclk = out Bool ()
val spis = out Bool ()
val led = out Bits (8 bits)
}
val murax = Murax(
MuraxConfig.default(withXip = true).copy(onChipRamSize = 8 kB)
)
murax.io.asyncReset := False
val mainClkBuffer = SB_GB()
mainClkBuffer.USER_SIGNAL_TO_GLOBAL_BUFFER <> io.mainClk
mainClkBuffer.GLOBAL_BUFFER_OUTPUT <> murax.io.mainClk
val jtagClkBuffer = SB_GB()
jtagClkBuffer.USER_SIGNAL_TO_GLOBAL_BUFFER <> io.jtag_tck
jtagClkBuffer.GLOBAL_BUFFER_OUTPUT <> murax.io.jtag.tck
io.led <> murax.io.gpioA.write(7 downto 0)
murax.io.jtag.tdi <> io.jtag_tdi
murax.io.jtag.tdo <> io.jtag_tdo
murax.io.jtag.tms <> io.jtag_tms
murax.io.gpioA.read <> 0
murax.io.uart.txd <> io.uart_txd
murax.io.uart.rxd <> io.uart_rxd
val xip = new ClockingArea(murax.systemClockDomain) {
RegNext(murax.io.xip.ss.asBool) <> io.spis
val sclkIo = SB_IO_SCLK()
sclkIo.PACKAGE_PIN <> io.sclk
sclkIo.CLOCK_ENABLE := True
sclkIo.OUTPUT_CLK := ClockDomain.current.readClockWire
sclkIo.D_OUT_0 <> murax.io.xip.sclk.write(0)
sclkIo.D_OUT_1 <> RegNext(murax.io.xip.sclk.write(1))
val datas =
for ((data, pin) <- (murax.io.xip.data, List(io.mosi, io.miso)).zipped)
yield new Area {
val dataIo = SB_IO_DATA()
dataIo.PACKAGE_PIN := pin
dataIo.CLOCK_ENABLE := True
dataIo.OUTPUT_CLK := ClockDomain.current.readClockWire
dataIo.OUTPUT_ENABLE <> data.writeEnable
dataIo.D_OUT_0 <> data.write(0)
dataIo.D_OUT_1 <> RegNext(data.write(1))
dataIo.INPUT_CLK := ClockDomain.current.readClockWire
data.read(0) := dataIo.D_IN_0
data.read(1) := RegNext(dataIo.D_IN_1)
}
}
}
def main(args: Array[String]) {
SpinalVerilog(Murax_iCE40_hx8k_breakout_board_xip())
}
}
object MuraxDhrystoneReady {
def main(args: Array[String]) {
SpinalVerilog(Murax(MuraxConfig.fast.copy(onChipRamSize = 256 kB)))
}
}
object MuraxDhrystoneReadyMulDivStatic {
def main(args: Array[String]) {
SpinalVerilog({
val config = MuraxConfig.fast.copy(onChipRamSize = 256 kB)
config.cpuPlugins += new MulPlugin
config.cpuPlugins += new DivPlugin
config.cpuPlugins.remove(
config.cpuPlugins.indexWhere(_.isInstanceOf[BranchPlugin])
)
config.cpuPlugins += new BranchPlugin(
earlyBranch = false,
catchAddressMisaligned = false
)
config.cpuPlugins += new IBusSimplePlugin(
resetVector = 0x80000000L,
cmdForkOnSecondStage = true,
cmdForkPersistence = false,
prediction = STATIC,
catchAccessFault = false,
compressedGen = false
)
config.cpuPlugins.remove(
config.cpuPlugins.indexWhere(_.isInstanceOf[LightShifterPlugin])
)
config.cpuPlugins += new FullBarrelShifterPlugin
Murax(config)
})
}
}
//Will blink led and echo UART RX to UART TX (in the verilator sim, type some text and press enter to send UART frame to the Murax RX pin)
object MuraxWithRamInit {
def main(args: Array[String]) {
SpinalVerilog(
Murax(
MuraxConfig.default.copy(
onChipRamSize = 4 kB,
onChipRamHexFile = "src/main/c/murax/gcd_world/build/gcd_world.hex"
)
)
)
.printPruned()
}
}
object MuraxWithRamInitSynth {
def main(args: Array[String]) {
val config = SpinalConfig(
targetDirectory = "synth",
defaultClockDomainFrequency = FixedFrequency(12 MHz)
)
config
.generateVerilog(
Murax(
MuraxConfig.default.copy(
onChipRamSize = 4 kB,
onChipRamHexFile = "src/main/c/murax/gcd_world/build/gcd_world.hex"
)
)
)
.printPruned()
}
}
object Murax_arty {
def main(args: Array[String]) {
val hex = "src/main/c/murax/hello_world/build/hello_world.hex"
SpinalVerilog(
Murax(
MuraxConfig
.default(false)
.copy(
coreFrequency = 100 MHz,
onChipRamSize = 32 kB,
onChipRamHexFile = hex
)
)
)
}
}
object MuraxAsicBlackBox extends App {
println("Warning this soc do not has any rom to boot on.")
val config = SpinalConfig()
config.addStandardMemBlackboxing(blackboxAll)
config.generateVerilog(Murax(MuraxConfig.default()))
}

View File

@ -0,0 +1,39 @@
package vexriscv.periph.gcd
import spinal.core._
import spinal.lib._
import spinal.lib.bus.amba3.apb.{Apb3, Apb3Config, Apb3SlaveFactory}
import spinal.lib.eda.altera.QSysify
import spinal.lib.slave
object Apb3GCDCtrl {
def getApb3Config = Apb3Config(
addressWidth = 5,
dataWidth = 32,
selWidth = 1,
useSlaveError = false
)
}
class Apb3GCDCtrl(apb3Config : Apb3Config) extends Component {
val io = new Bundle {
val apb = slave(Apb3(Apb3GCDCtrl.getApb3Config))
// maybe later
// val interrupt = out Bool
}
val gcdCtrl = new GCDTop()
val apbCtrl = Apb3SlaveFactory(io.apb)
apbCtrl.driveAndRead(gcdCtrl.io.a, address=0)
apbCtrl.driveAndRead(gcdCtrl.io.b, address=4)
// when result of calculation ready, synchronize it into memory mapped register
val resSyncBuf = RegNextWhen(gcdCtrl.io.res, gcdCtrl.io.ready)
apbCtrl.read(resSyncBuf, address=8)
// if result is read, it will be consumed, set ready to 0
apbCtrl.onRead(8)(resSyncBuf := 0)
apbCtrl.onRead(8)(rdySyncBuf := False)
// synchronize ready signal into memory mapped register
val rdySyncBuf = RegNextWhen(gcdCtrl.io.ready, gcdCtrl.io.ready)
apbCtrl.read(rdySyncBuf, address=12)
// set valid based on memory mapped register but clear/consume it after 1 cycle <s
gcdCtrl.io.valid := apbCtrl.setOnSet(RegNext(False) init(False), address=16, 0)
}

View File

@ -0,0 +1,68 @@
package vexriscv.periph.gcd
import spinal.core._
import spinal.lib._
import spinal.lib.master
import spinal.lib.fsm._
//Hardware definition
class GCDCtrl() extends Component {
val io = new Bundle {
val valid = in Bool()
val ready = out Bool()
val dataCtrl = master(GCDDataControl())
}
val fsm = new StateMachine{
io.dataCtrl.loadA := False
io.dataCtrl.loadB := False
io.dataCtrl.init := False
io.dataCtrl.selL := False
io.dataCtrl.selR := False
io.ready := False
val idle : State = new State with EntryPoint{
whenIsActive{
when(io.valid){
io.dataCtrl.init := True
goto(calculate)
}
}
}
val calculate : State = new State{
whenIsActive{
when(io.dataCtrl.cmpAgtB){
goto(calcA)
}.elsewhen(io.dataCtrl.cmpAltB){
goto(calcB)
}.elsewhen(!io.dataCtrl.cmpAgtB & !io.dataCtrl.cmpAgtB){
goto(calcDone)
}
}
}
val calcA : State = new State{
whenIsActive{
io.dataCtrl.selR := True
io.dataCtrl.loadA := True
goto(calculate)
}
}
val calcB : State = new State{
whenIsActive{
io.dataCtrl.selL := True
io.dataCtrl.loadB := True
goto(calculate)
}
}
val calcDone : State = new State{
whenIsActive{
io.ready := True
goto(idle)
}
}
}
}
object GCDCtrlVerilog {
def main(args: Array[String]) {
SpinalVerilog(new GCDCtrl)
}
}

View File

@ -0,0 +1,54 @@
package vexriscv.periph.gcd
import spinal.core._
import spinal.lib._
import spinal.lib.slave
//Hardware definition
class GCDData() extends Component {
val io = new Bundle {
val a = in(UInt(32 bits))
val b = in(UInt(32 bits))
val res = out(UInt(32 bits))
val dataCtrl = slave(GCDDataControl())
}
/*
*
* // Pseudocode of the Euclids algorithm for calculating the GCD
* inputs: [a, b, start]
* outputs: [done, a]
* done := False
* while(!done):
* if(a > b):
* a := a - b
* else if(b > a):
* b := b - a
* else:
* done := True
*/
//registers
val regA = Reg(UInt(32 bits)) init(0)
val regB = Reg(UInt(32 bits)) init(0)
// compare
val xGTy = regA > regB
val xLTy = regA < regB
// mux
val chX = io.dataCtrl.selL ? regB | regA
val chY = io.dataCtrl.selR ? regB | regA
// subtract
val subXY = chX - chY
// load logic
when(io.dataCtrl.init){
regA := io.a
regB := io.b
}
when(io.dataCtrl.loadA){
regA := subXY
}
when(io.dataCtrl.loadB){
regB := subXY
}
io.dataCtrl.cmpAgtB := xGTy
io.dataCtrl.cmpAltB := xLTy
io.res := regA
}

View File

@ -0,0 +1,46 @@
package vexriscv.periph.gcd
import spinal.core._
import spinal.lib._
import spinal.lib.IMasterSlave
case class GCDDataControl() extends Bundle with IMasterSlave{
val cmpAgtB = Bool
val cmpAltB = Bool
val loadA = Bool
val loadB = Bool
val init = Bool
val selL = Bool
val selR = Bool
// define <> semantic
override def asMaster(): Unit = {
// as controller: output, input
out(loadA, loadB, selL, selR, init)
in(cmpAgtB, cmpAltB)
}
}
//Hardware definition
class GCDTop() extends Component {
val io = new Bundle {
val valid = in Bool()
val ready = out Bool()
val a = in(UInt(32 bits))
val b = in(UInt(32 bits))
val res = out(UInt(32 bits))
}
val gcdCtr = new GCDCtrl()
gcdCtr.io.valid := io.valid
io.ready := gcdCtr.io.ready
val gcdDat = new GCDData()
gcdDat.io.a := io.a
gcdDat.io.b := io.b
io.res := gcdDat.io.res
gcdCtr.io.dataCtrl <> gcdDat.io.dataCtrl
}
object GCDTopVerilog {
def main(args: Array[String]) {
SpinalVerilog(new GCDTop)
}
}

View File

@ -0,0 +1,52 @@
package vexriscv.periph.gcd
import spinal.core._
import spinal.sim._
import spinal.core.sim._
//import scala.util.Random
import java.util.concurrent.ThreadLocalRandom
object GCDTopSim {
def main(args: Array[String]) {
SimConfig.withWave.doSim(new GCDTop()){dut =>
// SimConfig.doSim(new GCDTop()){dut =>
def gcd(a: Long,b: Long): Long = {
if(b==0) a else gcd(b, a%b)
}
def RndNextUInt32(): Long = {
ThreadLocalRandom.current().nextLong(Math.pow(2, 32).toLong - 1)
}
var a = 0L
var b = 0L
var model = 0L
dut.io.a #= 0
dut.io.b #= 0
dut.io.valid #= false
dut.clockDomain.forkStimulus(period = 10)
dut.clockDomain.waitRisingEdge()
for(idx <- 0 to 500){
// generate 2 random ints
a = RndNextUInt32()
b = RndNextUInt32()
// calculate the model value (software)
model = gcd(a,b)
// apply stimulus with random ints
dut.io.a #= a
dut.io.b #= b
dut.io.valid #= true
dut.clockDomain.waitRisingEdge()
dut.io.valid #= false
// wait until calculation of hardware is done
waitUntil(dut.io.ready.toBoolean)
assert(
assertion = (dut.io.res.toBigInt == model),
message = "test " + idx + " failed. Expected " + model + ", retrieved: " + dut.io.res.toBigInt
)
waitUntil(!dut.io.ready.toBoolean)
}
}
}
}