Update DMIPS/Mhz
Add cached config with maximal performance settings FullBarrielShifterPlugin can now be configured to do everything in the execute stage
This commit is contained in:
parent
b3564e1b7e
commit
26732942e5
34
README.md
34
README.md
|
@ -21,8 +21,8 @@ This repository host an RISC-V implementation written in SpinalHDL. There is som
|
|||
|
||||
- RV32IM instruction set
|
||||
- Pipelined on 5 stages (Fetch, Decode, Execute, Memory, WriteBack)
|
||||
- 1.16 DMIPS/Mhz when all features are enabled
|
||||
- Optimized for FPGA
|
||||
- 1.29 DMIPS/Mhz when all features are enabled
|
||||
- Optimized for FPGA, fully portable
|
||||
- AXI4 and Avalon ready
|
||||
- Optional MUL/DIV extension
|
||||
- Optional instruction and data caches
|
||||
|
@ -45,46 +45,62 @@ The hardware description of this CPU is done by using an very software oriented
|
|||
The following number where obtains by synthesis the CPU as toplevel without any specific synthesis option to save area or to get better maximal frequency (neutral).<br>
|
||||
The clock constraint is set to a unattainable value, which tends to increase the design area.<br>
|
||||
The dhrystone benchmark were compiled with -O3 -fno-inline<br>
|
||||
All the cached configuration have some cache trashing during the dhrystone benchmark except the `VexRiscv full max perf` one. This of course reduce the performance. It is possible to produce dhrystone binaries which fit inside a 4KB I$ and 4KB D$ (I already had this case once) but currently it isn't the case.<br>
|
||||
The used CPU corresponding configuration can be find in src/scala/vexriscv/demo.
|
||||
|
||||
```
|
||||
VexRiscv smallest (RV32I, 0.47 DMIPS/Mhz, no datapath bypass, no interrupt) ->
|
||||
VexRiscv smallest (RV32I, 0.51 DMIPS/Mhz, no datapath bypass, no interrupt) ->
|
||||
Artix 7 -> 346 Mhz 481 LUT 539 FF
|
||||
Cyclone V -> 201 Mhz 347 ALMs
|
||||
Cyclone IV -> 190 Mhz 673 LUT 529 FF
|
||||
Cyclone II -> 154 Mhz 673 LUT 528 FF
|
||||
|
||||
VexRiscv smallest (RV32I, 0.47 DMIPS/Mhz, no datapath bypass) ->
|
||||
VexRiscv smallest (RV32I, 0.51 DMIPS/Mhz, no datapath bypass) ->
|
||||
Artix 7 -> 340 Mhz 562 LUT 589 FF
|
||||
Cyclone V -> 202 Mhz 387 ALMs
|
||||
Cyclone IV -> 180 Mhz 780 LUT 579 FF
|
||||
Cyclone II -> 149 Mhz 780 LUT 578 FF
|
||||
|
||||
VexRiscv small and productive (RV32I, 0.78 DMIPS/Mhz) ->
|
||||
VexRiscv small and productive (RV32I, 0.82 DMIPS/Mhz) ->
|
||||
Artix 7 -> 309 Mhz 703 LUT 557 FF
|
||||
Cyclone V -> 152 Mhz 502 ALMs
|
||||
Cyclone IV -> 147 Mhz 1,062 LUT 552 FF
|
||||
Cyclone II -> 120 Mhz 1,072 LUT 551 FF
|
||||
|
||||
VexRiscv full no cache (RV32IM, 1.14 DMIPS/Mhz, single cycle barrel shifter, debug module, catch exceptions, static branch) ->
|
||||
VexRiscv full no cache (RV32IM, 1.20 DMIPS/Mhz, single cycle barrel shifter, debug module, catch exceptions, static branch) ->
|
||||
Artix 7 -> 310 Mhz 1391 LUT 934 FF
|
||||
Cyclone V -> 143 Mhz 935 ALMs
|
||||
Cyclone IV -> 123 Mhz 1,916 LUT 960 FF
|
||||
Cyclone II -> 108 Mhz 1,939 LUT 959 FF
|
||||
|
||||
VexRiscv full (RV32IM, 1.14 DMIPS/Mhz, I$, D$, single cycle barrel shifter, debug module, catch exceptions, static branch) ->
|
||||
VexRiscv full (RV32IM, 1.13 DMIPS/Mhz with cache trashing, 4KB-I$,4KB-D$, single cycle barrel shifter, debug module, catch exceptions, static branch) ->
|
||||
Artix 7 -> 250 Mhz 1911 LUT 1501 FF
|
||||
Cyclone V -> 132 Mhz 1,266 ALMs
|
||||
Cyclone IV -> 127 Mhz 2,733 LUT 1,762 FF
|
||||
Cyclone II -> 103 Mhz 2,791 LUT 1,760 FF
|
||||
|
||||
VexRiscv full with MMU (RV32IM, 1.16 DMIPS/Mhz, I$, D$, single cycle barrel shifter, debug module, catch exceptions, dynamic branch, MMU) ->
|
||||
VexRiscv full max perf -> (RV32IM, 1.29 DMIPS/Mhz, 16KB-I$,16KB-D$, single cycle barrel shifter, debug module, catch exceptions, dynamic branch, branch and shift operations done in the Execute stage) ->
|
||||
Artix 7 -> 216 Mhz 1978 LUT 1442 FF
|
||||
Cyclone V -> 105 Mhz 1,222 ALMs
|
||||
Cyclone IV -> 94 Mhz 2,735 LUT 1,702 FF
|
||||
|
||||
VexRiscv full with MMU (RV32IM, 1.17 DMIPS/Mhz with cache trashing, 4KB-I$, 4KB-D$, single cycle barrel shifter, debug module, catch exceptions, dynamic branch, MMU) ->
|
||||
Artix 7 -> 223 Mhz 2085 LUT 2020 FF
|
||||
Cyclone V -> 110 Mhz 1,503 ALMs
|
||||
Cyclone IV -> 108 Mhz 3,153 LUT 2,281 FF
|
||||
Cyclone II -> 94 Mhz 3,187 LUT 2,281 FF
|
||||
```
|
||||
|
||||
There is the a summary of the configuration which produce 1.29 DMIPS :
|
||||
|
||||
- 5 stage : F -> D -> E -> M -> WB
|
||||
- single cycle ADD/SUB/Bitwise/Shift ALU
|
||||
- branch/jump done in the E stage
|
||||
- memory load values are bypassed in the WB stage (late result)
|
||||
- 33 cycle division with bypassing in the M stage (late result)
|
||||
- single cycle multiplication with bypassing in the WB stage (late result)
|
||||
- dynamic branch prediction done in the D stage with an direct mapped 2 bit branch history cache
|
||||
|
||||
## Dependencies
|
||||
|
||||
On Ubuntu 14 :
|
||||
|
@ -337,7 +353,7 @@ sudo mv /opt/riscv64-unknown-elf-gcc-20170612-x86_64-linux-centos6 /opt/riscv
|
|||
echo 'export PATH=/opt/riscv/bin:$PATH' >> ~/.bashrc
|
||||
```
|
||||
|
||||
But if you want to compile from sources in /opt/ the rv32i and rv32im gcc, do the following (will take hours):
|
||||
But if you want to compile from sources in /opt/ the rv32i and rv32im gcc, do the following (will take one hour):
|
||||
|
||||
```sh
|
||||
# Be carefull, sometime the git clone has issue to successfully clone riscv-gnu-toolchain.
|
||||
|
|
|
@ -41,7 +41,7 @@ object TestsWorkspace {
|
|||
// ),
|
||||
new IBusCachedPlugin(
|
||||
config = InstructionCacheConfig(
|
||||
cacheSize = 4096,
|
||||
cacheSize = 4096*4,
|
||||
bytePerLine =32,
|
||||
wayCount = 1,
|
||||
wrappedMemAccess = true,
|
||||
|
@ -66,7 +66,7 @@ object TestsWorkspace {
|
|||
// ),
|
||||
new DBusCachedPlugin(
|
||||
config = new DataCacheConfig(
|
||||
cacheSize = 4096,
|
||||
cacheSize = 4096*4,
|
||||
bytePerLine = 32,
|
||||
wayCount = 1,
|
||||
addressWidth = 32,
|
||||
|
@ -83,14 +83,14 @@ object TestsWorkspace {
|
|||
portTlbSize = 6
|
||||
)
|
||||
),
|
||||
// new StaticMemoryTranslatorPlugin(
|
||||
// ioRange = _(31 downto 28) === 0xF
|
||||
// ),
|
||||
new MemoryTranslatorPlugin(
|
||||
tlbSize = 32,
|
||||
virtualRange = _(31 downto 28) === 0xC,
|
||||
new StaticMemoryTranslatorPlugin(
|
||||
ioRange = _(31 downto 28) === 0xF
|
||||
),
|
||||
// new MemoryTranslatorPlugin(
|
||||
// tlbSize = 32,
|
||||
// virtualRange = _(31 downto 28) === 0xC,
|
||||
// ioRange = _(31 downto 28) === 0xF
|
||||
// ),
|
||||
new DecoderSimplePlugin(
|
||||
catchIllegalInstruction = true
|
||||
),
|
||||
|
@ -102,7 +102,7 @@ object TestsWorkspace {
|
|||
new SrcPlugin(
|
||||
separatedAddSub = false
|
||||
),
|
||||
new FullBarrielShifterPlugin,
|
||||
new FullBarrielShifterPlugin(earlyInjection = true),
|
||||
// new LightShifterPlugin,
|
||||
new HazardSimplePlugin(
|
||||
bypassExecute = true,
|
||||
|
@ -120,7 +120,7 @@ object TestsWorkspace {
|
|||
new CsrPlugin(CsrPluginConfig.all),
|
||||
new DebugPlugin(ClockDomain.current.clone(reset = Bool().setName("debugReset"))),
|
||||
new BranchPlugin(
|
||||
earlyBranch = false,
|
||||
earlyBranch = true,
|
||||
catchAddressMisaligned = true,
|
||||
prediction = DYNAMIC
|
||||
),
|
||||
|
|
|
@ -0,0 +1,72 @@
|
|||
package vexriscv.demo
|
||||
import scala.sys.process._
|
||||
import java.io.File
|
||||
|
||||
object DhrystoneBench extends App{
|
||||
def doCmd(cmd : String) : String = {
|
||||
val stdOut = new StringBuilder()
|
||||
class Logger extends ProcessLogger {override def err(s: => String): Unit = {if(!s.startsWith("ar: creating ")) println(s)}
|
||||
override def out(s: => String): Unit = {stdOut ++= s}
|
||||
override def buffer[T](f: => T) = f
|
||||
}
|
||||
Process(cmd, new File("src/test/cpp/regression")).!(new Logger)
|
||||
stdOut.toString()
|
||||
}
|
||||
val report = new StringBuilder()
|
||||
def getDmips(name : String, gen : => Unit, test : String): Unit ={
|
||||
gen
|
||||
val str = doCmd(test)
|
||||
val intFind = "(\\d+\\.?)+".r
|
||||
val dmips = intFind.findFirstIn("DMIPS per Mhz\\: (\\d+.?)+".r.findAllIn(str).toList.last).get.toDouble
|
||||
report ++= name + " -> " + dmips + "\n"
|
||||
|
||||
}
|
||||
|
||||
getDmips(
|
||||
name = "GenSmallestNoCsr",
|
||||
gen = GenSmallestNoCsr.main(null),
|
||||
test = "make clean run REDO=0 IBUS=SIMPLE DBUS=SIMPLE CSR=no MMU=no DEBUG_PLUGIN=no MUL=no DIV=no"
|
||||
)
|
||||
|
||||
|
||||
getDmips(
|
||||
name = "GenSmallest",
|
||||
gen = GenSmallest.main(null),
|
||||
test = "make clean run REDO=0 IBUS=SIMPLE DBUS=SIMPLE MMU=no DEBUG_PLUGIN=no MUL=no DIV=no"
|
||||
)
|
||||
|
||||
|
||||
getDmips(
|
||||
name = "GenSmallAndProductive",
|
||||
gen = GenSmallAndProductive.main(null),
|
||||
test = "make clean run REDO=0 IBUS=SIMPLE DBUS=SIMPLE MMU=no DEBUG_PLUGIN=no MUL=no DIV=no"
|
||||
)
|
||||
|
||||
|
||||
getDmips(
|
||||
name = "GenFullNoMmuNoCache",
|
||||
gen = GenFullNoMmuNoCache.main(null),
|
||||
test = "make clean run REDO=0 IBUS=SIMPLE DBUS=SIMPLE MMU=no"
|
||||
)
|
||||
|
||||
getDmips(
|
||||
name = "GenFullNoMmu",
|
||||
gen = GenFullNoMmu.main(null),
|
||||
test = "make clean run REDO=0 MMU=no "
|
||||
)
|
||||
|
||||
getDmips(
|
||||
name = "GenFullNoMmuMaxPerf",
|
||||
gen = GenFullNoMmuMaxPerf.main(null),
|
||||
test = "make clean run REDO=0 MMU=no"
|
||||
)
|
||||
|
||||
|
||||
getDmips(
|
||||
name = "GenFull",
|
||||
gen = GenFull.main(null),
|
||||
test = "make clean run REDO=0"
|
||||
)
|
||||
|
||||
println(report)
|
||||
}
|
|
@ -0,0 +1,89 @@
|
|||
package vexriscv.demo
|
||||
|
||||
import spinal.core._
|
||||
import vexriscv.ip.{DataCacheConfig, InstructionCacheConfig}
|
||||
import vexriscv.plugin._
|
||||
import vexriscv.{VexRiscv, VexRiscvConfig, plugin}
|
||||
|
||||
/**
|
||||
* Created by spinalvm on 15.06.17.
|
||||
*/
|
||||
object GenFullNoMmuMaxPerf extends App{
|
||||
def cpu() = new VexRiscv(
|
||||
config = VexRiscvConfig(
|
||||
plugins = List(
|
||||
new PcManagerSimplePlugin(
|
||||
resetVector = 0x00000000l,
|
||||
relaxedPcCalculation = false
|
||||
),
|
||||
new IBusCachedPlugin(
|
||||
config = InstructionCacheConfig(
|
||||
cacheSize = 4096*4,
|
||||
bytePerLine =32,
|
||||
wayCount = 1,
|
||||
wrappedMemAccess = true,
|
||||
addressWidth = 32,
|
||||
cpuDataWidth = 32,
|
||||
memDataWidth = 32,
|
||||
catchIllegalAccess = true,
|
||||
catchAccessFault = true,
|
||||
catchMemoryTranslationMiss = false,
|
||||
asyncTagMemory = false,
|
||||
twoStageLogic = true
|
||||
)
|
||||
),
|
||||
new DBusCachedPlugin(
|
||||
config = new DataCacheConfig(
|
||||
cacheSize = 4096*4,
|
||||
bytePerLine = 32,
|
||||
wayCount = 1,
|
||||
addressWidth = 32,
|
||||
cpuDataWidth = 32,
|
||||
memDataWidth = 32,
|
||||
catchAccessError = true,
|
||||
catchIllegal = true,
|
||||
catchUnaligned = true,
|
||||
catchMemoryTranslationMiss = false
|
||||
)
|
||||
),
|
||||
new StaticMemoryTranslatorPlugin(
|
||||
ioRange = _(31 downto 28) === 0xF
|
||||
),
|
||||
new DecoderSimplePlugin(
|
||||
catchIllegalInstruction = true
|
||||
),
|
||||
new RegFilePlugin(
|
||||
regFileReadyKind = plugin.SYNC,
|
||||
zeroBoot = false
|
||||
),
|
||||
new IntAluPlugin,
|
||||
new SrcPlugin(
|
||||
separatedAddSub = false,
|
||||
executeInsertion = true
|
||||
),
|
||||
new FullBarrielShifterPlugin(earlyInjection = true),
|
||||
new HazardSimplePlugin(
|
||||
bypassExecute = true,
|
||||
bypassMemory = true,
|
||||
bypassWriteBack = true,
|
||||
bypassWriteBackBuffer = true,
|
||||
pessimisticUseSrc = false,
|
||||
pessimisticWriteRegFile = false,
|
||||
pessimisticAddressMatch = false
|
||||
),
|
||||
new MulPlugin,
|
||||
new DivPlugin,
|
||||
new CsrPlugin(CsrPluginConfig.small),
|
||||
new DebugPlugin(ClockDomain.current.clone(reset = Bool().setName("debugReset"))),
|
||||
new BranchPlugin(
|
||||
earlyBranch = true,
|
||||
catchAddressMisaligned = true,
|
||||
prediction = DYNAMIC
|
||||
),
|
||||
new YamlPlugin("cpu0.yaml")
|
||||
)
|
||||
)
|
||||
)
|
||||
|
||||
SpinalVerilog(cpu())
|
||||
}
|
|
@ -31,20 +31,26 @@ object VexRiscvSynthesisBench {
|
|||
override def getRtlPath(): String = "VexRiscvFullNoMmuNoCache.v"
|
||||
SpinalVerilog(GenFullNoMmuNoCache.cpu().setDefinitionName(getRtlPath().split("\\.").head))
|
||||
}
|
||||
|
||||
val fullNoMmu = new Rtl {
|
||||
override def getName(): String = "VexRiscv full no MMU"
|
||||
override def getRtlPath(): String = "VexRiscvFullNoMmu.v"
|
||||
SpinalVerilog(GenFullNoMmu.cpu().setDefinitionName(getRtlPath().split("\\.").head))
|
||||
}
|
||||
|
||||
val fullNoMmuMaxPerf= new Rtl {
|
||||
override def getName(): String = "VexRiscv full no MMU max perf"
|
||||
override def getRtlPath(): String = "VexRiscvFullNoMmuMaxPerf.v"
|
||||
SpinalVerilog(GenFullNoMmuMaxPerf.cpu().setDefinitionName(getRtlPath().split("\\.").head))
|
||||
}
|
||||
|
||||
val full = new Rtl {
|
||||
override def getName(): String = "VexRiscv full"
|
||||
override def getRtlPath(): String = "VexRiscvFull.v"
|
||||
SpinalVerilog(GenFull.cpu().setDefinitionName(getRtlPath().split("\\.").head))
|
||||
}
|
||||
|
||||
val rtls = List(smallestNoCsr, smallest, smallAndProductive, fullNoMmuNoCache, fullNoMmu, full)
|
||||
// val rtls = List(smallestNoCsr, smallest, smallAndProductive, fullNoMmuNoCache, fullNoMmuMaxPerf, fullNoMmu, full)
|
||||
val rtls = List(fullNoMmuMaxPerf)
|
||||
|
||||
val targets = XilinxStdTargets(
|
||||
vivadoArtix7Path = "/eda/Xilinx/Vivado/2017.2/bin"
|
||||
|
|
|
@ -6,7 +6,7 @@ import spinal.lib.Reverse
|
|||
|
||||
|
||||
|
||||
class FullBarrielShifterPlugin extends Plugin[VexRiscv]{
|
||||
class FullBarrielShifterPlugin(earlyInjection : Boolean = false) extends Plugin[VexRiscv]{
|
||||
object ShiftCtrlEnum extends SpinalEnum(binarySequential){
|
||||
val DISABLE, SLL, SRL, SRA = newElement()
|
||||
}
|
||||
|
@ -24,7 +24,7 @@ class FullBarrielShifterPlugin extends Plugin[VexRiscv]{
|
|||
SRC1_CTRL -> Src1CtrlEnum.RS,
|
||||
SRC2_CTRL -> Src2CtrlEnum.IMI,
|
||||
REGFILE_WRITE_VALID -> True,
|
||||
BYPASSABLE_EXECUTE_STAGE -> False,
|
||||
BYPASSABLE_EXECUTE_STAGE -> Bool(earlyInjection),
|
||||
BYPASSABLE_MEMORY_STAGE -> True,
|
||||
RS1_USE -> True
|
||||
)
|
||||
|
@ -33,7 +33,7 @@ class FullBarrielShifterPlugin extends Plugin[VexRiscv]{
|
|||
SRC1_CTRL -> Src1CtrlEnum.RS,
|
||||
SRC2_CTRL -> Src2CtrlEnum.RS,
|
||||
REGFILE_WRITE_VALID -> True,
|
||||
BYPASSABLE_EXECUTE_STAGE -> False,
|
||||
BYPASSABLE_EXECUTE_STAGE -> Bool(earlyInjection),
|
||||
BYPASSABLE_MEMORY_STAGE -> True,
|
||||
RS1_USE -> True,
|
||||
RS2_USE -> True
|
||||
|
@ -66,8 +66,9 @@ class FullBarrielShifterPlugin extends Plugin[VexRiscv]{
|
|||
insert(SHIFT_RIGHT) := (Cat(input(SHIFT_CTRL) === ShiftCtrlEnum.SRA & reversed.msb, reversed).asSInt >> amplitude)(31 downto 0).asBits
|
||||
}
|
||||
|
||||
memory plug new Area{
|
||||
import memory._
|
||||
val injectionStage = if(earlyInjection) execute else memory
|
||||
injectionStage plug new Area{
|
||||
import injectionStage._
|
||||
switch(input(SHIFT_CTRL)){
|
||||
is(ShiftCtrlEnum.SLL){
|
||||
output(REGFILE_WRITE_DATA) := Reverse(input(SHIFT_RIGHT))
|
||||
|
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
Loading…
Reference in New Issue