Update DMIPS/Mhz

Add cached config with maximal performance settings
FullBarrielShifterPlugin can now be configured to do everything in the execute stage
This commit is contained in:
Dolu1990 2018-01-25 01:11:57 +01:00
parent b3564e1b7e
commit 26732942e5
11 changed files with 10234 additions and 17736 deletions

View File

@ -21,8 +21,8 @@ This repository host an RISC-V implementation written in SpinalHDL. There is som
- RV32IM instruction set
- Pipelined on 5 stages (Fetch, Decode, Execute, Memory, WriteBack)
- 1.16 DMIPS/Mhz when all features are enabled
- Optimized for FPGA
- 1.29 DMIPS/Mhz when all features are enabled
- Optimized for FPGA, fully portable
- AXI4 and Avalon ready
- Optional MUL/DIV extension
- Optional instruction and data caches
@ -45,46 +45,62 @@ The hardware description of this CPU is done by using an very software oriented
The following number where obtains by synthesis the CPU as toplevel without any specific synthesis option to save area or to get better maximal frequency (neutral).<br>
The clock constraint is set to a unattainable value, which tends to increase the design area.<br>
The dhrystone benchmark were compiled with -O3 -fno-inline<br>
All the cached configuration have some cache trashing during the dhrystone benchmark except the `VexRiscv full max perf` one. This of course reduce the performance. It is possible to produce dhrystone binaries which fit inside a 4KB I$ and 4KB D$ (I already had this case once) but currently it isn't the case.<br>
The used CPU corresponding configuration can be find in src/scala/vexriscv/demo.
```
VexRiscv smallest (RV32I, 0.47 DMIPS/Mhz, no datapath bypass, no interrupt) ->
VexRiscv smallest (RV32I, 0.51 DMIPS/Mhz, no datapath bypass, no interrupt) ->
Artix 7 -> 346 Mhz 481 LUT 539 FF
Cyclone V -> 201 Mhz 347 ALMs
Cyclone IV -> 190 Mhz 673 LUT 529 FF
Cyclone II -> 154 Mhz 673 LUT 528 FF
VexRiscv smallest (RV32I, 0.47 DMIPS/Mhz, no datapath bypass) ->
VexRiscv smallest (RV32I, 0.51 DMIPS/Mhz, no datapath bypass) ->
Artix 7 -> 340 Mhz 562 LUT 589 FF
Cyclone V -> 202 Mhz 387 ALMs
Cyclone IV -> 180 Mhz 780 LUT 579 FF
Cyclone II -> 149 Mhz 780 LUT 578 FF
VexRiscv small and productive (RV32I, 0.78 DMIPS/Mhz) ->
VexRiscv small and productive (RV32I, 0.82 DMIPS/Mhz) ->
Artix 7 -> 309 Mhz 703 LUT 557 FF
Cyclone V -> 152 Mhz 502 ALMs
Cyclone IV -> 147 Mhz 1,062 LUT 552 FF
Cyclone II -> 120 Mhz 1,072 LUT 551 FF
VexRiscv full no cache (RV32IM, 1.14 DMIPS/Mhz, single cycle barrel shifter, debug module, catch exceptions, static branch) ->
Artix 7 -> 310 Mhz 1391 LUT 934 FF
VexRiscv full no cache (RV32IM, 1.20 DMIPS/Mhz, single cycle barrel shifter, debug module, catch exceptions, static branch) ->
Artix 7 -> 310 Mhz 1391 LUT 934 FF
Cyclone V -> 143 Mhz 935 ALMs
Cyclone IV -> 123 Mhz 1,916 LUT 960 FF
Cyclone II -> 108 Mhz 1,939 LUT 959 FF
VexRiscv full (RV32IM, 1.14 DMIPS/Mhz, I$, D$, single cycle barrel shifter, debug module, catch exceptions, static branch) ->
VexRiscv full (RV32IM, 1.13 DMIPS/Mhz with cache trashing, 4KB-I$,4KB-D$, single cycle barrel shifter, debug module, catch exceptions, static branch) ->
Artix 7 -> 250 Mhz 1911 LUT 1501 FF
Cyclone V -> 132 Mhz 1,266 ALMs
Cyclone IV -> 127 Mhz 2,733 LUT 1,762 FF
Cyclone II -> 103 Mhz 2,791 LUT 1,760 FF
VexRiscv full with MMU (RV32IM, 1.16 DMIPS/Mhz, I$, D$, single cycle barrel shifter, debug module, catch exceptions, dynamic branch, MMU) ->
VexRiscv full max perf -> (RV32IM, 1.29 DMIPS/Mhz, 16KB-I$,16KB-D$, single cycle barrel shifter, debug module, catch exceptions, dynamic branch, branch and shift operations done in the Execute stage) ->
Artix 7 -> 216 Mhz 1978 LUT 1442 FF
Cyclone V -> 105 Mhz 1,222 ALMs
Cyclone IV -> 94 Mhz 2,735 LUT 1,702 FF
VexRiscv full with MMU (RV32IM, 1.17 DMIPS/Mhz with cache trashing, 4KB-I$, 4KB-D$, single cycle barrel shifter, debug module, catch exceptions, dynamic branch, MMU) ->
Artix 7 -> 223 Mhz 2085 LUT 2020 FF
Cyclone V -> 110 Mhz 1,503 ALMs
Cyclone IV -> 108 Mhz 3,153 LUT 2,281 FF
Cyclone II -> 94 Mhz 3,187 LUT 2,281 FF
```
There is the a summary of the configuration which produce 1.29 DMIPS :
- 5 stage : F -> D -> E -> M -> WB
- single cycle ADD/SUB/Bitwise/Shift ALU
- branch/jump done in the E stage
- memory load values are bypassed in the WB stage (late result)
- 33 cycle division with bypassing in the M stage (late result)
- single cycle multiplication with bypassing in the WB stage (late result)
- dynamic branch prediction done in the D stage with an direct mapped 2 bit branch history cache
## Dependencies
On Ubuntu 14 :
@ -337,7 +353,7 @@ sudo mv /opt/riscv64-unknown-elf-gcc-20170612-x86_64-linux-centos6 /opt/riscv
echo 'export PATH=/opt/riscv/bin:$PATH' >> ~/.bashrc
```
But if you want to compile from sources in /opt/ the rv32i and rv32im gcc, do the following (will take hours):
But if you want to compile from sources in /opt/ the rv32i and rv32im gcc, do the following (will take one hour):
```sh
# Be carefull, sometime the git clone has issue to successfully clone riscv-gnu-toolchain.

View File

@ -41,7 +41,7 @@ object TestsWorkspace {
// ),
new IBusCachedPlugin(
config = InstructionCacheConfig(
cacheSize = 4096,
cacheSize = 4096*4,
bytePerLine =32,
wayCount = 1,
wrappedMemAccess = true,
@ -66,7 +66,7 @@ object TestsWorkspace {
// ),
new DBusCachedPlugin(
config = new DataCacheConfig(
cacheSize = 4096,
cacheSize = 4096*4,
bytePerLine = 32,
wayCount = 1,
addressWidth = 32,
@ -83,14 +83,14 @@ object TestsWorkspace {
portTlbSize = 6
)
),
// new StaticMemoryTranslatorPlugin(
// ioRange = _(31 downto 28) === 0xF
// ),
new MemoryTranslatorPlugin(
tlbSize = 32,
virtualRange = _(31 downto 28) === 0xC,
new StaticMemoryTranslatorPlugin(
ioRange = _(31 downto 28) === 0xF
),
// new MemoryTranslatorPlugin(
// tlbSize = 32,
// virtualRange = _(31 downto 28) === 0xC,
// ioRange = _(31 downto 28) === 0xF
// ),
new DecoderSimplePlugin(
catchIllegalInstruction = true
),
@ -102,7 +102,7 @@ object TestsWorkspace {
new SrcPlugin(
separatedAddSub = false
),
new FullBarrielShifterPlugin,
new FullBarrielShifterPlugin(earlyInjection = true),
// new LightShifterPlugin,
new HazardSimplePlugin(
bypassExecute = true,
@ -120,7 +120,7 @@ object TestsWorkspace {
new CsrPlugin(CsrPluginConfig.all),
new DebugPlugin(ClockDomain.current.clone(reset = Bool().setName("debugReset"))),
new BranchPlugin(
earlyBranch = false,
earlyBranch = true,
catchAddressMisaligned = true,
prediction = DYNAMIC
),

View File

@ -0,0 +1,72 @@
package vexriscv.demo
import scala.sys.process._
import java.io.File
object DhrystoneBench extends App{
def doCmd(cmd : String) : String = {
val stdOut = new StringBuilder()
class Logger extends ProcessLogger {override def err(s: => String): Unit = {if(!s.startsWith("ar: creating ")) println(s)}
override def out(s: => String): Unit = {stdOut ++= s}
override def buffer[T](f: => T) = f
}
Process(cmd, new File("src/test/cpp/regression")).!(new Logger)
stdOut.toString()
}
val report = new StringBuilder()
def getDmips(name : String, gen : => Unit, test : String): Unit ={
gen
val str = doCmd(test)
val intFind = "(\\d+\\.?)+".r
val dmips = intFind.findFirstIn("DMIPS per Mhz\\: (\\d+.?)+".r.findAllIn(str).toList.last).get.toDouble
report ++= name + " -> " + dmips + "\n"
}
getDmips(
name = "GenSmallestNoCsr",
gen = GenSmallestNoCsr.main(null),
test = "make clean run REDO=0 IBUS=SIMPLE DBUS=SIMPLE CSR=no MMU=no DEBUG_PLUGIN=no MUL=no DIV=no"
)
getDmips(
name = "GenSmallest",
gen = GenSmallest.main(null),
test = "make clean run REDO=0 IBUS=SIMPLE DBUS=SIMPLE MMU=no DEBUG_PLUGIN=no MUL=no DIV=no"
)
getDmips(
name = "GenSmallAndProductive",
gen = GenSmallAndProductive.main(null),
test = "make clean run REDO=0 IBUS=SIMPLE DBUS=SIMPLE MMU=no DEBUG_PLUGIN=no MUL=no DIV=no"
)
getDmips(
name = "GenFullNoMmuNoCache",
gen = GenFullNoMmuNoCache.main(null),
test = "make clean run REDO=0 IBUS=SIMPLE DBUS=SIMPLE MMU=no"
)
getDmips(
name = "GenFullNoMmu",
gen = GenFullNoMmu.main(null),
test = "make clean run REDO=0 MMU=no "
)
getDmips(
name = "GenFullNoMmuMaxPerf",
gen = GenFullNoMmuMaxPerf.main(null),
test = "make clean run REDO=0 MMU=no"
)
getDmips(
name = "GenFull",
gen = GenFull.main(null),
test = "make clean run REDO=0"
)
println(report)
}

View File

@ -0,0 +1,89 @@
package vexriscv.demo
import spinal.core._
import vexriscv.ip.{DataCacheConfig, InstructionCacheConfig}
import vexriscv.plugin._
import vexriscv.{VexRiscv, VexRiscvConfig, plugin}
/**
* Created by spinalvm on 15.06.17.
*/
object GenFullNoMmuMaxPerf extends App{
def cpu() = new VexRiscv(
config = VexRiscvConfig(
plugins = List(
new PcManagerSimplePlugin(
resetVector = 0x00000000l,
relaxedPcCalculation = false
),
new IBusCachedPlugin(
config = InstructionCacheConfig(
cacheSize = 4096*4,
bytePerLine =32,
wayCount = 1,
wrappedMemAccess = true,
addressWidth = 32,
cpuDataWidth = 32,
memDataWidth = 32,
catchIllegalAccess = true,
catchAccessFault = true,
catchMemoryTranslationMiss = false,
asyncTagMemory = false,
twoStageLogic = true
)
),
new DBusCachedPlugin(
config = new DataCacheConfig(
cacheSize = 4096*4,
bytePerLine = 32,
wayCount = 1,
addressWidth = 32,
cpuDataWidth = 32,
memDataWidth = 32,
catchAccessError = true,
catchIllegal = true,
catchUnaligned = true,
catchMemoryTranslationMiss = false
)
),
new StaticMemoryTranslatorPlugin(
ioRange = _(31 downto 28) === 0xF
),
new DecoderSimplePlugin(
catchIllegalInstruction = true
),
new RegFilePlugin(
regFileReadyKind = plugin.SYNC,
zeroBoot = false
),
new IntAluPlugin,
new SrcPlugin(
separatedAddSub = false,
executeInsertion = true
),
new FullBarrielShifterPlugin(earlyInjection = true),
new HazardSimplePlugin(
bypassExecute = true,
bypassMemory = true,
bypassWriteBack = true,
bypassWriteBackBuffer = true,
pessimisticUseSrc = false,
pessimisticWriteRegFile = false,
pessimisticAddressMatch = false
),
new MulPlugin,
new DivPlugin,
new CsrPlugin(CsrPluginConfig.small),
new DebugPlugin(ClockDomain.current.clone(reset = Bool().setName("debugReset"))),
new BranchPlugin(
earlyBranch = true,
catchAddressMisaligned = true,
prediction = DYNAMIC
),
new YamlPlugin("cpu0.yaml")
)
)
)
SpinalVerilog(cpu())
}

View File

@ -31,20 +31,26 @@ object VexRiscvSynthesisBench {
override def getRtlPath(): String = "VexRiscvFullNoMmuNoCache.v"
SpinalVerilog(GenFullNoMmuNoCache.cpu().setDefinitionName(getRtlPath().split("\\.").head))
}
val fullNoMmu = new Rtl {
override def getName(): String = "VexRiscv full no MMU"
override def getRtlPath(): String = "VexRiscvFullNoMmu.v"
SpinalVerilog(GenFullNoMmu.cpu().setDefinitionName(getRtlPath().split("\\.").head))
}
val fullNoMmuMaxPerf= new Rtl {
override def getName(): String = "VexRiscv full no MMU max perf"
override def getRtlPath(): String = "VexRiscvFullNoMmuMaxPerf.v"
SpinalVerilog(GenFullNoMmuMaxPerf.cpu().setDefinitionName(getRtlPath().split("\\.").head))
}
val full = new Rtl {
override def getName(): String = "VexRiscv full"
override def getRtlPath(): String = "VexRiscvFull.v"
SpinalVerilog(GenFull.cpu().setDefinitionName(getRtlPath().split("\\.").head))
}
val rtls = List(smallestNoCsr, smallest, smallAndProductive, fullNoMmuNoCache, fullNoMmu, full)
// val rtls = List(smallestNoCsr, smallest, smallAndProductive, fullNoMmuNoCache, fullNoMmuMaxPerf, fullNoMmu, full)
val rtls = List(fullNoMmuMaxPerf)
val targets = XilinxStdTargets(
vivadoArtix7Path = "/eda/Xilinx/Vivado/2017.2/bin"

View File

@ -6,7 +6,7 @@ import spinal.lib.Reverse
class FullBarrielShifterPlugin extends Plugin[VexRiscv]{
class FullBarrielShifterPlugin(earlyInjection : Boolean = false) extends Plugin[VexRiscv]{
object ShiftCtrlEnum extends SpinalEnum(binarySequential){
val DISABLE, SLL, SRL, SRA = newElement()
}
@ -24,7 +24,7 @@ class FullBarrielShifterPlugin extends Plugin[VexRiscv]{
SRC1_CTRL -> Src1CtrlEnum.RS,
SRC2_CTRL -> Src2CtrlEnum.IMI,
REGFILE_WRITE_VALID -> True,
BYPASSABLE_EXECUTE_STAGE -> False,
BYPASSABLE_EXECUTE_STAGE -> Bool(earlyInjection),
BYPASSABLE_MEMORY_STAGE -> True,
RS1_USE -> True
)
@ -33,7 +33,7 @@ class FullBarrielShifterPlugin extends Plugin[VexRiscv]{
SRC1_CTRL -> Src1CtrlEnum.RS,
SRC2_CTRL -> Src2CtrlEnum.RS,
REGFILE_WRITE_VALID -> True,
BYPASSABLE_EXECUTE_STAGE -> False,
BYPASSABLE_EXECUTE_STAGE -> Bool(earlyInjection),
BYPASSABLE_MEMORY_STAGE -> True,
RS1_USE -> True,
RS2_USE -> True
@ -66,8 +66,9 @@ class FullBarrielShifterPlugin extends Plugin[VexRiscv]{
insert(SHIFT_RIGHT) := (Cat(input(SHIFT_CTRL) === ShiftCtrlEnum.SRA & reversed.msb, reversed).asSInt >> amplitude)(31 downto 0).asBits
}
memory plug new Area{
import memory._
val injectionStage = if(earlyInjection) execute else memory
injectionStage plug new Area{
import injectionStage._
switch(input(SHIFT_CTRL)){
is(ShiftCtrlEnum.SLL){
output(REGFILE_WRITE_DATA) := Reverse(input(SHIFT_RIGHT))

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff