482 lines
22 KiB
Plaintext
482 lines
22 KiB
Plaintext
Migen (Milkymist Generator)
|
|
a Python toolbox for building complex digital hardware
|
|
======================================================
|
|
|
|
Background
|
|
==========
|
|
Even though the Milkymist system-on-chip [1] is technically successful,
|
|
it suffers from several limitations stemming from its implementation in
|
|
manually written Verilog HDL:
|
|
|
|
(1) The "event-driven" paradigm of today's dominant hardware descriptions
|
|
languages (Verilog and VHDL, collectively referred to as "V*HDL" in the
|
|
rest of this document) is often too general. Today's FPGA architectures
|
|
are optimized for the implementation of fully synchronous circuits. This
|
|
means that the bulk of the code for an efficient FPGA design falls into
|
|
three categories:
|
|
(a) Combinatorial statements
|
|
(b) Synchronous statements
|
|
(c) Initialization of registers at reset
|
|
V*HDL do not follow this organization. This means that a lot of
|
|
repetitive manual coding is needed, which brings sources of human errors,
|
|
petty issues, and confusion for beginners:
|
|
- wire vs. reg in Verilog
|
|
- forgetting to initialize a register at reset
|
|
- deciding whether a combinatorial statement must go into a
|
|
process/always block or not
|
|
- simulation mismatches with combinatorial processes/always blocks
|
|
- and more...
|
|
A little-known fact about FPGAs is that many of them have to ability to
|
|
initialize their registers from the bitstream contents. This can be done
|
|
in a portable and standard way using an "initial" block in Verilog, and
|
|
by affecting a value at the signal declaration in VHDL. This renders an
|
|
explicit reset signal unnecessary in practice in some cases, which opens
|
|
the way for further design optimization. However, this form of
|
|
initialization is entirely not synthesizable for ASIC targets, and it is
|
|
not easy to switch between the two forms of reset using V*HDL.
|
|
|
|
(2) V*HDL support for composite types is very limited. Signals having a
|
|
record type in VHDL are unidirectional, which makes them clumsy to use
|
|
e.g. in bus interfaces. There is no record type support in Verilog, which
|
|
means that a lot of copy-and-paste has to be done when forwarding grouped
|
|
signals.
|
|
|
|
(3) V*HDL support for procedurally generated logic is extremely limited.
|
|
The most advanced forms of procedural generation of synthesizable logic
|
|
that V*HDL offers are CPP-style directives in Verilog, combinatorial
|
|
functions, and generate statements. Nothing really fancy, and it shows.
|
|
To give a few examples:
|
|
- Building highly flexible bus interconnect is not possible. Even
|
|
arbitrating any given number of bus masters for commonplace protocols
|
|
such as Wishbone cannot be done with the tools at V*HDL puts at our
|
|
disposal. This requires manual recoding of parts of the arbiter to add or
|
|
remove a master, which is tedious and often cause frustrating errors.
|
|
Each occurence of the latter can easily cause one or two hours of lost
|
|
productivity when combined with the long compilation times of moderately
|
|
complex system-on-chip designs.
|
|
- Building a memory infrastructure (including bus interconnect, bridges
|
|
and caches) that can automatically adapt itself at compile-time to any
|
|
word size of the SDRAM is clumsy and tedious.
|
|
- Building register banks for control, status and interrupt management
|
|
of cores can also largely benefit from automation.
|
|
- Many hardware acceleration problems can fit into the dataflow
|
|
programming model. Manual dataflow implementation in V*HDL has, again, a
|
|
lot of redundancy and potential for human errors. See the Milkymist
|
|
texture mapping unit [3][4] for an example of this. The amount of detail
|
|
to deal with manually also makes the design space exploration difficult,
|
|
and therefore hinders the design of efficient architectures.
|
|
- Pre-computation of values, such as filter coefficients for DSP or
|
|
even simply trigonometric tables, must often be done using external tools
|
|
whose results are copy-and-pasted (in the best cases, automatically) into
|
|
the V*HDL source.
|
|
|
|
Enter Migen, a Python toolbox for building complex digital hardware. We
|
|
could have designed a brand new programming language, but that would have
|
|
been reinventing the wheel instead of being able to benefit from Python's
|
|
rich features and immense library. The price to pay is a slightly
|
|
cluttered syntax at times when writing descriptions in FHDL, but we
|
|
believe this is totally acceptable, particularly when compared to VHDL
|
|
;-)
|
|
|
|
Migen is made up of several related components, which are briefly
|
|
described below.
|
|
|
|
Migen FHDL
|
|
==========
|
|
The Fragmented Hardware Description Language (FHDL) is the lowest layer
|
|
of Migen. It consists of a formal system to describe signals, and
|
|
combinatorial and synchronous statements operating on them. The formal
|
|
system itself is low level and close to the synthesizable subset of
|
|
Verilog, and we then rely on Python algorithms to build complex
|
|
structures by combining FHDL elements and encapsulating them in
|
|
"fragments".
|
|
The FHDL module also contains a back-end to produce synthesizable
|
|
Verilog, and some basic analysis functions. It would be possible to
|
|
develop a VHDL back-end as well, though more difficult than for Verilog -
|
|
we are "cheating" a bit now as Verilog provides most of the FHDL
|
|
semantics.
|
|
|
|
FHDL differs from MyHDL [2] in fundamental ways. MyHDL follows the
|
|
event-driven paradigm of traditional HDLs (see Background, #1) while FHDL
|
|
separates the code into combinatorial statements, synchronous statements,
|
|
and reset values. In MyHDL, the logic is described directly in the Python
|
|
AST. The converter to Verilog or VHDL then examines the Python AST and
|
|
recognizes a subset of Python that it translates into V*HDL statements.
|
|
This seriously impedes the capability of MyHDL to generate logic
|
|
procedurally. With FHDL, you manipulate a custom AST from Python, and you
|
|
can more easily design algorithms that operate on it.
|
|
|
|
FHDL is made of several elements, which are briefly explained below.
|
|
|
|
BV
|
|
--
|
|
The bit vector (BV) object defines if a constant or signal is signed or
|
|
unsigned, and how many bits it has. This is useful e.g. to:
|
|
- determine when to perform sign extension (FHDL uses the same rules as
|
|
Verilog).
|
|
- determine the size of registers.
|
|
- determine how many bits should be used by each value in
|
|
concatenations.
|
|
|
|
Constant
|
|
--------
|
|
This object should be self-explanatory. All constant objects contain a BV
|
|
object and a value. If no BV object is specified, one will be made up
|
|
using the following rules:
|
|
- If the value is positive, the BV is unsigned and has the minimum
|
|
number of bits needed to represent the constant's value in the canonical
|
|
base-2 system.
|
|
- If the value is negative, the BV is signed, and has the minimum
|
|
number of bits needed to represent the constant's value in the canonical
|
|
two's complement, base-2 system.
|
|
|
|
Signal
|
|
------
|
|
The signal object represents a value that is expected to change in the
|
|
circuit. It does exactly what Verilog's "wire" and "reg" and VHDL's
|
|
"signal" and "variable" do.
|
|
|
|
The main point of the signal object is that it is identified by its
|
|
Python ID (as returned by the id() function), and nothing else. It is the
|
|
responsibility of the V*HDL back-end to establish an injective mapping
|
|
between Python IDs and the V*HDL namespace. It should perform name
|
|
mangling to ensure this. The consequence of this is that signal objects
|
|
can safely become members of arbitrary Python classes, or be passed as
|
|
parameters to functions or methods that generate logic involving them.
|
|
|
|
The properties of a signal object are:
|
|
- a bit vector description
|
|
- a name, used as a hint for the V*HDL back-end name mangler.
|
|
- a boolean "variable". If true, the signal will behave like a VHDL
|
|
variable, or a Verilog reg that uses blocking assignment. This parameter
|
|
only has an effect when the signal's value is modified in a synchronous
|
|
statement.
|
|
- the signal's reset value. It must be an integer, and defaults to 0.
|
|
When the signal's value is modified with a synchronous statement, the
|
|
reset value is the initialization value of the associated register.
|
|
When the signal is assigned to in a conditional combinatorial statement
|
|
(If or Case), the reset value is the value that the signal has when no
|
|
condition that causes the signal to be driven is verified. This enforces
|
|
the absence of latches in designs. If the signal is permanently driven
|
|
using a combinatorial statement, the reset value has no effect.
|
|
|
|
The sole purpose of the name property is to make the generated V*HDL code
|
|
easier to understand and debug. From a purely functional point of view,
|
|
it is perfectly OK to have several signals with the same name property.
|
|
The back-end will generate a unique name for each object. If no name
|
|
property is specified, Migen will analyze the code that created the
|
|
signal object, and try to extract the variable or member name from there.
|
|
It then uses the module name that created the signal, a underscore, and
|
|
the variable name. For example, if we are in module "foo", the following
|
|
statements will create one or several signal(s) named "foo_bar":
|
|
bar = Signal()
|
|
self.bar = Signal()
|
|
self.baz.bar = Signal()
|
|
bar = [Signal() for x in range(42)]
|
|
|
|
Operators
|
|
---------
|
|
Operators are represented by the _Operator object, which generally should
|
|
not be used directly. Instead, most FHDL objects overload the usual
|
|
Python logic and arithmetic operators, which allows a much lighter syntax
|
|
to be used. For example, the expression:
|
|
a * b + c
|
|
is equivalent to:
|
|
_Operator('+', [_Operator('*', [a, b]), c])
|
|
|
|
Slices
|
|
------
|
|
Likewise, slices are represented by the _Slice object, which often should
|
|
not be used in favor of the Python slice operation [x:y].
|
|
Implicit indices using the forms [x], [x:] and [:y] are supported.
|
|
Beware! Slices work like Python slices, not like VHDL or Verilog slices.
|
|
The first bound is the index of the LSB and is inclusive. The second
|
|
bound is the index of MSB and is exclusive. In V*HDL, bounds are MSB:LSB
|
|
and both are inclusive.
|
|
|
|
Concatenations
|
|
--------------
|
|
Concatenations are done using the Cat object. To make the syntax lighter,
|
|
its constructor takes a variable number of arguments, which are the
|
|
signals to be concatenated together (you can use the Python '*' operator
|
|
to pass a list instead).
|
|
To be consistent with slices, the first signal is connected to the bits
|
|
with the lowest indices in the result. This is the opposite of the way
|
|
the '{}' construct works in Verilog.
|
|
|
|
Replications
|
|
------------
|
|
The Replicate object represents the equivalent of {count{expression}} in
|
|
Verilog.
|
|
|
|
Assignments
|
|
-----------
|
|
Assignments are represented with the _Assign object. Since using it
|
|
directly would result in a cluttered syntax, the preferred technique for
|
|
assignments is to use the eq() method provided by objects that can have a
|
|
value assigned to them. They are signals, and their combinations with the
|
|
slice and concatenation operators.
|
|
As an example, the statement:
|
|
a[0].eq(b)
|
|
is equivalent to:
|
|
_Assign(_Slice(a, 0, 1), b)
|
|
|
|
If statement
|
|
------------
|
|
The If object takes a first parameter which must be an expression
|
|
(combination of the Constant, Signal, _Operator, _Slice, etc. objects)
|
|
representing the condition, then a variable number of parameters
|
|
representing the statements (_Assign, If, Case, etc. objects) to be
|
|
executed when the condition is verified.
|
|
|
|
The If object defines a Else() method, which when called defines the
|
|
statements to be executed when the condition is not true. Those
|
|
statements are passed as parameters to the variadic method.
|
|
|
|
For convenience, there is also a Elif() method.
|
|
|
|
Example:
|
|
If(tx_count16 == 0,
|
|
tx_bitcount.eq(tx_bitcount + 1),
|
|
If(tx_bitcount == 8,
|
|
self.tx.eq(1)
|
|
).Elif(tx_bitcount == 9,
|
|
self.tx.eq(1),
|
|
tx_busy.eq(0)
|
|
).Else(
|
|
self.tx.eq(tx_reg[0]),
|
|
tx_reg.eq(Cat(tx_reg[1:], 0))
|
|
)
|
|
)
|
|
|
|
Case statement
|
|
--------------
|
|
The Case object constructor takes as first parameter the expression to be
|
|
tested, then a variable number of lists describing the various cases.
|
|
|
|
Each list contains an expression (typically a constant) describing the
|
|
value to be matched, followed by the statements to be executed when there
|
|
is a match. The head of the list can be the an instance of the Default
|
|
object.
|
|
|
|
Instances
|
|
---------
|
|
Instance objects represent the parametrized instantiation of a V*HDL
|
|
module, and the connection of its ports to FHDL signals. They are useful
|
|
in a number of cases:
|
|
- reusing legacy or third-party V*HDL code.
|
|
- using special FPGA features (DCM, ICAP, ...).
|
|
- implementing logic that cannot be expressed with FHDL (asynchronous
|
|
circuits, ...).
|
|
- breaking down a Migen system into multiple sub-systems, possibly
|
|
using different clock domains.
|
|
|
|
The properties of the instance object are:
|
|
- the type of the instance (i.e. name of the instantiated module).
|
|
- a list of output ports of the instantiated module. Each element of
|
|
the list is a pair containing a string, which is the name of the
|
|
module's port, and either an existing signal (on which the port will
|
|
be connected to) or a BV (which will cause the creation of a new
|
|
signal).
|
|
- a list of input ports (likewise).
|
|
- a list of (name, value) pairs for the parameters ("generics" in VHDL)
|
|
of the module.
|
|
- the name of the clock port of the module (if any). If this is
|
|
specified, the port will be connected to the system clock.
|
|
- the name of the reset port of the module (likewise).
|
|
- the name of the instance (can be mangled like signal names).
|
|
|
|
Memories
|
|
--------
|
|
Memories (on-chip SRAM) are supported using a mechanism similar to
|
|
instances.
|
|
|
|
A memory object has the following parameters:
|
|
- the width, which is the number of bits in each word.
|
|
- the depth, which represents the number of words in the memory.
|
|
- an optional list of integers used to initialize the memory.
|
|
- a list of port descriptions.
|
|
|
|
Each port description contains:
|
|
- the address signal (mandatory).
|
|
- the data read signal (mandatory).
|
|
- the write enable signal (optional). If the port is using masked
|
|
writes, the width of the write enable signal should match the number
|
|
of sub-words.
|
|
- the data write signal (iff there is a write enable signal).
|
|
- whether reads are synchronous (default) or asynchronous.
|
|
- the read enable port (optional, ignored for asynchronous ports).
|
|
- the write granularity (default 0), which defines the number of bits
|
|
in each sub-word. If it is set to 0, the port is using whole-word
|
|
writes only and the width of the write enable signal must be 1. This
|
|
parameter is ignored if there is no write enable signal.
|
|
- the mode of the port (default WRITE_FIRST, ignored for asynchronous
|
|
ports). It can be:
|
|
* READ_FIRST: during a write, the previous value is read.
|
|
* WRITE_FIRST: the written value is returned.
|
|
* NO_CHANGE: the data read signal keeps its previous value on a
|
|
write.
|
|
|
|
Migen generates behavioural V*HDL code that should be compatible with all
|
|
simulators and, if the number of ports is <= 2, most FPGA synthesizers.
|
|
If a specific code is needed, the memory generator function can be
|
|
overriden using the memory_handler parameter of the conversion function.
|
|
|
|
Fragments
|
|
---------
|
|
A "fragment" is a unit of logic, which is composed of:
|
|
- a list of combinatorial statements.
|
|
- a list of synchronous statements.
|
|
- a list of instances.
|
|
- a list of memories.
|
|
- a set of pads, which are signals intended to be connected to
|
|
off-chip devices.
|
|
|
|
Fragments can reference arbitrary signals, including signals that are
|
|
referenced in other fragments. Fragments can be combined using the "+"
|
|
operator, which returns a new fragment containing the concatenation of
|
|
each pair of lists.
|
|
|
|
Fragments can be passed to the back-end for conversion to Verilog.
|
|
|
|
By convention, classes that generate logic implement a method called
|
|
"get_fragment". When called, this method builds a new fragment
|
|
implementing the desired functionality of the class, and returns it. This
|
|
convention allows fragments to be built automatically by combining the
|
|
fragments from all relevant objects in the local scope, by using the
|
|
autofragment module.
|
|
|
|
Migen Core Logic
|
|
================
|
|
Migen Core Logic is a convenience library of common logic circuits
|
|
implemented using FHDL:
|
|
- a multi-cycle integer divider.
|
|
- a round-robin arbiter, useful to build bus arbiters.
|
|
- a multiplexer bank (multimux), useful to multiplex composite
|
|
(grouped) signals.
|
|
- a condition-triggered static scheduler of FHDL synchronous statements
|
|
(timeline).
|
|
|
|
Migen Bus
|
|
=========
|
|
Migen Bus contains classes providing a common structure for master and
|
|
slave interfaces of the following buses:
|
|
- Wishbone [5], the general purpose bus recommended by Opencores.
|
|
- CSR-2, a low-bandwidth, resource-sensitive bus designed for
|
|
accessing the configuration and status registers of cores from
|
|
software.
|
|
- ASMIbus, a split-transaction bus optimized for use with a
|
|
high-performance, out-of-order SDRAM controller.
|
|
- DFI [12] (partial), a standard interface protocol between memory
|
|
controller logic and PHY interfaces.
|
|
|
|
It also provides interconnect components for these buses, such as
|
|
arbiters and address decoders. The strength of the Migen procedurally
|
|
generated logic can be illustrated by the following example:
|
|
wbcon = wishbone.InterconnectShared(
|
|
[cpu.ibus, cpu.dbus, ethernet.dma, audio.dma],
|
|
[(0, norflash.bus), (1, wishbone2asmi.wishbone),
|
|
(3, wishbone2csr.wishbone)])
|
|
In this example, the interconnect component generates a 4-way round-robin
|
|
arbiter, multiplexes the master bus signals into a shared bus, determines
|
|
that the address decoding must occur on 2 bits, and connects all slave
|
|
interfaces to the shared bus, inserting the address decoder logic in the
|
|
bus cycle qualification signals and multiplexing the data return path. It
|
|
can recognize the signals in each core's bus interface thanks to the
|
|
common structure mandated by Migen Bus. All this happens automatically,
|
|
using only that much user code. The resulting interconnect logic can be
|
|
retrieved using wbcon.get_fragment(), and combined with the fragments
|
|
from the rest of the system.
|
|
|
|
Migen Bank
|
|
==========
|
|
Migen Bank is a system comparable to wishbone-gen [6], which automates
|
|
the creation of configuration and status register banks and
|
|
interrupt/event managers implemented in cores.
|
|
|
|
Bank takes a description made up of a list of registers and generates
|
|
logic implementing it with a slave interface compatible with Migen Bus.
|
|
|
|
A register can be "raw", which means that the core has direct access to
|
|
it. It also means that the register width must be less or equal to the
|
|
bus word width. In that case, the register object provides the following
|
|
signals:
|
|
- r, which contains the data written from the bus interface.
|
|
- re, which is the strobe signal for r. It is active for one cycle,
|
|
after or during a write from the bus. r is only valid when re is
|
|
high.
|
|
- w, which must provide at all times the value to be read from the
|
|
bus.
|
|
|
|
Registers that are not raw are managed by Bank and contain fields. If the
|
|
sum of the widths of all fields attached to a register exceeds the bus
|
|
word width, the register will automatically be sliced into words of the
|
|
maximum size and implemented at consecutive bus addresses, MSB first.
|
|
Field objects have two parameters, access_bus and access_dev, determining
|
|
respectively the access policies for the bus and core sides. They can
|
|
take the values READ_ONLY, WRITE_ONLY and READ_WRITE.
|
|
If the device can read, the field object provides the r signal, which
|
|
contains at all times the current value of the field (kept by the logic
|
|
generated by Bank).
|
|
If the device can write, the field object provides the following signals:
|
|
- w, which provides the value to be written into the field.
|
|
- we, which strobes the value into the field.
|
|
|
|
Migen Flow (Work in progress)
|
|
==========
|
|
Many hardware acceleration problems can be expressed in the dataflow
|
|
paradigm, that is, using a directed graph representing the flow of data
|
|
between actors.
|
|
|
|
Actors in Migen are written directly in FHDL. This maximizes the
|
|
flexibility: for example, an actor can implement a DMA master to read
|
|
data from system memory. It is conceivable that a CAL [7] to FHDL
|
|
compiler be implemented at some point, to support higher level
|
|
descriptions of some actors and reuse of third-party RVC-CAL
|
|
applications. [8] [9] [10]
|
|
|
|
Actors communicate by exchanging tokens, whose flow is typically
|
|
controlled using handshake signals (strobe/ack).
|
|
|
|
Each actor has a "scheduling model". It can be:
|
|
- N-sequential: the actor fires when tokens are available at all its
|
|
inputs, and it produces one output token after N cycles. It cannot
|
|
accept new input tokens until it has produced its output. A
|
|
multicycle integer divider would use this model.
|
|
- N-pipelined: similar to the sequential model, but the actor can
|
|
always accept new input tokens. It produces an output token N cycles
|
|
of latency after accepting input tokens. A pipelined multiplier would
|
|
use this model.
|
|
- Dynamic: the general case, when no simple hypothesis can be made on
|
|
the token flow behaviour of the actor. An actor accessing system
|
|
memory on a shared bus would use this model.
|
|
|
|
Migen Flow automatically generates handshake logic for the first two
|
|
scheduling models. In the third case, the FHDL descriptions for the logic
|
|
driving the handshake signals must be provided by the actor.
|
|
|
|
If sequential or pipelined actors are connected together, Migen Flow will
|
|
attempt to find a static schedule, remove the handshake signals, optimize
|
|
away the control logic in each actor and replace it with a centralized
|
|
FSM implementing the static schedule.
|
|
|
|
An actor can be a composition of other actors.
|
|
|
|
Actor graphs are managed using the NetworkX [11] library.
|
|
|
|
|
|
References:
|
|
[ 1] http://milkymist.org
|
|
[ 2] http://www.myhdl.org
|
|
[ 3] http://milkymist.org/thesis/thesis.pdf
|
|
[ 4] http://www.xilinx.com/publications/archives/xcell/Xcell77.pdf p30-35
|
|
[ 5] http://cdn.opencores.org/downloads/wbspec_b4.pdf
|
|
[ 6] http://www.ohwr.org/projects/wishbone-gen
|
|
[ 7] http://opendf.svn.sourceforge.net/viewvc/opendf/trunk/doc/
|
|
GentleIntro/GentleIntro.pdf
|
|
[ 8] http://orcc.sourceforge.net/
|
|
[ 9] http://orc-apps.sourceforge.net/
|
|
[10] http://opendf.sourceforge.net/
|
|
[11] http://networkx.lanl.gov/
|
|
[12] http://www.ddr-phy.org/
|