doc: ASMI description
This commit is contained in:
parent
21f3def74b
commit
bf2f6f31e3
|
@ -0,0 +1,189 @@
|
||||||
|
Advanced System Memory Infrastructure (ASMI)
|
||||||
|
==============================================
|
||||||
|
|
||||||
|
Rationale
|
||||||
|
=========
|
||||||
|
The lagging of the DRAM semiconductor processes behind the logic
|
||||||
|
processes has led the industry into a subtle way of ever increasing
|
||||||
|
memory performance.
|
||||||
|
|
||||||
|
Modern devices feature a DRAM core running at a fraction of the logic
|
||||||
|
frequency, whose wide data bus is serialized and deserialized to and
|
||||||
|
from the faster clock domain. Further, the presence of more banks
|
||||||
|
increases page hit rate and provides opportunities for parallel
|
||||||
|
execution of commands to different banks.
|
||||||
|
|
||||||
|
A first-generation SDR-133 SDRAM chip runs both DRAM, I/O and logic at
|
||||||
|
133MHz and features 4 banks. A 16-bit chip has a 16-bit DRAM core.
|
||||||
|
|
||||||
|
A newer DDR3-1066 chip still runs the DRAM core at 133MHz, but the logic
|
||||||
|
at 533MHz (4 times the DRAM frequency) and the I/O at 1066Mt/s (8 times
|
||||||
|
the DRAM frequency). A 16-bit chip has a 128-bit internal DRAM core.
|
||||||
|
Such a device features 8 banks. Note that the serialization also
|
||||||
|
introduces multiplied delays (e.g. CAS latency) when measured in number
|
||||||
|
of cycles of the logic clock.
|
||||||
|
|
||||||
|
To take full advantage of these new architectures, the memory controller
|
||||||
|
should be able to peek ahead at the incoming requests and service
|
||||||
|
several of them in parallel, while respecting the various timing
|
||||||
|
specifications of each DRAM bank and avoiding conflicts for the shared
|
||||||
|
data lines. Going further in this direction, a controller able to
|
||||||
|
complete transfers out of order can provide more performance by:
|
||||||
|
(1) grouping together requests by DRAM row, in order to minimize time
|
||||||
|
spent on precharging and activating banks.
|
||||||
|
(2) grouping together requests by direction (read or write) in order to
|
||||||
|
minimize delays introduced by bus turnaround and write recovery
|
||||||
|
times.
|
||||||
|
(3) being able to complete a request that hits a page earlier than a
|
||||||
|
concurrent one which requires the cycling of another bank.
|
||||||
|
|
||||||
|
The first two techniques are explained with more details in [1].
|
||||||
|
|
||||||
|
To enable the efficient implementation of these mechanisms, a new
|
||||||
|
communication protocol with the memory controller must be devised. Migen
|
||||||
|
and Milkymist SoC (-NG) implement their own bus, called ASMIbus, based
|
||||||
|
on the split-transaction principle.
|
||||||
|
|
||||||
|
ASMIbus - the big picture
|
||||||
|
=========================
|
||||||
|
The ASMIbus consists of two parts: the control signals, and the data
|
||||||
|
signals.
|
||||||
|
|
||||||
|
The control signals are used to issue requests.
|
||||||
|
* Master->Slave:
|
||||||
|
- ADR communicates the memory address to be accessed. The unit is
|
||||||
|
the word width of the particular implementation of ASMIbus.
|
||||||
|
- WE is the write enable signal.
|
||||||
|
- STB qualifies the transaction request, and should be asserted
|
||||||
|
until ACK goes high.
|
||||||
|
* Slave->Master
|
||||||
|
- TAG_ISSUE is an integer representing the transaction ("tag")
|
||||||
|
attributed by the memory controller. The width of this signal is
|
||||||
|
determined by the maximum number of in-flight transactions that
|
||||||
|
the memory controller can handle.
|
||||||
|
- ACK is asserted at least one cycle after STB when TAG_ISSUE is
|
||||||
|
valid and the transaction has been accepted by the memory
|
||||||
|
controller.
|
||||||
|
|
||||||
|
The data signals are used to complete requests.
|
||||||
|
* Slave->Master
|
||||||
|
- TAG_CALL is used to identify the transaction for which the data
|
||||||
|
is "called". It takes the tag value that has been previously
|
||||||
|
attributed by the controller to that transaction during the issue
|
||||||
|
phase.
|
||||||
|
- CALL qualifies TAG_CALL.
|
||||||
|
- DATA_R returns data from the DRAM in the case of a read
|
||||||
|
transaction. It is valid for one cycle after CALL has been
|
||||||
|
asserted and TAG_CALL has identified the transaction.
|
||||||
|
The value of this signal is undefined for the cycle after a write
|
||||||
|
transaction data have been called.
|
||||||
|
* Master->Slave
|
||||||
|
- DATA_W must supply data to the controller from the appropriate
|
||||||
|
write transaction, on the cycle after they have been called using
|
||||||
|
CALL and TAG_CALL.
|
||||||
|
- DATA_WM are the byte-granularity write data masks. They are used
|
||||||
|
in combination with DATA_W to identify the bytes that should be
|
||||||
|
modified in the memory. The DATA_WM bit should be high for its
|
||||||
|
corresponding DATA_W byte to be written.
|
||||||
|
|
||||||
|
DATA_W and DATA_WM must always be driven low by a master, except during
|
||||||
|
the data call for a write transaction that it has requested.
|
||||||
|
|
||||||
|
Tags represent in-flight transactions. The memory controller can reissue
|
||||||
|
a tag as soon as the cycle when it appears on TAG_CALL.
|
||||||
|
|
||||||
|
Performance considerations
|
||||||
|
==========================
|
||||||
|
Note that the payload of a transaction lasts for only one cycle (i.e.
|
||||||
|
there are no bursts). Therefore, to be able to achieve 100% bandwidth
|
||||||
|
utilization, the issuance of a tag should also take no more than a
|
||||||
|
cycle.
|
||||||
|
|
||||||
|
For this purpose, the control signals are pipelined. When ACK is
|
||||||
|
asserted, STB can qualify a new request in the same cycle. This puts a
|
||||||
|
constraint on the arbiter, which must be able to switch combinatorially
|
||||||
|
to the next transaction on the assertion of ACK and still meet timing.
|
||||||
|
|
||||||
|
The controller is not allowed to generate ACK combinatorially from STB.
|
||||||
|
However, the master can generate STB combinatorially from ACK in order
|
||||||
|
to maximize bus bandwidth.
|
||||||
|
|
||||||
|
STB <0><1><0><0><1><1><0><1><1><0>
|
||||||
|
ACK <0><0><1><0><0><0><1><0><1><1>
|
||||||
|
TAG ------<A>---------<B>---<C><D>
|
||||||
|
|
||||||
|
SDRAM burst width and clock ratios
|
||||||
|
==================================
|
||||||
|
A system using ASMI must set the SDRAM burst length B, the ASMIbus word
|
||||||
|
width W and the ratio between the ASMIbus clock frequency Fa and the
|
||||||
|
SDRAM I/O frequency Fi so that all data transfers last for exactly one
|
||||||
|
ASMIbus cycle.
|
||||||
|
|
||||||
|
More explicitly, these relations must be verified:
|
||||||
|
B = Fi/Fa
|
||||||
|
W = B*[number of SDRAM I/O pins]
|
||||||
|
|
||||||
|
For DDR memories, the I/O frequency is twice the logic frequency.
|
||||||
|
|
||||||
|
Environment
|
||||||
|
===========
|
||||||
|
The ASMI consists of a memory controller (e.g. ASMIcon) and optionally
|
||||||
|
an arbiter/switch (e.g. ASMIswitch) thanks to which multiple masters can
|
||||||
|
access the shared system memory.
|
||||||
|
|
||||||
|
Links between them are using the same ASMIbus protocol described above.
|
||||||
|
In order to avoid duplicating the tag matching and tracking logic, the
|
||||||
|
master->slave data signals must be driven low when they are not in use,
|
||||||
|
so that they can be simply ORed together at the arbiter. This way, only
|
||||||
|
masters have to track (their own) transactions.
|
||||||
|
|
||||||
|
It is suggested that memory controllers use an interface to a PHY
|
||||||
|
compatible with DFI [2]. The DFI clock can be the same as the ASMIbus
|
||||||
|
clock, with optional serialization and deserialization happening across
|
||||||
|
the PHY, as specified in the DFI standard.
|
||||||
|
|
||||||
|
+-------+ +----------+
|
||||||
|
|Master1|<==>| | +----------+
|
||||||
|
+-------+ | | +-------+ +-------+ | Off-chip |
|
||||||
|
|ASMIswitch|<==>|ASMIcon|<-->|DDR PHY|<<====>>| SDRAM |
|
||||||
|
+-------+ | | +-------+ +-------+ |device(s) |
|
||||||
|
|Master2|<==>| | +----------+
|
||||||
|
+-------+ +----------+
|
||||||
|
|
||||||
|
<====> ASMIbus links
|
||||||
|
<----> DFI (or similar) links
|
||||||
|
<<==>> PCB traces to external SDRAM chips
|
||||||
|
|
||||||
|
Example transactions
|
||||||
|
====================
|
||||||
|
|
||||||
|
Basic transaction:
|
||||||
|
CTL <R A1>------------------------
|
||||||
|
ISSUE------< T1 >------------------
|
||||||
|
CALL ------------------< T1 >------
|
||||||
|
DAT_R------------------------<D A1>
|
||||||
|
DAT_W------------------------------
|
||||||
|
|
||||||
|
Two simple transactions:
|
||||||
|
CTL <R A1>------<R A2>------------------------------
|
||||||
|
ISSUE------< T1 >------< T2 >------------------------
|
||||||
|
CALL ------------------------< T1 >------< T2 >------
|
||||||
|
DAT_R------------------------------<D A1>------<D A2>
|
||||||
|
DAT_W------------------------------------------------
|
||||||
|
|
||||||
|
Interleaved transactions:
|
||||||
|
CTL <R A1>------<R A2><W A3><R A4><W A5>------------------------------
|
||||||
|
ISSUE------< T1 >------< T1 >< T2 >< T1 >< T1 >------------------------
|
||||||
|
CALL ------------------< T1 >------< T1 >< T1 >------< T1 >< T2 >------
|
||||||
|
DAT_R------------------------<D A1>------<D A2><D A4>------------------
|
||||||
|
DAT_W------------------------------------------------------<D A5><D A3>
|
||||||
|
|
||||||
|
<R Ax> Read address x
|
||||||
|
<W Ax> Write address x
|
||||||
|
< Tn > Tag n
|
||||||
|
<D Ax> Data to/from address x
|
||||||
|
|
||||||
|
|
||||||
|
[1] http://www.xilinx.com/txpatches/pub/documentation/misc/
|
||||||
|
improving%20ddr%20sdram%20efficiency.pdf
|
||||||
|
[2] http://www.ddr-phy.org/
|
|
@ -331,18 +331,18 @@ Migen Bus
|
||||||
Migen Bus contains classes providing a common structure for master and
|
Migen Bus contains classes providing a common structure for master and
|
||||||
slave interfaces of the following buses:
|
slave interfaces of the following buses:
|
||||||
- Wishbone [5], the general purpose bus recommended by Opencores.
|
- Wishbone [5], the general purpose bus recommended by Opencores.
|
||||||
- CSR-NG, a low-bandwidth, resource-sensitive bus designed for
|
- CSR-2, a low-bandwidth, resource-sensitive bus designed for
|
||||||
accessing the configuration and status registers of cores from
|
accessing the configuration and status registers of cores from
|
||||||
software.
|
software.
|
||||||
- FastMemoryLink-NG, a split-transaction bus optimized for use with a
|
- ASMIbus, a split-transaction bus optimized for use with a
|
||||||
high-performance, out-of-order SDRAM controller. (TODO)
|
high-performance, out-of-order SDRAM controller.
|
||||||
|
|
||||||
It also provides interconnect components for these buses, such as
|
It also provides interconnect components for these buses, such as
|
||||||
arbiters and address decoders. The strength of the Migen procedurally
|
arbiters and address decoders. The strength of the Migen procedurally
|
||||||
generated logic can be illustrated by the following example:
|
generated logic can be illustrated by the following example:
|
||||||
wbcon = wishbone.InterconnectShared(
|
wbcon = wishbone.InterconnectShared(
|
||||||
[cpu.ibus, cpu.dbus, ethernet.dma, audio.dma],
|
[cpu.ibus, cpu.dbus, ethernet.dma, audio.dma],
|
||||||
[(0, norflash.bus), (1, wishbone2fml.wishbone),
|
[(0, norflash.bus), (1, wishbone2asmi.wishbone),
|
||||||
(3, wishbone2csr.wishbone)])
|
(3, wishbone2csr.wishbone)])
|
||||||
In this example, the interconnect component generates a 4-way round-robin
|
In this example, the interconnect component generates a 4-way round-robin
|
||||||
arbiter, multiplexes the master bus signals into a shared bus, determines
|
arbiter, multiplexes the master bus signals into a shared bus, determines
|
||||||
|
|
Loading…
Reference in New Issue