mirror of
https://github.com/enjoy-digital/litex.git
synced 2025-01-04 09:52:26 -05:00
doc: ASMI description
This commit is contained in:
parent
21f3def74b
commit
bf2f6f31e3
2 changed files with 193 additions and 4 deletions
189
doc/asmi.txt
Normal file
189
doc/asmi.txt
Normal file
|
@ -0,0 +1,189 @@
|
|||
Advanced System Memory Infrastructure (ASMI)
|
||||
==============================================
|
||||
|
||||
Rationale
|
||||
=========
|
||||
The lagging of the DRAM semiconductor processes behind the logic
|
||||
processes has led the industry into a subtle way of ever increasing
|
||||
memory performance.
|
||||
|
||||
Modern devices feature a DRAM core running at a fraction of the logic
|
||||
frequency, whose wide data bus is serialized and deserialized to and
|
||||
from the faster clock domain. Further, the presence of more banks
|
||||
increases page hit rate and provides opportunities for parallel
|
||||
execution of commands to different banks.
|
||||
|
||||
A first-generation SDR-133 SDRAM chip runs both DRAM, I/O and logic at
|
||||
133MHz and features 4 banks. A 16-bit chip has a 16-bit DRAM core.
|
||||
|
||||
A newer DDR3-1066 chip still runs the DRAM core at 133MHz, but the logic
|
||||
at 533MHz (4 times the DRAM frequency) and the I/O at 1066Mt/s (8 times
|
||||
the DRAM frequency). A 16-bit chip has a 128-bit internal DRAM core.
|
||||
Such a device features 8 banks. Note that the serialization also
|
||||
introduces multiplied delays (e.g. CAS latency) when measured in number
|
||||
of cycles of the logic clock.
|
||||
|
||||
To take full advantage of these new architectures, the memory controller
|
||||
should be able to peek ahead at the incoming requests and service
|
||||
several of them in parallel, while respecting the various timing
|
||||
specifications of each DRAM bank and avoiding conflicts for the shared
|
||||
data lines. Going further in this direction, a controller able to
|
||||
complete transfers out of order can provide more performance by:
|
||||
(1) grouping together requests by DRAM row, in order to minimize time
|
||||
spent on precharging and activating banks.
|
||||
(2) grouping together requests by direction (read or write) in order to
|
||||
minimize delays introduced by bus turnaround and write recovery
|
||||
times.
|
||||
(3) being able to complete a request that hits a page earlier than a
|
||||
concurrent one which requires the cycling of another bank.
|
||||
|
||||
The first two techniques are explained with more details in [1].
|
||||
|
||||
To enable the efficient implementation of these mechanisms, a new
|
||||
communication protocol with the memory controller must be devised. Migen
|
||||
and Milkymist SoC (-NG) implement their own bus, called ASMIbus, based
|
||||
on the split-transaction principle.
|
||||
|
||||
ASMIbus - the big picture
|
||||
=========================
|
||||
The ASMIbus consists of two parts: the control signals, and the data
|
||||
signals.
|
||||
|
||||
The control signals are used to issue requests.
|
||||
* Master->Slave:
|
||||
- ADR communicates the memory address to be accessed. The unit is
|
||||
the word width of the particular implementation of ASMIbus.
|
||||
- WE is the write enable signal.
|
||||
- STB qualifies the transaction request, and should be asserted
|
||||
until ACK goes high.
|
||||
* Slave->Master
|
||||
- TAG_ISSUE is an integer representing the transaction ("tag")
|
||||
attributed by the memory controller. The width of this signal is
|
||||
determined by the maximum number of in-flight transactions that
|
||||
the memory controller can handle.
|
||||
- ACK is asserted at least one cycle after STB when TAG_ISSUE is
|
||||
valid and the transaction has been accepted by the memory
|
||||
controller.
|
||||
|
||||
The data signals are used to complete requests.
|
||||
* Slave->Master
|
||||
- TAG_CALL is used to identify the transaction for which the data
|
||||
is "called". It takes the tag value that has been previously
|
||||
attributed by the controller to that transaction during the issue
|
||||
phase.
|
||||
- CALL qualifies TAG_CALL.
|
||||
- DATA_R returns data from the DRAM in the case of a read
|
||||
transaction. It is valid for one cycle after CALL has been
|
||||
asserted and TAG_CALL has identified the transaction.
|
||||
The value of this signal is undefined for the cycle after a write
|
||||
transaction data have been called.
|
||||
* Master->Slave
|
||||
- DATA_W must supply data to the controller from the appropriate
|
||||
write transaction, on the cycle after they have been called using
|
||||
CALL and TAG_CALL.
|
||||
- DATA_WM are the byte-granularity write data masks. They are used
|
||||
in combination with DATA_W to identify the bytes that should be
|
||||
modified in the memory. The DATA_WM bit should be high for its
|
||||
corresponding DATA_W byte to be written.
|
||||
|
||||
DATA_W and DATA_WM must always be driven low by a master, except during
|
||||
the data call for a write transaction that it has requested.
|
||||
|
||||
Tags represent in-flight transactions. The memory controller can reissue
|
||||
a tag as soon as the cycle when it appears on TAG_CALL.
|
||||
|
||||
Performance considerations
|
||||
==========================
|
||||
Note that the payload of a transaction lasts for only one cycle (i.e.
|
||||
there are no bursts). Therefore, to be able to achieve 100% bandwidth
|
||||
utilization, the issuance of a tag should also take no more than a
|
||||
cycle.
|
||||
|
||||
For this purpose, the control signals are pipelined. When ACK is
|
||||
asserted, STB can qualify a new request in the same cycle. This puts a
|
||||
constraint on the arbiter, which must be able to switch combinatorially
|
||||
to the next transaction on the assertion of ACK and still meet timing.
|
||||
|
||||
The controller is not allowed to generate ACK combinatorially from STB.
|
||||
However, the master can generate STB combinatorially from ACK in order
|
||||
to maximize bus bandwidth.
|
||||
|
||||
STB <0><1><0><0><1><1><0><1><1><0>
|
||||
ACK <0><0><1><0><0><0><1><0><1><1>
|
||||
TAG ------<A>---------<B>---<C><D>
|
||||
|
||||
SDRAM burst width and clock ratios
|
||||
==================================
|
||||
A system using ASMI must set the SDRAM burst length B, the ASMIbus word
|
||||
width W and the ratio between the ASMIbus clock frequency Fa and the
|
||||
SDRAM I/O frequency Fi so that all data transfers last for exactly one
|
||||
ASMIbus cycle.
|
||||
|
||||
More explicitly, these relations must be verified:
|
||||
B = Fi/Fa
|
||||
W = B*[number of SDRAM I/O pins]
|
||||
|
||||
For DDR memories, the I/O frequency is twice the logic frequency.
|
||||
|
||||
Environment
|
||||
===========
|
||||
The ASMI consists of a memory controller (e.g. ASMIcon) and optionally
|
||||
an arbiter/switch (e.g. ASMIswitch) thanks to which multiple masters can
|
||||
access the shared system memory.
|
||||
|
||||
Links between them are using the same ASMIbus protocol described above.
|
||||
In order to avoid duplicating the tag matching and tracking logic, the
|
||||
master->slave data signals must be driven low when they are not in use,
|
||||
so that they can be simply ORed together at the arbiter. This way, only
|
||||
masters have to track (their own) transactions.
|
||||
|
||||
It is suggested that memory controllers use an interface to a PHY
|
||||
compatible with DFI [2]. The DFI clock can be the same as the ASMIbus
|
||||
clock, with optional serialization and deserialization happening across
|
||||
the PHY, as specified in the DFI standard.
|
||||
|
||||
+-------+ +----------+
|
||||
|Master1|<==>| | +----------+
|
||||
+-------+ | | +-------+ +-------+ | Off-chip |
|
||||
|ASMIswitch|<==>|ASMIcon|<-->|DDR PHY|<<====>>| SDRAM |
|
||||
+-------+ | | +-------+ +-------+ |device(s) |
|
||||
|Master2|<==>| | +----------+
|
||||
+-------+ +----------+
|
||||
|
||||
<====> ASMIbus links
|
||||
<----> DFI (or similar) links
|
||||
<<==>> PCB traces to external SDRAM chips
|
||||
|
||||
Example transactions
|
||||
====================
|
||||
|
||||
Basic transaction:
|
||||
CTL <R A1>------------------------
|
||||
ISSUE------< T1 >------------------
|
||||
CALL ------------------< T1 >------
|
||||
DAT_R------------------------<D A1>
|
||||
DAT_W------------------------------
|
||||
|
||||
Two simple transactions:
|
||||
CTL <R A1>------<R A2>------------------------------
|
||||
ISSUE------< T1 >------< T2 >------------------------
|
||||
CALL ------------------------< T1 >------< T2 >------
|
||||
DAT_R------------------------------<D A1>------<D A2>
|
||||
DAT_W------------------------------------------------
|
||||
|
||||
Interleaved transactions:
|
||||
CTL <R A1>------<R A2><W A3><R A4><W A5>------------------------------
|
||||
ISSUE------< T1 >------< T1 >< T2 >< T1 >< T1 >------------------------
|
||||
CALL ------------------< T1 >------< T1 >< T1 >------< T1 >< T2 >------
|
||||
DAT_R------------------------<D A1>------<D A2><D A4>------------------
|
||||
DAT_W------------------------------------------------------<D A5><D A3>
|
||||
|
||||
<R Ax> Read address x
|
||||
<W Ax> Write address x
|
||||
< Tn > Tag n
|
||||
<D Ax> Data to/from address x
|
||||
|
||||
|
||||
[1] http://www.xilinx.com/txpatches/pub/documentation/misc/
|
||||
improving%20ddr%20sdram%20efficiency.pdf
|
||||
[2] http://www.ddr-phy.org/
|
|
@ -331,18 +331,18 @@ Migen Bus
|
|||
Migen Bus contains classes providing a common structure for master and
|
||||
slave interfaces of the following buses:
|
||||
- Wishbone [5], the general purpose bus recommended by Opencores.
|
||||
- CSR-NG, a low-bandwidth, resource-sensitive bus designed for
|
||||
- CSR-2, a low-bandwidth, resource-sensitive bus designed for
|
||||
accessing the configuration and status registers of cores from
|
||||
software.
|
||||
- FastMemoryLink-NG, a split-transaction bus optimized for use with a
|
||||
high-performance, out-of-order SDRAM controller. (TODO)
|
||||
- ASMIbus, a split-transaction bus optimized for use with a
|
||||
high-performance, out-of-order SDRAM controller.
|
||||
|
||||
It also provides interconnect components for these buses, such as
|
||||
arbiters and address decoders. The strength of the Migen procedurally
|
||||
generated logic can be illustrated by the following example:
|
||||
wbcon = wishbone.InterconnectShared(
|
||||
[cpu.ibus, cpu.dbus, ethernet.dma, audio.dma],
|
||||
[(0, norflash.bus), (1, wishbone2fml.wishbone),
|
||||
[(0, norflash.bus), (1, wishbone2asmi.wishbone),
|
||||
(3, wishbone2csr.wishbone)])
|
||||
In this example, the interconnect component generates a 4-way round-robin
|
||||
arbiter, multiplexes the master bus signals into a shared bus, determines
|
||||
|
|
Loading…
Reference in a new issue