litex/doc/asmi.txt

171 lines
7.0 KiB
Plaintext

Advanced System Memory Infrastructure (ASMI)
==============================================
Rationale
=========
The lagging of the DRAM semiconductor processes behind the logic
processes has led the industry into a subtle way of ever increasing
memory performance.
Modern devices feature a DRAM core running at a fraction of the logic
frequency, whose wide data bus is serialized and deserialized to and
from the faster clock domain. Further, the presence of more banks
increases page hit rate and provides opportunities for parallel
execution of commands to different banks.
A first-generation SDR-133 SDRAM chip runs both DRAM, I/O and logic at
133MHz and features 4 banks. A 16-bit chip has a 16-bit DRAM core.
A newer DDR3-1066 chip still runs the DRAM core at 133MHz, but the logic
at 533MHz (4 times the DRAM frequency) and the I/O at 1066Mt/s (8 times
the DRAM frequency). A 16-bit chip has a 128-bit internal DRAM core.
Such a device features 8 banks. Note that the serialization also
introduces multiplied delays (e.g. CAS latency) when measured in number
of cycles of the logic clock.
To take full advantage of these new architectures, the memory controller
should be able to peek ahead at the incoming requests and service
several of them in parallel, while respecting the various timing
specifications of each DRAM bank and avoiding conflicts for the shared
data lines. Going further in this direction, a controller able to
complete transfers out of order can provide even more performance by:
(1) grouping requests by DRAM row, in order to minimize time spent on
precharging and activating banks.
(2) grouping requests by direction (read or write) in order to minimize
delays introduced by bus turnaround and write recovery times.
(3) being able to complete a request that hits a page earlier than a
concurrent one which requires the cycling of another bank.
The first two techniques are explained with more details in [1].
To enable the efficient implementation of these mechanisms, a new
communication protocol with the memory controller must be devised. Migen
and Milkymist SoC (-NG) implement their own bus, called ASMIbus, based
on the split-transaction principle.
Topology
========
The ASMI consists of a memory controller (e.g. ASMIcon) containing a hub
that connects the multiple masters, handles transaction tags, and
presents a view of the pending requests to the rest of the memory
controller.
Links between the masters and the hub are using the same ASMIbus protocol
described below.
It is suggested that memory controllers use an interface to a PHY
compatible with DFI [2]. The DFI clock can be the same as the ASMIbus
clock, with optional serialization and deserialization happening across
the PHY, as specified in the DFI standard.
+-------+ +---+
|Master1|<==>| | +----------+
+-------+ | +-------+ +-------+ | Off-chip |
|Hub|ASMIcon|<-->|DDR PHY|<<====>>| SDRAM |
+-------+ | +-------+ +-------+ |device(s) |
|Master2|<==>| | +----------+
+-------+ +---+
<====> ASMIbus links
<----> DFI (or similar) links
<<==>> PCB traces to external SDRAM chips
Signals
=======
The ASMIbus consists of two parts: the control signals, and the data
signals.
The control signals are used to issue requests.
* Master->Hub:
- ADR communicates the memory address to be accessed. The unit is
the word width of the particular implementation of ASMIbus.
- WE is the write enable signal.
- STB qualifies the transaction request, and should be asserted
until ACK goes high.
* Hub->Master
- TAG_ISSUE is an integer representing the transaction ("tag")
attributed by the hub. The width of this signal is determined by
the maximum number of in-flight transactions that the hub port
can handle.
- ACK is asserted when TAG_ISSUE is valid and the transaction has
been registered by the hub. A hub may assert ACK even when STB is
low, which means it is ready to accept any new transaction and
will do as soon as STB goes high.
The data signals are used to complete requests.
* Hub->Master
- TAG_CALL is used to identify the transaction for which the data
is "called". It takes the tag value that has been previously
attributed by the hub to that transaction during the issue
phase.
- CALL qualifies TAG_CALL.
- DATA_R returns data from the DRAM in the case of a read
transaction. It is valid for one cycle after CALL has been
asserted and TAG_CALL has identified the transaction.
The value of this signal is undefined for the cycle after a write
transaction data have been called.
* Master->Hub
- DATA_W must supply data to the controller from the appropriate
write transaction, on the cycle after they have been called using
CALL and TAG_CALL.
- DATA_WM are the byte-granular write data masks. They are used in
combination with DATA_W to identify the bytes that should be
modified in the memory. The DATA_WM bit should be high for its
corresponding DATA_W byte to be written.
In order to avoid duplicating the tag matching and tracking logic, the
master->hub data signals must be driven low when they are not in use, so
that they can be simply ORed together inside the memory controller. This
way, only masters have to track (their own) transactions for arbitrating
the data lines.
Tags represent in-flight transactions. The hub can reissue a tag as soon
as the cycle when it appears on TAG_CALL.
SDRAM burst length and clock ratios
===================================
A system using ASMI must set the SDRAM burst length B, the ASMIbus word
width W and the ratio between the ASMIbus clock frequency Fa and the
SDRAM I/O frequency Fi so that all data transfers last for exactly one
ASMIbus cycle.
More explicitly, these relations must be verified:
B = Fi/Fa
W = B*[number of SDRAM I/O pins]
For DDR memories, the I/O frequency is twice the logic frequency.
Example transactions
====================
Basic transaction:
CTL <R A1>------------------
ISSUE< T1 >------------------
CALL ------------< T1 >------
DAT_R------------------<D A1>
DAT_W------------------------
Two simple transactions:
CTL <R A1>------<R A2>------------------------
ISSUE< T1 >------< T2 >------------------------
CALL ------------------< T1 >------< T2 >------
DAT_R------------------------<D A1>------<D A2>
DAT_W------------------------------------------
Interleaved transactions:
CTL <R A1>------<R A2><W A3><R A4><W A5>------------------------
ISSUE< T1 >------< T1 >< T2 >< T1 >< T1 >------------------------
CALL ------------< T1 >------< T1 >< T1 >------< T1 >< T2 >------
DAT_R------------------<D A1>------<D A2><D A4>------------------
DAT_W------------------------------------------------<D A5><D A3>
<R Ax> Read address x
<W Ax> Write address x
< Tn > Tag n
<D Ax> Data to/from address x
[1] http://www.xilinx.com/txpatches/pub/documentation/misc/
improving%20ddr%20sdram%20efficiency.pdf
[2] http://www.ddr-phy.org/