171 lines
7.0 KiB
Plaintext
171 lines
7.0 KiB
Plaintext
Advanced System Memory Infrastructure (ASMI)
|
|
==============================================
|
|
|
|
Rationale
|
|
=========
|
|
The lagging of the DRAM semiconductor processes behind the logic
|
|
processes has led the industry into a subtle way of ever increasing
|
|
memory performance.
|
|
|
|
Modern devices feature a DRAM core running at a fraction of the logic
|
|
frequency, whose wide data bus is serialized and deserialized to and
|
|
from the faster clock domain. Further, the presence of more banks
|
|
increases page hit rate and provides opportunities for parallel
|
|
execution of commands to different banks.
|
|
|
|
A first-generation SDR-133 SDRAM chip runs both DRAM, I/O and logic at
|
|
133MHz and features 4 banks. A 16-bit chip has a 16-bit DRAM core.
|
|
|
|
A newer DDR3-1066 chip still runs the DRAM core at 133MHz, but the logic
|
|
at 533MHz (4 times the DRAM frequency) and the I/O at 1066Mt/s (8 times
|
|
the DRAM frequency). A 16-bit chip has a 128-bit internal DRAM core.
|
|
Such a device features 8 banks. Note that the serialization also
|
|
introduces multiplied delays (e.g. CAS latency) when measured in number
|
|
of cycles of the logic clock.
|
|
|
|
To take full advantage of these new architectures, the memory controller
|
|
should be able to peek ahead at the incoming requests and service
|
|
several of them in parallel, while respecting the various timing
|
|
specifications of each DRAM bank and avoiding conflicts for the shared
|
|
data lines. Going further in this direction, a controller able to
|
|
complete transfers out of order can provide even more performance by:
|
|
(1) grouping requests by DRAM row, in order to minimize time spent on
|
|
precharging and activating banks.
|
|
(2) grouping requests by direction (read or write) in order to minimize
|
|
delays introduced by bus turnaround and write recovery times.
|
|
(3) being able to complete a request that hits a page earlier than a
|
|
concurrent one which requires the cycling of another bank.
|
|
|
|
The first two techniques are explained with more details in [1].
|
|
|
|
To enable the efficient implementation of these mechanisms, a new
|
|
communication protocol with the memory controller must be devised. Migen
|
|
and Milkymist SoC (-NG) implement their own bus, called ASMIbus, based
|
|
on the split-transaction principle.
|
|
|
|
Topology
|
|
========
|
|
The ASMI consists of a memory controller (e.g. ASMIcon) containing a hub
|
|
that connects the multiple masters, handles transaction tags, and
|
|
presents a view of the pending requests to the rest of the memory
|
|
controller.
|
|
|
|
Links between the masters and the hub are using the same ASMIbus protocol
|
|
described below.
|
|
|
|
It is suggested that memory controllers use an interface to a PHY
|
|
compatible with DFI [2]. The DFI clock can be the same as the ASMIbus
|
|
clock, with optional serialization and deserialization happening across
|
|
the PHY, as specified in the DFI standard.
|
|
|
|
+-------+ +---+
|
|
|Master1|<==>| | +----------+
|
|
+-------+ | +-------+ +-------+ | Off-chip |
|
|
|Hub|ASMIcon|<-->|DDR PHY|<<====>>| SDRAM |
|
|
+-------+ | +-------+ +-------+ |device(s) |
|
|
|Master2|<==>| | +----------+
|
|
+-------+ +---+
|
|
|
|
<====> ASMIbus links
|
|
<----> DFI (or similar) links
|
|
<<==>> PCB traces to external SDRAM chips
|
|
|
|
Signals
|
|
=======
|
|
The ASMIbus consists of two parts: the control signals, and the data
|
|
signals.
|
|
|
|
The control signals are used to issue requests.
|
|
* Master->Hub:
|
|
- ADR communicates the memory address to be accessed. The unit is
|
|
the word width of the particular implementation of ASMIbus.
|
|
- WE is the write enable signal.
|
|
- STB qualifies the transaction request, and should be asserted
|
|
until ACK goes high.
|
|
* Hub->Master
|
|
- TAG_ISSUE is an integer representing the transaction ("tag")
|
|
attributed by the hub. The width of this signal is determined by
|
|
the maximum number of in-flight transactions that the hub port
|
|
can handle.
|
|
- ACK is asserted when TAG_ISSUE is valid and the transaction has
|
|
been registered by the hub. A hub may assert ACK even when STB is
|
|
low, which means it is ready to accept any new transaction and
|
|
will do as soon as STB goes high.
|
|
|
|
The data signals are used to complete requests.
|
|
* Hub->Master
|
|
- TAG_CALL is used to identify the transaction for which the data
|
|
is "called". It takes the tag value that has been previously
|
|
attributed by the hub to that transaction during the issue
|
|
phase.
|
|
- CALL qualifies TAG_CALL.
|
|
- DATA_R returns data from the DRAM in the case of a read
|
|
transaction. It is valid for one cycle after CALL has been
|
|
asserted and TAG_CALL has identified the transaction.
|
|
The value of this signal is undefined for the cycle after a write
|
|
transaction data have been called.
|
|
* Master->Hub
|
|
- DATA_W must supply data to the controller from the appropriate
|
|
write transaction, on the cycle after they have been called using
|
|
CALL and TAG_CALL.
|
|
- DATA_WM are the byte-granular write data masks. They are used in
|
|
combination with DATA_W to identify the bytes that should be
|
|
modified in the memory. The DATA_WM bit should be high for its
|
|
corresponding DATA_W byte to be written.
|
|
|
|
In order to avoid duplicating the tag matching and tracking logic, the
|
|
master->hub data signals must be driven low when they are not in use, so
|
|
that they can be simply ORed together inside the memory controller. This
|
|
way, only masters have to track (their own) transactions for arbitrating
|
|
the data lines.
|
|
|
|
Tags represent in-flight transactions. The hub can reissue a tag as soon
|
|
as the cycle when it appears on TAG_CALL.
|
|
|
|
SDRAM burst length and clock ratios
|
|
===================================
|
|
A system using ASMI must set the SDRAM burst length B, the ASMIbus word
|
|
width W and the ratio between the ASMIbus clock frequency Fa and the
|
|
SDRAM I/O frequency Fi so that all data transfers last for exactly one
|
|
ASMIbus cycle.
|
|
|
|
More explicitly, these relations must be verified:
|
|
B = Fi/Fa
|
|
W = B*[number of SDRAM I/O pins]
|
|
|
|
For DDR memories, the I/O frequency is twice the logic frequency.
|
|
|
|
Example transactions
|
|
====================
|
|
|
|
Basic transaction:
|
|
CTL <R A1>------------------
|
|
ISSUE< T1 >------------------
|
|
CALL ------------< T1 >------
|
|
DAT_R------------------<D A1>
|
|
DAT_W------------------------
|
|
|
|
Two simple transactions:
|
|
CTL <R A1>------<R A2>------------------------
|
|
ISSUE< T1 >------< T2 >------------------------
|
|
CALL ------------------< T1 >------< T2 >------
|
|
DAT_R------------------------<D A1>------<D A2>
|
|
DAT_W------------------------------------------
|
|
|
|
Interleaved transactions:
|
|
CTL <R A1>------<R A2><W A3><R A4><W A5>------------------------
|
|
ISSUE< T1 >------< T1 >< T2 >< T1 >< T1 >------------------------
|
|
CALL ------------< T1 >------< T1 >< T1 >------< T1 >< T2 >------
|
|
DAT_R------------------<D A1>------<D A2><D A4>------------------
|
|
DAT_W------------------------------------------------<D A5><D A3>
|
|
|
|
<R Ax> Read address x
|
|
<W Ax> Write address x
|
|
< Tn > Tag n
|
|
<D Ax> Data to/from address x
|
|
|
|
|
|
[1] http://www.xilinx.com/txpatches/pub/documentation/misc/
|
|
improving%20ddr%20sdram%20efficiency.pdf
|
|
[2] http://www.ddr-phy.org/
|