Right now litex_sim supports only SDR memories because it uses hardcoded
PhySettings. With this change PhySettings will be generated based on
selected sdram type which will allow us to use all the different types
of sdram chips in simulation.
litex_sim used a single predefined DRAM chip, with this it is now
possible to specify which one to use with --sdram-module and also
its data bus width can be set using --sdram-data-width
With add_memory_region, user needs to provide the memory origin, which should not be needed since
could be retrieved from mem_map and prevent automatic allocation which is already possible for csr
and interrupts.
New add_mem_region method now allows both: defining the memory origin in mem_map (which will then
be used) or let the SoC builder automatically find and allocate a memory region.
Hard IP blocks are fixed in location, so long/deep combinational paths routing to multiple hard IP blocks can lead to timing closure problems.
XADC and MMCM DRPs currently have their DEN pins triggered by the ".re" output of a CSR. This is asynchronously derived from a fairly complicated set of logic that involves a logic path that goes all the way back through the cache and arbitration mechanisms of the wishbone bus. On more complex designs, this is leading to a failure of timing closure for these paths, because the hard IP blocks can be located in disparate portions of the chip which "pulls" the logic cluster in opposite directions in an attempt to absorb the routing delays to these IP blocks, leading to non-optimal placement for everything else and thus timing closure problems.
This pull request proposes that we add a pipeline delay on these critical paths. This delays the commit of the data to the DRP by one cycle, but greatly relieves timing because the pipeline register can be placed close to the cluster of logic that computes addresses, caching, and arbitration, allowing for the routing slack to the hard IP blocks to be absorbed by the path between the pipe register and the hard IP block.
In general, this shouldn't be a problem because the algorithm to program the DRP is to hit the write or read CSR, and then poll the drdy bit until it is asserted (so the process is already pretty slow). The MMCM in particular should have almost no impact, because MMCM updates are infrequent and the subsequent lock time of the MMCM is pretty long. The XADC is potentially more problematic because it can produce data at up to 1MSPS; but if sysclk is around 100MHz, adding 10ns to the read latency is relatively small compared to the theoretical maximum data rate of one every 1,000ns.
Note that the xadc patch requires introducing a bit of logic into the non-DRP path. This is because without explicitly putting an "if" statement around the logic, you fall back to the non-blocking semantics of the verilog operator, which ultimately leads to a pretty hefty combinational path. By having a default "if" that should get optimized out when DRP is not enabled, when the DRP path /is/ enabled the synthesizer knows it can safely push the async signal into a simple mux as opposed to worrying about enforcing the non-blocking operator semantics to get the desired result.
Revert to treating SDRAM_DFII_PIX_[RD|WR]DATA CSRs as arrays
of bytes, but use the new uintX_t array accessors for improved
legibility, and to avoid unnecessary byteswapping.
Signed-off-by: Gabriel Somlo <gsomlo@gmail.com>