Picolibc doesn't maintain meson.build or cross*.txt files
for lm32 and or1k CPUs.
This commit adds meson.build files for both of them.
Also libc/Makefile got modifed to determine CPU family
from $(CPU) value and generate cross.txt file.
Now picolibc gets compiled with LiteX flags too.
It doesn't compile on neither of them yet,
but at least riscv still works.
Picolibc requires __iob array for its IO functions
This commit creates such array with dummy functions
using putchar/readchar from console.c
To prevent name conflicts printf and others were
removed from console.c
Also putchar had to be renamed to base_putchar
Right now it is still limited as it compiles only for one target,
but it should be possible to build BIOS with one command
Tested with digilent_arty.py
Adds a switch `--with-gpio`, which will add a 32 pin GPIOTristate
core, with the GPIOTristate signals exposed on the top-level
module. This can be used to add a custom GPIO module in the Verilated
simulation.
Signed-off-by: Leon Schuermann <leon@is.currently.online>
Support exposing tristate GPIOs with tristate pads, by avoiding
instantiation of tristate buffers directly in the module. This gives
the developers more flexibility in how they want to implement their
tristate IOs (for example with level shifters behind the IOs), and
allows to use the GPIOTristate core in the Verilated simulation as
Verilator does not support top-level inout signals.
Signed-off-by: Leon Schuermann <leon@is.currently.online>
I found that in some cases, initialized global variables don't work with user libraries, so a little change to the linker that I use, taken from the demo file, seems to solve the problem . I think that make more sense to put the global variables in sram and initial values in the main_ram, similar to the bios linker.
This allows setting a root device other than ram0, this is useful
when using a rootfs from the SD card. Doing this makes boot time
faster and saves on memory footprint used by an in ram initrd.
While the Depacketizer did correctly calculate a new last_be value for
the data with the header removed, it may happen that the last_be
overflows and thus relates to the current, non-delayed sink value. The
same goes for the Packetizer, just inversed. This introduces logic in
form of a simple FSM to handle these cases and properly output last_be
on the last valid bus word.
Co-authored-by: David Sawatzke <d-git@sawatzke.dev>
Signed-off-by: Leon Schuermann <leon@is.currently.online>
On slow configurations (ex iCEBreaker / SERV CPU / 12MHz SPI Flash freq) memspeed test was
too slow (>200s to do the random test for 1MB), so reduce test size to 4KB.
This will be less accurate but will still provide representative results which
is the aim of this test.
Integration from #1024 was working on some boards (ex Arty) but breaking others (ex iCEBreaker);
simplify things for now:
- Avoid duplication in spiflash_freq_init.
- Avoid passing useless SPIFLASH_LEGACY flag to software (software can detect it from csr.h).
- Only keep integration support for "legacy" PHY, others are not generic enough and can be passed with phy parameter.
litex_sim: SoC without RAM/SDRAM.
litex_sim --integrated-main-ram-size=0x1000: SoC with RAM of size 0x1000.
litex_sim --with-sdram: SoC with SDRAM.
litex_sim --integrated-main-ram-size=0x1000 --with-sdram: SoC with RAM (priority to RAM over SDRAM).
Inside the litex add_spi_flash function
we are detecting the devices that can't be used with
more efficient DDR version of litespi phy core
and we are choosing whether to instantiate the legacy or DDR core
This not only tests for the precise PHY model, but also whether there
is a model attribute in the ethphy instance and whether that is set to
True.
Signed-off-by: Leon Schuermann <leon@is.currently.online>
The different branches each constructed their own ethphy. We can split
this out, which increases code reuse and allows to use the GMII and
XGMII interface types with all of Ethernet, Etherbone or
Ethernet+Etherbone.
Signed-off-by: Leon Schuermann <leon@is.currently.online>
This renames the `clk_edge_t` struct to `clk_edge_state_t`, given it
only tracks the previous clock edge state.
Furthermore introduce a new `enum clk_edge` (typedef'd to
`clk_edge_t`) which represents all possible clock edges and add a
function (`clk_edge`) to retrieve the type of current clock edge as a
`clk_edge_t`.
Signed-off-by: Leon Schuermann <leon@is.currently.online>
When building with GCC 11:
../libbase/crt0.o: in function `_start':
litex/soc/cores/cpu/microwatt/crt0.S:54:(.text+0x38): relocation truncated to fit: R_PPC64_GOT16_DS against symbol `_fdata' defined in .data section in bios.elf
litex/soc/cores/cpu/microwatt/crt0.S:55:(.text+0x3c): relocation truncated to fit: R_PPC64_GOT16_DS against symbol `_edata' defined in .data section in bios.elf
litex/soc/cores/cpu/microwatt/crt0.S:56:(.text+0x40): relocation truncated to fit: R_PPC64_GOT16_DS against symbol `_fdata_rom' defined in *ABS* section in bios.elf
litex/soc/cores/cpu/microwatt/crt0.S:68:(.text+0x68): relocation truncated to fit: R_PPC64_GOT16_DS against symbol `_fbss' defined in .bss section in bios.elf
litex/soc/cores/cpu/microwatt/crt0.S:69:(.text+0x6c): relocation truncated to fit: R_PPC64_GOT16_DS against symbol `_ebss' defined in .bss section in bios.elf
litex/soc/cores/cpu/microwatt/crt0.S:80:(.text+0x90): relocation truncated to fit: R_PPC64_GOT16_DS against symbol `_fstack' defined in .bss section in bios.elf
boot.o: in function `copy_file_from_sdcard_to_ram':
litex/soc/software/bios/boot.c:622:(.text+0x18): relocation truncated to fit: R_PPC64_TOC16_DS against `.toc'
litex/soc/software/bios/boot.c:627:(.text+0x5c): relocation truncated to fit: R_PPC64_TOC16_DS against `.toc'+8
litex/soc/software/bios/boot.c:633:(.text+0x8c): relocation truncated to fit: R_PPC64_TOC16_DS against `.toc'+10
litex/soc/software/bios/boot.c:639:(.text+0xdc): relocation truncated to fit: R_PPC64_TOC16_DS against `.toc'+18
litex/soc/software/bios/boot.c:650:(.text+0x128): additional relocation overflows omitted from the output
This is because we pass -mcmodel=small. As the PowerPC ELF ABI
describes, the small code model restricts the relocations to 16-bit
offsets[1]. If we omit the option we get the default, which is the
medium model allowing 32-bit offsets.
http://openpowerfoundation.org/wp-content/uploads/resources/leabi/content/dbdoclet.50655240_19143.html
Signed-off-by: Joel Stanley <joel@jms.id.au>
New version of binutils contain stricter checks for sections that are
not included in the linker script, resulting in this error:
ld: error: no memory region specified for loadable section `.note.gnu.build-id'
Disable this feature as there is no use for it on bare metal systems.
Signed-off-by: Joel Stanley <joel@jms.id.au>
binutils 2.34 contains stricter checks for sections that are not
included in the linker script, resulting in this error:
ld: bios.elf: error: PHDR segment not covered by LOAD segment
From the 2.34 NEWS:
The ld check for "PHDR segment not covered by LOAD segment" is more
effective, catching cases that were wrongly allowed by previous versions of
ld. If you see this error it is likely you are linking with a bad linker
script or the binary you are building is not intended to be loaded by a
dynamic loader. In the latter case --no-dynamic-linker is appropriate.
As the BIOS runs bare metal, we do not need to emit a PHDR segment.
Signed-off-by: Joel Stanley <joel@jms.id.au>
READ_CHECK_TEST_PATTERN_MAX_ERRORS was being computed incorrectly
which could result in integer underflows when computing core via
(max_errors - errors). This could happen e.g. when using
DFIRateConverter, which modifies DFII_PIX_DATA_BYTES. Now we use
DFII_PIX_DATA_BYTES in the equation so this should not happen.
The Linux LiteETH driver parses registers by name. Provide the expected
names for each register: "mac", "mdio", and "buffer", respectively.
For setting up tx and rx slots, the Linux driver expects three pieces
of information: number of tx and rx slots, and slot size. Update
json2dts_linux to provide the appropriate names and values.
When playing with CPUs and variants, users previously had to do a rm -rf build to ensure
a proper software build. Various developers already lost time on it so it's important
to handle it directly in the Builder which is now the case.
-litex_json2dts_linux (previously litex_json2dts).
-litex_json2dts_zephyr (previously litex_zephyr_dts_generator).
-litex_json2renode (previously litex_renode_generator).
litex_json2dts_zephyr and litex_json2renode are now also directly exposed.
- Fix compilation in sdram.c.
- Fix warnings.
- Move Sequential/Random mode printf to memtest.
- Reduce SPI Flash test size (Testing full SPI Flash makes the boot too long, especially in random mode).
Without this change, when `.data` section size wasn't multiple
of word size, `data_loop` in crt0 was jumping over `_edata` and
continued looping. As it works on words and right now 64 bit CPUs
are biggest ones supported - alignment is now 8 bytes.
Also removed `- 4` from stack address, as it needs to be aligned
to 16 bytes on RISC-V.
The current code only works with a memory bus because otherwise
"generate_cluster_name" doesn't get called.
Cluster_name is only needed in the finalize phase.
Therefore, the name will now be generated just before its usage.
Verifiable with:
litex_sim --cpu-type vexriscv_smp (should be broken before this commit)
RISC-V requires stack to be aligned to 16 bytes.
1d5384e669/riscv-elf.md (L183)
Right now, in bios/linker.ld, `_fstack` is being set to 8 bytes
before the end of sram region.
```
PROVIDE(_fstack = ORIGIN(sram) + LENGTH(sram) - 8);
```
Removing ` - 8` makes it aligned to 16.
Also there are changes in crt0.S for vexriscv,
vexriscv_smp and cv32e40p.
Code that was setting up stack, was adding 4 to its address
for some reason.
Removing it makes it aligned to 8 bytes, and with change in
bios/linker.ld to 16 bytes.
It also fixes `printf` with long long integers on 32bit
CPUs ([relevant issue](https://github.com/riscv/riscv-gcc/issues/63)).
A regular CPU can provides specific mapping constraints and we are overriding provided mapping
with these constraints.
The case of CPUNone is different and we can do the opposite: Give priority to User's mapping.
For the regular CPU case, the override was done silently, it is now logged during the build.
- Rename optional #define and allow defining them externally.
- Add comments.
- Rename FLASH_CHIP_MX25L12833F_QUAD to SPIFLASH_MODULE_QUAD_CAPABLE.
- Rename FLASH_CHIP_MX25L12833F_QPI to SPIFLASH_MODULE_QPI_CAPABLE.
The instructions used for QUAD/QPI are probably different between chips, we could
imagine providing them through the LiteX integration based on the passed SPI Flash
module.
One small FPGAs running the BIOS from SPI Flash, the default divisor of 9 was slowing down too
much BIOS boot time (It was OK on reboot after liblitespi auto-calibration). Reduce the default
divisor to avoid this.
The implementation was causing regressions on actual designs, rework done:
- Only keep a common iteration loop as before.
- Add iteration on CLKO dividers (to fall in the VCO range).
- Do the iterations as before, if while doing it we find a clock suitable for feedback: just use it.
- If no feedback clock has been found: create it (if at least one free output available, if not raise an error).
- Shift in _csr_rd_buf should only been done when buf is set.
- When CSR size is not an exact multiple of the CSR data-width, the gap is in
the low addresses, not the high ones. So offset is introduced to take this into
account.
Dynamically adjusting the phase of a feedback will cause it to unlock.
The phase adjust ports are shared by all the outputs, so there is no
technical way to prevent this. Allow the user to indicate that they
will not adjust a clock when requesting an output by setting
uses_dpa=False, and only consider those that the user has promised not
to use.
Reimplement the configuration loop to allow all 4 outputs to be used by
the user, if one of them is suitable for use as VCO feedback.
The new strategy is to first iterate over requested outputs to see if
any of them can be used as a feedback source. If one can, it is
selected, and if no output is suitable, it attempts to instantiate one.
Once the feedback path is selected, the VCO frequency is known and it
attempts to calculate the remaining outputs' settings.
In addition, this implementation now respects datasheet limits in two
new ways:
- It respects the post-input-divider minimum frequency of 10MHz
- It respects the max output frequency of 400MHz for instantiated
feedback outputs
I am slightly unhappy with the seemingly-repetitive for loops. However
each one has slightly different sematics and I don't see a way to
combine them that doesn't hinder readability.
**Problems**
On macOS USB CDC ACM, which appears as `/dev/tty.usbmodem*`,
somehow `lxterm` keeps failing to send a payload.
**Solution**
Increase delay.
It's very unknown why to me, however, probably macOS USB CDC ACM
driver implementation issue.
**Testing**
Tested on MacBook Air (2020, M1) for OrangeCrab (rev.0.2) target
with Linux on LiteX SoC bitstream build from current commit and
load prebuild Linux On LiteX image.
Some targets have multiple alt settings for DFU for different zone
of the flash. Allowing to specify which one to flash and not
rebooting immediately allows to flash several of them at once.
Signed-off-by: Sylvain Munaut <tnt@246tNt.com>
On designs which use Nexus parts without any external memory, it can be
difficult to fit an embedded ROM program larger than a few KiB. Radiant
cannot infer LRAM, and refuses to infer EBRAM under many circumstances
too, so large memories tend to just consume a huge number of LUTs.
This patch makes it possible to explicitly wire up an LRAM as a ROM,
populate its initial values with a program, and execute directly from
it. That lets us embed programs up to 64KiB.
In some SoCs where UART's PHY is managed externally (ex through a Bridge) we don't
necessarily want the UART TX to wait for the PHY to be ready (and then stall the
CPU) but just want to let the CPU print the UART and will just connect when useful
and handle backpressure when connected.
This is now possible by calling add_auto_tx_flush method, ex in the SoC:
self.uart.add_auto_tx_flush(sys_clk_freq)
Fix gcc warnings: use 'unsigned long' to represent memory addresses,
and remove 'static' from the definition of 'cdelay()', as it is called
from multiple C files.
FemtoRV is a minimalist RISC-V CPU with design process documented and
available at https://github.com/BrunoLevy/learn-fpga.
This CPU is a very nice way to discover/learn RISC-V and this LiteX support
can be useful to learn how to integrate a custom CPU with LiteX.
With this support, FemtoRV is now directly usable with LiteX Sim:
$litex_sim --cpu-type=femtorv
This should also enable its use on all boards (> 50) available in LiteX-Boards
repository (but hasn't been tested yet), ex:
$python3 -m litex_boards.targets.digilent_arty --cpu-type=femtorv --build
As discussed in #909, in some specific cases, it can be interesting to be able
to keep the CPU in reset while the rest of the SoC is still operating (ex the
peripherals/bridges).
With theses changes, the old behaviour is preserved to do a full SoC Reset (at
the exception that writing a 1 is now mandatory) and a separate field specific
to the CPU reset is added.
The SoC Reset is a pulse (otherwise the system would be stuck in Reset) while
the CPU Reset is based on the register value (so can be pulse or hold).
Before this patch, the loop would finish with lowest_div either set to the first failing value
or 0 even if it succeeded with 0. Fix it so that if all tests pass, it’ll end up being -1 before
the incrementation.
This patch also skips retesting the original value. If the retest failed, lowest_div would be incremented past the original value and could potentially wrap around.
* resolve issue #862 add description to soc.svd
The issue is that with no description provided it simply would
not put out a description tag, which breaks compatibility with
other programs.
Insert a somewhat useful default description including a timestamp
and the words "LiteX SoC".
* I2S fix: sample SYNC on the correct edge
The original Tx path implementation samples SYNC on the falling
edge, out of convenience with the fact that teh data must also
change on the falling edge.
This works OK, until you have a CODEC which has a ~40ns max
delay spec on the SYNC, and also has a slightly asymmetric
SYNC edge (the SYNC signal is also the WCLK or LRCLK depending on
which docs you read). The SYNC by spec is supposed to change
on the falling edge, and this extra delay is enough to cause
the SYNC to introduce occassional bit or frame shifts into
the audio.
This fix samples the SYNC on the rising edge, but still
changes the data on the falling edge, thus allowing for
implementations where SYNC has quite loose timings relative
to everything else (as is the case on the TLV320AIC3200)