Crypto-router ASIC
This project has received funding from the European Union’s Horizon 2020 research and innovation programme within the framework of the NGI-POINTER Project funded under grant agreement No 871528
This project has received funding from the European Union’s Horizon 2020 research and innovation programme within the framework of the NGI-ASSURE Project funded under grant agreement No 957073.
- NLnet page: nlnet 2021 crypto router
- Top-level bugreport: https://bugs.libre-soc.org/show_bug.cgi?id=589
- ASIC/IO Pin specification page: crypto router pinspec
Goal
To build the foundations of a cryptographic extension of the POWER ISA, allowing anyone interested to build upon this effort and make an Cryptorouter FPGA or ASIC for oneself.
Deliverables
See top-level bugreport #589 - all Milestones were achieved 100% successfully as defined, including one additional Milestone added after the initial approval in 2021, for power-modulo arithmetic (the basis of RSA, DH etc).
1) A set of general-purpose scalar instructions suitable for cryptographic applications as well as many other purposes
See Big integer arithmetic (bigint) and Bit manipulation (bitmanip) for rationale, instruction list and definition in pseudo-code.
Relevant milestones:
- Bug 770: 1. Discussion and Finalisation of Which Cryptographic Primitives to Implement
- Bug 776: 7. Documentation of designs, code, processes, and other relevant things as needed
2) Implementation and validation of the above instructions on the ISA simulator
As with all large software projects the implementation is scattered within the simulator code, which is available at: https://git.libre-soc.org/?p=openpower-isa.git;a=tree;hb=HEAD
Unit tests are available at:
The above uses the ISA Simulator (see Simulator Test API).
To run the above tests cases,
install the developer environment, go to the
~/src/openpower-isa/src/openpower/decoder/isa
directory, and run
python3 test_caller_bigint.py
and python3 test_caller_bitmanip.py
.
Relevant Milestone:
- Bug 771: 2. Creation of Cryptographic-Primitive OpenPower ISA Pseudo-code
3) Reference HDL implementation of some instructions
(full implemention was not possible within limited 2021-02-051 budget nlnet 2021 crypto router)
Code and tests are available:
- HDL implementation of Ternlogi bitmanip instruction
- HDL implementation of Grev bitmanip instruction
- HDL Implementation of Galois Field instructions
- Unit test for the HDL implementation of Ternlogi
- Unit test for the HDL implementation of Grev
- Formal verification for the HDL implementation of Ternlogi
- Formal verification for the HDL implementation of Grev
- Unit test and formal verification for the HDL implementation of Galois Field instructions
To run the HDL tests, just install the developer environment and directly run the test scripts referenced above.
Relevant Milestones:
- Bug 772: 3. Creation of the HDL Code for the Instructions and Associated Unit-Tests
- Bug 840: 8. Formal proofs and unit tests for cryptoprimitives
4) Additional specification of and simulation for concepts like a REMAP engine and element width overrides
These, when implemented also in HDL, will allow hyper-efficient acceleration of many fundamental crypto algorithms in hardware.
These are implemented 100% in the ISA simulator, allowing 100% successful implementation and simulation of Simple-V-PowerISA assembler to be made. Once the HDL for these key criticl parts of SV are available (when funded) then as usual the exact same assembler run under the simulator may be run on FPGA or ASIC.
(But limited budget of 2021-02-051 was insufficient to complete HDL implementation)
5) Implementation of a few cryptographic primitives that happen to also help accelerate cryptographic algorithms
Cryptographic algorithms routinely use multi-byte quantities. Some big-integer cryptographic primitives were implemented on top of the SVP64 vectorisation of the above scalar instructions:
- Big integer multiplication primitive
- Big integer division/modulus primitive
- Big integer modular exponentiation primitive
- A presentation on big integer arithmetic primitives on top of SVP64 vectorization.
To test the above primitives in the ISA simulator,
install the developer environment, go to the
~/src/openpower-isa/src/openpower/decoder/isa
directory, and run
SILENCELOG=1 python3 test_aaa_caller_svp64_powmod.py
(warning: long running).
Relevant Milestone:
- Bug #1044: 9. Demo of modulo exponent biginteger
6) Implementation of a cryptographic algorithm (chacha20) using the new instructions and primitives
One catastrophic mistake made by many cryptographic instruction implementations is to create over-specific instructions. "multiply by 2 then subtract 5" for example (the basis of a RISC-V chacha20 "accelerator"!)
Using our instructions, our implementation of chacha20 only has TEN INSTRUCTIONS in the inner loop entire algorithm - a 50 to 100-fold reduction in code density. See: chacha20 design document.
To run the chacha20 test in the ISA simulator, go to the
~/src/openpower-isa/crypto/chacha20
directory, run make
and
SILENCELOG=1 ./test-chacha20
(warning: long running).
This unit test may also be run directly
https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_chacha20.py;hb=HEAD
Relevant Milestone:
- Bug 773: 4. High-Level Demos of Cryptographic and Other Relevant Algorithms
7) Binutils support for assembling the above instructions
Currently, our reference Python assembler needs to be used to translate assembly
files containing the new instructions. However, many (not all) instructions were
added to the Binutils assembler (gas) as well. See:
code. To install, run the
./binutils-gdb-install
script from the
developer scripts.
Further reading: Bug 964 - binutils: support maddedu, divmod2du instructions
8) A flexible self-contained HDL platform (ls2) for implementing a System-on-Chip on an FPGA or ASIC
The ls2 platform can compile a Microwatt compatible core (the reference libre-soc one, or Microwatt itself), together with select peripherals (internal RAM, SPI, Ethernet, HyperRAM, etc), for your target FPGA board (Arty A7-100t, VERSA_ECP5, other).
- Documentation (installation, running and uploading to an FPGA)
- Code
Relevant Milestone:
- Bug 774: 5. Equipment needed, such as FPGA boards and Ethernet PMODs
Helpful information for Cryptorouter implementations:
Given the work above, the information below is useful for allowing anyone interested to work towards building a Cryptorouter FPGA or ASIC for oneself:
Specifications, 2020
All of these are entirely Libre-Licensed or are to be written as Libre-Licensed:
- 300 mhz single-core, Libre-SOC OpenPOWER CPU with bitmanip extensions
- 180/130 nm (TBD)
- 5x RGMII Gigabit Ethernet PHYs with SRAM on-chip, built-in.
- 2x USB ULPI PHYs
- Direct DMA interface (independent bulk transfer)
- JTAG, GPIO, I2C, PWM, UART, SPI, QSPI, SD/MMC
- On-board Dual-ported SRAM (for Packet Buffers)
- Opencores sdram
- Wishbone interfaces to all peripherals
- XICS ICP / ICS Interrupt Controller
Example packet transfer
- Packet comes in on RGMII port 1. Each PHY has its own dual-ported SRAM
- Packet is directly stored in internal (dual-ported SRAM) by the RGMII PHY itself
- Interrupt notification is sent to the processor (XICS)
- Processor inspects packet over Wishbone interface directly connected to 2nd SRAM port.
- Processor computes, based on decoding the ETH Frame, where the packet must be sent to (which other RGM-II port: e.g. Port 2)
- Processor initiates Memory-to-Memory DMA transfer
- DMA Memory-to-Memory transfer, using Wishbone Bus, copies the ETH Frame from one on-board SRAM to the target on-board SRAM associated with Port 2.
- DMA Engine generates interrupt (XICS) to the CPU to say it is completed
- Processor notifies target RGM-II PHY to activate "send" of frame out through target RGM-II port 2.
Testing and Verification
We will need full HDL simulations as well as post P&R simulations. These may be achieved as follows:
- ISA-level unit tests as well as Formal Correctness Proofs. Example bpermd proof and individual unit tests for the Logical pipeline
- simulation with some peripherals developed in c++ as verilator modules
- nmigen-based OpenPOWER Libre-SOC core co-simulation such as this unit test, test_issuer.py
- cocotb pre/post PnR including GHDL, Icarus and Verilator (where best suited)
Actual instructions being developed (bitmanip) may therefore be unit tested prior to deployment. Following that, rapid simulations may be achieved by running ls2 (the same HDL may also easily be uploaded to an FPGA). When it comes to Place-and-Route of the ASIC, the cocotb simulations may be used to verify that the GDS-II layout has not been "damaged" by the PnR tools.
Peripherals functionality tests must also be part of the simulations, particularly using cocotb, to ensure that they remain functional after PnR. Supercomputer access for compilation of verilator and/or cxxrtl is available through fed4fire