Resources and Specifications
This page aims to collect all the resources and specifications we need in one place for quick access. We will try our best to keep links here up-to-date. Feel free to add more links here.
- Resources and Specifications
- Getting Started
- OpenPOWER ISA
- Energy-efficient cores
- Open Access Publication locations
- Communities
- ppc64 ELF ABI
- Similar concepts
- Other GPU Specifications
- Other CPUs and ISAs worth considering
- Package Management
- JTAG
- Radix MMU
- D-Cache
- BW Enhancing Shared L1 Cache Design research done in cooperation with AMD
- RTL Arithmetic SQRT, FPU etc.
- Khronos Standards
- Open Source (CC BY + MIT) DirectX specs (by Microsoft, but not official specs)
- Graphics and Compute API Stack
- 3D Graphics Texture compression software and hardware
- Various POWER Communities
- Conferences
- Coriolis2
- Logical Equivalence and extraction
- Klayout
- image to GDS-II
- The OpenROAD Project
- Other RISC-V GPU attempts
- Tests, Benchmarks, Conformance, Compliance, Verification, etc.
- Bus Architectures
- Vector Processors
- LLVM
- Branch Prediction
- Python RTL Tools
- Other
- Real/Physical Projects
- ASIC tape-out pricing
- Funding
- Good Programming/Design Practices
- 12 skills summary
- Analog Simulation
- Libre-SOC Standards
- Server setup
- Testbeds
- Really Useful Stuff
- Digilent Arty
- CircuitJS experiments
- Logic Simulator 2
- ASIC Timing and Design flow resources
- Geometric Haskell Library
- Handy Compiler Algorithms for SimpleV
- TODO investigate
Getting Started
This section is primarily a series of useful links found online
Is Open Source Hardware Profitable?
RaptorCS on FOSS Hardware Interview
OpenPOWER ISA
- 3.0 PDF
- 2.07 PDF
- Virginia Tech course https://github.com/w-feng/CompArch-MIPS-POWER
- mini functional simulator https://github.com/god-s-perfect-idiot/POWER-sim
- https://raw.githubusercontent.com/linuxppc/public-docs/main/ISA/PowerPC_Assembly_IBM_Programming_Environment_2.3.pdf
Overview of the user ISA:
- Raymond Chen's PowerPC series
- Power ISA listings https://power-isa-beta.mybluemix.net/
OpenPOWER OpenFSI Spec (2016)
Energy-efficient cores
- https://arxiv.org/abs/2002.10143
- https://arxiv.org/abs/2011.08070
Open Access Publication locations
Communities
- https://www.reddit.com/r/OpenPOWER/
- http://lists.mailinglist.openpowerfoundation.org/pipermail/openpower-hdl-cores/
- http://lists.mailinglist.openpowerfoundation.org/pipermail/openpower-community-dev/
- Open tape-out mailing list https://groups.google.com/a/opentapeout.dev/g/mailinglist
ppc64 ELF ABI
- EABI 1.9 supplement https://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi-1.9.html
- https://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi-1.9.pdf
- v2.1.5 https://openpowerfoundation.org/specifications/64bitelfabi/
Similar concepts
- https://www.tdx.cat/bitstream/handle/10803/674224/TCRL1de1.pdf Vector registers may be made "ultra-wide" (SX Aurora / Cray)
Other GPU Specifications
- https://developer.amd.com/wp-content/resources/RDNA_Shader_ISA.pdf
- https://developer.amd.com/wp-content/resources/Vega_Shader_ISA_28July2017.pdf
- MALI Midgard
- Nyuzi
- VideoCore IV
- etnaviv
Other CPUs and ISAs worth considering
- https://en.m.wikipedia.org/wiki/Zilog_Z380
- Mitch Alsup 66000
- Hitachi Sh2 https://lists.j-core.org/pipermail/j-core/ http://shared-ptr.com/sh_insns.html
- 68080 except Length-Decode is a pig for Multi-Issue http://www.apollo-core.com/index.htm?page=coding&tl=1
Package Management
JTAG
-
Abstract
"The objective of this work is to design and implement a TAP controller IP core compatible with IEEE 1149.1-2013 revision of the standard. The test logic architecture also includes the Test Mode Persistence controller and its associated logic. This work is expected to serve as a ready to use module that can be directly inserted in to a new digital IC designs with little modifications."
Radix MMU
D-Cache
D-Cache Possible Optimizations papers and links
- ACDC: Small, Predictable and High-Performance Data Cache
- Stop Crying Over Your Cache Miss Rate: Handling Efficiently Thousands of Outstanding Misses in FPGAs
BW Enhancing Shared L1 Cache Design research done in cooperation with AMD
- Youtube video PACT 2020 - Analyzing and Leveraging Shared L1 Caches in GPUs
- Url to PDF of paper on author's website (clicking will download the pdf)
RTL Arithmetic SQRT, FPU etc.
Wallace vs Dadda Multipliers
Sqrt
- Fast Floating Point Square Root
- Reciprocal Square Root Algorithm
- Fast Calculation of Cube and Inverse Cube Roots Using a Magic Constant and Its Implementation on Microcontrollers (clicking will download the pdf)
- Modified Fast Inverse Square Root and Square Root Approximation Algorithms: The Method of Switching Magic Constants (clicking will download the pdf)
CORDIC and related algorithms
- https://bugs.libre-soc.org/show_bug.cgi?id=127 research into CORDIC
- https://bugs.libre-soc.org/show_bug.cgi?id=208
- BKM (log(x) and ex)
- CORDIC
- Does not have an easy way of computing tan(x)
- zipcpu CORDIC
- Low latency and Low error floating point TCORDIC (email Michael or Cole if you don't have IEEE access)
- http://www.myhdl.org/docs/examples/sinecomp/ MyHDL version of CORDIC
- https://dspguru.com/dsp/faqs/cordic/
IEEE Standard for Floating-Point Arithmetic (IEEE 754)
Almost all modern computers follow the IEEE Floating-Point Standard. Of course, we will follow it as well for interoperability.
- IEEE 754-2019: https://standards.ieee.org/standard/754-2019.html
Note: Even though this is such an important standard used by everyone, it is unfortunately not freely available and requires a payment to access. However, each of the Libre-SOC members already have access to the document.
Among other things, has a nice explanation on arithmetic, rounding modes and the sticky bit.
Nice resource on rounding errors (ulps and epsilon) and the "table maker's dilemma".
Past FPU Mistakes to learn from
- Intel Underestimates Error Bounds by 1.3 quintillion on Random ASCII – tech blog of Bruce Dawson
- Intel overstates FPU accuracy 06/01/2013
- How not to design an ISA https://player.vimeo.com/video/450406346 Meester Forsyth http://eelpi.gotdns.org/
Khronos Standards
The Khronos Group creates open standards for authoring and acceleration of graphics, media, and computation. It is a requirement for our hybrid CPU/GPU to be compliant with these standards as well as with IEEE754, in order to be commercially-competitive in both areas: especially Vulkan and OpenCL being the most important. SPIR-V is also important for the Kazan driver.
Thus the zfpacc proposal has been created which permits runtime dynamic switching between different accuracy levels, in userspace applications.
- SPIR-V 1.5 Specification Revision 1
- SPIR-V OpenCL Extended Instruction Set
- SPIR-V GLSL Extended Instruction Set
- OpenCL 2.2 API Specification
- OpenCL 2.2 Extension Specification
OpenCL released the proposed OpenCL 3.0 spec for comments in april 2020
- Announcement video slides (PDF)
Note: We are implementing hardware accelerated Vulkan and OpenCL while relying on other software projects to translate APIs to Vulkan. E.g. Zink allows for OpenGL-to-Vulkan in software.
Open Source (CC BY + MIT) DirectX specs (by Microsoft, but not official specs)
https://github.com/Microsoft/DirectX-Specs
Graphics and Compute API Stack
I found this informative post that mentions Kazan and a whole bunch of other stuff. It looks like many APIs can be emulated on top of Vulkan, although performance is not evaluated.
https://synappsis.wordpress.com/2017/06/03/opengl-over-vulkan-dev/
Pixilica is heading up an initiative to create a RISC-V graphical ISA
3D Graphics Texture compression software and hardware
Various POWER Communities
- An effort to make a 100% Libre POWER Laptop The T2080 is a POWER8 chip.
- Power Progress Community Supporting/Raising awareness of various POWER related open projects on the FOSS community
- OpenPOWER Promotes and ensure compliance with the Power ISA amongst members.
- OpenCapi High performance interconnect for POWER machines. One of the big advantages of the POWER architecture. Notably more performant than PCIE Gen4, and is designed to be layered on top of the physical PCIE link.
- OpenPOWER “Virtual Coffee” Calls Truly open bi-weekly teleconference lines for anybody interested in helping advance or adopting the POWER architecture.
Conferences
see conferences
Coriolis2
- LIP6's Coriolis - a set of backend design tools: https://www-soc.lip6.fr/equipe-cian/logiciels/coriolis/
Note: The rest of LIP6's website is in French, but there is a UK flag in the corner that gives the English version.
Logical Equivalence and extraction
- NETGEN
- CVC https://github.com/d-m-bailey/cvc
Klayout
- KLayout - Layout viewer and editor: https://www.klayout.de/
image to GDS-II
- https://nazca-design.org/convert-image-to-gds/
The OpenROAD Project
OpenROAD seeks to develop and foster an autonomous, 24-hour, open-source layout generation flow (RTL-to-GDS).
Other RISC-V GPU attempts
TODO: Get in touch and discuss collaboration
Tests, Benchmarks, Conformance, Compliance, Verification, etc.
RISC-V Tests
RISC-V Foundation is in the process of creating an official conformance test. It's still in development as far as I can tell.
- //TODO LINK TO RISC-V CONFORMANCE TEST
IEEE 754 Testing/Emulation
IEEE 754 has no official tests for floating-point but there are well-known third party tools to check such as John Hauser's TestFloat.
There is also his SoftFloat library, which is a software emulation library for IEEE 754.
Jacob is also working on an IEEE 754 software emulation library written in Rust which also has Python bindings:
- Source: https://salsa.debian.org/Kazan-team/simple-soft-float
- Crate: https://crates.io/crates/simple-soft-float
- Autogenerated Docs: https://docs.rs/simple-soft-float/
A cool paper I came across in my research is "IeeeCC754++ : An Advanced Set of Tools to Check IEEE 754-2008 Conformity" by Dr. Matthias Hüsken.
- Direct link to PDF: http://elpub.bib.uni-wuppertal.de/servlets/DerivateServlet/Derivate-7505/dc1735.pdf
Khronos Tests
OpenCL Conformance Tests
Vulkan Conformance Tests
MAJOR NOTE: We are not allowed to say we are compliant with any of the Khronos standards until we actually make an official submission, do the paperwork, and pay the relevant fees.
Formal Verification
Formal verification of Libre RISC-V ensures that it is bug-free in regards to what we specify. Of course, it is important to do the formal verification as a final step in the development process before we produce thousands or millions of silicon.
Possible way to speed up our solvers for our formal proofs https://web.archive.org/web/20201029205507/https://github.com/eth-sri/fastsmt
Algorithms (papers) submitted for 2018 International SAT Competition https://web.archive.org/web/20201029205239/https://helda.helsinki.fi/bitstream/handle/10138/237063/sc2018_proceedings.pdf https://web.archive.org/web/20201029205637/http://www.satcompetition.org/
- Minisail https://www.isa-afp.org/entries/MiniSail.html - compiler for SAIL into c
Some learning resources I found in the community:
- ZipCPU: http://zipcpu.com/ ZipCPU provides a comprehensive tutorial for beginners and many exercises/quizzes/slides: http://zipcpu.com/tutorial/
- Western Digital's SweRV CPU blog (I recommend looking at all their posts): https://tomverbeure.github.io/
- https://tomverbeure.github.io/risc-v/2018/11/19/A-Bug-Free-RISC-V-Core-without-Simulation.html
- https://tomverbeure.github.io/rtl/2019/01/04/Under-the-Hood-of-Formal-Verification.html
VAMP CPU
- Formal verification of a fully IEEE compliant floating point unit https://publikationen.sulb.uni-saarland.de/bitstream/20.500.11880/25760/1/ChristianJacobi_ProfDrWolfgangJPaul.pdf
- https://www-wjp.cs.uni-sb.de/forschung/projekte/VAMP/?lang=en
- the PVS/hw subfolder is under the 2-clause BSD license: https://www-wjp.cs.uni-sb.de/forschung/projekte/VAMP/PVS/hw/COPYRIGHT
- https://alastairreid.github.io/RelatedWork/papers/beyer:ijsttt:2006/
Automation
Bus Architectures
- Avalon https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/manual/mnl_avalon_spec.pdf
- CXM https://www.computeexpresslink.org/download-the-specification
Vector Processors
- THOR https://github.com/robfinch/Thor/blob/main/Thor2021/doc/Thor2021.pdf
- NEC SX-Aurora
- RVV
- MRISC32 https://github.com/mrisc32/mrisc32
LLVM
Adding new instructions:
Branch Prediction
Python RTL Tools
- https://ieeexplore.ieee.org/document/9591456 pylog fpga https://github.com/hst10/pylog
- Migen - a Python RTL
- Migen Tutorial
- There is a great guy, Robert Baruch, who has a good tutorial on nMigen. He also build an FPGA-proven Motorola 6800 CPU clone with nMigen and put the code and instructional videos online. There is now a page learning nmigen.
- Using our Python Unit Tests(old)
- https://chisel.eecs.berkeley.edu/api/latest/chisel3/util/DecoupledIO.html
Other
- https://github.com/chrische-xx/mpw7 10-bit SAR ADC
- Cray-1 Pocket Reference https://nitter.it/aka_pugs/status/1546576975166201856 https://ftp.libre-soc.org/cray-1-pocket-ref/ https://www.computerhistory.org/collections/catalog/102685876
- https://github.com/tdene/synth_opt_adders Prefix-tree generation scripts
- https://debugger.medium.com/why-is-apples-m1-chip-so-fast-3262b158cba2 N1
- https://codeberg.org/tok/librecell Libre Cell Library
- https://wiki.f-si.org/index.php/FSiC2019
- https://fusesoc.net
- https://www.lowrisc.org/open-silicon/
- http://fpgacpu.ca/fpga/Pipeline_Skid_Buffer.html pipeline skid buffer
- https://pyvcd.readthedocs.io/en/latest/vcd.gtkw.html GTKwave
- https://github.com/Ben1152000/sootty - console-based vcd viewer
- https://github.com/ics-jku/wal - Waveform Analysis
- http://www.sunburst-design.com/papers/CummingsSNUG2002SJ_Resets.pdf Synchronous Resets? Asynchronous Resets? I am so confused! How will I ever know which to use? by Clifford E. Cummings
- http://www.sunburst-design.com/papers/CummingsSNUG2008Boston_CDC.pdf Clock Domain Crossing (CDC) Design & Verification Techniques Using SystemVerilog, by Clifford E. Cummings In particular, see section 5.8.2: Multi-bit CDC signal passing using 1-deep / 2-register FIFO synchronizer.
- http://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-143.pdf Understanding Latency Hiding on GPUs, by Vasily Volkov
- Efabless "Openlane" https://github.com/efabless/openlane
- example of openlane with nmigen https://github.com/lethalbit/nmigen/tree/openlane
- Co-simulation plugin for verilator, transferring to ECP5 https://github.com/vmware/cascade
- Multi-read/write ported memories https://tomverbeure.github.io/2019/08/03/Multiport-Memories.html
- Data-dependent fail-on-first aka "Fault-tolerant speculative vectorisation" https://arxiv.org/pdf/1803.06185.pdf
- OpenPOWER Foundation Membership https://openpowerfoundation.org/membership/how-to-join/membership-kit-9-27-16-4/
- Clock switching (and formal verification) https://zipcpu.com/formal/2018/05/31/clkswitch.html
- Circuit of Compunit http://home.macintosh.garden/~mepy2/libre-soc/comp_unit_req_rel.html
- Circuitverse 16-bit https://circuitverse.org/users/17603/projects/54486
- Nice example model of a Tomasulo-based architecture, with multi-issue, in-order issue, out-of-order execution, in-order commit, with reservation stations and reorder buffers, and hazard avoidance. https://www.brown.edu/Departments/Engineering/Courses/En164/Tomasulo_10.pdf
- adrian_b architecture comparison https://news.ycombinator.com/item?id=24459041
- ericandecscent RISC-V https://libre-soc.org/irclog/%23libre-soc.2021-07-11.log.html
Real/Physical Projects
- Samuel's KC5 code
- https://chips4makers.io/blog/
- https://hackaday.io/project/7817-zynqberry
- https://github.com/efabless/raven-picorv32
- https://efabless.com
- https://efabless.com/design_catalog/default
- https://wiki.f-si.org/index.php/The_Raven_chip:_First-time_silicon_success_with_qflow_and_efabless
- https://mshahrad.github.io/openpiton-asplos16.html
ASIC tape-out pricing
Funding
Good Programming/Design Practices
- Liskov Substitution Principle
- Principle of Least Astonishment
- https://peertube.f-si.org/videos/watch/379ef007-40b7-4a51-ba1a-0db4f48e8b16
- http://flopoco.gforge.inria.fr/
- Fundamentals of Modern VLSI Devices https://groups.google.com/a/groups.riscv.org/d/msg/hw-dev/b4pPvlzBzu0/7hDfxArEAgAJ
12 skills summary
Analog Simulation
- https://github.com/Isotel/mixedsim
- http://www.vlsiacademy.org/open-source-cad-tools.html
- http://ngspice.sourceforge.net/adms.html
- https://en.wikipedia.org/wiki/Verilog-AMS#Open_Source_Implementations
Libre-SOC Standards
This list auto-generated from a page tag "standards":
Server setup
Testbeds
Really Useful Stuff
- https://github.com/im-tomu/fomu-workshop/blob/master/docs/requirements.txt
- https://github.com/im-tomu/fomu-workshop/blob/master/docs/conf.py#L39-L47
Digilent Arty
- https://store.digilentinc.com/pmod-sf3-32-mb-serial-nor-flash/
- https://store.digilentinc.com/arty-a7-artix-7-fpga-development-board-for-makers-and-hobbyists/
- https://store.digilentinc.com/pmod-vga-video-graphics-array/
- https://store.digilentinc.com/pmod-microsd-microsd-card-slot/
- https://store.digilentinc.com/pmod-rtcc-real-time-clock-calendar/
- https://store.digilentinc.com/pmod-i2s2-stereo-audio-input-and-output/
CircuitJS experiments
Logic Simulator 2
Features
- Micro-subset verilog-like DSL for coding the array of logic gates (parsed using Antlr)
- Monaco-based code editor with automatic linting/error reporting, smart indentation, code folding, hints
- IDE docking ui courtesy of JupyterLab's Lumino widgets
- Schematic visualisation courtesy of d3-hwschematic
- Testbench simulation with graphical trace output and schematic animation
- Circuit description as gates, boolean logic or verilog behavioural model
- Generate arbitrary outputs from truth table and Sum of Products or Karnaugh Map
[from the GitHub page. As of 2021/03/29]
ASIC Timing and Design flow resources
- https://www.linkedin.com/pulse/asic-design-flow-introduction-timing-constraints-mahmoud-abdellatif/
- https://www.icdesigntips.com/2020/10/setup-and-hold-time-explained.html
- https://www.vlsiguide.com/2018/07/clock-tree-synthesis-cts.html
- https://en.wikipedia.org/wiki/Frequency_divider
Geometric Haskell Library
- https://github.com/julialongtin/hslice/blob/master/Graphics/Slicer/Math/GeometricAlgebra.hs
- https://github.com/julialongtin/hslice/blob/master/Graphics/Slicer/Math/PGA.hs
- https://arxiv.org/pdf/1501.06511.pdf
- https://bivector.net/index.html
Handy Compiler Algorithms for SimpleV
Requires aligned registers:
More general:
TODO investigate
https://www.nextplatform.com/2022/08/22/the-expanding-cxl-memory-hierarchy-is-inevitable-and-good-enough/
https://github.com/idea-fasoc/OpenFASOC
https://www.quicklogic.com/2020/06/18/the-tipping-point/
https://www.quicklogic.com/blog/
https://www.quicklogic.com/2020/09/15/why-open-source-ecosystems-make-good-business-sense/
https://www.quicklogic.com/qorc/
https://en.wikipedia.org/wiki/RAD750
The RAD750 system has a price that is comparable to the RAD6000, the latter of which as of 2002 was listed at US$200,000 (equivalent to $284,292 in 2019).
https://theamphour.com/525-open-fpga-toolchains-and-machine-learning-with-brian-faith-of-quicklogic/
https://github.blog/2021-03-22-open-innovation-winning-strategy-digital-sovereignty-human-progress/
https://github.com/olofk/edalize
https://github.com/hdl/containers
https://twitter.com/OlofKindgren/status/1374848733746192394
You might also want to check out https://umarcor.github.io/osvb/index.html
https://www.linkedin.com/pulse/1932021-python-now-replaces-tcl-all-besteda-apis-avidan-efody/
“TCL has served us well, over the years, allowing us to provide an API, and at the same time ensure nobody will ever use it. I will miss it”.
https://sphinxcontrib-hdl-diagrams.readthedocs.io/en/latest/examples/comb-full-adder.html
https://sphinxcontrib-hdl-diagrams.readthedocs.io/en/latest/examples/carry4.html
FuseSoC is used by MicroWatt and Western Digital cores
OpenTitan also uses FuseSoC
LowRISC is UK based
https://antmicro.com/blog/2020/12/ibex-support-in-verilator-yosys-via-uhdm-surelog/
https://cirosantilli.com/x86-paging
https://stackoverflow.com/questions/18431261/how-does-x86-paging-work
http://denninginstitute.com/modules/vm/red/i486page.html
https://m.slashdot.org/story/391021 - mirror neural atrophy results in destruction of empathy