Title: An introduction to Libre-SOC and Cray-style Vectors for OpenPOWER
The current HPC / Supercomputer market has few top contenders: China ICT Loongson 3, Intel / AMD x86 and IBM POWER9/10. None of these have the features of one of the most iconic Supercomputer ISA: Cray-style Vectors, known historically for compact program size and massive number-crunching capability.
Libre-SOC is working under the close watchful eye of the OpenPOWER Foundation to develop Draft Extensions to the Power ISA, extending the already-proven Supercomputing-grade Power ISA with Cray-style Vectors. This talk will give a brief overview of the progress and roadmap.
Bio
Luke Kenneth Casson Leighton specialises in Libre Ethical Technology. He has been using, programming and reverse-engineering computing devices continuously for 44 years, has a BEng (Hons), ACGI, in Theory of Computing from Imperial College, and recently put that education to good use in the form of the Libre-SOC Project: an entirely Libre-Licensed 3D Hybrid CPU-VPU-GPU based on OpenPOWER. He writes poetry and has been developing a HEP Physics theory for the past 36 years in his spare time.
Links
- https://twitter.com/OpenPOWERorg/status/1408093712664567809?s=20
- https://www.linkedin.com/posts/openpower-foundation_openpower-workshop-at-cineca-activity-6813861142486745088-yWD_
- June 30th 7.30 am EST to 10.30 am EST
- https://academy.cineca.it/en/events/openpower-workshop
- https://www.hpc.cineca.it/center_news/openpower-workshop-cineca-june-30th-2021
- Video https://www.youtube.com/watch?v=gexy0z1YqFY
Talk links
- SVP64 Overview https://libre-soc.org/openpower/sv/overview/
- SIMD Considered harmful (massive understatement) https://www.sigarch.org/simd-instructions-considered-harmful/
- Carnegie course on Vector Processors https://course.ece.cmu.edu/~ece740/f13/lib/exe/fetch.php?media=seth-740-fall13-module5.1-simd-vector-gpu.pdf
- IAXPY AVX512 (quite shocking) https://godbolt.org/z/f8a7PMPWc
- 250 lines of hand-crafted assembler for VSX strncpy https://patchwork.ozlabs.org/project/glibc/patch/20200929152103.18564-1-rzinsly@linux.ibm.com/
- under 20 lines for Vectorised strncpy https://github.com/plctlab/rvv-benchmark/blob/master/strncpy.s
- IAXPY for VSX (around 60 lines of assembler) https://godbolt.org/z/4oGjTe8Ko
- FFMPEG MP3 code snippet inner loop https://ffmpeg.org/doxygen/3.1/mpegaudiodsp__template_8c_source.html#l00121
- FFMPEG MP3 assembler, 450 lines https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=media/audio/mp3/mp3_0_apply_window_float.s;hb=HEAD
- FFMPEG SVP64 MP3, under 100 lines https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=media/audio/mp3/mp3_0_apply_window_float_basicsv.s;hb=HEAD
- Cooley Tukey FFT algorithm https://en.wikipedia.org/wiki/Cooley%E2%80%93Tukey_FFT_algorithm#Data_reordering,_bit_reversal,_and_in-place_algorithms
- in-place FFT Butterfky https://en.wikipedia.org/wiki/File:DIT-FFT-butterfly.png
- SVP64 bit-reverse LOAD https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_ldst.py;hb=HEAD
- SVP64 twin +/- Vectorised FMAC https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_fft.py;h=6cb2b522be85a2f86a0b505d1878dbcec645cb90;hb=8dfffc9c2ff7bb91715500160d1b057f9bef3ba0
- not part of the video, more info about REMAP remap