Libre-SOC Test API

A problem with complex specifications is ensuring Compliance. Whilst having Compliance Documentation is obviously critical (stating in writing what results are expected when an instruction is executed) actually checking that the results are as expected is both tedious, protracted and required to be meticulous to such an extreme extent that to expect it to be carried out manually is absurd. The sensible thing to do is to automate the process of testing Compliance, with a Test Suite.

However even if a Test Suite is available, it firstly may be specific to one implementation, or only be installable on one type of system, or only work under other certain very specific circumstances (such as requiring a bare metal machine but not function fully under a Virtual Machine, or vice-versa).

It is clearly of benefit to make it easy to run a Compliance Test Suite for a given Instruction Set Architecture, but more than that it is important to make it easy to run against multiple disparate systems, all of which implement that same ISA.

Thus we have some requirements: in Software Engineering terms we have a Requirements Specification. The Test API must:

be comprehensive and thorough
run a full suite of tests sufficient to confirm that an implementation is 100% Compliant with every instruction to be implemented
be generic (runnable or adaptable to run on multiple systems)
be open so as to encourage wider adoption, increasing the probability of its use and thus avoiding catastrophic implementation mistakes.

The beginnings of this API came out of a need to "bootstrap" the Libre-SOC implementation of:

the Machine-readable Specification - semi-automated conversion of the Power ISA 3.0 PDF pseudo-code to an actual executable language - which produced a slow but functional python-based Simulator (ineptly named ISACaller)
the nmigen-based hardware implementation (several variants) again ineptly named TestIssuer
QEMU

The reason chosen right at the start of the project for including QEMU in the list is because it is a functional bare-metal "executor" of Power ISA instructions, known and confirmed (for the most part) to be accurate in its implementation (caveat: the Floating-Point execution is in no way accurate, relying on the underlying Host Operating System and Hardware. Therefore if executed on a PowerPC system the FP guest execution produces correct answers but if executed on an Intel AMD or ARM system the guest produces wrong answers in certain cases).

As the test suite grew (list of "known-good answers to certain instructions" grew) this list became "expected" results, and in and of themselves became an important integral part of the API. Initial exploration was to simply bootstrap against other implementations - run a test against QEMU, then run the exact same instruction(s) against ISACaller, and if there are discrepancies find out why and fix them.

However when errors in QEMU were discovered, this was the point where it was realised that "expected results" needed to be meticulously and laboriously created. The context here : the official IBM / OpenPOWER ISA Compliance Test Suite is in a non-machine-readable format that would take months to massage into a machine-readable form. It is deemed that this is too risky to do because it should be obvious that simple "transcription" errors from manual typing or mis-reading could be catastrophic. Therefore the Official Compliance Test Suite is excluded until such time as IBM / OpenPOWER releases a machine-readable Compliance Test Suite.

At present (time of writing) the Test API can (very slowly) run tests against QEMU and also run the exact same test(s) against ISACaller and TestIssuer. Over time a suite of "expected results" was also extracted from the organically-evolving API into a more rigorous format, which gives:

the starting state of the implementation being tested (memory contents, register contents, Program Counter)
the instructions to be executed
the expected ending-state of both memory and registers, post-execution.

Thus it is expected that each implementation helps verify others, and, ironically and just as importantly, verify that the actual Power ISA specification is correct and its Compliance Suite also sufficiently comprehensive to ensure implementation mistakes do not occur!

The next most logically-obvious thing to do is to extend the API to allow additional implementations to easily be included, in a modular extensible fashion, and to formalise and document the API itself. This was where bug 985 came into play: to add the expected new Simulator (cavatools) in as another option to triage against.

Cavatools already has the same gdb debug interface as QEMU, so it is expected to be reasonably straightforward to add. The qemu interaction utilises a python-based gdbremote client to first upload a binary into memory, followed by commands to set the contents of registers, including the Program Counter. A breakpoint is set, at the end of the program. Finally QEMU is permitted to run, and following the trigger of the breakpoint, memory contents and register contents are dumped (over the gdb-remote interfacce) and from that they can be compared against other implementations - or the Expected Results - or the Compliance Test Suite expected responses. Cavatools is expected to follow the exact same process, using the exact same python-based gdbremote client.