Choosing a Scheme implementation for a GNAT bootstrap compiler


The limits of my language means the limits of my world.
—Ludwig Wittgenstein (1889 -- 1951)

Finer grained comparison of selected Scheme implementations

The following chart compares the different Scheme (and other languages) implementations that may be used to bootstrap GNAT within the live-bootstrap procedure.

All of the following implementations have the following properties:

  • Have no pregenerated files (aka, easy to bootstrap).
  • Can be compiled with TCC, the Tiny C compiler.
  • Can be compiled with no dependencies appart from TCC and LibC; GNU make is allowed.
  • Have been tested in a RISC-V Linux environment.
  • Are somewhat small.
  • Are reproducible.

The lines of code statistic was performed with the tool cloc. Test files, examples and other non-required files are taken out. Notice, that library files are taken into account. The count was taken on the 2022-11-19 using the master branch or development snapshot of the programs with the exception of Lua, where v5.4 was used.

Compiled size refers to compilation of the required files using GCC, optimized for size (-Os), staticly linked, no debugging symbols and stripped.

ImplementationLicenseStatus1TypeStandardLibrary supportLines of C codeLines of !C codeRISC-V compiled sizePerformance2Call/use C codeTheoretical step in live-boostrapStep in live-bootstrap3Can be X-compiled?
Chibi SchemeBSDv3DevelopedVM, compiler4R7RS Small/RedScheme Red2700062100807K/1.1M5Quite slowYesmin 7, likely 20Yes, but with several steps
Ol See6MIT and LGPLv3DevelopedVM, compiler (VM)R7RS SmallLarge4400400001.1M???Yesmin 6, likely 20Yes, setup is not straight forward
S7BSD-0StableInterpreter, compiler7R5RS, R7RS SmallMedium8400027001.5MBestYes, see commentmin 6, likely 20Yes, but not the bindings to C programs
TR7BSD-0DevelopedInterpreterR7RS SmallSRFI-{1,69,111}145001200607KGoodNo, embeddablemin 6, likely 20Yes
Easy-ISLispBSDv2StableInterpreter, compiler8ISLispMedium294007900699K???Yesmin 7, likely 20Yes, but not compiled code
LuaMITStableVM-Small20000-696KGoodYesmin 7, likely 20Yes
JimTCLBSDv2StableInterpreter-Large3080060009552KGood?Yesmin 6, likely 20Yes

Choice of an implementation for a GNAT bootstrap compiler

License

An FSF approved license is a requirement.

  • All implementations fulfill this requirement.

We are looking for the most permissive license possible.

  • All implementations have very liberal licenses. In the case of Ol, it is dual licensed, so there is no problem there.

Status

Maintained, developed and stable implementations are preferred.

  • All implementations complete this requirement.

Type

Any type would do. However, implementations that can compile their code to C (which can then later be compiled to an executable) may have an advantage for being more performant and producing final binaries which would simplify the development.

  • All implementations complete this requirement.

Standard

The target language should have a complete (or close to complete) implementation of a defined standard. This would make relying on any such standarised implementation much easier and portable.

  • Lua and JimTCL do not have any standard. Lua follows its own design, while JimTCL implements a subset of the main TCL implementation. They have been added in order to provide a reference.

Library support

While a larger library support is great, we are looking to have a somewhat minimal and portable system. Therefore, we only expect the use of a reduced number of external libraries. The target language is R7RS small. This data point is given for reference only.

Lines of C code

This metric shows the ammount of lines of code written in C that would be required in order to run any given implementation. The reason this metric is added, is to indicate the ammount of checking, debugging (actual bugs and reproducibility bugs) that would be necessary in order use the chosen implementation.

The lower the better.

Lines of Scheme (!C) code

Similarly to the lines of C code metric, we want to know the ammount of work that would be required to go through the code for each implementation. In the case of lines written in Scheme, the libraries from each implementation are counted. Not all of the Scheme lines are critical to the correct operation of any given implementation.

The lower the better.

RISC-V compiled size

This metric gives a simple overview of the final weight of the binary for each implementation. The sizes shown in this section are by themselves worthless. A more valuable metric would be peak memory and CPU usage while running some bechmark suite.

The lower the better?

Performance

The performance metric only gives a hint towards how fast a given implementation may be. For our purpose, a lower memory usage is of greater importance.

Nonetheless, faster is better.

Call/use C code

For the purpose of creating a GNAT bootstrap compiler, it may not be entirely necessary to call or use some (compiled) C code. However, having the ability to do so could come in handy.

  • All implementations but TR7 can call C code. TR7 can, however, be embedded.

Theoretical step in live-bootstrap

This indicates how soon an implementatio could be compilen inside the live-bootstrap procedure. All implementations could be compiled very early.

The sooner the better.

Comments on each implementation

Chibi Scheme

Chibi seems to be one of the most complete and smallest implementations available. It is also the slowest when compared to S7 and TR7.

It has a wealth of libraries and SRFIs, which would be incredibly useful. However, since no other implementations will have them, it will make the bootstrap compiler dependent on Chibi. This is not bad if it works as we want it but it is also not a good thing.

Chibi has another great advantage, it can compile Scheme code to C code and then generate binaries. This would make deploying and using the bootstrap compiler much easier. It would possibly speed up the execution.

One drawback is its non-trivial compilation sequence. GNU Make is a hard requirement for it. GNU Make 3.8 may not be able to handle Chibi, which would push it further in the live-bootstrap procedure.

Ol Scheme/Lisp10

It’s official name is Otous Lisp, however, it is mostly R7RS Small compliant, see list of differences.

One advantage is its incredibly small C core size. It also comes with quite a few libraries that can be useful, but the same comment as with Chibi applies.

Sadly, TCC support requires a small set of patches, which could be upstreamed. The C code is also not very clean and well-structured. A lot of comments are in russian, which does not help.

NOTE: as of 2022-12-06 a PR allowing TCC to compile a basic VM binary has beed created. However, it seems that Ol requires a binary to compile: ./repl. An issue has been created to clarify this issue.

NOTE 2: on 2022-12-10 Ol was “taken out” of the table as it requires a binary11. It is being left in this article as it could be possible to take the binary out of the compilatin chain in the future. Ol is also much smaller than the rest, so there is value and merit to it and therefore I would like to keep it here.

Owl Scheme

Owl is a functional implementation/interpretation of Scheme. It has the smallest C core of all of the implementations. It also compiles itself by interpreting the Scheme compiler.

However, Owl seems unmaintained, has some issues with its compilation (non reproducible failures) and it is also limited while not being performant (as expected).

On 2022-12-06 it was taken out from the table as it requires a binary to compile itself. The offending file is ./fasl/init.fasl.

On 2022-12-26 Owl has been readed to the list as a process to solve its bootstrapping has started12.

S7 Scheme

S7 is probalby the fastest implementation by far. It is used in snd for audo processing. It comes with a wealth of modules, most of them imprelemented in C.

It has the largest C codebase by far. It also requres a native C compiler to correctly transpile and use Scheme files. It is also very difficult (if not near impossible) to cross-compile.

TR7

It is one of the smallest, fully Schme Small compliant implementation that strives for minimalism, performance and portability.

It is however, not capable of generating native executables. Nonetheless, José, the creator/continuator of TR7 has been incredibly helpful and receptive towards my feedback.

Easy-ISLisp

It in Lisp implementation that conforms to the ISLisp standard. ISLisp tries to put toghether the main features of all the Lisp types/implementations. This allows Easy-ISLisp to be fairly small while very capable.

The main issue I see with Easy-ISLisp is that it is the only ISLisp implementation that is fully open, minimal and portable. This means, If we were to use it, we would be stuck with it forever. On a positive note, is is fairly stable and develpment has switched to the implementation of libraries and large utilities.

Lua

I wanted to take a look at Lua to see what a “minimal” programming language looked like. I was pleasantly surprised to see, that the reference implementation is about 20000 lines of C code. That is indeed very small!

However, we will not consider Lua for the bootstrapping compiler as its metaprogramming features are lacking compared to Scheme and Lisp. However, it is worth mentioning Fennel, a Lisp dialect based on Lua. Versions 0.5.0 or newer cannot be bootstrapped without previously generated scripts. The latest (cleanly) bootstrappable version is 0.4.2, which does run on RISC-V.

JimTCL

JimTCL is a small implementation of the TCL programming language. TCL is by no means minimal in terms of its implementation. However, it tends to get used in embedded environments. For that reason, I took a look at JimTCL.

It sadly uses the autosetup configuration tool, which adds a lot of weight to the bootstrapping chain. However, JimTCL is itself quite minimal and powerful, whit batteries included if necessary.

Selection of an implementation

The final selection for an implementation should be an open process. As discussed in the #ada IRC channel, a Fossil forum will be setup for those discussions.

Footnotes

Changes

On the 2022-12-06 Owl was taken out of the table as it needs a precompiled binary to compile itself. The binary is ./fasl/init.fasl.

Footnotes

  1. ”Stable” refers to implementations that are mostly done and now receive bug fixes only. Though new features may be added from time to time.

  2. The performance metric is a purely indicational one, nothing to be taken seriously. 10 indicates best, 0 indicates worst. Some results have been taken from the R7RS benchmark suite.

  3. On the 2022-11-13, still to be determined.

  4. Requires a C compiler on the host.

  5. The lowest size is without all the libraries bundled.

  6. Ol requires a pregenerated FASL binary repl, therefore it is not valid. On 2022-12-26 a process to generate the repl binary from source has been started. This is expected to work with third party solutions, which would solve the bootstrapping problem. See the Github issue for more information.

  7. Requires a C compiler on the host.

  8. Requires a C compiler on the host.

  9. Generated using:

    cloc jimsh.c jim-subcmd.c jim-interactive.c utf8.c jimregexp.c jim.c jim-nosignal.c jimiocompat.{c,h} jim-subcmd.h utf8.h jimregexp.h jim.h jim-nosignal.c autosetup/
    github.com/AlDanial/cloc v 1.95  T=5.71 s (4.6 files/s, 9048.8 lines/s)
    -------------------------------------------------------------------------------
    Language                     files          blank        comment           code
    -------------------------------------------------------------------------------
    C                                9           6109           5994          30019
    Bourne Shell                     4            388           1028           4742
    Tcl/Tk                           8            170            621           1193
    C/C++ Header                     5            174            434            787
    -------------------------------------------------------------------------------
    SUM:                            26           6841           8077          36741
    -------------------------------------------------------------------------------
    
  10. Ol requires a pregenerated FASL binary repl, therefore it is not valid. On 2022-12-26 a process to generate the repl binary from source has been started. This is expected to work with third party solutions, which would solve the bootstrapping problem. See the Github issue for more information.

  11. Ol requires a pregenerated FASL binary repl, therefore it is not valid. On 2022-12-26 a process to generate the repl binary from source has been started. This is expected to work with third party solutions, which would solve the bootstrapping problem. See the Github issue for more information.

  12. Ol requires a pregenerated FASL binary repl, therefore it is not valid. On 2022-12-26 a process to generate the repl binary from source has been started. This is expected to work with third party solutions, which would solve the bootstrapping problem. See the Github issue for more information.