Choosing a Scheme implementation for a GNAT bootstrap compiler

Abstract:

Table of Contents

Finer grained comparison of selected Scheme implementations

The following chart compares the different Scheme (and other languages) implementations that may be used to bootstrap GNAT within the live-bootstrap procedure.

All of the following implementations have the following properties:

  • Have no pregenerated files (aka, easy to bootstrap).
  • Can be compiled with TCC, the Tiny C compiler.
  • Can be compiled with no dependencies appart from TCC and LibC; GNU make is allowed.
  • Have been tested in a RISC-V Linux environment.
  • Are somewhat small.
  • Are reproducible.

The lines of code statistic was performed with the tool cloc. Test files, examples and other non-required files are taken out. Notice, that library files are taken into account. The count was taken on the <2022-11-19 Sa> using the master branch or development snapshot of the programs with the exception of Lua, where v5.4 was used.

Compiled size refers to compilation of the required files using GCC, optimized for size (-Os), staticly linked, no debugging symbols and stripped.

Implementation License Status1 Type Standard Library support Lines of C code Lines of !C code RISC-V compiled size Performance2 Call/use C code Theoretical step in live-boostrap Step in live-bootstrap3 Can be X-compiled?
Chibi Scheme BSDv3 Developed VM, compiler4 R7RS Small/Red Scheme Red 27_000 62_100 807K/1.1M5 Quite slow Yes min 7, likely 20   Yes, but with several steps
Ol MIT and LGPLv3 Developed VM, compiler (VM) R7RS Small Large 4_400 40_000 1.1M ??? Yes min 6, likely 20   Yes, setup is not straight forward
Owl MIT Unmaintained6 VM, compiler4 R7RS subset Minimal 1_400 18_000 1.2M Slowest No min 6, likely 20   Most likely no
S7 BSD-0 Stable Interpreter, compiler4 R5RS, R7RS Small Medium 84_000 2_700 1.5M Best Yes, see comment min 6, likely 20   Yes, but not the bindings to C programs
TR7 BSD-0 Developed Interpreter R7RS Small SRFI-{1,69,111} 14_500 1_200 607K Good No, embeddable min 6, likely 20   Yes
Easy-ISLisp BSDv2 Stable Interpreter, compiler4 ISLisp Medium 29_400 7_900 699K ??? Yes min 7, likely 20   Yes, but not compiled code
Lua MIT Stable VM - Small 20_000 - 696K Good Yes min 7, likely 20   Yes
JimTCL BSDv2 Stable Interpreter - Large 30_800 6_0007 552K Good? Yes min 6, likely 20   Yes

Choice of an implementation for a GNAT bootstrap compiler

License

An FSF approved license is a requirement.

  • All implementations fulfill this requirement.

We are looking for the most permissive license possible.

  • All implementations have very liberal licenses. In the case of Ol, it is dual licensed, so there is no problem there.

Status

Maintained, developed and stable implementations are preferred.

  • All but Owl complete this requirement.

Type

Any type would do. However, implementations that can compile their code to C (which can then later be compiled to an executable) may have an advantage for being more performant and producing final binaries which would simplify the development.

  • All but Owl complete this requirement.

Standard

The target language should have a complete (or close to complete) implementation of a defined standard. This would make relying on any such standarised implementation much easier and portable.

  • Lua and JimTCL do not have any standard. Lua follows its own design, while JimTCL implements a subset of the main TCL implementation. They have been added in order to provide a reference.

Library support

While a larger library support is great, we are looking to have a somewhat minimal and portable system. Therefore, we only expect the use of a reduced number of external libraries. The target language is R7RS small. This data point is given for reference only.

Lines of C code

This metric shows the ammount of lines of code written in C that would be required in order to run any given implementation. The reason this metric is added, is to indicate the ammount of checking, debugging (actual bugs and reproducibility bugs) that would be necessary in order use the chosen implementation.

The lower the better.

Lines of Scheme (!C) code

Similarly to the lines of C code metric, we want to know the ammount of work that would be required to go through the code for each implementation. In the case of lines written in Scheme, the libraries from each implementation are counted. Not all of the Scheme lines are critical to the correct operation of any given implementation.

The lower the better.

RISC-V compiled size

This metric gives a simple overview of the final weight of the binary for each implementation. The sizes shown in this section are by themselves worthless. A more valuable metric would be peak memory and CPU usage while running some bechmark suite.

The lower the better?

Performance

The performance metric only gives a hint towards how fast a given implementation may be. For our purpose, a lower memory usage is of greater importance.

Nonetheless, faster is better.

Call/use C code

For the purpose of creating a GNAT bootstrap compiler, it may not be entirely necessary to call or use some (compiled) C code. However, having the ability to do so could come in handy.

  • All but TR7 and Owl can call C code. TR7 can, however, be embedded.

Theoretical step in live-bootstrap

This indicates how soon an implementatio could be compilen inside the live-bootstrap procedure. All implementations could be compiled very early.

The sooner the better.

Comments on each implementation

Chibi Scheme

Chibi seems to be one of the most complete and smallest implementations available. It is also the slowest when compared to S7 and TR7.

It has a wealth of libraries and SRFIs, which would be incredibly useful. However, since no other implementations will have them, it will make the bootstrap compiler dependent on Chibi. This is not bad if it works as we want it but it is also not a good thing.

Chibi has another great advantage, it can compile Scheme code to C code and then generate binaries. This would make deploying and using the bootstrap compiler much easier. It would possibly speed up the execution.

One drawback is its non-trivial compilation sequence. GNU Make is a hard requirement for it. GNU Make 3.8 may not be able to handle Chibi, which would push it further in the live-bootstrap procedure.

Ol Scheme/Lisp

It's official name is Owl Lisp, however, it is mostly R7RS Small compliant, see list of differences.

One advantage is its incredibly small C core size. It also comes with quite a few libraries that can be useful, but the same comment as with Chibi applies.

Sadly, TCC support requires a small set of patches, which could be upstreamed. The C code is also not very clean and well-structured. A lot of comments are in russian, which does not help.

Owl Scheme

It is a functional implementation/interpretation of Scheme. It has the smallest C core of all of the implementations. It also compiles itself by interpreting the Scheme compiler.

However, Owl seems unmaintained, has some issues with its compilation (non reproducible failures) and it is also limited while not being performant (as expected).

S7 Scheme

S7 is probalby the fastest implementation by far. It is used in snd for audo processing. It comes with a wealth of modules, most of them imprelemented in C.

It has the largest C codebase by far. It also requres a native C compiler to correctly transpile and use Scheme files. It is also very difficult (if not near impossible) to cross-compile.

TR7

It is one of the smallest, fully Schme Small compliant implementation that strives for minimalism, performance and portability.

It is however, not capable of generating native executables. Nonetheless, José, the creator/continuator of TR7 has been incredibly helpful and receptive towards my feedback.

Easy-ISLisp

It in Lisp implementation that conforms to the ISLisp standard. ISLisp tries to put toghether the main features of all the Lisp types/implementations. This allows Easy-ISLisp to be fairly small while very capable.

The main issue I see with Easy-ISLisp is that it is the only ISLisp implementation that is fully open, minimal and portable. This means, If we were to use it, we would be stuck with it forever. On a positive note, is is fairly stable and develpment has switched to the implementation of libraries and large utilities.

Lua

I wanted to take a look at Lua to see what a "minimal" programming language looked like. I was pleantly surprised to see, that the reference implementation is about 20000 lines of C code. That is indeed very small!

However, we will not consider Lua for the bootstrapping compiler and its metaprogramming features are lacking compared to Scheme and Lisp.

JimTCL

JimTCL is a small implementation of the TCL programming language. TCL is by no means minimal in terms of its implementation. However, it tends to get used in embedded environments. For that reason, I took a look at JimTCL.

It sadly uses the autosetup configuration tool, which adds a lot of weight to the bootstrapping chain. However, JimTCL is itself quite minimal and powerful, whit batteries included if necessary.

TODO Selection of an implementation

Footnotes:

1

"Stable" refers to implementations that are mostly done and now receive bug fixes only. Though new features may be added from time to time.

2

The performance metric is a purely indicational one, nothing to be taken seriously. 10 indicates best, 0 indicates worst. Some results have been taken from the R7RS benchmark suite.

3

On the <2022-11-13 So>, still to be determined.

4

Requires a C compiler on the host.

5

The lowest size is without all the libraries bundled.

6

And it does not seem to be fully complete.

7

Generated using:

cloc jimsh.c jim-subcmd.c jim-interactive.c utf8.c jimregexp.c jim.c jim-nosignal.c jimiocompat.{c,h} jim-subcmd.h utf8.h jimregexp.h jim.h jim-nosignal.c autosetup/
github.com/AlDanial/cloc v 1.95  T=5.71 s (4.6 files/s, 9048.8 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C                                9           6109           5994          30019
Bourne Shell                     4            388           1028           4742
Tcl/Tk                           8            170            621           1193
C/C++ Header                     5            174            434            787
-------------------------------------------------------------------------------
SUM:                            26           6841           8077          36741
-------------------------------------------------------------------------------

Date: 2022-04-30 Sa 00:00

Author: Fernando Oleo Blanco

Email: irvise _at_ irvise.xyz

Created: 2022-11-27 So 17:32

Emacs 27.1 (Org mode 9.4.4)

Validate