Choosing a Scheme implementation for a GNAT bootstrap compiler
The limits of my language means the limits of my world.
—Ludwig Wittgenstein (1889 -- 1951)
Finer grained comparison of selected Scheme implementations
The following chart compares the different Scheme (and other languages) implementations that may be used to bootstrap GNAT within the live-bootstrap procedure.
All of the following implementations have the following properties:
- Have no pregenerated files (aka, easy to bootstrap).
- Can be compiled with TCC, the Tiny C compiler.
- Can be compiled with no dependencies appart from TCC and LibC; GNU make is allowed.
- Have been tested in a RISC-V Linux environment.
- Are somewhat small.
- Are reproducible.
The lines of code statistic was performed with the tool cloc
. Test
files, examples and other non-required files are taken out. Notice, that
library files are taken into account. The count was taken on the
2022-11-19
using the master branch or development snapshot of the
programs with the exception of Lua, where v5.4
was used.
Compiled size refers to compilation of the required files using GCC
,
optimized for size (-Os
), staticly linked, no debugging symbols and
stripped.
Implementation | License | Status1 | Type | Standard | Library support | Lines of C code | Lines of !C code | RISC-V compiled size | Performance2 | Call/use C code | Theoretical step in live-boostrap | Step in live-bootstrap3 | Can be X-compiled? |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Chibi Scheme | BSDv3 | Developed | VM, compiler4 | R7RS Small/Red | Scheme Red | 27000 | 62100 | 807K/1.1M5 | Quite slow | Yes | min 7, likely 20 | Yes, but with several steps | |
Ol See6 | MIT and LGPLv3 | Developed | VM, compiler (VM) | R7RS Small | Large | 4400 | 40000 | 1.1M | ??? | Yes | min 6, likely 20 | Yes, setup is not straight forward | |
S7 | BSD-0 | Stable | Interpreter, compiler7 | R5RS, R7RS Small | Medium | 84000 | 2700 | 1.5M | Best | Yes, see comment | min 6, likely 20 | Yes, but not the bindings to C programs | |
TR7 | BSD-0 | Developed | Interpreter | R7RS Small | SRFI-{1,69,111} | 14500 | 1200 | 607K | Good | No, embeddable | min 6, likely 20 | Yes | |
Easy-ISLisp | BSDv2 | Stable | Interpreter, compiler8 | ISLisp | Medium | 29400 | 7900 | 699K | ??? | Yes | min 7, likely 20 | Yes, but not compiled code | |
Lua | MIT | Stable | VM | - | Small | 20000 | - | 696K | Good | Yes | min 7, likely 20 | Yes | |
JimTCL | BSDv2 | Stable | Interpreter | - | Large | 30800 | 60009 | 552K | Good? | Yes | min 6, likely 20 | Yes |
Choice of an implementation for a GNAT bootstrap compiler
License
An FSF approved license is a requirement.
- All implementations fulfill this requirement.
We are looking for the most permissive license possible.
- All implementations have very liberal licenses. In the case of Ol, it is dual licensed, so there is no problem there.
Status
Maintained, developed and stable implementations are preferred.
- All implementations complete this requirement.
Type
Any type would do. However, implementations that can compile their code to C (which can then later be compiled to an executable) may have an advantage for being more performant and producing final binaries which would simplify the development.
- All implementations complete this requirement.
Standard
The target language should have a complete (or close to complete) implementation of a defined standard. This would make relying on any such standarised implementation much easier and portable.
- Lua and JimTCL do not have any standard. Lua follows its own design, while JimTCL implements a subset of the main TCL implementation. They have been added in order to provide a reference.
Library support
While a larger library support is great, we are looking to have a somewhat minimal and portable system. Therefore, we only expect the use of a reduced number of external libraries. The target language is R7RS small. This data point is given for reference only.
Lines of C code
This metric shows the ammount of lines of code written in C that would be required in order to run any given implementation. The reason this metric is added, is to indicate the ammount of checking, debugging (actual bugs and reproducibility bugs) that would be necessary in order use the chosen implementation.
The lower the better.
Lines of Scheme (!C) code
Similarly to the lines of C code metric, we want to know the ammount of work that would be required to go through the code for each implementation. In the case of lines written in Scheme, the libraries from each implementation are counted. Not all of the Scheme lines are critical to the correct operation of any given implementation.
The lower the better.
RISC-V compiled size
This metric gives a simple overview of the final weight of the binary for each implementation. The sizes shown in this section are by themselves worthless. A more valuable metric would be peak memory and CPU usage while running some bechmark suite.
The lower the better?
Performance
The performance metric only gives a hint towards how fast a given implementation may be. For our purpose, a lower memory usage is of greater importance.
Nonetheless, faster is better.
Call/use C code
For the purpose of creating a GNAT bootstrap compiler, it may not be entirely necessary to call or use some (compiled) C code. However, having the ability to do so could come in handy.
Theoretical step in live-bootstrap
This indicates how soon an implementatio could be compilen inside the live-bootstrap procedure. All implementations could be compiled very early.
The sooner the better.
Comments on each implementation
Chibi Scheme
Chibi seems to be one of the most complete and smallest implementations available. It is also the slowest when compared to S7 and TR7.
It has a wealth of libraries and SRFIs, which would be incredibly useful. However, since no other implementations will have them, it will make the bootstrap compiler dependent on Chibi. This is not bad if it works as we want it but it is also not a good thing.
Chibi has another great advantage, it can compile Scheme code to C code and then generate binaries. This would make deploying and using the bootstrap compiler much easier. It would possibly speed up the execution.
One drawback is its non-trivial compilation sequence. GNU Make is a hard requirement for it. GNU Make 3.8 may not be able to handle Chibi, which would push it further in the live-bootstrap procedure.
Ol Scheme/Lisp10
It’s official name is Otous Lisp, however, it is mostly R7RS Small compliant, see list of differences.
One advantage is its incredibly small C core size. It also comes with quite a few libraries that can be useful, but the same comment as with Chibi applies.
Sadly, TCC support requires a small set of patches, which could be upstreamed. The C code is also not very clean and well-structured. A lot of comments are in russian, which does not help.
NOTE: as of 2022-12-06
a PR allowing TCC to compile a basic VM
binary has beed created. However, it seems that
Ol requires a binary to compile:
./repl
. An issue has been created to clarify this issue.
NOTE 2: on 2022-12-10
Ol
was “taken out” of the table as it requires a binary11. It is being
left in this article as it could be possible to take the binary out of
the compilatin chain in the future.
Ol is also much smaller than the
rest, so there is value and merit to it and therefore I would like to
keep it here.
Owl Scheme
Owl is a functional implementation/interpretation of Scheme. It has the smallest C core of all of the implementations. It also compiles itself by interpreting the Scheme compiler.
However, Owl seems unmaintained, has some issues with its compilation (non reproducible failures) and it is also limited while not being performant (as expected).
On 2022-12-06
it was taken out from the table as it requires a
binary to compile itself. The offending file is ./fasl/init.fasl
.
On 2022-12-26
Owl has been readed to the list as a process to
solve its bootstrapping has started12.
S7 Scheme
S7 is probalby the fastest implementation by far. It is used in snd for audo processing. It comes with a wealth of modules, most of them imprelemented in C.
It has the largest C codebase by far. It also requres a native C compiler to correctly transpile and use Scheme files. It is also very difficult (if not near impossible) to cross-compile.
TR7
It is one of the smallest, fully Schme Small compliant implementation that strives for minimalism, performance and portability.
It is however, not capable of generating native executables. Nonetheless, José, the creator/continuator of TR7 has been incredibly helpful and receptive towards my feedback.
Easy-ISLisp
It in Lisp implementation that conforms to the ISLisp standard. ISLisp tries to put toghether the main features of all the Lisp types/implementations. This allows Easy-ISLisp to be fairly small while very capable.
The main issue I see with Easy-ISLisp is that it is the only ISLisp implementation that is fully open, minimal and portable. This means, If we were to use it, we would be stuck with it forever. On a positive note, is is fairly stable and develpment has switched to the implementation of libraries and large utilities.
Lua
I wanted to take a look at Lua to see what a
“minimal” programming language looked like. I was pleasantly surprised
to see, that the reference implementation is about 20000 lines of C
code. That is indeed very small!
However, we will not consider Lua for the bootstrapping compiler as its metaprogramming features are lacking compared to Scheme and Lisp. However, it is worth mentioning Fennel, a Lisp dialect based on Lua. Versions 0.5.0 or newer cannot be bootstrapped without previously generated scripts. The latest (cleanly) bootstrappable version is 0.4.2, which does run on RISC-V.
JimTCL
JimTCL is a small implementation of the TCL programming language. TCL is by no means minimal in terms of its implementation. However, it tends to get used in embedded environments. For that reason, I took a look at JimTCL.
It sadly uses the autosetup configuration tool, which adds a lot of weight to the bootstrapping chain. However, JimTCL is itself quite minimal and powerful, whit batteries included if necessary.
Selection of an implementation
The final selection for an implementation should be an open process. As
discussed in the #ada
IRC channel, a
Fossil forum
will be setup for those discussions.
Footnotes
Changes
On the 2022-12-06
Owl was
taken out of the table as it needs a precompiled binary to compile
itself. The binary is ./fasl/init.fasl
.
Footnotes
-
“Stable” refers to implementations that are mostly done and now receive bug fixes only. Though new features may be added from time to time. ↩
-
The performance metric is a purely indicational one, nothing to be taken seriously. 10 indicates best, 0 indicates worst. Some results have been taken from the R7RS benchmark suite. ↩
-
On the
2022-11-13
, still to be determined. ↩ -
Requires a C compiler on the host. ↩
-
The lowest size is without all the libraries bundled. ↩
-
Ol requires a pregenerated FASL binary
repl
, therefore it is not valid. On2022-12-26
a process to generate therepl
binary from source has been started. This is expected to work with third party solutions, which would solve the bootstrapping problem. See the Github issue for more information. ↩ -
Requires a C compiler on the host. ↩
-
Requires a C compiler on the host. ↩
-
Generated using:
↩cloc jimsh.c jim-subcmd.c jim-interactive.c utf8.c jimregexp.c jim.c jim-nosignal.c jimiocompat.{c,h} jim-subcmd.h utf8.h jimregexp.h jim.h jim-nosignal.c autosetup/ github.com/AlDanial/cloc v 1.95 T=5.71 s (4.6 files/s, 9048.8 lines/s) ------------------------------------------------------------------------------- Language files blank comment code ------------------------------------------------------------------------------- C 9 6109 5994 30019 Bourne Shell 4 388 1028 4742 Tcl/Tk 8 170 621 1193 C/C++ Header 5 174 434 787 ------------------------------------------------------------------------------- SUM: 26 6841 8077 36741 -------------------------------------------------------------------------------
-
Ol requires a pregenerated FASL binary
repl
, therefore it is not valid. On2022-12-26
a process to generate therepl
binary from source has been started. This is expected to work with third party solutions, which would solve the bootstrapping problem. See the Github issue for more information. ↩ -
Ol requires a pregenerated FASL binary
repl
, therefore it is not valid. On2022-12-26
a process to generate therepl
binary from source has been started. This is expected to work with third party solutions, which would solve the bootstrapping problem. See the Github issue for more information. ↩ -
Ol requires a pregenerated FASL binary
repl
, therefore it is not valid. On2022-12-26
a process to generate therepl
binary from source has been started. This is expected to work with third party solutions, which would solve the bootstrapping problem. See the Github issue for more information. ↩