ELVM Compiler Infrastructure

ELVM is similar to LLVM but dedicated to Esoteric Languages. This project consists of two components - frontend and backend. Currently, the only frontend we have is a modified version of 8cc. The modified 8cc translates C code to an internal representation format called ELVM IR (EIR). Unlike LLVM bitcode, EIR is designed to be extremely simple, so there's more chance we can write a translator from EIR to an esoteric language.

Currently, there are 48 backends:

Awk (by @dubek)
Bash
Befunge
Brainfuck
C
C++14 constexpr (compile-time) (by @kw-udon)
C++ Template Metaprogramming (compile-time) (by @kw-udon) (WIP)
C# (by @masaedw)
C-INTERCAL
CMake (by @ooxi)
CommonLisp (by @youz)
Crystal (compile-time) (by @MakeNowJust)
Emacs Lisp
F# (by @masaedw)
Forth (by @dubek)
Fortran (by @samcoppini)
Go (by @shogo82148)
HeLL (by @esoteric-programmer)
J (by @dubek)
Java
JavaScript
Kinx (by @Kray-G)
LLVM IR (by @retrage)
LOLCODE (by @gamerk)
Lua (by @retrage)
Octave (by @inaniwa3)
Perl5 (by @mackee)
PHP (by @zonuexe)
Piet
Python
Ruby
Scheme syntax-rules (by @zeptometer)
Scratch3.0 (by @algon-320)
SQLite3 (by @youz)
Swift (by @kwakasa)
Tcl (by @dubek)
TeX (by @hak7a3)
TensorFlow (WIP)
Turing machine (by @ND-CSE-30151)
Unlambda (by @irori)
Vim script (by @rhysd)
WebAssembly (by @dubek)
WebAssembly System Interface (by @sanemat)
Whirl by (@samcoppini)
W-Machine by (@jcande)
Whitespace
arm-linux (by @irori)
i386-linux
sed

The above list contains languages which are known to be difficult to program in, but with ELVM, you can create programs in such languages. You can easily create Brainfuck programs by writing C code for example. One of interesting testcases ELVM has is a tiny Lisp interpreter. The all above language backends are passing the test, which means you can run Lisp on the above languages.

Moreover, 8cc and ELVM themselves are written in C. So we can run a C compiler written in the above languages to compile the ELVM's compiler toolchain itself, though such compilation takes long time in some esoteric languages.

A demo site

http://shinh.skr.jp/elvm/8cc.js.html

As written, ELVM toolchain itself runs on all supported language backends. The above demo runs ELVM toolchain on JavaScript (thus slow).

Example big programs

ELVM internals

ELVM IR

Harvard architecture, not Neumann (allowing self-modifying code is hard)
6 registers: A, B, C, D, SP, and BP
Ops: mov, add, sub, load, store, setcc, jcc, putc, getc, and exit
Psuedo ops: .text, .data, .long, and .string
mul/div/mod are implemented by _builtin*
No bit operations
No floating point arithmetic
sizeof(char) == sizeof(int) == sizeof(void*) == 1
The word-size is backend dependent, but most backend uses 24bit words
A single programming counter may contain multiple operations

See ELVM.md for more detail.

Directories

shinh/8cc's eir branch is the frontend C compiler.

ir/ directory has a parser and an interpreter of ELVM IR. ELVM IR has

target/ directory has backend implementations. Code in this directory uses the IR parser to generate backend code.

libc/ directory has an incomplete libc implementation which is necessary to run tests.

Notes on language backends

Brainfuck

Running a Lisp interpreter on Brainfuck was the first motivation of this project (bflisp). ELVM IR is designed for Brainfuck but it turned out such a simple IR could be suitable for other esoteric languages.

As Brainfuck is slow, this project contains a Brainfuck interpreter/compiler in tools/bfopt.cc. You can also use other optimized Brainfuck implementations such as tritium. Note you need implementations with 8bit cells. For tritium, you need to specify `-b' flag.

Unlambda

This backend was contributed by @irori. See also 8cc.unl.

This backend is tested with @irori's interpreter. tools/rununl.sh automatically downloads it.

C-INTERCAL

This backend uses 16bit registers and address space, though ELVM's standard is 24bit. Due to the lack of address space, you cannot compile large C programs using 8cc on C-INTERCAL.

This backend won't be tested by default because C-INTERCAL is slow. Use

$ CINT=1 make i

to run them. Note you may need to adjust tools/runi.sh.

You can make faster executables by doing something like

$ cp out/fizzbuzz.c.eir.i fizzbuzz.i && ick fizzbuzz.i
$ ./fizzbuzz

But compilation takes much more time as it uses gcc instead of tcc.

Piet

This backend also has 16bit address space. There's the same limitation as C-INTERCAL's.

This backend won't be tested by default because npiet is slow. Use

$ PIET=1 make piet

to run them.

Befunge

BefLisp, which translates LLVM bitcode to Befunge, has very similar code. The interpreter, tools/befunge.cc is mostly Befunge-93, but its address space is extended to make Befunge-93 Turing-complete.

Whitespace

This backend is tested with @koturn's Whitespace implementation.

Emacs Lisp

This backend is somewhat more interesting than other non-esoteric backends. You can run a C compiler on Emacs:

M-x load-file tools/elvm.el
open test/putchar.c (or write C code without #include)
M-x 8cc
Now you'll see ELVM IR. You need to prepend a backend name (`el' for example) as the first line.
M-x elc
M-x eval-buffer
M-x elvm-main

Vim script

This backend was contributed by @rhysd. You can run a C compiler on Vim:

Open test/hello.c (or write your C code)
:source /path/to/out/8cc.vim
Now you can see ELVM IR in the buffer
Please prepend a backend name (vim for Vim) to the first line
:source /path/to/out/elc.vim
You can see Vim script code as the compilation result in current buffer
You can :source to run the code

You can find more descriptions and released vim script in 8cc.vim.

TeX

This backend was contributed by @hak7a3. See also 8cc.tex.

C++14 constexpr (compile-time)

This backend was contributed by @kw-udon. You can find more descriptions in constexpr-8cc.

sed

This backend is very slow so only limited tests run by default. You can run them by

$ FULL=1 make sed

but it could take years to run all tests. I believe C compiler in sed works, but I haven't confirmed it's working yet. You can try Lisp interpreter instead:

$ FULL=1 make out/lisp.c.eir.sed.out.diff
$ echo '(+ 4 3)' | time sed -n -f out/lisp.c.eir.sed

This backend should support both GNU sed and BSD sed, so this backend is more portable than sedlisp, though much slower. Also note, due to limitation of BSD sed, programs cannot output non-ASCII characters and NUL.

HeLL

This backend was contributed by @esoteric-programmer. HeLL is an assembly language for Malbolge and Malbolge Unshackled. Use LMFAO to build the Malbolge Unshackled program from HeLL. This backend won't be tested by default because Malbolge Unshackled is extremely slow. Use

$ HELL=1 make hell

to run them. Note you may need to adjust tools/runhell.sh.

This backend does not support all 8-bit characters on I/O, because I/O of Malbolge Unshackled uses Unicode codepoints instead of single bytes in getc/putc calls. Further, the Malbolge Unshackled interpreter automatically converts newlines read from stdin, which cannot be revert in a platform independent way. The backend reverts/converts newlines from input to Linux encoding and applies modulo 256 operations to all input and output, but it cannot compensate the issues this way. You should limit I/O to ASCII characters in order to avoid unexpected behaviour or crashes.

This backend may be replaced by a Malbolge Unshackled backend in the future.

TensorFlow

Thanks to control flow operations such as tf.while_loop and tf.cond, a TensorFlow's graph is Turing complete. This backend translates EIR to a Python code which constructs a graph which is equivalent to the source EIR. This backend is very slow and uses a huge amount of memory. I've never seen 8cc.c.eir.tf works, but lisp.c.eir.tf does work. You can test this backend by

$ TF=1 make tf

TODO: Reduce the size of the graph and run 8cc

Scratch 3.0

Scratch is a visual programming language.

Internally, a Scratch program consists of a JSON that represent the program and some resources such as images or sounds. They are zip-archived and you can import/export them from project page (Create new one from here).

You can use tools/gen_scratch_sb3.sh to generate complete project files from output of this backend, and tools/run_scratch.js to execute programs from command line (npm 'scratch-vm' package is required).

You can try "fizzbuzz_fast" sample from here.

Example (for `test/basic.eir`)

First, generate scratch project.

$ ./out/elc -scratch3 test/basic.eir > basic.scratch3
$ ./tools/gen_scratch_sb3.sh basic.scratch3
$ ls basic.scratch3.sb3
basic.scratch3.sb3

Execute it from Web browser

Visit https://scratch.mit.edu/projects/editor.
Click a menu item: "File".
Click "Load from your computer".
Select and upload the generated project file: basic.scratch3.sb3.
Wait until the project is loaded. (It takes a long time for a hevy project.)
Click the "Green Flag"

From the Web editor, to input special characters (LF, EOF, etc.) you have to input them explicitly by following:

special character	representation
LF	`＼n`
EOF	`＼0`
other character with codepoint XXX (decimal)	`＼dXXX`

Note that: the escape character is ＼ (U+FF3C) not \.

For normal ASCII characters, you can just put them into the input field.

Execute it from command line

First install the npm package "scratch-vm" under the tools directory :

$ cd tools
$ npm install scratch-vm

Run it with tools/run_scratch.js:

$ echo -n '' | nodejs ./run_scratch.js ../basic.scratch3.sb3
!!@X

Future works

I'm interested in

adding more backends (e.g., 16bit CPU, Malbolge Unshackled, ...)
running more programs (e.g., lua.bf or mruby.bf?)
supporting more C features (e.g., bit operations)
eliminating unnecessary code in 8cc

Adding a backend shouldn't be extremely difficult. PRs are welcomed!

Acknowledgement

I'd like to thank Rui Ueyama for his easy-to-hack compiler and suggesting the basic idea which made this possible.

Name		Name	Last commit message	Last commit date
Latest commit History 649 Commits
8cc @ fb8f571		8cc @ fb8f571
Whitespace @ 16be2c0		Whitespace @ 16be2c0
ir		ir
lci @ 574f4bb		lci @ 574f4bb
libc		libc
target		target
test		test
tinycc @ c948732		tinycc @ c948732
tools		tools
.gitignore		.gitignore
.gitmodules		.gitmodules
.travis.yml		.travis.yml
ELVM.md		ELVM.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
build.mk		build.mk
clear_vars.mk		clear_vars.mk
diff.mk		diff.mk
runtest.sh		runtest.sh
target.mk		target.mk

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ELVM Compiler Infrastructure

A demo site

Example big programs

ELVM internals

ELVM IR

Directories

Notes on language backends

Brainfuck

Unlambda

C-INTERCAL

Piet

Befunge

Whitespace

Emacs Lisp

Vim script

TeX

C++14 constexpr (compile-time)

sed

HeLL

TensorFlow

Scratch 3.0

Example (for `test/basic.eir`)

Execute it from Web browser

Execute it from command line

Future works

See also

Acknowledgement

About

Releases

Packages

Languages

License

ROGERSM94/elvm

Folders and files

Latest commit

History

Repository files navigation

ELVM Compiler Infrastructure

A demo site

Example big programs

ELVM internals

ELVM IR

Directories

Notes on language backends

Brainfuck

Unlambda

C-INTERCAL

Piet

Befunge

Whitespace

Emacs Lisp

Vim script

TeX

C++14 constexpr (compile-time)

sed

HeLL

TensorFlow

Scratch 3.0

Example (for test/basic.eir)

Execute it from Web browser

Execute it from command line

Future works

See also

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Example (for `test/basic.eir`)

Packages