This is the latest version of Oil, a Unix shell:
Oil version 0.8.pre5 - Source tarballs and documentation.
To build and run it, follow the instructions inINSTALL.txt. If you're newto the project, seeWhy Create a New Shell?and the2019FAQ.
Table of Contents
Semi-Automatic Translation to C++
Two Analogies: Go Compiler and TeX
DSLs and Code Generation
Wrapping Shell Dependencies
Appendix: Selected Metrics
shopt -s extglob
is now respected.I'd still like more bug reports! SeeHow To Test OSH.
(+) Test harness bug that will be fixed: 1539should be 1560.
#758 | Incorrect fnmatch due to extended glob syntax |
#754 | Implement test -u and test -g |
#753 | ${var+foo} shouldn't cause error when 'set -o nounset' |
#727 | 1 ? (a=42) : b shouldn't require parentheses |
What's all this about C++? Here are two analogies to help explain what's goingon.
GopherCon 2014: Go from C to Go by RussCox(YouTube, 31 minutes).It's time for the Go compilers to be written in Go, not in C. I'll talkabout the unusual process the Go team has adopted to make that happen:mechanical conversion of the existing C compilersinto idiomatic Gocode.(Grindis the one-off tool that helped withtranslation, analogous tomycpp.)
The flavor of the work is similar to what I'm doing with Oil, but there's akey difference: Oil's source will remain in statically typed Python and DSLslikeZephyr ASDL for the forseeable future. We won't be writing C++by hand.
Static types play an important role in both translations.
How to compile the source code ofTeX.Knuth wrote TeX in a dialect of Pascal, but it'snotcompiled with aPascal compiler. Instead, it's translated to C and compiled with a Ccompiler.
The common thread is that we want topreserve the correctnessof anexisting codebase. Oil runsthousands of lines ofexistingbash scripts, including some ofthe biggest shell programs in theworld.
Rewriting by hand would introduce a lot of bugs, so instead we write a customtranslator and apply it to the codebase. In Oil's case, there are more codegenerators to remove dynamic typing and reflection, discussed below.
In addition to the new spec test metrics, these line counts give a feel forrecent progress:
osh_parse.cc
has9,867lines of code (rawdata). I showed thatthe OSH parser can be gradually refactored and translated to C++. Notably, the result isas fast ashand-written C code.osh_eval.cc
has16,491lines of code. In addition to the parser, we translate the word and arithmetic evaluators.osh_eval.cc
has20,875lines of code. We translate the command evaluator, including assignments. So the resulting C++ interpreter can run code likereadonly x=y; echo $x
. Details below.For comparison, the slow OSH interpreter consists of about30Klines ofPython code. This doesn't include theOil language,which I haven't started translating.
The translation isn't going as quickly as I'd like it to, but it's working, andI'm solving interesting technical problems along the way.
As far as I can tell, this unusual process is the shortest path to a fastshell. (As mentioned in January, Iencourage parallelefforts. Feel free to ask me aboutthis.)
I keep a log of the translation process onZulip.
declare -g foo=bar
now work, so we have a path to translate moreshell builtins to C++.map[string, int]
.osh_eval.cc
doesn't even runls
, because it's external process! But it understands the hairy details of word evaluation${}
, arithmetic evaluation$(( ))
, brace expansion{a,b}
, and more.More background: the March recap had a similar section with Zulip threads:mycpp: The Good, the Bad, and theUgly.
Even though about two-thirds of OSH translates to C++ and compiles, and much ofit runs correctly, there's still a lot of work left.
Oil is simply a big project: recall thatbash consists ofover 140K linesof code. I estimate thatOSH implements 80% ofbash, with significant fixes. And Oil is a newlanguage with many features on top.
Oil's source code will remain in high-level languages for the forseeablefuture, so we need to enhance the code generators to produce correct and fastC++.
try
/finally
for scoped destruction, but C++ doesn't havefinally
. We should probably use Python's context managers, and havemycpp translate such blocks into constructors and destructors.#ifdef
. Exceptions are more like structs than classes, so they could logically expressed with ASDL schemas.In theJanuary blog roadmap, Imentioned that there aretwo technical problemswith translation.
One of them was wrapping native C code, which I no longer see as a risk. It'sjust work. The shell has three main dependencies:
fnmatch()
in C++, and this is straightforward.execve()
is similar to wrappinglibc, buterrno
handling is an issue I want to revisit. (TheseUnix comics are relevant.)yield
, which I can't (or don't want to) use in C++. I might rewrite it withfork()
andwrite()
to a pipe.yield
). A few weeks ago, I played with the shell and C code in his2014 explanation ofthe coroutine prime number sieve(PDF).Asmentioned in January, the bare minimumfor "success" is when OSH to replacesbash for my own use.
After reviewing all this work, I still feel like OSH can be "finished" in 2020.I won't be extremely surprised if isn't, but it seems reasonable.
On the other hand, it seems clear that the Oil language will remain a prototypefor all of 2020. I haven't gotten much feedback on it, probably because thereisn't much documentation.
This is disappointing, but I don't have a solution to this problem.
In short, theproject's focus has necessarily narrowed. The only two goalson my radar are:
I should write a longer blog post about this, butalmost everything else iscut. Oil will be more like alibrarythan a shell. (As mentioned, I'llneed basic GNUreadline support for my own use.)
The docs are another sore point. I've mostly been writing them "on demand"(whenever anyone asks). It seems like that pattern will continue, given allthe other work that needs to be done.
errexit
(issue709). I'd also like to resume work onRunning ble.sh WithOil.Feel free toask questions in the commentsor onZulip!
Let's compare this release with the previous one, version0.8.pre4.
We have nearly 70K lines of C++ code, including over 20K translated bymycpp.
osh_eval.cc
osh_eval.cc
The size of theosh_eval.opt.stripped
executable differs between GCC andClang, an I don't yet know why. In any case, the increase is consistent withtranslating and compiling more lines of code.
OSH spec tests:
There was no work on the Oil language! I'm a bit concerned by that, which isone reason for the scope reduction mentioned above.
We have ~300 new significant lines of code in OSH:
And ~500 new physical lines of code:
The parsing benchmark didn't change much:
Nor did the runtime benchmark: