generality through systems thinking
This post is part 2 of a multi-part series called “the computer of the next 200 years”.
This post is as-yet unfinished.
And always, he fought the temptation to choose a clear, safe course, warning 'That path leads ever down into stagnation—Frank Herbert
escaping the box
the approach i take in this series is to instead use runtime tracking at the lowest interfaces between the program and the outside world: syscalls, cpu instructions, and ELF files 1; interactions that cannot possibly be faked and are required for all programs that run anywhere on the system. this loses portability between OS’s and static analysis. but in turn it gains generality: we do not need to establish a new coordination mechanism between any two processes, and our system does not need to special-case any program, because we use the same approach for all of them.
by doing so, we “escape the box”. by moving features outside the process, switching costs are greatly reduced: if we build things at the OS level, we don't have to rewrite them for each program, so the interface boundary is smaller. our systems work even for languages that have not yet been invented! in some sense, this series is an exploration of just how good we can make our tooling without first establishing a new coordination mechanism.
note that this isn’t “just another tool” because programs running in this system can interact freely with programs outside it. there is no kind of vendor lock in. i call this systems thinking because it works at the boundaries of the systems that already exist, in full detail, rather than at the level of the abstractions that are normally built on top. systems thinking is not limited to Unix processes; you can apply it to (e.g.) distributed systems, performance tracking, and debugging.
does this actually work?
this systems-level approach is surprisingly powerful! here are some existing tools that work at this level:
- docker. this works by sandboxing all processes and interposing an overlay filesystem to track their file writes. this sandbox uses linux-specific mechanisms, which is why docker runs in a linux VM on macOS and Windows.
- SystemD socket activation, which decouples the socket file descriptor from the program listening to it, allowing services to be "lazy-activated" when the other side of the socket is written to
- syscall tracking using strace
- stack backtraces using DWARF.
- debuggers (gdb, lldb, etc). these encode quite a lot of information about the language itself, but in theory work for any language with a C-compatible callstack.
- time-travel debuggers, like rr. these work by recording and replaying syscalls, so they can work no matter how many layers of FFI are going on in the program.
- dynamically loaded library metadata using ldd (and in general the dynamic loader has many surprising features most people don’t know about).
does this only work for “C-like” languages?
note that all of the above debugging tools are hamstrung by languages with an embedded interpreter; they show information that is accurate but contains far too much info about the runtime internals to be useful to a programmer in that language. in response, people build language specific tools such as PDB and Delve.
this limitation is specific to mapping runtime info back to the source language. if you do not attempt to map back to the source language—for examples, schemes 1 and 2 in completed and orthogonal persistence—you do not need language specific tooling, and you can get systems that work in full generality for any language. for instance rr can replay any process even though it cannot let you debug a python process at the level you want to see.
i’ll discuss how to map back to the source language in composable compilers. for now, we’ll stick with features that don’t require mapping runtime info back to the source.
this all sounds really cursed
this is cursed! it's true! working at this level stack exposes you to a whole new axis of bugs. you may discover that your program is broken only on AMD Zen, or that it breaks when using interruptible atomic accesses, or that it works
bibliography
- D. R. MacIver, "This is important"
- Wikipedia, "Zawinski’s Law of Software Envelopment"
- Graydon Hoare, "Rust 2019 and beyond: limits to (some) growth."
- Rich Hickey, "Simple Made Easy"
- Vivek Panyam, "Parsing an undocumented file format"
- The Khronos® Group Inc, "Vulcan Documentation: What is SPIR-V"
- Aria Desires, "C Isn't A Language Anymore"
- Google LLC, "Standard library: cmd.cgo"
- Filippo Valsorda, "rustgo: calling Rust from Go with near-zero overhead"
- WebAssembly Working Group, “WebAssembly”
- The Bytecode Alliance, "The WebAssembly Component Model"
- Josh Triplett, "crABI v1"
- Robert Lechte, “Programs are a prison: Rethinking the fundamental building blocks of computing interfaces”
- Wikipedia, “Executable and Linkable Format”
- DWARF Debugging Information Format Committee, “DWARF Debugging Standard”
- Robert O’Callahan et al., “RR”
- “ldd(1)”
- The strace developers, “strace: linux syscall tracer
- Python Software Foundation, “
pdb
— The Python Debugger“ - Derek Parker, “Delve”
- Susanna Clark, Jonathan Strange & Mr Norrell