March 15, 2020

RUNOFF and the birth of markup language

The invention of markup language can be dated back to MIT in the early ‘60s. It wasn’t known as “markup language” until much later, but that’s clearly what RUNOFF was, and it inspired every markup language to come since. (If only, in some cases, to do it entirely differently, as Fortran “inspired” Lisp.)

RUNOFF was a document formatter written by Jerry Saltzer for CTSS, described in a 1964 memo alongside its companion editor command TYPSET and subsequently integrated into the operating system. It appears in both the 1965 and 1969 editions of the CTSS manual at page AH.9.01. A RUNOFF input file consisted of lines of text interspersed with control codes, each on its own line which started with a period. The control codes could be typed in as either a full word (”.header”) or a two-letter abbreviation (”.he”). When executed RUNOFF would output the text on a hard-copy terminal, formatted as directed by the control codes.

This should ring a bell to anyone who’s read the source code to a Unix man page, or tried to write one. Yes, RUNOFF is the direct predecessor to nroff and troff on Unix (though in classic Unix form, they dumped the long control words and made the cryptic abbreviations mandatory), but its influence doesn’t stop there. In the ‘70s every operating system worth its salt had some kind of RUNOFF-style formatter. On IBM mainframes, there was one called SCRIPT, which was the basis for the first implementation of GML, the Generalized Markup Language by Goldfarb, Mosher, and Lorie. From GML came SGML, and from there HTML and XML. So there’s a genealogical link between XML and man pages, who’da thunk it?

I’m looking to put RUNOFF in context, and Saltzer helpfully cites a long list of earlier programs. Most of them are text editors, and were more inspiration to the editor component TYPSET than to the formatter RUNOFF. (Of course back in those pre-ASCII days they were still figuring out what “text” was and how it should be edited. TYPSET couldn’t be used to edit program source code, nor could program editors be used to edit RUNOFF files, since they used different character encodings.) The two early text formatters that influenced RUNOFF were DITTO by Leslie Lowry and TJ-2 by Peter Samson, both first documented in 1963.

DITTO was the immediate predecessor to RUNOFF on CTSS and came with an editor called MEMO or MODIFY, with different commands depending on whether you wanted to create a new document or edit an existing one. (Likewise the original CTSS program editor was INPUT/EDIT.) The functionality of DITTO included headers, footers, and pagination, and the control codes resembled RUNOFF with the dot at the beginning of a line; troff-style two-letter abbreviations were also present. But many of these control codes were processed by MEMO/MODIFY rather than DITTO, as they did things like insert or replace lines in the source document. The clear separation of editing and publishing was something RUNOFF contributed. Separation of logical structure from presentation was even further off.

Meanwhile, TJ-2 came from the RLE PDP-1 hacker crowd. Though later commentary plays up the conflict between the hackers and the more “bureaucratic” time-sharing system developers, there clearly was a free flow of ideas between the two camps. TJ-2 was the first program to justify text between the left and right margins, even hyphenating as needed. It also supported centering lines and preformatted text, as the (manually entered) table in the 1963 memo shows. There were only six control codes, each introduced by the overbar character in the PDP-1’s character set FIODEC, while page size could be set out-of-band with the switches on the PDP-1 console. There was no associated editor program, nor any way to save files to disk as the PDP-1 had no disk, but they could be read from and punched to paper tape.

Side note: The “TJ” in TJ-2 stands for “type justifier” and Saltzer cites a program called JUSTIFY by Samson; Wikipedia claims they were separate programs but it’s my impression that they were one and the same. Wikipedia also cites a single reference to a “TJ-1” which it interprets to be a third such program; I figure this is probably a typo, there was never a TJ-1, and the name was possibly a pun on MIT’s TX-2 computer.

So there wasn’t much that was new in RUNOFF but it did take in the most useful features of DITTO and TJ-2, and became significantly more widely used than both. Interestingly, both predecessors had features that were absent in RUNOFF. In DITTO you could encode multiple character sets within a single document, and the program would emit instructions to the human at the printer to change the type ball at the appropriate times. RUNOFF had a cruder but more practical solution of just omitting the unavailable characters from the printout and letting the user draw them in by hand. TJ-2 had an elaborate interactive system for setting up the hyphenation dictionary using the PDP-1’s vector display and light pen. The standard version had a blank dictionary, so every word had to be hyphenated manually. This seems rather cumbersome, and besides CTSS was purely typewriter based and didn’t have a display, so instead Saltzer simply left hyphenation out of RUNOFF entirely.

But it’s from these modest beginnings that all of markup came. I pulled up Graphviz and made this chart of markup languages, well-known and forgotten, and how they influenced one another. Naturally, this is purely my after-the-fact interpretation, so take with many grains of salt. And I’m leaving out many of the post-HTML markup languages, and specialized ones like POD and ReST, because there’s already enough things on this chart.

markup

© 2020 Bronx River Software

Powered by Hugo & Kiss.