The invention of markup language can be dated back to MIT in the early
‘60s. It wasn’t known as “markup language” until much later, but
that’s clearly what RUNOFF
was, and it inspired every markup
language to come since. (If only, in some cases, to do it entirely
differently, as Fortran “inspired” Lisp.)
RUNOFF
was a document formatter written by Jerry Saltzer for CTSS,
described in a 1964
memo
alongside its companion editor command TYPSET
and subsequently
integrated into the operating system. It appears in both the
1965 and
1969
editions of the CTSS manual at page AH.9.01. A RUNOFF
input file
consisted of lines of text interspersed with control codes, each on
its own line which started with a period. The control codes could be
typed in as either a full word (”.header
”) or a two-letter
abbreviation (”.he
”). When executed RUNOFF
would output the text
on a hard-copy terminal, formatted as directed by the control codes.
This should ring a bell to anyone who’s read the source code to a Unix
man page, or tried to write one. Yes, RUNOFF
is the direct
predecessor to nroff
and troff
on Unix (though in classic Unix
form, they dumped the long control words and made the cryptic
abbreviations mandatory), but its influence doesn’t stop there. In
the ‘70s every operating system worth its salt had some kind of
RUNOFF
-style formatter. On IBM mainframes, there was one called
SCRIPT
, which was the basis for the first implementation of GML, the
Generalized Markup Language by Goldfarb, Mosher, and Lorie. From GML
came SGML, and from there HTML and XML. So there’s a genealogical
link between XML and man pages, who’da thunk it?
I’m looking to put RUNOFF
in context, and Saltzer helpfully cites a
long list of earlier programs. Most of them are text editors, and
were more inspiration to the editor component TYPSET
than to the
formatter RUNOFF
. (Of course back in those pre-ASCII days they were
still figuring out what “text” was and how it should be edited.
TYPSET
couldn’t be used to edit program source code, nor could
program editors be used to edit RUNOFF
files, since they used
different character encodings.) The two early text formatters that
influenced RUNOFF
were DITTO
by Leslie
Lowry
and TJ-2
by Peter Samson, both
first documented in 1963.
DITTO
was the immediate predecessor to RUNOFF
on CTSS and came
with an editor called MEMO
or MODIFY
, with different commands
depending on whether you wanted to create a new document or edit an
existing one. (Likewise the original CTSS program editor was
INPUT
/EDIT
.) The functionality of DITTO
included headers,
footers, and pagination, and the control codes resembled RUNOFF
with
the dot at the beginning of a line; troff
-style two-letter
abbreviations were also present. But many of these control codes were
processed by MEMO
/MODIFY
rather than DITTO
, as they did things
like insert or replace lines in the source document. The clear
separation of editing and publishing was something RUNOFF
contributed. Separation of logical structure from presentation was
even further off.
Meanwhile, TJ-2
came from the RLE PDP-1 hacker crowd. Though later
commentary plays up the conflict between the hackers and the more
“bureaucratic” time-sharing system developers, there clearly was a
free flow of ideas between the two camps. TJ-2
was the first
program to justify text between the left and right margins, even
hyphenating as needed. It also supported centering lines and
preformatted text, as the (manually entered) table in the 1963 memo
shows. There were only six control codes, each introduced by the
overbar character in the PDP-1’s character set FIODEC, while page size
could be set out-of-band with the switches on the PDP-1 console.
There was no associated editor program, nor any way to save files to
disk as the PDP-1 had no disk, but they could be read from and punched
to paper tape.
Side note: The “TJ” in TJ-2
stands for “type justifier” and Saltzer
cites a program called JUSTIFY
by Samson; Wikipedia claims they were
separate programs but it’s my impression that they were one and the
same. Wikipedia also cites a single reference to a “TJ-1” which it
interprets to be a third such program; I figure this is probably a
typo, there was never a TJ-1, and the name was possibly a pun on MIT’s
TX-2 computer. UPDATE 1/6/2024: I was wrong. The Computer History
Museum has manuals for TJ-2 and TJ-1.
So there wasn’t much that was new in RUNOFF
but it did take in the
most useful features of DITTO
and TJ-2
, and became significantly
more widely used than both. Interestingly, both predecessors had
features that were absent in RUNOFF
. In DITTO
you could encode
multiple character sets within a single document, and the program
would emit instructions to the human at the printer to change the type
ball at the appropriate times. RUNOFF
had a cruder but more
practical solution of just omitting the unavailable characters from
the printout and letting the user draw them in by hand. TJ-2
had an
elaborate interactive system for setting up the hyphenation dictionary
using the PDP-1’s vector display and light pen. The standard version
had a blank dictionary, so every word had to be hyphenated manually.
This seems rather cumbersome, and besides CTSS was purely typewriter
based and didn’t have a display, so instead Saltzer simply left
hyphenation out of RUNOFF
entirely.
But it’s from these modest beginnings that all of markup came. I pulled up Graphviz and made this chart of markup languages, well-known and forgotten, and how they influenced one another. Naturally, this is purely my after-the-fact interpretation, so take with many grains of salt. And I’m leaving out many of the post-HTML markup languages, and specialized ones like POD and ReST, because there’s already enough things on this chart.