October 9, 2022

on the origins of the dot

As anyone who used computers in the ‘90s can tell you, in DOS a filename is eight bytes, a dot, and a three-byte extension. The extension tells you the file type, TXT for plain text and BAS for a program in BASIC and so on. The dot is standard notation but not part of the filename as stored on disk, and although most commands use the dot notation, a few (like DIR) separate the filename and the extension with a space instead.

As anyone who used real computers in the ‘90s can tell you, in Unix a filename is an arbitrary sequence of bytes, whose maximum length varied by which of the many Unix variants you were using. (14 bytes on older systems and 255 bytes on newer systems were the most common limits.) A filename doesn’t need to have an extension; if there is one, the dot is just a character in the name. Unix files can have multiple extensions (like .tar.gz) or long extensions (like .html) that DOS would find unacceptable.

In DOS, an executable file has one of three extensions (COM, EXE, BAT) and you can leave the extension off to run a program - to run GWBASIC.EXE, just type gwbasic at the prompt. In Unix, extensions have no special meaning to the OS, executable files are flagged in metadata, and you have to refer to a file with its full name. Typically an executable won’t have an extension. Many DOS programs follow the convention that you can leave an extension off a file if the program knows the type, hence LOAD "DONKEY" in GW-BASIC would load DONKEY.BAS. Although some Unix programs were aware of extensions - cc knows that fortune.c is source code and fortune.o is a binary object file - it’s almost never acceptable to leave an extension off a filename, and programs that let you do that, like TeX, are typically of non-Unix origin.

Clearly file extensions are entirely different conceptually in DOS and Unix. (In fact older versions of Unix don’t even call them “extensions” in the documentation, the usual term being “suffixes.”) Yet the usual notation is identical, and they serve the same purpose on both OSes. How did that happen? Obviously there’s no reason why a file type has to be represented after a dot at the end of a filename; in classic Mac OS, which ran on the other kind of computer you were likely to encounter in the ‘90s, it isn’t in the filename at all, but in a hidden four-byte field in the metadata. And the dot isn’t necessary even if you have extensions; on some IBM mainframe OSes the extension followed a space, while the dot was the directory separator.

I figured there had to be a common origin behind both OSes’ use of the dot, and although evidence is sparse, I think there probably is one. DOS (development started 1980) got 8.3 filenames from CP/M (started 1974), which was mostly developed on TOPS-10 (released 1967). Many DOS conventions originated on TOPS-10, including dotted filenames with extensions, although TOPS-10 had a 6.3 character limit. Unix (started 1969) got the dotted suffix convention from Multics (started 1964). Now here’s where it gets interesting.

TOPS-10 was the operating system that shipped with the first PDP-10 computers in 1967, though it didn’t get that name until 1970. Earlier, it was called the “Monitor” or the “Time-Sharing Executive.” The PDP-10 was mostly compatible with its predecessor the PDP-6, and likewise the PDP-10 Monitor was based on the PDP-6 Monitor. Documentation for the PDP-6 Monitor shows that, as early as 1965, it used 6.3 filenames and the DEVICE:FILNAM.EXT notation that remains standard on Windows to this day. And indeed the first Multics papers in 1965 cite the PDP-6 Monitor as the first commercial timesharing operating system.

It’s not clear that anyone on the Multics team actually used a PDP-6 running the Monitor, but it’s quite likely that it or thereabouts is where Multics got the notation, given the free flow of people and ideas among MIT and DEC and BBN at the time. The PDP-6 was a flop, only selling 23 machines, but two of them were sold to MIT. One went to the Laboratory for Nuclear Science, which clearly used the Monitor; they contributed symmetric multiprocessing code back to DEC for eventual incorporation in TOPS-10. The other went to the AI Lab, which shared the Tech Square building with the Multics team and the rest of LCS. Of course, that PDP-6 was used to develop the AI Lab’s infamous Incompatible Time Sharing system, which inflicted Emacs and Scheme upon the world; there’s no record of whether it ever ran the Monitor. And ITS, like its direct predecessor CTSS (also the predecessor to TOPS-10 and Multics and, frankly, almost every other operating system ever written) used filenames with a 6-byte “primary” and 6-byte “secondary” name, usually separated by a space.

Now, the “secondary name” in CTSS was also known as the “file type”, and from the earliest version of CTSS in 1961 it served the same function as a DOS or Unix extension. (It’s almost certainly the origin of named files in general; back then, a data storage device was usually considered a single block and not subdivided, be it a spool of magnetic or paper tape, a deck of punched cards, or the rare and expensive hard disk.) In documentation the file names were usually separated by a comma, while usage in commands themselves it was usually a space. Since the space was also used to separate command arguments, a different separator was desirable, somebody in Massachusetts (either Cambridge or Maynard) chose a dot and it stuck.

What I’m still not sure about is whether Multics actually got it from the PDP-6 Monitor, whether there was another source that inspired both, or whether they just happened to pick the same separator. While researching this post, I found one Usenet post from 1992 suggesting that the BBN PDP-1 timesharing system had dots in 1964… this would predate the PDP-6 Monitor but I can’t find any primary evidence that BBN’s system had named files at all, as its main data storage was punched paper tape. Memories can be fuzzy, after all, and today we’re further from that post than it is from the poster’s stint at BBN. In any case people have been wondering this for 30 years, and “probably somebody in Massachusetts” is as much of an answer as we’re likely to get.

© 2020-24 Bronx River Software

Powered by Hugo & Kiss.