Computer Software and Unconventional Archives
I recently finished my dissertation in the history of science and technology at the University of Minnesota. I explored the history of technical change in software design and implementation, under the supervision of Professor Arthur Norberg and with the support of the Charles Babbage Institute. One of the biggest hurdles I faced when working on my dissertation was to create a methodology that would allow me to explore a field that is rife with proprietary information, born digital data, and a narrative heavily influenced by the primary actors. I needed to create this methodology while staying true to spirit of the current methods that guide historical inquiry. This “Talking Shop” piece describes why I prioritized technical change as my focus, explains the methodology I designed to examine technical change and describes the unconventional archives I used in my research.
Ed. Note: Please see also Christine Aicardi’s review of Joline Zepcevski’s dissertation, Complexity and Verification: The History of Programming as Problem Solving (University of Minnesota, 2012).
The reason I prioritized technical change in the design and implementation of software is, in part, because of the important role that software plays in society. The amount of information created by today’s society far surpasses that of previous generations. IBM estimates that in the past two years we have created 90% of all of the available data in the world (IBM staff writer, “What is Big Data?”). In popular culture, this current era (after the advent of the computer) is often referenced in popular culture as the Information Age. Almost all of this information is collected, structured, retrieved and created using software. Society has become dependent on this information and by extension, software.
Given the important role software has in modern society, it is easy to imagine that much academic work has been done on the history, theory and culture of software production. In reality, this field of study is still very small. Further, much of the work is centered on the external elements. Examples of this focus include the history of the software industry, the culture of programming and explorations of the gender imbalance found in the field. When exploring the field in preparation for my dissertation, it became apparent to me that what was missing from the literature was a history that focused on the internal content of the field.
I argue that, in part, the reason that it is rare to find a history of the technical changes that have occurred in the field is that it is difficult to categorize software. At its most basic level it is hard to describe software as either a technology or a science. Is software theory? Is it an artifact? Or is it a document? The amorphous nature of software lends itself to all of these definitions.
My purpose is not to resolve this tension and it is not resolved in my dissertation. Instead, I accepted that software is multifaceted and can be considered from multiple perspectives. For the purposes of my argument, I focused on source code as a document that could be read and analyzed.
“Source code” is the source of the object (machine) code. It is written in symbolic format, but one that is human readable. Object code (sometimes called machine code) is written in binary notation – ones and zeros. Source code is transformed, by the computer, into object code.
My conceptualization of source code as a document is strongly rooted in my personal history. Taking programming classes at the University of Sydney in the late 1990s was an exercise in change. Object-oriented programming was trickling down into the freshman syllabus, while some professors were holding onto structured programming techniques. During my time in the department I saw several shifts in the programming languages used for educational purposes, from Prolog, to C++, to Java, all within three years. Each of these changes was heralded as revolutionary. This hyperbolic language about revolutionary change is not limited to university departments; it is one of the most consistent elements in the computing literature. Being introduced to three “revolutionary changes” over the course of three years prompted questions about what revolutionary meant in the computing field and the role rhetoric played in these changes.
During this time I was also taking classes in the history of science and technology. One focus of these classes that stood out in particularly sharp relief was the way historians of science and technology used primary documents to examine the difference between real and rhetorical change in various fields and disciplines. The idea of applying these concepts to the field that I was most interested in, the history of computing, appealed to me, but I was not sure how to apply the methods other historians were using to the more amorphous field of software.
While completing my preliminary research for the dissertation, I came across two anecdotes, both quite well known in the history of software, which heavily influenced my resulting methodology. The first is an anecdote related by Alan C. Kay, the creator of Smalltalk. In this story he walked into the University of Utah and found on his desk a memo asking him to make ALGOL for the Univac 1108 computer work. He describes the process of unrolling the program listing down the hall and crawling over it and reading the code. In doing so, he discovered how ALGOL worked. Another story, one from Tracy Kidder’s book, The Soul of a New Machine (1981), described Tom West (a project leader at Data General) looking at the circuitry of Digital Equipment Corporation’s VAX machine and seeing a reflection of the corporate organization it evolved within. If Alan Kay could recreate how the ALGOL programming language worked by reading it as a document, then did this mean that all object code was essentially a document? And, if company organization could be found in the hardware of a minicomputer, why couldn’t the social and economic milieu be found in the source code of a software application?
One of the most well-known authors in the field, Michael Mahoney, came to similar conclusions. He wrote about the concept of “reading the products of practice” (Michael Mahoney, “Issues in the History of Computing,” in Thomas J. Bergin and Rick G. Gibson, eds., History of Programming Languages II (New York: ACM Press, 1996, pp. 772-781 at p. 772). In software, the product of practice is source code and this reinforced my decision to explore source code as a document. Mahoney specifically discusses the importance of looking at programs – and the difference between what a program does and what its documentation says. He talks about the importance of capturing the tricks of the trade, the tacit knowledge of the writer. Capturing those tricks may be attained by reading source code as a document. This is what I have attempted to implement.
By treating software as a document I have demonstrated that there are real technical changes in the way programmers wrote code between the early days of assembly language and the sophisticated programming environments of object oriented languages. By extension, I have also illustrated where revolutionary language found in the computing literature is rhetoric, used to further an actor’s agenda.
Treating software as a document is not as straightforward as it may appear. The biggest hurdle is that in the later years of computing, source code is almost always proprietary to the company that created the software. Finding a stable software application that changed over time, but would have source code that was freely available (to be used as my object of research) throughout the time period being examined, was almost impossible. The best lead that I uncovered was the source code for games. In the earliest years of game coding, before packaged software, the source code for different games was printed out in computer enthusiast and hobbyist magazines, so that users could implement their own version. However, by the time of hand-held consoles, these were few and far between. As the games grew more lucrative their source code became heavily protected trade secrets. This is a problem pointed out by Michael Mahoney, when he argues that much of the art of programming is undocumented, and the software is becoming inaccessible (see “Issues in the History of Computing”). Furthermore, the decisions made when creating those programs are buried in corporate (and often proprietary) records.
As a result of these difficulties, I decided to focus on changes in the methodologies used to design software and the new programming languages that accompanied the methodological shift. This made it much easier to track technical change because programming languages must be transparent to the user. In fact, the way programmers learn about new programming languages is in a published paper that defines the syntax of the language, resulting in very visible document that explains these changes.
Upon shifting my focus to programming languages, I found three fruitful, if unconventional, archives of source code to serve my purposes: open source code, textbooks and illustrative journal articles. I also created what could be grandly described as my own “archive”. With less grandiosity, I created a number of small programs to assist my research.
The first of my unconventional archives was that of the open source code available in the collections of the Charles Babbage Institute. Open source simply means that the source (human readable) code is supplied with the object (machine readable) code. While there is a current open source movement, the Free/Libre/Open Source (FLOSS) movement, most of the sources I used were from a much earlier era in programming. Prior to 1969, all software was “open source”. This was because originally, software was considered a “perk” or marketing tool that came along with the hardware. Software was considered trivial in comparison to the big iron of the hardware.
This is best illustrated by the story of IBM, the largest computer manufacturer. In the 1950s software costs were 4% of IBM’s engineering budget. However, by the mid-1960s, this had increased to 60% of the budget. Moreover, IBM was facing an antitrust lawsuit, in part, over their decision to bundle their hardware and software. In 1969, IBM chose to unbundle their hardware and software, selling their software privately. With their focus shifting to sales, IBM needed to protect their source code so that it was not straightforward to duplicate their software. Other manufacturers quickly followed suit. This expanded the field for third-party software providers, but it hid the changes in software design and implementation.
Stymied by the shift away from available source code, I turned my attention to journal articles that were intended to discuss and debate cutting edge concepts with the academic, computing field. These articles were often rife with hyperbole and rhetoric. To balance this, I compared the articles with later textbooks and manuals intended to explain these same techniques to vocational programmers and undergraduates, once they were fully resolved and a consensus in the field had been reached. Often these texts were published years after the original adoption of the techniques by the “bleeding edge” of the computing community. This allowed me to compare the rhetoric of the original articles with the more moderate tones found in educational material. It also allowed me to track the modifications made to original theories.
Finally, to explore the difference between rhetoric and reality, I engaged in original research. Instead of using applications written by programmers, I wrote small programs in the style of the different languages or using pseudo code to illustrate the different methodologies, basing them on templates from well-known programmers, like Donald Knuth, famous for his comprehensive series of texts The Art of Computer Programming (1968-present). I used these anecdotal programs to uncover changes in the way programmers write software and then used the literature of the programming field to contextualize and situate these changes in the broader web of the socio-economic milieu of the programming field.
The next challenge was to find ways of communicating the changes I was seeing in the code — both the code written by professional programmers and the code I wrote myself. This is a difficult challenge. Originally, using code excerpts seemed to be the answer. I thought that textually comparing small programs written using two different design methodologies would illustrate the differences. But, this was not the case. The differences appeared, at best, subtle. The lines of code permeating the text of my dissertation were distracting. I turned instead to graphical representations of the code – using colors and symbols to highlight the areas of change. This graphical representation made my dissertation more accessible to a broader audience while still illustrating my point about real and rhetorical changes in the field.
Historians have a fully developed concept of what defines archives and archival sources. Clearly, the archives I explored and the sources I used did not always fit these conventional definitions. As a result, I needed to renegotiate that definition. There were two elements to this redefinition. The first is that many of my historical actors communicated to each other about their field in published sources. For example, Bjarne Stroustrup, one of my primary actors, has frequently declined to be interviewed because he believes he has already communicated his intentions in a series of interviews. He collated these answers into a long web page of “Frequently Asked Questions.” Moreover, many of my actors presented a historical perspective on their work at one of the History of Programming Languages Conferences, which are transcribed and published as books. I argue that these can be re-considered as oral histories. While they were not guided by a historian, they are a retrospective analysis of technological changes implemented by the primary actors.
More importantly though is the question surrounding source code. Is source code a primary or archival document? It is technically a published source, as the purpose of defining a programming methodology, or language, is to propagate the information throughout the field. But, it is also the equivalent of a lab report of an experiment, often with the author’s thoughts appearing either in the code comments or, in the case of design methodologies, in the language of introduction and explanation. I argue that source code is an archival source. It is not unlike using scientific papers to explore the history of physics. When you re-conceive software as a document, it can be used to read the human motivations that are written into source code. I believe this justifies its use as a primary source.
As different sub disciplines in the humanities grapple with our born digital age, digital sources will characterize the scholarship of near history. An increasing numbers of scholars are going to need to reformulate their idea of what defines an archive or a primary source. It will be increasingly necessary to examine unconventional “archival” and primary sources like code repositories, listserv communications, even blog entries, to fully consider many topics.
Software, regardless of its definition as science or technology, document or artifact, is the object of study for computer scientists and software engineers. It is the product of their practice. Once it is reinterpreted as a document, the places one might find such documents expands, leading to the need to justify the use of unconventional archives. I hope that the idea of software as a document will be a useful perspective for others interested in the role of software in society and as an impetus to archive source code for future generations. But, I also believe it illustrates one element of the changing methods that historians will need to embrace to continue to understand our past.
Minneapolis College of Art and Design
Image: Student programmers at the Technische Hochschule in Aachen, Germany in 1970. Bundesarchiv, B 145 Bild-F031434-0006 / Gathmann, Jens / CC-BY-SA. Wikimedia Commons.
The views, perspectives, and opinions expressed here and by those providing comments are those of the author(s) and commentator(s) alone, and do not reflect the opinions of Dissertation Reviews, its members, editors, or advisory board members.