The Loom of Minerva: An Introduction to Computer Projects for the Literary Scholar

by Dr. Cora Angier Sowa


PART I: The Making of a Literary Project

Chapter 1: A Guide to the Labyrinth: The Problem and its Solution

(Revised December, 2009)

Minerva Systems home page
Chapter 1 of The Loom of Minerva: An Introduction to Computer Projects for the Literary Scholar, "A Guide to the Labyrinth"
"The Eureka Machine for Composing Hexameter Latin Verses" (1845)
"Verbal Patterns in Hesiod's Theogony"
Selected Excerpts from Chapters of Traditional Themes and the Homeric Hymns
"Thought Clusters in Early Greek Oral Poetry"
"Holy Places", a study of myths of landmarks
"Epilogue to 'Holy Places': the World Trade Center as a Mythic Place"
Writings on Building and Architecture
"Ancient Myths in Modern Movies"
Archived "Quotations of the Month"
Write e-mail to Cora Angier Sowa

Note that links in this chapter to other paragraphs within this chapter or to external Web sites will work correctly. However, links to other chapters of The Loom of Minerva cannot be accessed from this Web page, because only this chapter is currently on this Web site. It is hoped that the entire MINERVA System will soon be available online.

You can also see, on other pages of this Web site, discussions from three demonstrations of the MINERVA System at the Chicago Colloquia on Digital Humanities and Computer Science, from 2006 (emphasizing individual applications programs), 2007 (emphasizing new project planning programs), and 2008 (emphasizing relationships between scholars and computers).

Henry Hudson Bridge

"The work of the engineer is not unlike that of a writer. How the original design for a new bridge comes to be may involve as great a leap of the imagination as the first draft of a novel." (-Henry Petroski, To Engineer is Human )

(Illustration: Henry Hudson Bridge over the Harlem River, New York City, photo by C.A. Sowa)


Build-a-project: Introducing the MINERVA System

You have a computer and access to large databases. You may have a specific query you want answered. You begin to think through the details and ask questions: Exactly what texts or images do you need, and how do you get them? What form should your output take -- a table of statistics? a list of names and dates? a chart or diagram? What information must the input data include to produce these results? What programs will best yield the information you want? If more than one person works on the project, who does what when? When the project is complete, were your questions answered, or, if not, did the study yield interesting information that can lead to further work?

The Project Planner and the MINERVA Program Suite

The MINERVA System provides a set of tools that guide the user through the steps of planning and carrying out a project. The Project Planner provides interactive screens on which the scholar can fill in blanks, type descriptions, or create diagrams. The user is aided in choosing data, selecting programs, planning the final output, assigning tasks in multiperson projects, and evaluating the success of the completed project. Examples are provided, with links to extended discussions in the text chapters. The MINERVA Program Suite also supplies individual programs to do specific things, such as make a concordance, create statistical tables, and find recurring clusters of associated words in a text. There are also programs to create or adapt data for the analytical programs. (Examples can be seen in the Screen images below. For a complete description of the Project Planner, see Chapter 2, for the MINERVA Program Suite, see Chapter 3, and for the OwlData programs to create your own data, see Chapter 4.)

Techniques used by MINERVA have been adapted from methods of Systems Analysis, widely used in science and industry, which emphasize diagramming techniques and modularity. Whatever your scholarly interest, you can build a project from the bits and pieces supplied by MINERVA, or you can add your own. It is modular and extensible, and requires only data that can be downloaded from the Internet, needing no proprietary texts or databases.

The work may be done top-down (big idea first) or bottom-up (little ideas eventually coalesce). The most important steps in project planning are the selection of the topic and its definition in precise terms. Vague descriptions such as "literary influence" or "realism" must be analyzed to see what they really involve, for example lexical identity, metrical similarity, and semantic context. Always important is intuition and the aesthetic sense. One should not simply collect objective facts for their own sake, but because there is some actual reason for collecting them, leading to some insight of literary, historical, or social importance.

Logic and analytical thought, not just computers

Many of the methods presented are equally applicable to non-computerized work, particularly techniques for examining vague and ill-defined concepts (like "sublimity" or "originality" or "derivative" or "socially conscious") and lending them a precise definition. It may seem odd to begin a discussion about computers by explaining that a computer is not needed. What we present, however, is not so much an advertisement for using computers as an encouragement to use the tools of logic, of analysis, and of organization, of which the computer is but a natural extension.

You may click on the links below or on the buttons on the right to go to individual sections of this chapter.

  1. Why planning is necessary: Literary Criticism, Systems Analysis, and the Computer
  2. The art of project planning
  3. The MINERVA System and the Project Planner
  4. Modularity: breaking the project into smaller parts
  5. Sainte-Beuve and the computer: Analyzing the language of criticism
  6. The computer system and its programs as art forms
  7. Top-down or bottom-up? Different ways to structure a project
  8. Expanding the project
  9. What makes a good project
  10. Cicero on the need for breadth of experience
  11. How the MINERVA System is organized
  12. Screen images: What you will see when you run MINERVA


Click on a topic to read about it.

a. Why Planning is Necessary: Literary Criticism, Systems Analysis, and the Computer

Birth of Athena

I began, "Poet, you who guide me,
consider my strength, whether it is sufficient,
before trusting me to the arduous passage."

Dante, The Divine Comedy, Inferno, Canto II. 10-12.1

I do not think there is any thrill that can go through the human heart like that felt by the inventor as he sees some creation of the brain unfolding to success . . . Such emotions make a man forget food, sleep, friends, love, everything.

Nikola Tesla, 1896.2

(Illustration: Birth of Athena from the head of Zeus, from a vase painting.)

The common origins of science and humanities

Arts and technology spring from similar impulses

The Greek goddess Athena -- the Roman Minerva -- was patroness both of intellectual wisdom and of crafts and technology. There is truth in this myth. Technology and science, on the one hand, and the pursuits which we call the humanities, on the other, spring basically from the same human impulses: to know, to understand, to shape, to discover the symmetry and beauty of our universe. The more "practical" among us direct these impulses in such a way as to predict and control; the more "impractical" (and this includes many scientists as well as humanists) pursue their work out of sheer appreciation for its beauty and aesthetic satisfaction.

Paradoxically, methods borrowed from scientific research and engineering offer a way back to the original values of literary study and criticism. These methods, with their emphasis on the setting of goals and definition of the means to attain them, leads us to focus on the actual reasons why we are studying a particular piece of literature, author, or genre. The goals should determine the tools used, not the other way around.

The art of project planning

Portrait of al Khwarizmi

An algorithm is a step-by-step procedure for solving a problem. The word "algorithm" comes from the name of Mohammed ibn Musa al-Khwarizmi, a 9th century mathematician born in what is today Uzbekistan, who wrote many algorithms for solving mathematical problems. Our word "algebra" comes from the name of his book on elementary mathematics, Kitab al jabr wa'l mugabala.

(Illustration: an idealized portrait of Al Khwarizmi, on a stamp of the former USSR. We do not know what he really looked like.)

What do we do with all this data?

Today, enormous databases of every conceivable kind contain texts, images, topographic maps, film clips, audio recordings, and other digitized material. The archives of every library, museum, historical society, and private collection are now (or will be) on line. But this is only the beginning of the process. What can be done with all this stuff?

This contrasts with work done by earlier humanists using computers, who began with a specific problem that they wanted to study (such as irregularities in the versification of Vergil or authorship of the books of the New Testament), then put some text in machine-readable form and sought to create programs that could help them. The development of large-scale databases, such as the Thesaurus Linguae Graecae or the Encyclopedia Britannica, shifted attention to assembling vast amounts of data using production-line methods with no specific goal in mind. Programs were developed to search these databases or and display their information.

"How do I use all these things? How can these facilities help me?" some literary scholars ask. "And what does this have to do with ordinary literary criticism or scholarship?" There is a tendency to let the tools drive the research; the scholar uses a particular technique simply because it is there, not because it contributes to any actual notion that the researcher may have about the chosen topic. Humanists try to out-scientize the scientists, collecting numbers for their own sake (forgetting that real scientists are always guided by intuition, beauty, symmetry, and other non-"objective" matters).

"Backing into" planning

Many scholars know exactly they want to know, they just want a program that will give them an answer, without the need for any complicated planning.

Actually, it is common for researchers to "back into" planning. Once started, they ask: "How do I get files?" "How do I know which files I need?" "Will this program help me?" "How do I know what program to use?" "Will it give me the answers I want?" The answer is, it depends on how you define the problem, in terms that a computer will "understand." Exactly which factors do you want the computer to count, compute, display, or otherwise process? Your initial definition of the subject and how the computer defines it may not be the same. If you are looking for echoes of Spenser in the poems of Shelley, you may consider the following lines, from Spenser's The Ruines of Time and Shelley's The Triumph of Life:

Fled back too soone unto their native place (-Spenser)

Fled back like eagles to their native noon (-Shelley)

You, as a human reader, instantly see the correspondence. But do you tell the computer to look for recurrence of identical words? Or identical position in the verse? Or rhyme? Or similar metrical patterns? Or some combination of all of these? Or something else?

To ask "Will this program help me?" or "Which files do I need" is to approach the problem from the wrong end. The starting point should be identify with clarity the topic of the project, then find (or create) the necessary resources. For the literary scholar, this means analyzing the actual language of criticism, in the etymological sense of the word "analyze," which comes from the Greek ana-luo, to break apart something into its constituent parts.

Pushing buttons and expecting an answer

Since we use computers all the time in our everyday lives, we are used to pushing a button and having something happen. But there is constant interaction between the human researcher and the computer, and between the user and the computer programmer who writes the programs. You have to know what you want to do before you know which button to push. There is human intervention at every stage: defining the topic, choosing appropriate input, choosing a program, and interpreting the results.

The MINERVA System and the Project Planner

The literary systems analyst

The Minerva System is a set of tools for planning and carrying out a project in literary study. Boxes are provided to fill in, spaces for typing descriptions, and screens for making diagrams with templates that can be modified. Techniques have been borrowed from the discipline of Systems Analysis, well known in the industrial world and in the sciences, which emphasize modularity (dividing a problem into parts) and extensibility (the capacity for adding more parts). In the manufacturing environment, project-planning software, such as the Catia System used by both Boeing and Airbus to design and build airplanes, is employed in all phases of the Project Life Cycle, following a project from its inception to its completion, use, maintenance, and eventual retirement and replacement. It coordinates and tracks the activities of workers in far distant locations, working on parts of the same project at the same time. The computer and the Internet create opportunities for the same kind of collaboration in scholarly activities. Top-down methods are emphasized, starting with the big picture and breaking it into smaller pieces that can be worked on separately, but bottom-up methods are also used, for example in writing down a group of ideas and letting them coalesce into a pattern (see "Top-down or bottom-up: Different ways to design a project" below). The method includes analyzing literary criticism itself, to determine which aspects can be studied using a computer, then choosing inputs, outputs, and programs that will attain that goal. A project is defined as an undertaking that has a goal and an organized way to get to it.

The concept of a modular, extensible system, divided into parts, which can have more parts added to it, is not unknown in literary scholarship. David Packard Jr.'s Ibycus System in the field of Classics, described in Chapter 6, is an example that may be familiar to some.3

Although Systems Analysis does not necessarily have anything to do with computers, its techniques are well adapted to the use of computers. Some of the examples of research that are described in later chapters, in fact, did not use computers (for example, Carney's use of quantified methods of content analysis in his studies of Roman historiography, described in Chapter 6).

In addtion to the Project Planner, the MINERVA System includes the Minerva Program Suite, which provides programs for doing specific literary tasks, such as creating concordances, collecting statistics on word frequency, and identifying clusters of cooccurring words, as well as OwlData programs to create input data for the literary analysis programs.

What the MINERVA System contains

The MINERVA System is made up of programs and text chapters. These are linked so that one can start either with the text chapters (read with a standard browser), as illustrated by the programs, or with the programs, as explained by the text chapters.

The programs of the MINERVA System do not require any proprietary databases or programs, which are not available to all users, or require expensive licenses. Any text in ASCII format, such as those downloaded from the Internet, can be used. The programs themselves are open source, and source code is provided. The techniques of the Project Planner can, of course, be used with proprietary as well as non-proprietary materials.

The parts of the system are arranged as follows (for more detailed listing of the actual programs, see below in How The MINERVA System is Organized.)

  1. Systems Analysis Tutorial/Project Planner. Takes the user through the steps of the Project Life Cycle, from Selection of a Topic to Execution of the Program(s) to Evaluation of the Results. Interactive screens allow the user to plan his or her own project, filling in blanks to choose output layouts, input requirements, programs, and division of work. A study of Coleridge's The Rime of the Ancient Mariner is used as a Case Study. Screens are linked to sections of Chapter 2 of the text.

  2. Minerva Program Suite. Includes programs to make concordances, make maps of recurring themes, conduct statistical studies, perform cluster analysis of recurring words, "compose" original sentences, etc. Screens are linked to sections of Chapter 3.

  3. OwlData programs. Allow the user to create, download, and adapt new data for the MINERVA programs. Screens are linked to sections of Chapter 4.

  4. The Loom of Minerva text chapters. These comprise a Preface and ten chapters, the first four of which are linked directly to the MINERVA programs. These are linked to other chapters that supply historical and technical background material, including chapters on statistical methods and on applications.

    Every discussion of a program is illustrated by an actual problem in literary criticism, including examples from Vergil, Coleridge, Shelley, Baudelaire, Gertrude Stein, and others.

Coleridge's Rime of the Ancient Mariner as a Case Study

For the Project Planner, we use an analysis of Coleridge's The Rime of the Ancient Mariner as a Case Study. This text lends itself to computerized study as a sample project for many reasons. Coleridge used a wide variety of sources for his fantastic tale, rewriting it over and over in multiple versions. It has influenced other writers (including Mary Shelley in her Frankenstein) and inspired many artists and illustrators. It shares aspects with many other stories of accursed wanderers, and phrases associated with it have found their way into the speech even of those who never read the poem, such as "an albatross around one's neck" and "Water, water, every where, /Nor any drop to drink." It has been commented on by many critics, both in praise and in contempt.

Examples of various kinds of texts and criticism

In the discussions in the text chapters of the MINERVA programs, we examine a number of works of literature, and the critics and scholars who have studied them, with the purpose of seeing how quantitative methods could be brought to bear on both text and critic. Many of the examples of literary analysis are the actual problems whose solutions formed the basis for specific MINERVA programs. In addition to the study of Coleridge's Ancient Mariner (with criticism by the Victorian aesthete Walter Pater and others), these include a statistical study of repetition in Gertrude Stein (with analysis of her own works by Stein herself); intimacy of style in Vergil's Eclogues (as commented on by the Classical scholar J.Wight Duff); Baudelaire's combination of beauty and viciousness in Les Fleurs du Mal (through the eyes of the flamboyantly romantic critic Charles Augustin Sainte-Beuve); onomatopoeia in Victor Hugo's sea poem Une nuit qu'on entend la mer sans la voir (as seen by another poet, Swinburne); Edna St. Vincent Millay's ambivalent career ambitions in Travel (as seen by differing critics). Other topics covered include a content analysis of various ancient writers' views of the Roman statesman Marius (by Cicero and others), cluster analysis of mythic themes in the Homeric Hymns, questions of authorship in St. Paul and Jane Austen, reconstruction of damaged manuscripts, composition of poems and folktales by computer. We also consider criticism by Sainte-Beuve on Vergil and Molière and by poet T.S. Eliot on Dante.

The hierarchical chart in Figure 1.1. below illustrates the layout of the MINERVA System. The contents of MINERVA are described in more detail below, in How The MINERVA System is Organized.

Layout of MINERVA System

Modularity: breaking the project into smaller parts

Container ship

We divide the project into parts that can be worked on separately, while fitting them into an overall plan, like stowing containers, belonging to separate owners, on a modern cargo ship. We see this principle in the step of the Project Planner called Functional decomposition: dividing the project into parts, described in Chapter 2. (Illustration: a ship enters the port of New York through the Kill Van Kull, Staten Island. Photo by C.A. Sowa.)

The computable and the uncomputable

One of our first tasks is to break down the project into small steps that can be tackled individually. In some parts of the problem, a computer can be of help and others where it cannot. Uncomputability itself is not a reason not to choose a topic. Uncomputable parts are necessary, and will be reassembled with the computable parts at the end of the project. Half the trick is knowing when and where a computer should be used, a subject that we treat below under "Sainte-Beuve and the computer" and later under "Choosing a Topic for Research") and in greater depth in Chapter 2 under "Clarification of Meaning: Identifying the Quantifiable".

The MINERVA System itself, as can be seen in the hierarchical chart above, is made up of separate modules, which can be added to as wished.

The epic formulas of programming

Breaking down a subject into its component parts is natural to the computer. It is the essence of the modern digital computer that it breaks down everything it does into a series of small activities that it performs over and over. Information is stored in the computer as an aggregation of ones and zeroes, called "bits," represented by circuits that are on or off, and bits are arranged in groups of eight, called "bytes." The digital computer, like an abacus, operates by calculating in discrete steps (the word is from the Latin digitus "finger," referring to calculation by counting on one's fingers; the opposite is an analog device, which moves by continuous amounts, like a sundial).

The programs of instructions that tell the computer what to do are built of reusable modules, repeated over and over. In this, they resemble Homeric and other oral poetry, with their reusable "formulas," "type scenes," and "themes." (See Chapter 3, "Homer, Beowulf, and Computer Programs: Using Prefabricated Pieces.") Typical modules of computer programs include reading input from a database or from information typed by the user; searching the database for information; performing calculations, such as applying statistical formulas, or finding correlations (for example between word length and sentence length or between educational level and number of books read); and formatting output for attractive viewing on the screen or in print. Activities such as reading, searching, calculating, and arranging, will look familiar to any methodical scholar, whether or not he or she is using a computer. Modules can be changed or moved around without disturbing the whole plan. (See "Subroutines: the Mechanics of Modularity" in Chapter 9.)

Putting it back together

Those parts of the project that are not computable, requiring human choice and interpretation are equally important. Later in the project, we integrate all parts of the study, in the step of the Project Planner called Synthesis of computable and non-computable parts.

Sainte-Beuve and the computer: analyzing the language of criticism

Painting of ship

Our principal Case Study in the Project Planner is an analysis of narrative themes in Coleridge's Rime of the Ancient Mariner. It has been criticized, both positively and negatively, by many critics and writers, including Wordsworth, Southey, Swinburne, and Pater. Their styles and the content of their comments differ wildly. It is a challenge to computerize their disparate insights. (Illustration: painting of an antique ship, artist unidentified.)

Computerizing different styles of criticism

To use a computer in literary study, we must examine the language of criticism. Fashions in criticism come and go. There have been classical criticism, romantic criticism, traditional philology, New Criticism, deconstruction, postmodernism. The computer can be used with any of them. We just have to find the parts that can be quantified, that is, expressed by counting, sorting, or arranging aggregations of data. Some are more easily computerized, some with more difficulty.

Baudelaire's ornate kiosk

While many types of literary scholarship can benefit from using the computer, these chapters emphasize applications of technology to traditional belles-lettres or aesthetic literary criticism. This emphasis is not accidental, for I wish to show that the use of computers is not inimical to the most intuitive humanistic studies. The nineteenth-century French critic Sainte-Beuve may be taken as the prototype of the romantic critic. We explore his baroque criticism of Baudelaire in Chapter 3. An example is the following, translated from the French:

". . . M. Baudelaire has managed to build for himself, out at the very farthest point of a neck of land reputed uninhabitable and beyond the frontiers of known romanticism, a bizarre kiosk of his own, ornate and contorted, but at the same time dainty and mysterious. Here Edgar Poe is read, exquisite sonnets are recited, hashish is taken for the purpose of analyzing the experience afterward, and opium and every other more dangerous drug is served in cups of the most exquisite porcelain."

This statement can be analyzed, as we explain in Chapter 3 while discussing the program CATMAP, as "Baudelaire describes ugly thoughts in beautiful language."

Sainte-Beuve's essays on French and Latin literature are filled with a vocabulary of "purity" and "light" that would be widely derided today as "unscientific." Yet even such criticism can be brought within the range of the computer. I like to think, somewhat whimsically, that these chapters show how Sainte-Beuve might have used the computer, if he had had access to one and the inclination to use it.

As the computer scientist Marvin Minsky (whose work we discuss in Chapter 6) said,

"any procedure which can be precisely described can be programmed to be performed by a computer." 4

Literary material already is in quantified form--letters on the page

It may surprise the reader that literary material already exists to some extent in discrete, quantified form. Written language consists entirely of a set of marks upon the page, a limited set of symbols whose changing patterns approximate the words of spoken language, which in turn can only hint at the reality of the world and the author's thoughts about it. Stephen M. Parrish wrote of this dilemma in "Computers and the Muse of Literature":

Our only clues to "meaning" lie in black marks on a page

The real questions in the humanities, as a critic has . . . put it, are always qualitative and always unanswerable. What, then, could a data processing machine contribute to their analysis? . . . Without striking very deeply into literary theory, we ought to be able to see that this problem is not a new one, and has nothing to do with computers. It is not even distinctive to the field of literary study . . . In the study of literature, the barrier lies, as it has always lain, squarely across some of our central preoccupations. It lies, that is, between the black marks of ink on sheets of white paper, which are quantitative things, and the emotions and ideas that produced those marks, which are qualitative. Or between the marks and the values they embody. Or, depending on one's critical stance, between the marks and the emotions and ideas they arouse in the reader.5

The digital computer does everything that it does by adding, subtracting, multiplying, and dividing numbers, and moving these numbers around. Letters, words, and even pictures are represented by patterns of numbers. We might feel that a device that imposes discrete numbers on imprecise occurrences cannot be trusted to tell us about inexact experiences such as thought and beauty. But the fear is illusory, since in literary criticism it is not the real world we are studying, but our means of communicating about the world.

As early as the Middle Ages and Renaissance, Ramon Lull and Gottfried Wilhelm Leibniz attempted to express theological and philosophical truths as combinations of numbers (as described in Chapter 6, "The Kinds of Projects That Have Been Done in Literary Computing"). Despite all the fancy graphics and multimedia, the computer is still basically just a calculator. But we cannot just count everything, since the number of things that can be counted is infinite. We must decide which numbers will tell us something interesting.

It is in this sense that we can "learn to think like a computer," not in the sense of lessening the impact of great art, or "turning poetry to dust" (see Step 4 of the Systems Analysis Tutorial/Project Planner, "Clarification of Critical Terms," described in Chapter 2.)

The computer cannot confer total objectivity

The question is often asked, of how the computer can be used to ensure complete objectivity. The answer is that the computer will never confer complete objectivity. There are arbitrary decisions to be made in all parts of the project, starting with the choice of the subject to be studied and the definition of the way the problem is to be solved, and concluding with the final interpretation of the results. The very use of the term "objectivity" implies that to every question there is only one answer, which can be stated in only one particular way. The use of statistics to "prove" wildly contradictory conclusions is well known. (See, in Chapter 5, "The Significance of Statistics.") We cannot be "completely objective" in this sense, nor should we want to be.

We can actually draw upon our own personal experiences and memories to guide us in defining and interpreting our studies. In the Case Study of Coleridge's The Rime of the Ancient Mariner (Chapter 2, illustrating the Project Planner), we ask, for example, whether we have ever ourselves known any "Ancient Mariners" (of whatever walk of life), who love to tell their tales, whether we have ever felt the need to expiate a sense of guilt, whether we have ever had an uncanny experience, whether any other of our experiences can be brought to bear on our appreciation of the poem. (See, in Chapter 2, "Alternative definitions of the data: What if I change my mind?", Synthesis of computable and non-computable parts, and "The Computer is not an Oracle."

As mathematician Richard Hamming has said:

"The purpose of computing is insight, not numbers."6

Just what is "early springtime freshness?"

Flowers in the garden

What do we mean when we say that a poem "has an early springtime freshness about it"? Are we reacting to its descriptions of flowers and growing things, or to the bouncy sweep of its phrasing and meter? Or something else? (Illustration: flowers in the garden, photo by C.A.Sowa)

Violets and childbirth in Pindar

Professor Finley once said that Pindar's Sixth Olympian Ode "had an early springtime freshness about it." This is an intriguing thought, but what does it mean? Consider in particular the verses in which Evadne "the violet-haired," pregnant by Apollo, gives birth beneath the wooded river bank to the boy Iamos, whose name comes from the word for "Violet" (ion). Here, in translation, is the passage:

. . . Laying down her purple-woven girdle
and her silver pitcher, in a dark blue thicket
she bore a boy whose mind was god-inspired.
Golden-haired Apollo sent to her Eileithuia [goddess of childbirth] of gentle counsel, and the Fates.
There came from her womb, through sweet pangs, Iamos,
immediately into the light. In pain,
she left him on the ground. Two grey-eyed serpents,
by the decision of the gods, nourished him with blameless
venom -- the honey of bees. . .
. . . He was hidden among the rushes in the impenetrable thicket,
his delicate body bathed in the light of yellow and deep-purple
violets; whence his mother declared that he would be known for all time
by that immortal name [
Iamos]. . .

(-Pindar, Olympian Ode VI vv. 39-57)

What gives the impression of "springtime freshness?" Perhaps the many words for colors? The fact that violets (her hair color, his name) bloom in the spring? The connection with the birth of the child? The bees? How does this passage compare in vocabulary and theme with other odes of Pindar or with other Greek poetry?

To take another example, perhaps we feel that a certain dramatist's later plays "are better crafted than his earlier." Do the later plays show a change in large plot structure? Do they contain less or more digressive material? Do they perhaps include more verbal echoes from one scene to another? Are there more characters? Fewer characters?

Strategy: getting the computer to recognize words, sentences, motifs, and concepts

Quantified tools, including those provided by the computer, can be made to recognize various features in a work of literature, including simple units like letters, words, syllables, and sentences, but they can also be made to find larger configurations like themes, motifs, figures of speech, and concepts, if we define them in terms of smaller features.

Strategies for getting the computer to recognize literary units

  • The basic unit for analyzing a literary text is the individual letter. Each is represented in the computer by a unique code of bits -- ones and zeroes. A letter with an accent, ligature, or other mark, like é or ö or ç or Æ, is represented by a different bit pattern from one without it. Punctuation symbols like periods, commas, and quotation marks are also represented by bit codes. (See Appendix III to Chapter 10 under "The World as Ones and Zeroes: The Binary Representation of Information.")

  • To identify individual words, we search for blanks between words. In the computer, a blank is a character, coded with its own pattern of bits, just like letters and punctuation. When the computer finds a blank, everything between blanks is a word. (Our subroutine to do this, called GetWord, is described in Chapter 9.)

  • A sentence can be defined as a string of words terminated by a period, colon, or question mark; To identify sentences, therefore, the machine can look for these marks of punctuation.

  • To search for a particular word, like the name "JUNO," the computer compares the pattern of letters in each word with the letters in the word it is searching for, but the user can just say, "Find the word 'JUNO'," and the machine will find it. (In the old days, the programmer had to specify each letter separately, as in a football cheer: "Give me a 'J';" Give me a 'U';" "Give me an 'N';" "Give me an 'O'!")

  • Among larger patterns, some rhetorical figures, like alliteration (repetition of the same letter at the beginnings of successive words) and epanastrophe (repetition of the same letters at the end of one word and the beginning of the next), require only recognition of individual letters. Some examples of such studies are described in Chapter 6, under "Sound patterns, ancient and modern."

  • Poetic meter can also be studied by examining individual letters. For Greek and Latin poetry, the computer can be given rules for long and short syllables, so that the machine can analyze its quantitative meters (for examples, see Chapter 6, under "Metrical analysis of Greek and Latin poetry").

  • Grammatical forms can be identified by searching for particular combinations of prefixes and suffixes. (They can also be flagged by attaching identifying tags to them; see below and in Chapter 6 under "Dictionaries, lexicography, and language classification.")

  • Similes can be found by looking for specific words, such as "like," "such," or "as."

  • Themes and motifs, such as "love, "death," "fire," "honor," or "friendship," can be identified by looking for specific words that embody those concepts. This technique can even be turned on its head, by identifying themes by finding which words occur together Examples appear in in the discussion of "Cluster Analysis" in Chapter 6; also see Chapter 3, in the discussion of the CLUMPS program ("Using cluster analysis to find associated ideas in a text: Nature and solitude in Shelley's Alastor").

  • Flagging the text provides another method of finding themes. The machine is given a text that has been specially prepared with an identifying mark for each feature one is looking for. A capital letter, (such as "F" for "fire" or "L" for "love") could, for example, be placed before or after each "thematic" word one is interested in. (For flagging of themes, see below in the Case Study of Coleridge's Ancient Mariner; it is illustrated in Figure 2.33, under "Input requirements").

  • Flagging of texts with a grammatical category for each word is useful in the study of an author's grammatical usage. The use of flagged texts to identify grammatical forms is demonstrated in the MINERVA System See Chapter 3, in the discussion of the GERTRUDE program "Statistical analysis of repetitions (Gertrude Stein's Tender Buttons.")

  • Figures like personification and metaphor require semantic analysis as well as vocabulary search to determine when, for example, a word normally used with a human subject is being used with a non-human or inanimate subject. As we describe in Chapter 6 under "Rhetorical figures", methods of linguistic analysis and artificial intelligence (described below) have been employed to both study and create metaphors by joining words from disparate semantic fields, as in the metaphor "Socrates is a midwife..." For quick results, these figures, too, can be flagged in the text with special characters.

  • Linguistics is a complete field in itself, too large to be covered in depth here. Linguistic analyses, involving transformations, semantic networks, and other techniques of linguistic theory have useful applications for the literary scholar. These include parsing programs that analyze the grammatical, syntactic, and semantic structure of sentences, the study of metaphor mentioned above, and programs that translate from one language to another.

  • Artificial intelligence, the science of mimicking in machine functions the concept-making activities of the human mind, is also not covered here except in short comments. Among the accomplishments of artificial intelligence is the development of so-called expert systems, which can answer simple questions on specific topics. In literary studies, expert systems have been used to answer queries about grammatical forms (for example in French mediaeval texts, as described in Chapter 6 under "Dictionaries, lexicography, and language classification").

The computer system and its programs as art forms

Playing lyre for a deer

. . . If a person
Asks his questions of [the lyre] knowingly, with skill and wisdom,
Clear of voice it teaches all kinds of things that delight the mind,
Easily, when played with soft familiarities,
For it avoids ill-suffering labor. But if a person
Questions it stupidly and violently from the start,
In vain it makes a false, bombastic jangle.

(-The Homeric Hymn to Hermes, vv. 482-488, translated by C.A. Sowa)

Hermes, speaking to Apollo, is describing the lyre, a machine for making music that he has just invented. The computer, like Hermes' lyre, is in itself a work of art, and the programs (the "music") that we write for it should respect its nature. (Illustration: Greek red-figured vase, lithograph by A. Rey for de Kaeppelin et Cie., ca. 1840.)

The program as a creative narrative

Computers and their programs are, at their best, art forms, each with its own structure, balance, and symmetry. A computer program is a narrative telling the computer what we want it to do. It has a beginning, middle, and end, and, like any work of literature, it has something to communicate!

Computer programs and programming languages differ greatly in style, although this aspect is usually visible to the nonprofessional computer user only in its effects. Programs can be compared to various literary forms, in the similarity of program elements to literary components such as vocabulary, narrative, and character development (see below under "Top-down or bottom up? Different ways to design a project," in Chapter 3 under "Project design is like Homeric poetry", and in Chapter 7 under "You can "tell the same story in many different languages"). Some programs (and programming languages) are long and windy, like nineteenth-century novels; others have the algebraic rigor of a haiku. Programs that use a lot of graphics are like coffee-table books, consisting of mostly pictures and little text.

To "think like a computer" does not mean to dehumanize ourselves, but to think with logic and clarity. To be "scientific" does not mean to strip literature of its beauty.

"Scientific" doesn't have to mean "ugly" or "sterile"

Science is based on methods of rigorous testing and predictability, requiring the capacity to get the same answer with each repetition of the same experiment. It uses empirical methods, based on an accumulation of observations, recorded with word-for-word exactness. In the modern world, science comprises the most dominant and prestigious branch of thinking. Non-sciences, such as history, have attempted to resemble science, basing their conclusions on accumulations of observed events. A way of looking at the world has evolved, which values the quantifiable and the easily duplicated, while devaluing the shifting and ambiguous. This awe of engineering and "efficiency" and its effect on literature (for example in T.S. Eliot's The Waste Land) are discussed in Chapter 6.

Humanists frequently succumb to the temptation of trying to turn their fields into sciences, in order to enhance the prestige of their own disciplines. These attempts sometimes embody valid adaptations of scientific principles to humanistic studies; at other times, the scholar simply grasps at the trappings of scientific practice. Humanistic scholars who use the new technology abandon their own aesthetic values too readily, allowing themselves to be persuaded that words like "beauty," "elegance," and "charm" are inappropriate to the objective pursuit of scientific truth. They write in abstract jargon and display the results in impressive tables, assuming that the use of a computer will of itself confer significance upon their work. Scientific methods and technical terms have valid uses and meanings; indeed, many are described in these chapters. They do not of themselves, however, confer worth upon a project that does not otherwise have it. Seeking to rid themselves of intuition (which any real scientist will assert is indispensable to solving a problem), humanists pursue the will o' the wisp of out-scientizing the scientists, only to end in arid and useless exercises in abstraction, a quicksand of megabytes and multivariate analyses.

Pseudoscientific affectations are not confined to work in which a computer is used. They have been endemic to the more technical kind of scholarly writing for many years. The introduction of the computer has only aggravated the tendency. This, however, has little to do with real science or technology.

The aesthetic dimension in science

The best scientists and engineers have for centuries looked upon their occupation as an Art, and they recognize the importance of the emotional dimension as a guide to its proper pursuit. Words like "intuition," "elegance," and "beauty" are frequently encountered in scientific writings, as seen in the following excerpts:

Scientists' words about beauty in science:
  • C.B. Bazzoni expressed the importance of aesthetics in Kernels of the Universe, "Scientists . . . are necessarily poets -- I mean that in their work they must use the same powers of constructive imagination that poets and painters use." 7

  • Robert A. Frosch, geophysicist, oceanographer, and, at the time, Assistant Secretary of the Navy, reminded a group of systems analysts, that in the proper assessment of a job they must ask, " . . . but is it a good system? Do you like it? Is it harmonious? Is it an elegant solution to a real problem?"8

  • Henry Petroski, in To Engineer is Human (quoted at the head of this chapter), states that "The work of the engineer is not unlike that of a writer. How the original design for a new bridge comes to be may involve as great a leap of the imagination as the first draft of a novel."9

  • Physicist Edward Teller, lecturing at UCLA, followed his explanation of one of the great scientific discoveries with the words, "Of what use is all this? Well, in the first place it's beautiful, and it's true..."10

  • E.W. Dijkstra, speaking of computer and program design in "Some Meditations on Advanced Programming," says "the tool should be charming, it should be elegant, it should be worthy of our love . . . In this respect, the programmer does not differ from any other craftsman: unless he loves his tools it is highly improbable that he will ever create anything of superior quality. Thus, at the same time these considerations tell us the virtues a program can show: Elegance and Beauty."11

Differences between humanities and technology

Science and engineering are like the humanities in that they are guided by beauty, elegance, and harmony, appeal to the emotions, and are forms of art. But they are different from the humanities in one important respect. While the humanities may encompass rational modes of thinking, the sciences demand and require logical thought and reasoning. The use of the computer enhances clarity of thought.

There are, of course, those who feel that their enjoyment of a thing or person comes from a sense of mystery, from not knowing too much about it. But the computer's aid is welcome to those who, when they love a thing or person, want to know as much as possible about the beloved.

Top-down or bottom-up? Different ways to design a project

Ways to design a project

There are many ways to design a project: 1. Top-down or hierarchical, which is the method we use the most in MINERVA; 2. Bottom-up, where ideas float around until they coalesce into a pattern; 3. "Characters" (or data) first; 4. General concept first. (Collage by C.A. Sowa.)

Plot- or character-driven programs and novels

A top-down methodology, where we start with a big idea that we decompose into smaller modules that can be moved around, replaced, or changed, is emphasized In the MINERVA System. But this is not the only way that a project can be created, and some scholars chafe at the suggestion that they need a central focus at all in the beginning, preferring to let the focus develop as the work progresses. There are many ways to get going, and, in fact, several of them have been integrated into the MINERVA System, where appropriate. Engineering projects, poems, and novels can all be worked on in different ways, by different creative minds or even by the same artists at different times.

Here are some of the ways to design a project (with comparisons to literary forms), and the ways that we have used them in the MINERVA project:


  • Top-down. This is the method emphasized in MINERVA. We develop an algorithm, hierarchically laying out the big subdivisions first. We describe the steps to be taken, the order in which they are to be executed, and the relationship of the parts to each other, filling in the details later. If we were writing a novel, we could start with the plot or "story," then develop the characters and fill out the narrative with incidents or episodes. The method is illustrated by charts like those introduced below in this chapter, including the functional decomposition (hierarchical) chart, the flow chart, and the circular wheel chart.

  • Bottom-up. Whether we are designing a system or writing a novel or short story, we can start with small bits and pieces, then watch them coalesce into a larger whole. We use this method in MINERVA, too. A writer might begin with bits of dialog or isolated descriptions, then let a plot arise from them. In the Project Planner, described in Chapter 2, we suggest that the student use a similar method in the design of a computer project, by writing down key thoughts about a topic in the form of a worksheet, whose items may be chosen from an initial project description. He or she then begins to pick out important ideas from the worksheet, grouping related items together and rearranging them into some kind of order (changing the order if desired).

    For testing a system, we also advocate bottom-up methods. After a project has been written, we test each part separately to make sure that each works properly, before testing the whole program (Chapter 10).

    We also use a bottom-up organization in our descriptions of projects carried out by different researchers who have used computers or other quantified methods in the study of language and literature (Chapter 6). First to be described are studies of individual sounds and sound patterns, then studies of vocabulary and grammar, then finally research on larger building blocks like themes and story patterns. There is no way, of course, of knowing, in most cases, whether these scholars themselves have used top-down, bottom-up, or other methods in their work.

  • Data first (characters first). A novel, instead of being plot-driven, may be character-driven. The author imagines a cast of characters, then watches to see what they do. Just so, the designer of a computer project may start by defining the data elements needed, such as text, dictionaries, statistical tables, etc., then decide what to do with them. In the Project Planner (see Chapter 2), we define the inputs and outputs that we want, then design an algorithm to get from here to there.

    In Visual Basic, the programming language in which MINERVA is written, we start with the layout of the screens before we begin to write a program that makes them look that way (Chapter 7). If you run the MINERVA programs, you see the result. If you are just reading the text, look below under "What you will see when you run MINERVA." With their emphasis on graphics, I compare Visual Basic programs to coffee-table books.

  • "Idea" or concept first. Here we start with a vague idea that must be defined. This is, in fact, what this whole course is about, taking a vague hunch, feeling, or intuitive appreciation and giving it more precise expression. Thus we gain insight into the object of our interest or obsession. We can use any of the methods above to accomplish this goal.

Expanding the project

MINERVA: an ongoing project

Techniques of Systems Analysis provide a means not only of planning and designing new projects, but of expanding and extending existing projects.

The MINERVA System for Study of Literary Texts, both as a series of programs and as a set of text chapters, is an ongoing project. Developed using the techniques of modularity described in the following chapters, it is expandable, so that modules can be changed or added. The suite of programs currently contains eight programs to perform specific types of literary analysis, eight "OwlData" programs (named for Athena's owl "mascot") with which to create data for the analytical programs, and sixteen programs in the Systems Analysis Tutorial/Project Planner. The organization of these programs is outlined below in "How the MINERVA System is Organized.

The MINERVA System is designed to be extended, deepening its exploration. The Project Planner has already been expanded by adding programs to aid the user in creating charts, diagrams, and verbal descriptions of a project, and new functions are being added to the CLUMPS and LEMMA programs for performing cluster analysis.

There are a number of ways in which a project can expand. This topic is also discussed in Chapter 2, under "Expanding the Labyrinth."


  • Adding more modules. The MINERVA System has already grown by adding two programs based on Sowa and Sowa's Clump Finder project to perform cluster analysis (the study of recurring word groups), described in Chapter 6, to the original MINERVA suite of programs, in response to a user request. Discussed in Chapter 3, it is called CLUMPS. In conjunction with it, we have added an additional OwlData program LEMMA to create a canonical dictionary to turn the text into stem form. The MINERVA System has also grown by adding more data entry screens to the Project Planner.

  • Expanding the functions of each module. Another possible expansion of MINERVA is for each of the programs that perform literary analysis to become a portal to a whole family of programs. Thus, instead of one program to make a simple concordance of a text, there could be a screen offering the user a choice of several different types of concordance; instead of one statistical program, we might see a gateway to a selection of various statistical measurements; instead of one program to compose original paragraphs, we might be offered a choice of several different kinds of composition programs -- perhaps one to compose paragraphs in the style of an author, one to compose haikus, one to put together original "folk tales," "soap operas," or "westerns." Already, the CLUMPS module includes two versions of the program, using slightly different algorithms to achieve similar results.

  • Adding personnel. More people can work on a project. These can include students and colleagues, at the same university or at other locations. They can also be data entry personnel working under contract.

  • Expanding spatially. Users can be added at various locations. At present, all users of MINERVA have their own copies of the system. The system may be eventually placed on line, which will require new measures to allow for simultaneous use.

  • Adding platforms. At present, the MINERVA Program Suite runs only on Windows. Macintosh users can only read the text chapters. Changes in the future can make the full system available on the Macintosh.

What makes a good project

Deciding what parts (if any) of your project can be mechanized can be a challenging, if fascinating, task. Your idea of what can be computerized may change as you go along -- as happens frequently in technology.

In the nineteenth century, the old hand-operated looms (which themselves replaced more primitive weaving methods) were replaced by the power-operated Jacquard loom, which was controlled by punched cards. The Jacquard punched card became the ancestor of the punched cards used in older modern computers, as we describe in Chapter 6. Modern weaving is, of course, computer-controlled. Pictured is one of the looms that predated Jacquard's invention.

(Illustration: "Ribbon loom," from an 18th-century engraving.)

Choosing a topic of intrinsic worth and computability

The best project has a clearly defined goal, but has wide importance. It can be divided into parts, large and small, computable and non-computable. Some tasks, such as, say, ascertaining the percentage of proper nouns in a long novel, are best done by a computer. Aspects of interpretation, on the other hand, are best done by humans.

Choosing a topic of both literary (or historic or social) worth and computability is one of the most important tasks we face as digital humanists. It is the subject of several steps in the Project Planner (described in Chapter 2): "Selection of a Topic" Clarification of critical terms: identifying the quantifiable, Functional decomposition: dividing the project into parts, and Synthesis of computable and non-computable parts. (See also, in Chapter 2, Appendix I Finding the Computable in Various Styles of Criticism). Just as the ancient gods divided heaven from earth, so we must divide the computable from the uncomputable. The test of intrinsic worth should always come first, before assessing computability, then we can see where (if anywhere) we may employ mechanical help. We should keep looking for a topic until we find one that has real significance. This is a better method than looking first at what is computable then trying to find a justification for it.

"Write the report in advance, before you do the research"

Plan your goals in advance. To plan a project is to create its framework. Mathematician Richard W. Hamming gave the apparently paradoxical advice:

Write the final paper in advance, either in your mind or on paper, before you do the research. 12

Despite its apparent absurdity, the method is inherently sound, for the fictitious "final report" establishes a coherent structure into which the real results will fit once they are actually obtained. The plan will, of course, be revised as you go along. This is the method used in the step of the Project Planner called Output layout: what is the goal?

The "Hamming method" is helpful even when the real results turn out to be the exact opposite of the results predicted earlier. Even if the original hypothesis was completely wrong, a clear demonstration of why it was wrong may also have a beneficial result. Should a theory be a complete blind alley, the researcher can abandon the hypothesis and go on to more fruitful areas.

A flow chart for selecting a topic

The procedure for choosing a topic of literary worth and analyzing it for possible computability is illustrated in the flow chart in Figure 1.2. It is a pictorial view of the algorithm, or series of steps, for solving a problem. Where a functional decomposition or hierarchical chart like the one above in Figure 1.1 shows the parts of a project, but is silent about the order in which tasks will be performed, the flow chart establishes a sequence, shown by arrows linking the boxes.

How to choose a topic

Non-linear or circular thinking

Our activities do not always take a straight line; they may be circular or follow many branches, as seen in Figure 1.3, which shows decisions (represented by diamond shapes) to be made at key points. If the answer to the question "Is [the topic] worthwhile from a literary standpoint?" is "No," the arrow loops back to the top, allowing us to choose another topic. To the question "Could parts of [the topic] be quantified?" there are two possible answers: "Pursue a non-computer solution" and "Divide topic into computable, non-computable parts" (followed by "Pursue a combination solution," which is the usual answer).

Whichever route we choose, we eventually arrive at the END of the project (although, as we see below in "The Project Life Cycle," the END may not actually be the end at all, but only a new beginning.)

Detail of choose a topic

The Project Life Cycle

The designing of a project may be compared to a journey or quest, the search for a solution to a problem. It is, however, sometimes more like a labyrinth, in which there are many different routes, and perhaps even many different outcomes. In the commercial and industrial world from which the discipline of Systems Analysis comes, the development of a project is often formally defined in terms of a Project Life Cycle. This life cycle contains phases or episodes, that are the stages of its life history. They are frequently defined thus:

Phases of the Project Life Cycle:

  1. Feasibility Study (to see if the project can or should be done),

  2. Analysis (to see how the project can be done and to compare alternative ways of carrying out the project),

  3. Design (to create an overall structure for the system),

  4. Programming (to write the individual programs that make up the system; if a vendor package is used, appropriate features are selected);

  5. Testing (to try out the system under a variety of conditions, to be sure it works),

  6. Implementation (to complete the system and make it available for use),

  7. Evaluation (to analyze the system's successes and failures, as a guide to future work, often leading to the beginning of another feasibility study).

Figure 1.4 illustrates the Phases of the Project Life Cycle. The circle shape of the wheel chart indicates the cyclicality of the process, with the last step leading back to a repetition of the first. The MINERVA System itself has gone through several life cycles of its own, and will go through more as future developments are planned. There is more about the Project Life Cycle in Chapter 2, under "The Project Life Cycle".

Project Life Cycle

Cicero on the need for breadth of experience

Basketball players

"Just as ball players do not use the specific skills of gymnastic exercise in the game itself, but their very movements show whether they have learned the arts of the gymnasium or have had no such training, ... thus in our very speeches in the law courts, in the public assembly, and in the Senate, is easily revealed whether the speaker has simply wallowed around in his declamatory work, or whether he has approached his task of speaking fully instructed in all the liberal arts." (-Cicero, De Oratore, translated by C.A. Sowa)

Just as ball players and public speakers benefit from a broad array of experience even in the practice of their principal talent, so too, comprehensiveness of training, not just in science but in the arts and other fields of human endeavor, is necessary for the best use of the computer. (Illustration: basketball players on the West Fourth Street Courts, New York City, photo by C.A.Sowa.)

Comprehensiveness of training always shows

Two thousand years ago, Cicero took the position in his de Oratore that the finished orator must know not only oratory, but philosophy and all other disciplines that bear upon its practice. Such knowledge shows itself in the works produced by such a person, whether or not it seems relevant to the precise action of the moment. Crassus, the spokesman for Cicero's views in the dialog, speaks as follows (as translated from the Latin):

Cicero on the training of an orator:

. . . I feel that no one should be numbered among the orators who is not accomplished in all those arts that befit the independent man; for even if we do not actually use them in our speaking, it is nevertheless apparent and obvious whether we are ignorant of these subjects or have been trained in them. Just as ball players do not use the specific skills of gymnastic exercise in the game itself, but their very movements show whether they have learned the arts of the gymnasium or have had no such training; and just as with those who portray something, even if they are not at the moment employing the art of painting, yet it is not hard to see whether they know how to paint or lack this knowledge; thus in our very speeches in the law courts, in the public assembly, and in the Senate, even if the other arts are not expressly brought into play, nevertheless it is easily revealed whether the speaker has simply wallowed around in his declamatory work, or whether he has approached his task of speaking fully instructed in all the liberal arts.13

The use of the computer, too, requires for its most effective practice more than just a narrow proficiency in the operation of machinery or the writing of computer programs; it calls instead for a knowledge of and respect for subjects that touch upon it. Simple use of the computer should ideally be supplemented by knowledge of the history and philosophy of computing, reflection upon the limits of computability, and acquaintance with the pleasures of mechanical objects as well as with the delights of art, music, and literature. Humanists could reclaim science itself as one of the humanities (instead of making the humanities a branch of science -- or pseudoscience), by bringing to bear on the art of computing the full range of human experience. But then, every field worthy of note, from Classical literature to music to filmmaking to shipbuilding, in fact contains within it, in differing forms, all human experience.

b. How The MINERVA System is Organized


Theseus used a ball of thread to find his way through the Labyrinth; we use various techniques and methods to navigate our way through a project. (Illustration: the ruins of the Palace of Minos at Knossos, Crete, perhaps the original of the Labyrinth, photo by C.A. Sowa)

Division of MINERVA into its major parts

The MINERVA System is organized as shown in the hierarchical chart in Figure 1.5 (also called a functional decomposition for its division into "functions" or parts). It illustrates the two major parts of the project, the programs and the text chapters, and the accompanying "Help" screens for all the programs.

Overall view of MINERVA

Organization of the MINERVA programs

The MINERVA programs

The programs of the MINERVA System are represented by the chart in Figure 1.6. (For more detail, see Chapter 2, Figure 2.63 and in Chapter 8, Figure 8.1.)

The programs of the MINERVA System fall into three groups:

  • The Systems Analysis Tutorial/Project Planner. This contains programs for building your own project, with introductory screens and interactive input screens for building your own project. All programs have links to descriptions in Chapter 2 "Starting out: How to Plan a Project,".

  • MINERVA Programs for Literary Analysis, with links to descriptions in Chapter 3, "Picturing the Problem;"

  • Owldata programs for creating your own data to use in the programs for literary analysis, with links to descriptions in Chapter 4, "Completing the Circle."

Relationship to text chapters

The program screens link directly to "Help" screens (accessed by clicking on the "Click Here for Directions and Link to Chapter (2, 3, or 4)" button on each screen), which contain condensed directions for using each program, with buttons to click for links to selected paragraphs of the text chapters of The Loom of Minerva. Below the chart are listed all the programs available in the system. The list has links to the same text chapters that you would get to from the programs.

Overall view of programs


  1. Selection of a topic
  2. Descriptive paragraph
  3. Worksheet
  4. Clarification of critical terms: Identifying the quantifiable
  5. Functional decomposition (hierarchical division into modules)
  6. Output layout
  7. Input requirements
  8. Flow chart(s)
  9. HIPO (Hierarchy plus Input-Process-Output) charts, with inputs and outputs for each step
  10. Program(s)
  11. Test plan
  12. Execution of the program
  13. Synthesis of computable and non-computable parts
  14. System dependencies: using a PERT (or "bubble") chart
  15. Evaluation of results
  16. The Project Life Cycle


  1. CATMAP: visually mapping pairs of opposed categories (Example: Sainte-Beuve's Baudelaire: "good" and "bad" words in Les Fleurs du Mal)
  2. LOOKUP: determining the frequency of selected words (Example: Wight Duff on Vergil's Georgics)
  3. CONCORD: a concordance program (Example: Edna St. Vincent Millay's Travel
  4. HUNTER: searching for words (Example: Onomatopoeia in Victor Hugo's sea poems)
  5. COOCCUR: searching for cooccurrences of pairs of words (Example: word pairs in Hugo's sea poems)
  6. GERTRUDE: statistical analysis (Example: Repetition in Gertrude Stein's Tender Buttons)
  7. COMPOSE: "playing back" the linguistic rules of composition (Example: A program to compose paragraphs "in the style of" Shakespeare, Chomsky, or any other author)
  8. CLUMPS: using cluster analysis to find associated ideas in a text (Example: Nature and solitude in Shelley's Alastor)


  1. MAKETEXT: a utility for creating small test files of original text input in the MINERVA format
  2. ASCITEXT: a utility for reformatting ASCII texts downloaded or copied from the Internet or elsewhere
  3. MAKEDICT: a utility for making an alphabetized list of words, without duplicates, from a text
  4. MERGLIST: a utility for merging two alphabetized lists
  5. CATDICT: a utility for creating sublists and tagged lists
  6. GERTTEXT: a utility for making flagged input texts
  7. PHRASES: a utility for creating phrase lists for random composition
  8. LEMMA: a utility for making a canonized dictionary, with up to three "stems" per word

Executable programs and source code

The disk contains the programs of the MINERVA System in both the compiled or executable version, which is what the user sees, and the source code, which is the version that can be changed or altered by the programmer. For the benefit of anyone wishing to experiment with the source code, text Chapters 7, 8, 9, and 10 include a complete explanation of how to use Visual Basic, the language in which most of the programs are written, and a technical description of each MINERVA program.

In their present form, these programs are copyrighted by the author. The reader should feel free to make modifications for research or teaching purposes, but is asked to credit the original author whenever such work leads to publication or public presentation.

Organization of the text chapters of The Loom of Minerva

Figure 1.7 shows the organization of the text part of the MINERVA System, the chapters of The Loom of Minerva. The chart depicts the three sections of the text. Verbal descriptions follow below the chart.

Chapters 2-4 describe specific programs, which are linked to these chapters. Each chapter also contains links to Chapters 5-10, which contain historical or technical supporting material. These other chapters can also be browsed on their own as desired.

Overall view of chapters


  • Part I, "The Making of a Literary Project": The chapters in this part (which includes the present chapter and Chapters 2, 3, and 4) lay out the steps for developing a literary project, illustrating the method with examples from literature and literary criticism. Chapter 2 describes the Project Planner, Chapters 3-4 describe the programs of the Minerva Program Suite.

  • Part II, "Variations on the Computer Theme in Literature": These chapters survey some applications of computers and computer-like devices to literary scholarship from the oldest beginnings to the present. Strategies of quantification, including statistical and other methods, are emphasized, as well as the influences of technology on literature, from antiquity to the the present day. Synopses of modern scholarship in literature using a computer are provided, including a number of different applications.

  • Part III, "A Visual Basic Implementation of the MINERVA System," is a manual of Visual Basic that uses top-down methods to create modular programs. The MINERVA programs are also analyzed from a technical point of view. Testing and debugging (i.e. error-correcting) methods and other ways for "making your project work" are explained.

A closer look at the contents of the chapters

In Figure 1.8, we draw a detailed visual Table of Contents for the entire text, showing the individual chapters that make up each major part. As we did for the more general chart in Figure 1.7, we append verbal descriptions of the contents of each chapter.

Detail view of chapters


Synopsis: The first part demonstrates the steps for creating a project and illustrates the programs of the MINERVA System.

  • Chapter 1, "A Guide to the Labyrinth: The Problem and its Solution,"is the introductory chapter which you are reading. Some of the most important principles of modular design are demonstrated in it, such as the setting of a goal and the development of a step by step plan.

  • Chapter 2, "Starting Out: How to Plan a Project," describes the sequence of steps for designing and completing a project. The extend from "Selection of a Topic" through the final "Evaluation of Results" (For the actual steps, see above, under "Programs of the Systems Analysis Tutorial/Project Planner.") Coleridge's Rime of the Ancient Mariner, with criticism by Robert Southey, Charles Swinburne, and Walter Pater, is used as a case study to demonstrate the planning of a model project. We analyze different styles of criticism, seeing which parts of a topic can be subjected to quantifiable (and computerized) methods. Your opinion of what can be quantified may change as you analyze your topic and read the chapters. Programs in the Project Planner are linked to this chapter.

  • Chapter 3, "Picturing the Problem: The MINERVA Programs for Literary Analysis," describes the MINERVA programs that perform specific tasks, such as making concordances, gathering statistics on grammatical categories, making visual maps of key themes in a text, or discovering clusters of coccurring word stems. As examples, it analyzes literary works by various writers, including Vergil, Baudelaire, Victor Hugo, Shelley, Edna Millay and Gertrude Stein, together with a selection of critics for each author, from Sainte-Beuve to Swinburne to Gertrude Stein (on her own work). For each problem, we describe the use of one of the MINERVA programs. In addition to programs to study existing literature, there is a program that enables the computer to create "original" compositions "in the style of" an author, including Shakespeare and Noam Chomsky. (For the names of the programs, see above, under "Programs of the Minerva Program Suite: Programs for Literary Analysis".) Programs in the Minerva Programs for Literary Analysis are linked to this chapter.

  • Chapter 4, "Completing the Circle: Creating Data for New Projects," tells the reader how to use the OwlData programs to create original data for the MINERVA programs. What we put into our data determines what type of answers we can hope to get from it ("Garbage in, garbage out" is the common phrase, or, as we might put it, "Poetry in, poetry out"). (For the names of the programs, see above, under "Programs of the Minerva Program Suite: OwlData Programs for Creating Data" OwlData programs are linked to this chapter.


Synopsis: The second part is an excursion into various applications of quantified methods in literary studies, emphasizing strategies of quantification, even in apparently unquantifiable material. Where the first part took as its principal topic the use of the MINERVA System, the second part is an analytical survey of work done by a variety of scholars, writers, inventors, and thinkers.

  • Chapter 5, "Statistical Methods and Literary Style," discusses common statistical procedures, and a few less common ones, that have been applied to literary material, and describes how the computer can be used in their application.

  • Chapter 6, "The Kinds of Projects That Have Been Done in Literary Computing," describes the history of computing both as analyzer and subject of literature, and discusses recent applications of computers to the study of literature.

    The first part of the chapter sketches the relationship between computer-like concepts and literature from Homeric antiquity and the Middle Ages to the modern day. Starting from Homer's fictional robots (and Hero of Alexandria's real ones) and ancient calculators like the Antikythera Mechanism, we proceed to Lull's mechanized logic in the Ars Magna of the 13th century, Leibniz' calculator of the 16th century, John Clark's Eureka machine of 1845 to compose Latin hexameters, and other forerunners. Moving up to the "giant brain" computers of the 1940's (one of which was programmed by Alan Turing to compose a love letter), we conclude with the Internet and the multimedia databases of today.

    The second part of the chapter presents a discussion of applications using modern computers, mostly beginning in the 1960's (although some are earlier)and extending to the present day. Different types of literary problem are arranged in a progression from studies of individual sounds (such as alliteration, versification and the mechanized scansion of Classical poetry) through studies of words and vocabulary up to studies of themes, concepts, and the wider external context of a work of literature. We end by discussing programs to "write" stories and poetry, including a "McPoet."


Synopsis: The third part (strictly for techies!) is the programming section of the course. It is for students who wish to write their own programs in Visual Basic, or modify the ones provided on the CD. The top-down, modular methods of program development follow the same basic principles as are presented in the rest of the course, and can be applied to any programming language. Information is included on both Visual Basic 5/6 and Visual Basic.NET. Every programming language has its own idiosyncrasies, its own personality, its own style. Visual Basic is descended from BASIC, one of the oldest languages. It was originally a simplified language for beginners. You will find that in the MINERVA System we are creatively stretching the language to perform tasks that might surprise its designers.

  • Chapter 7, "Programming the Problem," begins with a brief description of various programming languages and how they are used to create programs that tell the computer what to do. Instruction is provided in the use of Microsoft's Visual Basic to create programs like those of the MINERVA System.

  • Chapter 8, "A Model Project for Literary Analysis: The MINERVA System," contains technical discussions of the Visual Basic programs that are currently included in the MINERVA System. The programs themselves are not reproduced, as they can be examined or printed from the disk.

  • Chapter 9, "Formulaic Programming in Visual Basic: Building Blocks of Programs," contains a small primer or manual of the commands of the Visual Basic programming language. Like Homeric poetry, computer programs are formulaic, and program commands are grouped according to the "formulas" they represent, like Input and Output, Searching for a Word, Sorting a List, Mathematical Operations, or Character String Functions. Only a subset of Visual Basic is included. Specialized features of the language have been omitted; they are not used in the MINERVA programs. This simplification not only makes the language easier to learn, but makes the programs more portable to other languages.

  • Chapter 10, "Do-It-Yourself: Techniques for Making Your Projects Work," presents some structured ways of making sure that your program works. Testing and correcting programs, so that they work as intended, is an important part of project development. The same techniques of Systems Analysis that we use to design a project are used to create a test plan, embodying the algorithm for testing a program or system. Technical Appendixes provide some basic definitions of computer workings, including data organization, the ASCII code for representing data in zeroes and ones, and the functions of Boolean logic used in the computer's computation.

Screen Images: What you will see when you run MINERVA

Build your own project or run individual programs

The illustrations below are a sample of what you see when you run the MINERVA programs. The first screen you see is the Minerva Startup screen. Choose "Run the programs" to go to the Minerva Menu, from which you can choose to build your own project using the Tutorial in Systems Analysis/Project Planner, or choose individual application programs from the Minerva Program Suite.

Below, you see the Startup screen, the Minerva Menu, the Minerva Information screen, the Systems Analysis Tutorial/Project Planner Menu, the Description screen for the Planner, and four of the Tutorial/Planner steps, "Selection of a Topic," "Descriptive Paragraph,", "Clarification of Critical Terms," and "Functional Decomposition: dividing the project into parts," with their input screens. Also shown are screens from the programs CONCORD, CATMAP, GERTRUDE, COMPOSE, and CLUMPS, with some of their results. (The preceding links are to Chapter 2 for the Project Planner and to Chapter 3 for the individual programs.)

CONCORD creates a simple concordance of all words in a text, with the lines where each word occurs; CATMAP displays a "category map" showing the interplay of pairs of thematic words in the text; GERTRUDE collects statistics on the frequencies of grammatical forms in a text; COMPOSE uses a random number generator to "compose" paragraphs "in the style of" an author (in our example, we create a parody of noted linguist Noam Chomsky), and CLUMPS finds clusters of cooccurring word stems used in a text.

The Minerva Startup Screen

Minerva Startup

The Minerva Menu

Minerva Menu

What you will see if you request "More Information About Using MINERVA"

Help for MINERVA users

What you see if you choose "Menu of the Systems Analysis Tutorial/Project Planner"

Systems Analysis Menu

What you see if you choose the "Description of the Tutorial/Planner"

HELP for Systems Analysis


What you see if you click on "Selection of a Topic"

Selection screen

Sample input, after you have clicked on "Start Your Own Project File"

Selection Input screen

What you see if you click on "Descriptive Paragraph"

Descriptive Paragraph screen

Sample input, after you have clicked on "Create a Paragraph About Your Own Project"

Paragraph Input screen

What you see if you click on "Clarification of Terms"

Clarification screen

Sample input, after you have clicked on "Identify Quantifiable Parts of Your Own Project"

Clarification Input screen

What you see if you click on "Functional Decomposition"

Functional Decomposition screen

Sample input, after you have clicked on "List Your Modules (Text Form)"

Functional Decompsition Text Input screen


What you see if you click on "CONCORD" on the MINERVA Menu and request a concordance to Coleridge's The Ancient Mariner

CONCORD screen filled in

Some results of CONCORD

CONCORD results

What you see if you click on "CATMAP" on the MINERVA Menu and request a map of words for "animals" and "forgiveness" in Coleridge's The Ancient Mariner

CATMAP screen filled in

Some results of CATMAP

CATMAP results

An alternative suggested representation of the results of CATMAP, illustrating rising and falling frequencies of word categories in different parts of the poem

CATMAP results as a graph

What you see if you click on "GERTRUDE" on the Minerva Menu and request statistics on Gertrude Stein's "Book"

GERTRUDE screen filled in

Some results of GERTRUDE

GERTRUDE results

What you see if you click on "COMPOSE" on the Minerva Menu and request a paragraph "in the style of" linguist Noam Chomsky

COMPOSE screen filled in

Some results of COMPOSE, created with a random number generator

COMPOSE results

What you see if you click on "CLUMPS" on the Minerva Menu

CLUMP Gateway

What you see if you click on the Python version, requesting word clusters in Coleridge's The Ancient Mariner

CLUMPS Python version

Some results of CLUMPS

CLUMPS results

An alternative suggested presentation of the results of CLUMPS,
showing stems shared between clusters

CLUMPS results as a graph

Notes to Chapter 1

1. Dante, The Divine Comedy (translation by C.A.Sowa).

2. Quoted in Margaret Cheney, Tesla, Man Out of Time, New York: Dorset Press, 1981, p. 107 (reprinted, paperback, Bantam Doubleday Dell, 1998).

3. Catia project-planning software is part of is Dassault Systèmes' Product Lifecycle Management Suite, which includes Catia (computer-aided design), Delmia (to control manufacturing), and Enovia (to manage the database of designs and specifications). Various parts of this suite are being used in the 21st century by Boeing and Airbus to design, manufacture, and maintain their new airplanes.

David Packard Jr.'s Ibycus System is best known for its use in the Thesaurus Linguae Graecae, but it had many other applications. Stephen V.F. Waite's Logoi System, with which the camera-ready copy for my book Traditional Themes and the Homeric Hymns was created, used a typesetting program that ran on the Ibycus System. Ibycus had its own machines, its own operating system, and its own programming language, so that anyone could write new applications.

4. Marvin Minsky (1967): "any procedure which can be precisely described can be programmed to be performed by a computer." See Chapter 6, Note 20.

5. S.M. Parrish, in Edmund Bowles (ed.), Computers in Humanistic Research, Englewood Cliffs, NJ: Prentice-Hall, 1967, p. 125.

6. R.W. Hamming, Numerical Methods for Scientists and Engineers, New York: McGraw-Hill, 1962, p. V.

7. C.B. Bazzoni, Kernels of the Universe, New York: George H. Doran, 1927, p. 16.

8. R.A. Frosch, quoted in the IEEE Spectrum, Sept. 1969, pp. 24.28.

9. Henry Petroski, To Engineer is Human: The Role of Failure in Successful Design, New York: St. Martin's Press, 1985, p. 78.

10. Edward Teller, in a lecture on "Inertia" delivered at UCLA, Oct. 10, 1960.

11. E.W. Dijkstra, in Information Processing 62 (IFIP Proceedings 1962), Amsterdam: North-Holland Publishing Co., 1962, p. 538.

12. R. W. Hamming, in a talk entitled "You and Your Research," at IBM in Poughkeepsie, NY, August 8, 1969.

13. Cicero, De Oratore I. XVI. 72-73 (translation by C.A.Sowa).

Copyright © 2009, Cora Angier Sowa. All rights reserved.

Send e-mail   Send e-mail to Cora Angier Sowa.


  Return to Minerva Systems home page.