He had recently been doing research, he told us, on the influence of religious groups on the level of political participation by their members; in fact, he and his coworkers had developed a model that only one religious group appeared not to fit--the Lutherans. Curious to find out more about these people, he consulted the massive online catalog of Harvard University holdings (and other libraries) and identified some 15 titles of possible interest. Being, as he said, an "affluent University professor", he sent a graduate assistant off to find and retrieve the books--from several different locations as it turned out. When the books arrived, it took, Verba explained, about 1 minute to recognize that at least 11 of them were of no use at all. This disappointing result came only after a considerable expenditure of time and money by not only the information seeker, but by others involved in the process as well. He also lamented, at another point in his talk, the crudeness of most Internet searching--not the moral quality, so to speak, but the very coarse sieve through which most such search results are filtered. What can I--or any serious researcher--do, he almost pleaded, with 1,000 answers to even a fairly simple inquiry?
During the past year I have also attended a number of workshops and conferences on the Internet and on teaching the use of the Internet to faculty and students. Almost without exception, the speakers (mostly librarians) at these conferences accept the twin dogmas that the Internet is the "new literacy," the wave of the scholarly publishing future, and that faculty and students can now use the Internet to bring into their home, office, or classroom a vast array of valuable information and scholarly resources. Indeed, introducing researchers to the Internet has become a kind of moral imperative for many academic librarians, who apparently believe the ordinary world of print publishing to be a rapidly fading anachronism. At one institution with which I am familiar, for instance, the Internet is being inflicted on some 600 English Composition students as part of their so-called research training; as if they didn't have enough trouble just reading and writing.
On the street, one can hear the Internet called "datatrash". It has been described as a "toxic waste dump," a fairy tale, a river we can't step into even once, and as a haystack. Let's go further. The Internet may be likened to a haystack moving at, or near, the speed of light; any needle in it we might wish to find is simultaneously undergoing the corresponding Lorentz contraction. Ted Nelson, of all people, was quoted in the September issue of Atlantic Monthly, of all places, to the effect that the "so-called information age is really the age of information lost." In the same article, John Updike offered the opinion that fiction on the Internet is mostly "roadkill" anyway.
What's going on here? Trouble on the offramp to River City? The little boy who revealed the emperor to be without clothes did not necessarily mean to suggest that the unfortunate monarch was deserving of no respect whatsoever. Just so, it is not my intention to malign the Internet as of no value at all to librarians or their customers. But I believe that we often expect too much of it, and similar resources, and that we transfer this optimism to our customers without due regard for the problems and speedbumps. We encourage a faith that frequently is unfounded, and divert many library patrons from more appropriate, often print, resources. Turning our customers loose on the Internet is, to embrace a uniquely Arizona image, rather like sending them into the OK Corral with a slingshot--in other words, unarmed.
The first thing we need to realize is that the Internet is not a thing, even though I will sometimes speak of it that way. It is, at its most basic, merely an electronic communications network. To speak of using the Internet to find this or that piece of information, or to locate a specific source of information, is to treat the Internet as though it were a single and coherent compendium. But all of the techniques that are common to Internet access in nearly every electronic environment are rather more like the light switch in the reading room of a library than they are like a guide to the collection of items contained in that room. In particular, no coherent or global strategy for locating information and information sources will as yet yield useful results on the Internet. No useful filtering or discriminating mechanism has yet been developed for searching across the Internet that will sift out irrelevant (or less relevant) information while leaving the most relevant resources unscathed. In a word, no serious indexing exists for the Internet and is not likely to exist in the near future. I shall return to this difficulty in a moment.
Some critics are finally beginning to notice that the relative lack of scholarly content on the Internet is, as Ed Shreeves remarks in the July 1994 issue of The Journal of Academic Librarianship, a "serious impediment to the use of electronic resources." Indeed, until fairly recently, most electronic publishing on the Internet failed to pass the So What ? test. That is, much of it was not of sufficient scholarly importance or interest to warrant the effort of trying to identify and control it. While this situation is beginning to change, it remains entirely unclear that scholars and researchers will rush to publish their findings on the Internet as an alternative to traditional print forms. There are many reasons for this, but I think they can conveniently be grouped into just two categories: the economic, and the academic.
As I write, most of what appears in any form on the Internet is a matter of altruism: little or no money is to be made from what is published there. There is no good reason to believe that electronic journals, at least at first, will be significantly cheaper to produce than their print counterparts. The Johns Hopkins University Press, for instance, has indicated that the print version of one of its journals might cost a subscriber $100, while the electronic version would be available at the bargain price of $90. Print publishers would have to spend considerable money retooling for electronic publishing unless they already have experience and personnel in that arena; new marketing and distribution networks would require cultivation; and advertising, especially as it appears in medical journals, might not work at all in the electronic environment. And, at least at the moment, large commercial publishers still dominate the production of academic and scholarly journals; about 100 publishers account for about 70% of the total journal output. It is, as Michael Gorman has remarked, "hard to see how a lucrative industry that is doing very well, thank you, would have any incentive to change to a method of publishing with murky financial prospects." This may be why many of the several dozen electronic journals that do now in fact exist have been started by universities or university departments. (One of the participants in the OCLC videoconference noted, in comparing non-profit and commercial publishing, that some information coming from the private sector costs as much as 40 times what the same or equivalent information costs--or would cost--coming from non-profit organizations.)
A central concern for any commercial publisher contemplating jumping into the electronic marketplace must surely be the twin bugaboos of copyright and plagiarism. In general, I think it is fair to say that the issues of copyright protection, plagiarism, fraudulent duplication, and textual integrity all remain to be resolved on the Internet.
The concept of copyright could only appear in a print society, largely because in oral and even manuscript cultures, texts never stabilize sufficiently to become an objective property. In fact, it wasn't until the 18th century that something like the modern concept of copyright was developed, assigning legal reality and definition to the proprietary author and the literary work. The first serious damage to this framework was done by the photocopier; libraries are required to place a warning above each photocopy machine, to which no one pays the slightest attention, about legal limits and reproducing copyrighted material. Anyone using such a machine can violate most copyright restrictions in a matter of minutes without fear of discovery or punishment.
There are two general issues here:
1) We now have access, largely unmonitored and uncontrolled (and uncontrollable), to vast amounts of information and text over an ordinary telephone line. In most instances, one way or the other, the reader of these texts can bring them over to his or her own workstation, store them for later retrieval, and print and distribute them at will. Worse yet, the electronic version of the text itself can be redistributed to an essentially unlimited number of other readers (if they have access to the same telecommunications network) by the original reader with just a few keystrokes. Scott Bennett has remarked, in this connection, that "ease of copying is now the greatest threat to the monopoly position of copyright holders." The recent federal appeals court decision in American Geophysical Union, et. al., v. Texaco Inc. suggests that journal publishers are still very much concerned to interpret "fair use" as restrictively as possible, and distinctly to their own advantage.
The concepts of copyright and plagiarism depend fundamentally, however, on the stability, objectivity, accessibility, and preservation of the text that is said to be copyrighted and therefore protected from unauthorized copying and plagiarism. And this is the second issue.
2) The purely electronic text is inherently unstable. For the author of
an electronic text, the malleability of such text is, of course, extremely
convenient; anyone who has written anything using a word processing program
will surely agree. And electronic authoring almost certainly leads to
greater quantity, if not quality, of textual matter. But we are immediately
faced with a resistant ambiguity in the concepts of publication, textual
integrity, authorship, and referent of bibliographic and indexing citations.
Print-to-paper publishing neatly solves these problems: the printed page is,
in a sense, self-archiving. But if a text exists only in electronic form,
what precisely counts as "publication"? How do we determine the genuinely
"final", or authoritative version (indeed, the version actually by the
original author)?
If there is such a version, where is it? Or, to put it another way, if
bibliographic citation is to past work in a discipline, exactly which past
are we talking about? One imagines the following defense against a charge of
plagiarism: "Well, your honor, that certainly is not the text I was looking
at." Or perhaps, "Well, your honor, that certainly is not the text I
uploaded and published on the Internet."
Finally, there is the matter of job security. The academic subculture of tenure and promotion decisions, of peer review, of refereed journal publication, and of citation counting and impact factor calculations cannot be very comfortable with many features of electronic publishing. This is very much a chicken-and-egg problem. Until electronic journals can be seen as legitimate participants in this academic environment, scholars and researchers will be reluctant to submit their best work to journal titles in electronic format; but until scholars and researchers are willing to submit their best work to journal titles in electronic format, electronic journals will not be regarded as legitimate players in the making of academic stars. There will inevitably continue to be some concern, in other words, that databases built without an acceptable and well-understood level of peer review and academic respectability will become "scholarly flea markets."
Here we find our connection to the problem of indexing the Internet. In the academic village, it often seems to matter less what is the quality of an article, than in what journal it appears. A "high-impact" journal, one to which reference is made often by other, equally high-impact journals (and authors), counts for more in the promotion and tenure calculus than lesser lights. The key to identifying these journals, and the articles that appear in them, is of course indexing. And there is, not surprisingly, a hierarchy among indexing publications as well. Nearly every U. S. pharmaceutical company has a policy, for instance, that they do not submit papers to journals not indexed in Medline.
Indexing is in fact an important component of the meaning of "publish." Part of what it means to publish something is to make that something available to the (relevant) public for scrutiny and review. This is as much a part of the peer review process as is the pre-publication review by a journal's editor and colleagues. And this involves several essential steps in the authoring and distribution process, nearly all of which are either absent or ill-understood in the electronic environment: creation of multiple, identical copies of the publication readily available to scholars for comparison and evaluation; an "address" for the publication to which bibliographic reference may reliably be made; standard locating mechanisms that distinguish the given publication from all others; and standard and comprehensive mechanisms for providing intellectual access to the item in question. In what sense, that is, may someone be said to have published something, made it available to the scholarly community, if-- after publication--it becomes lost on the Internet? A tenure committee probably does not want to hear something like: "Well, I'm sorry if you can't find my startlingly original paper on chaos theory; I know it's out there somewhere."
No one believes, of course, that all of the Internet requires, or even deserves, indexing. Much of the traffic on the Internet is as ephemeral as it is ethereal. But certain chunks, especially those that represent genuine contributions to knowledge or are original research reports, will want bibliographic control no less than such information now receives in printed form. It is important to recognize that this level of access requires document-level identification, something virtually nonexistent now on the Internet. Some of this indexing will be done, as it should be, by major indexing services; the mere format of an electronic journal need not demand any different treatment by the indexing community (although problems remain of precisely to what the indexing and bibliographic pointers will "point" to). And indeed, any electronic journal aspiring to scholarly status would surely demand to be included in the coverage of one or more of the traditional indexing publications. An electronic chemistry journal, for instance, that did not receive coverage by Chemical Abstracts would be (in the postmodern way) severely marginalized in the realm of chemical discourse.
The temptation is to suppose that, because the Internet is already in machine- readable form, indexing the Internet need involve nothing more than asking a machine to read it. This is a frequent theme in discussions, both on and off the Internet itself, of this problem. In fact, when online library catalogs first became common, the suggestion was often heard that traditional cataloging practices (assignment of subject headings, for instance) would no longer be necessary; keyword searching was the answer to our prayers for fast and efficient subject searching. One occasionally still encounters this foolish idea, even within the profession. The assumption is, we know, quite false. It is not for nothing that the makers of large and complex databases invest considerable sums of money in indexing and vocabulary control to provide effective access to their data files. It is entirely obvious that intellectual indexing, vocabulary control, and structured search techniques are even more important in electronic data files than in printed files, precisely because of the great size of the databases and the genuinely remarkable power of the searching algorithms. But neither is this just a search engine problem, or not merely a search engine problem. A search and retrieval device or mechanism is only as good as that upon which it is asked to operate.
Let's think about this for a minute. One of the inflated claims made by Internet hucksters is that the network now makes possible direct access to the collections of very many of the world's great libraries. We now have, they like to say, the culture of the entire planet at our fingertips: the libraries, the museums, the archives, the galleries; you name it, it's on the 'Net.
Ted Roszak remarks drily that we have a name for visions like this: we call
them "fairy tales."
Never mind the fact that many of our customers fail to understand that what
they get when they access, say, Hollis or Melvyl, is only the library's online
catalog, and most definitely not the books and journals themselves.
But suppose that a library patron or scholar at home really just does want
merely to search the catalogs of some Internet libraries. What are the
obstacles? For the unaware, that is, most of our customers, the problems add
up to a nightmare.
One can learn much from Nicholson Baker's article in the April 1994 issue of The New Yorker; a great deal more, in fact, than most librarian critics of the piece understood or were willing to acknowledge. In particular, Baker reveals an intelligent and informed awareness of just what happens when a searcher goes shopping on the 'Net across a variety of library catalogs and databases.
Our hapless wanderer, for example, discovers that merely getting into, and then out of, a catalog may not be all that straightforward; in fact, escape may turn out to be impossible. He learns, probably without realizing it, that how--and if--a library has implemented authority control will substantially alter search results from one catalog to the next. She learns, also probably without realizing it, that decisions individual libraries make about the character of keyword and subject searches--what fields and subfields, for instance, are included in each and how they are combined--will similarly affect cross-catalog searching in unpredictable and significant ways. Why don't more catalogs, for instance, include their authority records in keyword searching?
Brand name shopping may not, he finds, yield the same quality at every supermarket. One library's version of the Notis, Innopac, or CARL search engine may differ significantly from that of another. Decisions about how to configure any particular search type, about which fields to include in each search strategy, and about subject and name authorities will dramatically affect the results of what appears to be the same search for an inquirer moving across catalogs, even though the catalog vendor is the same at each site. Almost never do the catalog interface and help screens reveal this crucial information. In fact, just the variety of help structures is astounding, and usually disappointing.
Well, it's Thursday, so I must be in Canada. Or am I? Not infrequently, the actual geographic location of an Internet online catalog may be all but concealed from even the most persistent searcher. And will the National Library of Canada and the Library of Congress use exactly the same subject heading scheme, so that a consistent result in a cross-catalog subject search is guaranteed? The question only needs asking to reveal the trap.
It seems to me undeniable that the Internet contains a few information and scholarly gems, but mostly dross. And mining the ore (to retain the analogy) is uncertain at best, impossible at worst, and costly in any case. The Internet, and the information superhighway, has been oversold as the next generation in scholarly communication and academic publishing.
Having now traversed the Internet jungle with gun and camera, I want finally to consider why our customers (and here I concern myself largely with college undergraduates, although the conclusions are quite general) are unable to cope with these resources on their own, and frequently not even with considerable assistance, and why therefore expecting them to do so is professionally irresponsible. This is the tragedy in the self- sufficiency movement in public services librarianship.
What kind of intellectual, conceptual, and educational framework does the typical undergraduate bring to us, and to the library, within which to interpret and understand these sophisticated information resources? The answer is obvious to anyone who works daily with this population: virtually none. I have come to call this syndrome "bibliographic alienation." The concepts of evidence, of authority, of reasoned thought and narrative-- and of how these are exemplified in the resources of a library and can be intellectually exploited--are all quite foreign to most undergradu- ates. In fact, higher-order conceptual skills of any kind are all quite foreign to most undergraduates. Leon Botstein calls this damaged literacy. "The actual command of the spoken and written word", he explains, "is insufficient to grasp, much less command, the realities in which we live. Even the literacy that permits the privileged in our society to graduate from high school and college is too compromised in these terms to be called a high order of literacy." Ignorance has proved to be more stubborn than anyone expected.
The Department of Education's National Excellence report of 1993 concluded that all available indicators tell us that "only a small percentage of students are prepared for . . . college-level work as measured by tests that are not very exacting or difficult."
What these results point to, it seems clear, is that for most undergraduates (indeed for most Americans) reading is an unnatural act. And when Diane Ravitch and Chester Finn asked "What Do Our 17-Year Olds Know?" in 1986, the answer they got was: apparently, not much. Of 8,000 students tested on very general knowledge of history and literature, the average score turned out to be what would normally be regarded as a failing grade.
A fascinating study done in 1992 by the U. S. Department of Education, Office of Research--aptly titled Tourists in Our Own Land--suggests that what these students actually take in college does not tend to improve this situation. To this extent, undergraduates in the library come to us ill-prepared not merely for the relatively prosaic task of using, say, indexes and reference books, but even to think clearly about what they are doing at all. The idiosyncratic structure of the Internet does not help here.
An article in the September 28 issue of The Chronicle of Higher Education announces enthusiastically that we are coming back to hypertext as the "information technology of the decade." Certain Internet access programs, such as World-Wide-Web and Mosaic, are being touted as the killer applications in this environment, as the hypertextualization, if you will, of the Internet. Michael Gorman names this crowd "technovandals." In the September 1994 issue of the magazine Chronicles, Gorman quotes this passage from a California State University planning document:
...learners increasingly can be free to determine their own learning paths divorced from the sequential, linear, directed flow of printed text, or the weight of authority. Responsibility for collecting, organizing, and analyzing information can be shifted from the provider to the end user. In the learning environment which is student centered and student controlled, learning becomes less structured and more associative, intuitive, dynamic, and potentially more creative.
Quite frankly, this description seems to me to fit neatly what we used to call "attention deficit syndrome." Johnny can't read, he can't add, he can't write, and he can't pay attention either. Neither, for that matter, can Mary.
Gorman comments, with evident sadness, on this vision:
I read these words on the 37th anniversary of the day that I first worked in a library. They did more to illuminate the thinking and motives of those who are dedicated to destroying academic libraries than anything I have ever heard or read. Students, teachers, and all those interested in education and learning would do well to heed their warning and understand their implications for education and society. These are people to whom the sustained reading of linear texts-- the culture of the book--is anathema.
This is not merely the disgruntled perspective of a retrograde humanist. David Gelernter, professor of computer science at Yale University, made this revealing comment in the September 11 issue of the Sacramento Bee:
In practice ... computers make our worst educational nightmares come true. While we bemoan the decline of literacy, computers discount words in favor of pictures and pictures in favor of video. While we fret about the decreasing cogency of public debate, computers dismiss linear argument and promote fast, shallow romps across the information landscape. While we worry about basic skills, we allow into the classroom software that will do a pupil's arithmetic or correct his spelling.
Hypermedia [Gelernter continues] is just as troubling. It's a way of presenting documents on screen without imposing a linear start-to-finish order. This is a cute idea that is good in minor ways and terrible in major ones. Teaching children to understand the orderly unfolding of a plot or a logical argument is a crucial part of education. Authors don't merely agglomerate paragraphs; they work hard to make the narrative read a certain way, to prove a particular point. Dynamiting documents into disjointed paragraphs [Gelernter concludes] is one more expression of the sorry fact that sustained argument is fading rapidly.
With the introduction of the Internet and CD-ROM technology, and the tape loading of large bibliographic databases into (or through) online catalogs, we have made readily available in our libraries direct user access to databases just as large, and in many instances just as complex, as (for example) the early mediated DIALOG files. We have done so with virtually no serious thought as to how successful any of the resulting electronic searches would be. We have made the entirely unwarranted assumption that because many of our customers are "computer literate", they can with little or no assistance navigate the mysteries of electronic information retrieval. Even casual observation confirms what online catalog transaction analysis demonstrates: user failure in these systems is nothing short of spectacular. Working directly with library users to overcome these obstacles to the effective use of electronic retrieval systems is perhaps the single most significant challenge facing college and university library public services in the coming decade.