_______ _______ __ / _____/ /__ __/ / / / /__ / / ____ __ __ __ ___ __ __ ____ / / / ___/ __ / / / __ \ / / / / / //__/ / //_ \ / __ \ / / / /____ / /_/ / / /_/ / / /_/ / / / / / / / / /_/ / / / \_____/ \____/ \____/ \____/ /_/ /_/ /_/ \__/_/ /_/ December, 1995 _EJournal_ Volume 5 Number 2 ISSN 1054-1055 There are 699 lines in this issue. An Electronic Journal concerned with the implications of electronic networks and texts. At last count, 3018 subscribers in 37 Countries University at Albany, State University of New York Department of English EJOURNAL@albany.edu CONTENTS: [This is line 22 ] ARCHIVING ELECTRONIC DOCUMENTS [Begins at line 65] A Report of a Proposal by David Bearman PRIORITIES FOR PUBLICATION PRESERVATION [at line 339] by Sheila Webber AMBIGUITY MACHINES [at line 452] [OR, "Precision, hah! Computers are better at poetry than they are at math."] by Jacques Leslie (republished from BYTE magazine) Editorial Comment [at line 553] Information about _EJournal_ [at line 604] About Subscriptions and Back Issues About Supplements to Previous Texts About _EJournal_ People [at line 661] Board of Advisors Senior Editors Consulting Editors *********************************************************************** ***************************************************************** * This electronic publication and its contents are (c) copyright * * 1995 by _EJournal_. Permission is hereby granted to give away * * the journal and its contents, but no one may own it. Any and * * all financial interest (except where noted) is hereby assigned * * to the acknowledged authors of individual texts. * * This notification must accompany all distribution of _EJournal_.* ***************************************************************** ======================================================================== A R C H I V I N G E L E C T R O N I C D O C U M E N T S A Report of a Proposal by David Bearman PREFACE (by _EJournal_) In an era of electronic communications one can mount many arguments against building more filing cabinets to hold paper. This essay is an _EJournal_ adaptation of David Bearman's extensive proposal about standards for Virtual Archives. It proposes six layers of "metadata" that ought to surround all electronic records so they can be considered permanent and trustworthy. It also suggests, implicitly, that we should establish and begin using standards for such metadata capsules. Along the way, almost incidentally, the essay highlights the amazing amount of information embedded in those oft-maligned folder-filled filing cabinets. By pointing out some of the minutiae needed to upgrade everyday binary storage into trustworthy virtual archiving, the essay (and especially the detailed proposal) may spur the on-line community to establish standards and "enforce" them soon- -perhaps with the example of SGML's development as a model. Or the proposal might prompt us to think again about our paperless fantasies. David Bearman's own complete proposal is to be presented in Beijing in 1996. It encompasses the extreme case of recording business transactions so they will be acceptable as evidence in courts. The most up-to-date version of the requirements, specifications, production rules, metadata standards, literary warrant and research papers on the variables in electronic recordkeeping in organizations are maintained on the Project's Website at the University of Pittsburgh. Go to: http://www2.lis.pitt.edu/~sochats/nhprc.html ---------------------------------------------------------------------- ARCHIVING ELECTRONIC DOCUMENTS A Report of a Proposal by David Bearman INTRODUCTION [line 107] Within ten years, most of the records in our society will be made and transmitted and stored electronically. We will be able to access them from anywhere. This means that our default assumptions about the custody of records, which are still rooted in physical terms like "hold" and "musty" and "on-site availability," will be replaced by policies to ensure the more generic "retention" and "disposition" and "virtual availablity." The ultimate goals of archiving won't change, but the means of achieving those goals will. Right now, we should be adopting standards and methods of virtual archiving, using available technology, so that we can be spared having to redesign multiple ad-hoc systems and then, a few years from now, retrofit masses of accumulated documents. Indeed, critical observers have argued that the Global Information Infrastructure will not be able to support "serious" work until we can satisfy requirements for the "integrity," "authenticity," "reliability" and "archiving" of digital information . [1] THE METADATA These six encapsulating layers would constitute documents destined to qualify as and to be archived as virtual records: 1. Handle. 2. Terms and Conditions. 3. Software and hardware dependency flags. 4. The document-creation context. 5. The binary-large-object, or BLOB- -the text/ display, the document, the record- -itself. 6. The history of the record. In visual, paperbased imagery, the first four layers are on the "top" of the document and the sixth is on the "bottom." Some points about about the six Metadata layers: [line 149] 1> Handle: Uniqueness is important, of course, and so is a flag showing that what follows (or is "enclosed," in paperbased terminology) is indeed a record. Such flagging will have to conform to an international standard. 2> Terms and Conditions: At the outset, eligibility is set by the creator. Are there restrictions involving permission, payment, or "not before" criteria? What's the address of whoever can grant permission? Is the record "read only"? Within the access layer there should also be disposition information. Who may authorize erasure? Is there some "Sunset" provision? 3> What kind of file is it? Text, graphic, sound ....? ASCII, raster, sampling ....? Compressed? Encrypted? With what hardware, operating system and application was it created? When these data are provided (in the kind of drill-down detail laid out in Bearman's complete proposal), they will make it possible to adapt the record (and metadata) for reading with hardware and software constructed to specifications not currently anticipated. [Worriers should look at Jeff Rothenberg, "Ensuring the Longevity of Digital Documents" in the January, 1995 _Scientific American_, pp 42-47.] 4> The context of the document's origin: Who created it? Where was it broadcast, or to whom was it sent? Who got copies? For business transactions, who is responsible for contractual implications? 5> The encapsulated BLOB itself, the record, the document, the text/ display- -the Binary Large OBject. [It could, of course, contain links to other documents and sites; the thought of archiving www hypertexts is enough to make the virtual archiving of standalone documents look easy.] 6> The history: Viewed, copied, edited, filed, indexed, forwarded, abstracted- -when, and by whom, and what portions. Among other matters, the history should assure readers that nothing from the original document has been left out along the way. Stripped of details and references, this is the conceptual core of David Bearman's proposal. What follows are pointers (non-electronic) to pertinent material. --------------------- [line 193] First, the professional and organizational aspects of reengineering the archival profession can be explored in these references: David Bearman & Margaret Hedstrom, "Reinventing Archives for Electronic Records: Alternative Service Delivery Options" in Hedstrom, Margaret ed. _Electronic Records Management Program Strategies, Archives and Museum Informatics Technical Report #18_, 1993, pp.82-98 Margaret Hedstrom, "Electronic Archives: Integrity and Access in the Networked Environment" (Second International Conference on Scholarship and Technology in the Humanities, January 1994); forthcoming New York State Archives and Records Administration, "Building Partnerships for Electronic Recordkeeping: The New York State Information Management Policies and Practices Survey. Summary of Findings," (by Margaret Hedstrom, Project Director), August 1994 ------------------------- Second, questions about the technical needs, and costs, of reconfiguring paper-based archives are being addressed by Digital Library projects throughout the world: Library of Congress, The National Digital Library Program (Washington, DC, January, 1995) Research Libraries Group, "Digital Imaging Technology for Preservation." _Proceedings_ from an RLG Symposium held March 17 & 18, 1994, ed. Nancy E. Elkington (Mountain View CA, RLG, 1994) Anne R. Kenney & Lynne K. Personius, "Joint Study in Digital Preservation: Report Phase I (January 1990-December 1991)" (Washington DC, Commission on Preservation and Access, 1992) Anne R. Kenney, Michael A. Friedman and Sue A. Poucher, "Preserving Archival Material Through Digital Technology." Final Report (Ithaca NY, Cornell University Library, 1993) ------------------------- [line 235] Third, these are places where general issues of recordkeeping requirements have been discussed: IEEE Mass Storage Systems Standards Technical Committee Metadata Project, Second Meeting on Metadata for the Administration and Access of Stored Information, Austin, Texas, February 17-18, 1994 [Documents discussed at this meeting included: - "The Intelligent Archive" (UCRL-TB-115079-6 Lawrence Livermore Laboratory, Carol Hunter, Project Manager) - "Whitepaper on Data Management", Robyne Sumpter, Lawrence Livermore Laboratory, February 10, 1994 - "A Metadata Capability Supporting the Hierarchical Storage and Access of Large Abstract Data Entities," J. C. Almond and Rekha Singhal, University of Texas CHPC New York State Archives & Records Administration, "Guidelines for the Legal Acceptance of Public Records in an Emerging Electronic Environment," (Albany, Dept. of Education, New York, 1994) 35pp. NHPRC grant #93-030, "Variables in the Satisfaction of Archival Requirements for Electronic Records Management." David Bearman, "Electronic Evidence: Strategies for Managing Records in Contemporary Organizations," (Pittsburgh, Archives & Museum Informatics, 1994). David Bearman and Ken Sochats, "Formalizing Functional Requirements for Recordkeeping," unpublished draft paper included in University of Pittsburgh "Functional Requirements for Recordkeeping" Project "Reports and Working Papers" (LIS055/LS94001), September, 1994. Richard J. Cox, "Re-Discovering the Archival Mission," _Archives and Museum Informatics/Cultural Heritage Informatics Quarterly_, vol.8 #4, 1994, pp.279-300. -------------------------- [line 277] Fourth, these are places where the documentation requirements of specific record-keeping domains are discussed: A. H. Boss, "Electronic Data Interchange agreements: private contracting towards a global environment," _Northwestern Journal of International Law and Business_, vol.13 (1992); Electronic Data Interchange Association, _The United States Electronic Data Interchange Standards_; J. L. Lamprecht, _Implementing the ISO 9000 Series_ (1993); A. J. Marcella & S. Chan, _EDI Security, Control and Audit_ (1993); Miller, _GAAS Guide_ (1994); J. A. Rabbitt, _The ISO 9000 Book: A Global Competorts Guide to Compliance and Certification_ (1993); J. P. Tomes, _Healthcare records management, disclosure and retention_ (1993); B. Wright, _The Law of Electronic Commerce_ (1991) The "Functional Requirements for Recordkeeping" project has compiled a database of "warrant" for the requirements defined within the project. The most up-to-date version of the requirements, specifications, production rules, metadata standards, literary warrant and research papers on the variables in electronic recordkeeping in organizations are maintained on the project WWW server at the University of Pittsburgh. http://www2.lis.pitt.edu/~sochats/nhprc.html -------------------------------------------- [line 313] NOTE [1] Clifford Lynch, "The Integrity of Digital Information: Mechanics and Definitional Issues," _Journal of the American Society for Information Science_, vol.45 #10, December, 1994, pp.737-744; Peter Graham, "Intellectual Preservation in the Electronic Environment," _Proceedings_, Library Collections and Technical Services, 1992, pp.18-32 (Chicago, ALA, 1992); Henry Perritt, "Public Information in the National Information Infrastructure," Report to the Regulatory Information Service Center, General Services Administration, and to the Administrator, Office of Information and Regulatory Affairs, Office of Management and Budget, 5/20/94 ****************************************************************** * This essay in Volume 5 Number 2 of _EJournal_ is (c) copyright * * 1995 by _EJournal_. Permission is hereby granted to give it * * away. Any and all financial interest is assigned to David * * Bearman. This notice must accompany all copies of this text. * ****************************************************************** ------------------------------------------------------------------ P R I O R I T I E S F O R P U B L I C A T I O N [line 339] P R E S E R V A T I O N by Sheila Webber [Editor's note: I plucked this Note from a List where members were conversing about what constitutes "electronic publication." Ms. Webber's Note appears here with her permission; I was unable to reach her for approval of the changes made in preparing it for publication.] I think there is a need to look afresh at the whole spectrum of media in which knowledge, entertainment, culture (or whatever) can be transmitted, and then, as a separate task, to work out fresh criteria for deciding what "publications"- -in whatever medium- -should be top priority for cataloguing and preservation. We should keep clear, I believe, the distinction between being eligible for classification as a "publication" and being chosen for archiving. Not every publication should be kept; on the other hand, e-stuff should not be excluded from preservation because it doesn't fit into papyrocentric "publication" pigeonholes. The British Library, for instance, in its leaflet on legal deposit, says that "A work is said to be published when copies of it are issued to the public." As we know, it is not yet universally agreed, in an electronic context, just what "issuing" consists of, or what a "copy" is. What if one were simply to substitute "accessible to" for "issued to," so that the definition of "published" would include material in electronic format that is publicly accessible? [line 369] Some might worry that this definition would make (e.g.) every World Wide Web page a "publication," and that the entire Web would then need to be catalogued and archived. But this is a needless worry. Not every "publication" needs to be archived. The BL doesn't collect every printed promotional leaflet or every free parish newsletter, etc., on legal deposit. Although such ephemera *do* meet the BL's criteria, and it *could* collect them if it wanted to, it makes decisions about not including certain types of publication in the national bibliography and the national collection. Over the years national libraries evolve policies based on pragmatism, and on experience about what seems most likely to be of value to future users. For example, experience showed that material published as books was more likely than material published as leaflets to be of lasting value. Therefore (as a cost-effective alternative to considering the merits of every item individually) the type of publication "book" is included in national bibliographies whereas "leaflets" generally aren't. As new types of publication evolve, organisations which currently have a national/ global archive function will have to rethink what type of publication is both appropriate and "reasonable" (including cost-efficient) to record and preserve. For example, a print journal may ultimately be replaced by a web site which has new material and new links added to it continuously; a company may provide information of more than ephemeral interest at its "promotional" web site. The old decision-by-type approach will no longer be valid. New criteria for deciding what type of "publication" is likely to be of lasting value will evolve. Whilst for scholarly communication the printed (or digitally encoded) word may still be the prime mechanism for recording and summarising knowledge and research, this might not be the case in the future. From the point of view of historical research, for example, a TV broadcast by a world leader may already be the preferred and richest source, rather than a transcript of what was said; a graphical simulation might form a vital part of scientific research, and so forth. In everyday life, most citizens and (as surveys show) business people use informal networks, non-print media and grey/trade literature before approaching traditional information sources. [line 411] It is well known that for relatively new media such as films, sound recordings and broadcast programmes, "bibliographic control" is still often non-existent, or is restricted to in-print items. Online databases have been around for decades now, and more and more of them have no print equivalent, but national libraries have for the most part failed to work out how or what they should archive. Obviously there are huge obstacles to rethinking the area of cataloguing and archiving global knowledge: most particularly money (lack of); the technology itself (the problem of obsolescent and disappearing hardware and software); complicated issues of national pride (e.g., we still seem to be playing the "who's got the largest national library" game); and existing structures and procedures. I suppose in some ways we shouldn't get too depressed, though. Think how long it took between printing being invented and the appearance of systematic trade and national bibliographies (centuries?). Even so, we need to begin earnestly sorting out not only what constitutes "publication," but also what publications-- --in whatever medium or format-- --deserve to be preserved. Sheila Webber University of Strathclyde sheila@dis.strath.ac.uk ****************************************************************** * This essay in Volume 5 Number 2 of _EJournal_ is (c) copyright * * 1995 by _EJournal_. Permission is hereby granted to give it * * away. Any and all financial interest is assigned to Sheila * * Webber. This notice must accompany all copies of this text. * ****************************************************************** ------------------------------------------------------------------ A M B I G U I T Y M A C H I N E S [line 452] Precision, hah! Computers are better at poetry than they are at math. by Jacques Leslie It should not have taken the Pentium math-bug debacle to remind us that computers do not always deliver absolute precision. On the contrary, for all their grounding in mathematical exactitude and their annoying literal-mindedness, computers are really ambiguity- generating machines. Consider humble E-mail, perhaps the simplest form of computer- mediated communication. At first glance, the difference between a message written on paper and sent through the postal service versus its identically worded electronic counterpart seems insignificant: Both contain the same language, so their meaning is the same- -or is it? One is an artifact of the material world, with intimations of permanence. The other, a captive of cyberspace, can be eliminated in a keystroke. One is evocative- -its paper quality, handwriting and scent all convey nuances of meaning- -while the other is framed within the bland uniformity of ASCII. Moreover, it is unlikely that the two messages would use precisely the same words. The tendency in E-mail is toward informality: _Gonna_ and _gotta_ replace _going to_ and _ must_. The shift in E-mail is toward oral speech patterns, a rejection of the precision of written discourse in favor of spontaneity. [line 480] In publishing, the distinction has starker ramifications. A conventionally printed book may be valued not just for the words inside it but as an object, whose worth is often dictated by such factors as whether it is a first edition, whether its author signed it, or whether a famous person owned it. In contrast, an electronically published book is an entirely different animal. For starters, the idea of copyright is undermined, since the digital book is infinitely reproducible. The notion of authorship is weakened, for the new medium encourages collaboration, often by anonymous contributors. Even the idea of the the book itself is threatened, as publishers of electronic journals have already discovered. Photography is also rendered fuzzy by digitilization. The malleabilty of digitized photographs has caused people to look on all photos with deepened skepticism. It is now virtually impossible to tell which images are digitally manipulated. Although photographic trickery has been around for almost as long as the camera, digital technology makes it much easier to doctor an image. Digital art has also broken down the distinction between an "original" and its copies, for all possess the same digital components. In some digital pursuits, ambiguity is exactly the point. In Multiuser Dimensions, or MUDs, those adolescent computer playgrounds in which players take on imaginary identities while cavorting within a fictional universe, much of the excitement stems from the simple fact that few players know with certainty anything about one another. The player with whom I'm conversing may be a man who presents himself as a woman, a woman presenting herself as a man, or, for that matter, a cleverly designed "bot" that responds to comments in ways that produce the illusion of personhood. [line 513] Digital audio raises interesting questions. Composer John Oswald's Plunderphonics, for example, consists of works by musicians ranging from Beethoven to Michael Jackson that Oswald had digitally manipulated in startling and amusing ways. By creating "new" works out of familiar ones, Oswald demonstrated that musical authorship is a surprisingly complex issue, since all music borrows from all the sounds that surround us. One reason for the rise of computer-generated ambiguity surely is the newness of digital technology. Just as in the early years of electrification and the telephone, we're still trying to figure out what we want computers to do for us, and in the meantime we're confused. But something else is going on here as well. We like to think of computers as innately masterful at computation (thus, our ridicule for the Pentium when it returns inaccurate results). But we tend to forget that because some number strings are infinitely long, even the most sophisticated computers must settle for numerical approximations- -which is to say, imprecision. And imprecision doesn't apply just to computation. While computers' breathtakingly broad impact stems from their capacity to render so much of reality in 0s and 1s, we forget that those combinations of 0s and 1s are all approximations or, put another way, metaphors. Of course, metaphors rightfully dwell in the province of poetry, where they don't mimic reality but use ambiguity to evoke it. by Jacques Leslie jacques@well.com ***************************************************************** * Reprinted with permission, from the October 1995 issue of BYTE * * magazine. (c) by The McGraw-Hill Companies, Inc. All rights * * reserved. * ****************************************************************** ========================================================================= E D I T O R I A L C O M M E N T [line 553] I hope the renewal confusion is over. THANKS for your patience. We knew that many addresses were no longer valid, and we guessed that quite a few people would decide to not renew. But we did not anticipate being caught up in Albany's migration from Bitnet (and Listserv). The transition hasn't been as "transparent" as everyone hoped. Thanks, too, for the many thoughtful replies to the "ASCII vs. Web" question. Very few said "ASCII only"; some said "ASCII preferred"; most said "sure, move to the Web." There were several compelling rationales, though, for continuing to deliver the entire journal instead of just telling you where to look for it. So we will get our server staightened out and keep e-mailing- -and we will keep issues shorter than 1000 lines so we won't overload mailboxes. And we will keep developing our HTML/ www versions. The journal is archived at http://www.hanover.edu/philos/ejournal/home.html and there is also an experimental Web site at http://www.albany.edu/rachel.albany.edu/~ejournal/ejournal/ejournal.html About Sheila Webber's text: It caught my eye as raising a distinction not always acknowledged. Because one of the founding purposes of _EJournal_ was to share the kind of Good Point that appears on Lists and would otherwise not be noted elsewhere, I grabbed it to share with you. About Jacques Leslie's text: Again, I liked it. It expresses a point of view widely shared outside the computing-specialist community, but seldom so well articulated and widely distributed. McGraw- Hill didn't want to let us re-"print" it, at first, because doing so "would compete with the electronic services McGraw-Hill offers." They did go on to say, however, that we could point to http://www.byte.com "where the article will be posted." I took the paradox as a challenge and wrote (USMail) to question their decision; they thought the matter over and graciously changed their approach. Paradigm shifts take time. ====================================================================== ------------------------------------------------------------- -------------------- I N F O R M A T I O N -------------------- ---------------------------------------------------------------------- ------------------------------------------------------------------------- About Subscribing and Sending for Back Issues: [line 604] In order to: Send to: This message: -------- -------- -------- Subscribe to _EJournal_: LISTSERV@albany.edu SUB EJRNL YourName Get Contents/Abstracts of previous issues: LISTSERV@albany.edu GET EJRNL CONTENTS Get Volume 5 Number 1: LISTSERV@albany.edu GET EJRNL V5N1 Send mail to _EJournal_: EJOURNAL@albany.edu Your message... Reach our archive site: http://www.hanover.edu/philos/ejournal/home.html Reach our experimental Web site: http://rachel.albany.edu/~ejournal/ejournal/ejournal.html ---------------------------------------------------------------------- About "Supplements": [line 625] _EJournal_ continues to experiment with ways of revising, responding to, reworking, or even retracting the texts we publish. Authors who want to address a subject already broached- -by others or by themselves- -may send texts for us to consider publishing as a Supplement issue. Proposed supplements will not go through as thorough an editorial review process as the essays they annotate. --------------------------------------------------------------------------- About _EJournal_: _EJournal_ is an all-electronic, e-mail delivered, peer-reviewed, academic periodical. We are particularly interested in theory and practice surrounding the creation, transmission, storage, interpretation, alteration and replication of electronic "text"- and "display" -broadly defined. We are also interested in the broader social, psychological, literary, economic and pedagogical implications of computer-mediated networks. The journal's essays are delivered free to Internet addressees. Recipients may make paper copies; _EJournal_ will provide authenticated paper copy from our read-only archive for use by academic deans or others. Writers who think their texts might be appreciated by _EJournal_'s audience are invited to forward files to ejournal@albany.edu . If you are wondering about starting to write a piece for to us, feel free to ask if it sounds appropriate. There are no "styling" guidelines; we try to be a little more direct and lively than many paper publications, and considerably less hasty and ephemeral than most postings to unreviewed electronic spaces. Essays in the vicinity of 5000 words fit our format well. We read ASCII; we continue to experiment with other transmission and display formats and protocols. [line 659] ----------------------------------------------------------------------- Board of Advisors: Stevan Harnad University of Southampton Dick Lanham University of California at L. A. Ann Okerson Association of Research Libraries Joe Raben City University of New York Bob Scholes Brown University Harry Whitaker University of Quebec at Montreal ------------------------------------------------------------------------- SENIOR EDITORS - December, 1995 ahrens@alpha.hanover.edu John Ahrens Hanover dabrent@acs.ucalgary.ca Doug Brent Calgary kahnas@jmu.edu Arnie Kahn James Madison nrcgsh@ritvax Norm Coombs RIT richardj@bond.edu.au Joanna Richardson Bond ryle@urvax.urich.edu Martin Ryle Richmond ------------------------------------------------------------------------- Consulting Editors - December, 1995 bcondon@umich.edu Bill Condon Michigan djb85@albany Don Byrd Albany folger@watson.ibm.com Davis Foulger IBM - Watson gms@psu.edu Gerry Santoro Penn State nakaplan@ubmail.ubalt.edu Nancy Kaplan Baltimore r0731@csuohio Nelson Pole Cleveland State ray_wheeler@dsu1.dsu.nodak.edu Ray Wheeler North Dakota srlclark@liverpool.ac.uk Stephen Clark Liverpool twbatson@gallua Trent Batson Gallaudet wcooper@vm.ucs.ualberta.ca Wes Cooper Alberta -------------------------------------------------------------------------- Editor: Ted Jennings, emeritus, University at Albany Editorial Asssociate: Jerry Hanley, emeritus, University at Albany -------------------------------------------------------------------------- University at Albany Computing and Network Services -------------------------------------------------------------------------- University at Albany, State University of New York, Albany, NY 12222 USA