Harnad, S. (2003) Online Archives for Peer-Reviewed Journal Publications. International Encyclopedia of Library and Information Science. John Feather & Paul Sturges (eds). Routledge. http://www.ecs.soton.ac.uk/~harnad/Temp/archives.htm

Online Archives for Peer-Reviewed Journal Publications

Stevan Harnad

Intelligence/Agents/Multimedia Group
Department of Electronics and Computer Science
University of Southampton
Highfield, Southampton
harnad AT ecs.soton.ac.uk

SUMMARY: Peer-reviewed journals used to perform two functions for research and researchers -- (1) peer review and (2) distribution -- and research libraries used to perform two more -- (3) archiving and (4) access provision. In the online age, journals will need only to provide the peer-review service. Authors will self-archive their papers, both before and after peer review, in their institutional Eprint Archives, which will all be interoperable with one another, providing open access to all peer-reviewed research output as if it were all in one global archive.

There currently exist at least 20,000 peer-reviewed journals, across all scholarly and scientific disciplines, published in most of the research-active nations and tongues of the world. The (at least) 2 million articles that appear in them annually are only accepted for publication after they have successfully met the quality-standards of the particular journal to which they were submitted. There is a hierarchy of quality standards across journals, from the most rigorous ones at the top -- usually the journals with the highest rejection rates and the highest "impact factors" (the number of times their articles are cited by other articles) -- all the way down to a virtual vanity press at the bottom.

The responsibility for maintaining each journal's quality standards is that of the editor(s) and referees. The editor chooses qualified experts ("peers") who then review the submissions and recommend acceptance, rejection, or various degrees of revision.

In the past, journals were not concerned with archiving. Their contents appeared on paper and the journal's responsibility was the peer review, editing, markup, typesetting, proofing, printing, and distribution of the paper texts. It was the subscribers (individual or institutional) who had to concern themselves with the archiving and preservation, usually in the form of the occupation of space on library shelves, occasionally supplemented by copying onto microfiche as a backup. The main backup, however, was the (presumably) preserved multiple copies on individual and institutional library shelves all over the world. It was this distributedness and redundancy that ensured that refereed journals were archival and did not vanish within a few days of printing, as ephemeral newspapers and leaflets might do.

In recent decades, journals have increasingly produced online versions in addition to on-paper versions of their contents. Initially, the online version was offered as an extra feature for institutional subscribers, and could be received only if the institution also subscribed to the paper version. Eventually, institutional site-licenses to the online version alone became a desired option for institutions. For approximately the same price as a paper subscription, online licenses offered much wider and more convenient access to institutional users than a single paper subscription ever could do.

This new option raised the problem of archiving again, however: Who owns and maintains the online archive of past issues? In paper days, it was clear that the subscriber owned the "archive," in the form of the enduring paper edition on the shelf. But with digital texts there is the question of storing them, upgrading them with each advance in technology, and in general seeing to it that they remain accessible to all institutional users online permanently.

If the journal maintains the online archive, (1) what happens when an institution discontinues its subscription? No new issues are received, of course, but (2) what about past issues, already paid for?

And (3) are publishers really in a position to become archivists too, adding to their traditional functions (peer review, editing, etc.) the function of permanent online archiving, upgrading, migration, preservation, and search/access-provision? Are these traditional library- and digital-library functions now to become publisher functions?

There is not yet a satisfactory answer to any of these questions, but the means of implementing them, once we decide on what the correct answers are, have meanwhile already been created.

First, a means was needed to make the digital literature "interoperable." This required agreeing on a shared metadata tagging convention that would allow distributed digital archives to share information automatically, so that their contents were navigable as if they were all in the same place and in the same format. An unambiguous vocabulary had to be agreed upon so that digital texts could be tagged by their author, title, publication date, journal, volume, issue, etc. (along with keywords, subject classification, citation-linking, and even an inverted full-text index for searching).

These "metadata" tags could then be "harvested," both by individual users and by search engines that provided sophisticated navigational capabilities. In principle, the outcome would be as if each of the annual 2 million articles in the 20,000 peer-reviewed journals were all in one global archive.

This shared metadata tagging convention has been provided by the Open Archives Initiative (OAI) http://www.openarchives.org and is being adopted by a growing number of archives, including both journal archives and institutional archives. The OAI convention, however, does not answer the question of who should do the archiving: journals or institutions.

Another growing movement, the Budapest Open Access Iniatitive (BOAI) http://www.soros.org/openaccess/ is likely to influence this outcome. To understand the form this may take, we have to distinguish two kinds of Archives, "Open Archives," which are all OAI-compliant Archives, and "Open-Access Archives," which are not only OAI-compliant, but access to their full-text contents is free.

Explaining why and how free online access is the optimal and inevitable solution for this special literature (of 20,000 peer-reviewed journals) goes beyond the scope of this article, but it is based on the fact that this literature differs from most other literatures in that it is without exception all an author give-away: Not one of the authors of the annual 2 million refereed articles seeks royalties or fees in exchange for his text. All these authors seek is as many readers and users as possible, for it is the research impact of these articles -- of which a rough measure is the number of times each is cited -- that brings these authors their rewards (employment, promotion, tenure, grants, prizes, prestige). It is not subscription/license sales revenue that brings authors these rewards: on the contrary, these toll-based access-barriers are also impact-barriers, and therefore at odds with the interests of research and researchers.

Hence, from their authors' point of view, the optimal solution for archiving is that the archives should be Open-Access Archives. There are two ways to achieve this. One is that (a) the journals add archiving to their existing services and make the contents of their archives Open-Access.

This is on the face of it a rather unrealistic thing to ask from journals, for it asks them to take on additional expenses, over and above their traditional ones, and yet to seek no revenue in exchange, but instead give away all their contents online. It becomes somewhat more realistic if we anticipate a future time when there is no longer any demand for the on-paper version, and so it, and all its associated expenses, can be eliminated, by downsizing to only the essentials.

It has been estimated that if journals performed only peer review, and nothing else, becoming only quality-control service-providers and certifiers, then their expenses per article would be reduced by about 75%. The average revenue per article is currently $2000 (the sum of all subscription, license, and pay-per-view income, mostly paid by institutions). But this still leaves $500 per article to be recovered, somehow: How to do it if the text is given away for free?

We will return to this question in a moment, noting only that it is still futuristic, becoming relevant only when there is no longer enough demand for the paper version to cover all the costs as it had in the past. There are conceivably sources for covering a cost of $500 per article, including research grants and other possible sources of institutional or governmental subsidy in the interest of open access to research. But there is another possibility, not calling for subsidy:

The second way to achieve Open-Access Archives -- an immediate rather than a future-contigent way like (a) -- is  through (b) the author/institution self-archiving of all peer-reviewed articles in their own institutional Eprint Archives. Institutions create Open-Access Eprint Archives for all of their own peer-reviewed research output: http://www.eprints.org. This provides immediate open access to the entire peer-reviewed journal literature for all would-be users, everywhere.

While the on-paper versions continue to be sold and bought, that continues to be the "true" archive, and all publication costs are covered the old way (through subscription and license payments to journals). But if and when the day arrives when there is no longer any demand or market for the publisher's paper version, institutions will by the same token have the 100% annual windfall savings out of which to redirect the 25% needed to cover the peer review costs for their own annual research output. And at that point the interoperable, OAI-compliant institutional Eprint Archives will also become the true archives of the peer-reviewed journal literature.

Harnad, S. (2001) The Self-Archiving Initiative. Nature 410: 1024-1025 http://cogprints.soton.ac.uk/documents/disk0/00/00/16/42/index.html

Odlyzko, A.M. (2002) The rapid evolution of scholarly communication." Learned Publishing 15: 7-19 http://www.si.umich.edu/PEAK-2000/odlyzko.pdf