Self-Archiving FAQ

for the Budapest Open Access Initiative (BOAI)

[ Version francaise. Powerpoints pour promouvoir l'autoarchivage -- Trois ecrits pertinents: (1) (2) ( 3) (4) -- Modele pour une politique institutionelle exigeant l'autoarchivage -- Voir aussi site Inra de Helene Bosc -- Declaration de Berlin et Declaration d'Engagement Institutionnel ]

Berlin Declaration -- Declaration of Institutional Commitment to Providing OA --
Model institutional self-archiving policy -- Institutional Archives Registry and List --
The Open Access (OA) vs. Toll Access (TA) citation impact advantage --
Citebase the citation-based scientometric search engine -- Standardized online OAI CV --
The usage/citation correlator/predictor -- Paracite citation seeker --
Powerpoints (to be used in promoting open-access provision)

What-is/why/how FAQs:

What is self-archiving?
What is the Open Archives Initiative (OAI)?
What is OAI-compliance?
What is an Eprint Archive?
How can I or my institution create an Eprint Archive?
How can an institution facilitate the filling of its Eprint Archives?
What is the purpose of self-archiving?
What is the difference between distributed and central self-archiving?
What is the difference between institutional and central Eprint Archives?
Who should self-archive?
What is an Eprint?
Why should one self-archive?
What should be self-archived?
Is self-archiving publication?
What about copyright?
What if my copyright transfer agreement explicitly forbids self-archiving?
Peer-review reform: Why bother with peer review?
Is self-archiving legal?
What if the publisher forbids preprint self-archiving?

What-to-do FAQs:

What can researcher/authors do to facilitate self-archiving?
What can researchers' institutions do to facilitate self-archiving?
What can libraries do to facilitate self-archiving?
What can research funders do to facilitate self-archiving?
What can publishers do to facilitate self-archiving?

"I-worry-about..." 32 prima facie concerns (subgrouped thematically):

I. 10. Copyright
            32. Poisoned Apple
II. 7. Peer review
            5. Certification
            6. Evaluation
            22. Tenure/Promotion
            13. Censorship
III. 29. Sitting Pretty
            4. Navigation (info-glut)
IV. 1. Preservation
            2. Authentication
            3. Corruption
            23. Version control
            25. Mark-up
            26. Classification
            16. Graphics
            15. Readability
            21. Serendipity
            18. Libraries'/Librarians' future
V. 19. Learned Societies' future
VI. 17. Publishers' future
            9. Downsizing
            8. Paying the piper
            14. Capitalism
            24. Napster
            31. Waiting for Gold
VII. 20. University conspiracy
            30. Rechanneling toll-savings
            28. Affordability
VIII. 12. Priority
            27. Secrecy
IX. 11. Plagiarism

What is self-archiving?

To self-archive is to deposit a digital document in a publicly accessible website, preferably an OAI-compliant Eprint Archive. Depositing involves a simple web interface where the depositer copy/pastes in the "metadata" (date, author-name, title, journal-name, etc.) and then attaches the full-text document. Software is also being developed to allow documents to be self-archived in bulk, rather than just one by one.

What is the Open Archives Initiative (OAI)?

The Open Archives Initiative (OAI) has designed a shared code for metadata tags (e.g., "date," "author," "title," "journal" etc.). See the OAI FAQ. The full-text documents may be in different formats and locations, but if they use the same metadata tags they become "interoperable." Their metadata can be "harvested " and all the documents can then be jointly searched and retrieved as if they were all in one global collection, accessible to everyone.

What is OAI-compliance?

OAI-compliance means using the OAI metadata tags. A document can be OAI-compliant and an Eprint archive can be OAI-compliant. All OAI-compliant documents in OAI-compliant archives are interoperable. This means distributed documents can be treated as if they were all in one place and one format.

What is an Eprint Archive?

An Eprint Archive is a collection of digital documents. OAI-compliant Eprint Archives share the same metadata, making their contents interoperable with one another. Their metadata can then be harvested into global "virtual" archives that are seamlessly navigable by any user (just as a commercial index or abstract database is navigable, but with full-text access).

How can I or my institution create an Eprint Archive?

Free Eprints software (itself using only free software) has been designed so institutions or even individuals can create their own OAI-compliant Eprint Archives . Setting up the archive only needs some space on a web server. Installing the Eprints software is relatively easy, and being made easier with each successive release of the software. It requires a little webmaster time to set up, and a little webmaster time to maintain. This investment is very small. The real challenge is not creating or maintaining an Eprint Archive, but ensuring that it is promptly filled with its intended contents, which, for the BOAI , consist of pre-peer-review preprints and peer-reviewed, accepted postprints.
See the Institutional Archives Registry and List.

How can an institution facilitate the filling of its Eprint Archives?

(1) Install OAI-compliant Eprint Archives .

(2) Adopt a university-wide policy that all faculty maintain and update a standardised online curriculum vitae (CV) for annual review .
See the Institutional Archives Registry and List
(3) Mandate that the full digital text of all refereed publications should be deposited in the University Eprint Archives and linked to their entry in the author's online CV. (Make it clear to all faculty how self-archiving is in the interest of their own research and standing , maximizing the visibility , accessibility and impact of their work.)

(4) Offer trained digital librarian help in showing faculty how to self-archive their papers in their own university Eprint Archive (it is very easy).

(5) Offer trained digital librarian help in doing "proxy" self-archiving, on behalf of any authors who feel that they are personally unable (too busy or technically incapable) to self-archive for themselves. They need only supply their digital full-texts in word-processor form: the digital archiving assistants can do the rest (usually only a few dozen keystrokes per paper).

(A policy of mandated self-archiving for all refereed research output, together with a trained proxy self-archiving service, to ensure that lack of time or skill do not become grounds for non-compliance, are the most important ingredients in a successful self-archiving program . The proxy self-archiving will only be needed to set the first wave of self-archiving reliably in motion. The rewards of self-archiving -- in terms of visibility , accessibility and impact -- will maintain the momentum once the archive has reached critical mass. And even students can do for faculty the few keystrokes needed for each new paper thereafter.)

(6) Digital librarians, collaborating with web system staff , should be involved in ensuring the proper maintenance, backup, mirroring, upgrading, and migration that ensures the perpetual preservation of the university Eprint Archives. Mirroring and migration should be handled in collaboration with counterparts at all other institutions supporting OAI-compliant Eprint Archives.
See the Institutional Archives Registry and List

What is the purpose of self-archiving?

The purpose of self-archiving is to make the full text of the peer-reviewed research output of scholars/scientists and their institutions visible, accessible, harvestable, searchable and useable by any potential user with access to the Internet. The purpose of thus maximizing public access to research findings online is that this in turn maximizes its visibility , usage and impact -- which in turn not only maximizes its benefits to researchers and their institution in terms of prestige, prizes, salary, and grant revenue but it also maximizes its benefits to research itself (and hence to the society that funds it) in terms of research dissemination, application and growth, hence research productivity and progress. This is why open access is both optimal and inevitable.
See the Institutional Archives Registry and List

What is the difference between distributed and central self-archiving?

All OAI-compliant Eprint Archives are interoperable. This means their contents are harvestable by cross-archive search engines like ARC or cite-base into global virtual archives. Hence OAI has eliminated the difference between self-archiving documents in one central archive or many distributed archives. Users need not know where documents are located in order to find, browse and retrieve them (any more than they do when they are using commercial indexing or abstracting services); and the full texts are all retrievable.

What is the difference between institutional and central Eprint Archives?

Because of OAI-compliance , it no longer matters whether documents are archived in one central Eprint Archive or in many distributed ones. They are all interoperable and harvestable into one virtual "central" archive in which all contents are seamlessly navigable and retrievable. Strategically, however, there is a difference between institutional and central self-archiving.

Self-archiving is done in order to maximize the visibility and accessibility of refereed research, and hence to maximize its usage by researchers and its impact on research. The benefits of maximizing research impact are felt by the researcher and the researcher's institution, rather than by some more central entity (such as the research discipline or learned society). The academic reward system (salaries, research funding) is centered on the researcher's institution. Publishing and impact confer advantages on both researcher and institution. Hence the researcher's institution is the natural one to host self-archiving and ensure that its archives are filled with its annual research output.
See the Institutional Archives Registry and List.

Who should self-archive?

The Budapest Open Access Initiative is focussed specifically on the refereed research literature, across all disciplines. It is the authors of these articles who should self-archive them, in order to maximize the visibility , accessibility , uptake and impact of their work. The self-archiving itself, however, though rapid and simple, can be done by "proxy," by digital archivers in the researcher's institution or its library . It can also be done in bulk, by (free) software (under development).
See the Institutional Archives Registry and List.

What is an Eprint?

Eprints are the digital texts of peer-reviewed research articles, before and after refereeing. Before refereeing and publication, the draft is called a "preprint." The refereed, accepted final draft is called a "postprint." (Note that this need not be the publisher's proprietary PDF version!) Eprints include both preprints and postprints (as well as any significant drafts in between, and any postpublication updates). Researchers are encouraged to self-archive them all. The OAI tags keep track of all versions. All versions should contain links to the publisher's official version of record.

Why should one self-archive?

In order to maximize the visibility and accessibility of one's research, and hence the usage and impact of one's work. Merely publishing it provides minimal impact: Also self-archiving it provides maximal impact.

What should be self-archived?

All significant stages of one's work, from the pre-refereeing preprint to the peer-reviewed, published postprint, to postpublication updates should be self-archived. The OAI tags keep track of all versions. (Note that postprint need not be the publisher's proprietary PDF, but that there should always be a link to it.)

Is self-archiving publication?

Definitely not. For purposes of establishing priority and asserting copyright, anything that is made public, even on a single piece of paper, meets the legal definition of "publication." Hence so does self-archiving. But for scholarly and scientific purposes, only meeting the quality standards of peer review, hence acceptance for publication by a peer-reviewed journal, counts as publication. Self-archiving should on no account be confused with self-publication (vanity press). (Self-archiving pre-refereeing preprints, however, is an excellent way of establishing priority and asserting copyright.)

What about copyright?

The author holds the copyright for the pre-refereeing preprint, so that can be self-archived without seeking anyone else's permission. For the refereed postprint, the author can try to modify the copyright transfer agreement to allow self-archiving, or, failing that, can append or link a corrigenda file to the already self-archived preprint. See " Is self-archiving legal? ," "What if the publisher forbids self-archiving the preprint? " and the Rights MEtadata for Open archiving Project and Directory of Journals' Policies on Author Self-Archiving

What if my copyright transfer agreement explicitly forbids self-archiving ?

See " Is self-archiving legal? ," "What if the publisher forbids self-archiving the preprint? " and the Rights MEtadata for Open archiving Project and Directory of Journals' Policies on Author Self-Archiving .

Peer-review reform: Why bother with peer review?

Peer review is not without its flaws, but improving peer review first requires careful testing of alternative systems, and demonstrating empirically that these alternatives are at least as effective as classical peer review in maintaining the quality of the refereed literature (such as it is). No alternatives have yet been tested or demonstrated effective.
Hence current peer review reform or elimination proposals are merely speculative hypotheses at this time, and red herrings insofar as the freeing of the peer-reviewed literature is concerned: The self-archiving initiative is directed at freeing the current peer-reviewed literature, such as it is, from the impact/access barriers of Subscription/License/Pay-per-view access-tolls, now. It is not directed at freeing the literature from peer review, or at testing or implementing untested alternatives to peer review (Cf. http://library.caltech.edu/publications/ScholarsForum/042399sharnad.htm
and http://www.ecs.soton.ac.uk/~harnad/Ebiomed/com0509.htm#harn45 ).

The benefits of freeing the refereed literature now are a sure thing; the benefits (if any) from future alternatives to peer review (if any) are purely hypothetical, and certainly nothing to hold us back from self-archiving and wait for .

Is self-archiving legal?

Texts that an author has himself written are his own intellectual property. The author holds the copyright and is free to give away or sell copies, on-paper or on-line (e.g., by self-archiving), as he sees fit. For example, the pre-refereeing preprint can always be legally self-archived .

Self-archiving of one's own, non-plagiarized texts is in general legal in all cases but two. The first of these two exceptions is irrelevant to the kind of self-archiving BOAI is concerned with, and for the second there is a legal alternative.

Exception 1: Where exclusive copyright in a "work for hire" has been assigned by the author to a publisher -- i.e., the author has been paid (or will be paid royalties) in exchange for the text -- the author may not self-archive it. The text is still the author's "intellectual property," in the sense that authorship is retained by the author, and the text may not be plagiarized by anyone, but the exclusive right to sell or give away copies of it has been transfered to the publisher.

Exception 1 is irrelevant to BOAI , because BOAI is concerned only with peer-reviewed research, for which the author is paid nothing, and no royalty revenue is expected, sought, or paid.

Exception 2: Where exclusive copyright has been assigned by the author to a journal publisher for a peer-reviewed draft, refereed and accepted for publication by that journal, then that draft may not be self-archived by the author (without the publisher's permission).

The pre-refereeing preprint, however, has already been (legally) self-archived. (No copyright transfer agreement existed at that time.)

So in those cases where the the copyright transfer agreement does not yet give the author the green light to self-archive the refereed final draft ("postprint"), there is always the alternative of self-archiving a corrigenda file alongside the already archived preprint, listing the changes that need to be made to make the pre-refereeing preprint conform to the refereed postprint.
See the Rights MEtadata for Open archiving (Romeo) Director of Journal Self-Archiving Policies. Of the 10,000+ journals surveyed over 90% are already "green" (i.e., they have already give their green light to author self-archiving). Many of of the remaining 20% "gray" journals will agree if the author asks.
Perhaps the most sensible default strategy of all is the one that the physicists have been successfully practicing since 1991: "don't-ask/don't-tell": Simply self-archive your preprint as well as your postprint, and wait to see whether the publisher ever requests removal. After nearly a decade and a half of practicing this default strategy, and a quarter of a million self-archived papers, not a single paper has yet been removed because a publisher requested it. On the contrary, virtually all physics journals have since become officially "green" in response to the physics community's evident desire and determination to enjoy the research benefits of providing open access to their own papers by self-archiving them. In contrast, those researchers who during that decade and a half have not been practicing this default strategy have instead needlessly lost a decade and a half's worth of cumulative research impact .

What if the publisher forbids preprint self-archiving?

The right to self-archive the refereed postprint is a legal matter, because the copyright transfer agreement pertains to that text. But the pre-refereeing preprint is self-archived at a time when no copyright transfer agreement exists and the author holds exclusive and full copyright. So publisher policy forbidding prior self-archiving of preprints is not a legal matter, but merely a journal policy matter (just as it would be if the journal were to forbid the submission of papers by authors with blue-eyed uncles!). It would become a legal matter -- but a contractual matter, not a copyright one -- if the author were to sign a contract explicitly stating that the unrefereed preprint had not been previously self-archived online. Obviously an author should strike such arbitrary stipulations out of any contract.

This policy goes by the name of the " Ingelfinger Rule ," originally invoked by the Editor of the New England Journal of Medicine (NEJM), Franz Ingelfinger, in order to protect public health (and the NEJM's priority) from any publicity about unrefereed findings prior to publication.

The Ingelfinger Rule (sometimes also referred to as a " prepublication embargo ") is accordingly not a copyright matter, but a journal submission policy: "We will not consider for publication any preprint that has been previously self-archived."

BOAI makes no recommendations to authors regarding compliance with such policies, except to note that (1) the Ingelfinger Rule is not a legal matter, (2) the number of journals invoking the Ingelfinger Rule is rapidly diminishing in the face of self-archiving pressure from authors in the interests of research progress (Nature, for example, has dropped it, and other journals are following suit) and (3) the Ingelfinger Rule was probably never enforceable in any case.

What-to-do FAQs

What can researcher/authors do to facilitate self-archiving?

Make sure that your university or research institution has installed OAI-compliant Eprint Archives .

Self-archive your pre-peer-review preprints in your institutional (or central) Eprint Archives.

Self-archive your post-peer-review postprints (or corrigenda file) in your institutional (or central) Eprint Archives.
See the Institutional Archives Registry and List.

What can researchers' institutions do to facilitate self-archiving?

See " How can an institution facilitate the filling of its Eprint Archives ?"
See the Institutional Archives Registry and List.
Sign the Declaration of Institutional Commitment to Providing OA .

What can libraries do to facilitate self-archiving?

Digital librarians are the natural candidates for maintaining the Eprint Archives , their institution's outgoing collection of peer-reviewed research output.

(1) Offer trained digital librarian help in showing faculty how to self-archive their papers in the university Eprint Archive (it is very easy).

(2) Offer trained digital librarian help in doing "proxy" self-archiving, on behalf of any authors who feel that they are personally unable (too busy or technically incapable) to self-archive for themselves. Authors need only supply their digital full-texts in word-processor form: the digital archiving assistants can do the rest (usually only a few dozen key/mouse-strokes per paper).

(The proxy self-archiving will only be needed to set the first wave of self-archiving reliably in motion. The rewards of self-archiving -- in terms of visibility , accessibility and impact -- will maintain the momentum once the archive has reached critical mass. And even students can do for faculty the few keystrokes needed for each new paper thereafter.)

(3) Digital librarians, collaborating with web system staff , should be involved in ensuring the proper maintenance, backup, mirroring, upgrading, and migration that ensures the perpetual preservation of the university Eprint Archives. Mirroring and migration should be handled in collaboration with counterparts at all other institutions supporting OAI-compliant Eprint Archives.
See the Institutional Archives Registry and List.

What can research funders do to facilitate self-archiving?

Mandate that the research that is publicly funded must not merely be published but it must be publicly accessible online (whether through self-archiving, open-access journals, or both) as recommended by the UK Government Science and Technology Committee as well as the US House of Representatives Appropriations Committee.

Make it part of grant applications that CVs and bibliographies citing the applicant's prior work should contain links to the online full-text (whether self-archived or in open-access journals, or both).
Sign the Declaration of Institutional Commitment to Providing OA .

What can publishers do to facilitate self-archiving?

Support Open Access by adopting a "green" author self-archiving policy, i.e. giving your green light to author self-archiving of preprints and postprints (not necessarily the publisher's PDF) as over ninety percent of journals sampled (8,000+) have already done. See the Rights MEtadata for Open archiving (ROMEO) Directory of publishers' self-archiving policies.
See also FOS policy statements by learned societies and professional associations and "The Green Road to Open Access: A Leveraged Transition".

Publishers are encouraged to fill out the SHERPA/ROMEO webform describing their self-archiving policy statement for inclusion in the Romeo publishers directory and to email their journals list (with ISSNs and URLs) to Maria at http://romeo.eprints.org/corrections.php for inclusion in the Romeo journals directory .

1. Preservation

"I worry about self-archiving because archived eprints may not continue to exist or to be accessible in perpetuum on-line, the way they were on-paper."

This worry is misplaced. It is not really a worry about self-archiving at all, but about the online medium itself. As such, it needs to be directed toward the primary database in question, which is the toll-access refereed journal literature, currently in the hands of publishers and libraries, and most of it already in both paper and digital form. That is the official version of record. If you are worried about the preservation of the online version, it is to its publishers and subscribing/licensing librarians that your worry needs to be addressed. The preprints and postprints that are being self-archived by their authors in their institutional eprint archives today are intended to maximize impact by providing immediate open access; they are merely open-access supplements to that toll-based primary literature at this time, not substitutes for it.

To put even this misdirected worry into perspective, we must remember that print-on-paper is not permanent either. The only relevant parameter is the probability of future access. The on-paper probability, such as it is, is achieved by generating (a) multiple copies that are (b) geographically distributed (c) in a (relatively) robust medium and can be made (d) visible to the human eye.

All four of these properties can be achieved (and have been) on-line too, and the resulting preservation probability can be made as good as, or even better than, the current probability on-paper.

That should be the end of the story: For once this concern is no longer grounded in actual, objective probabilities, but only in prior habits and attendant intuitions, then we are talking about biasses and superstitions and not about actual risks.

There are a few side issues: People worry about global power-failures, or global dictatorships. They should remind themselves that these are matters of probability too, and have their equivalents in paper.

People also, by analogy with current unreadable documents in obsolete word-processors or peripherals, worry about whether the digital code, even if preserved, will always be accessible and visible to the eye.

The answer is again probability: The reason print-on-paper has been faithfully preserved across generations (when it has been) is that the literate world's collective interests were vested in ensuring that it should do so. This same continuity of collective interests will exist for the digital corpus too, for the same reasons, except that digital code will be much easier to keep migrating to every successive new technology than print on-paper to every successive building or regime ever was.

(And there is always the option for those who are still not confident enough in the technology, despite the odds, of printing out hard copies as back-up: Indeed, that is a good way to put the magnitude of one's preservation worries to the test: Who will still feel the need to keep hard copies, and of how much of the corpus, once it's all on-line and accessible to everyone, everywhere, at all times?)

In short, setting up active preservation programs implemented by digital librarians is indeed important and necessary; but it would be completely irrational to interpret the need for robust preservation programs as a reason for any hesitation or delay whatsoever about proceeding with self-archiving right now -- a fortiori, because, for the time being, self-archiving is merely a supplement to, not a substitute for, the existing, modes of preservation, on paper and online. If and when the day should ever come when primary journal publishers decide to downsize and become peer-review service-providers only, cutting costs by offloading the access and archiving burden entirely onto the network of institutional archives, then that institutional network will be quite ready, willing and able to take over the distributed digital preservation burden for its collective research legacy. But that time is not now, hence this worry (about self-archiving now) is misplaced.

2. Authentication

"I worry about self-archiving because you can never be sure whether you are reading the definitive version of an eprint on-line, the way you can be sure on-paper."

Again, the rational way to put this into context and proportion is to remind ourselves that the authenticity of an on-paper version is just a matter of probability too, and that the very same factors that maximize that probability on-paper can maximize it on-line too. Indeed, if we wish, we can make both the probability and the verifiability of authenticity on-line much higher than it currently is on-paper through techniques such as public hash/time-stamping and encryption .

Nor should the authentication issue be confused with the issue of Peer-Review (7) or Journal Certification (5) (separate questions), nor with the question of " Version Control (23) ": There will be self-archived preprints, revised drafts, final accepted, drafts (postprints; but not necessarily the publisher's PDF), updated, corrected post-postprints, peer comments, author replies, revised second editions. In all of this, the refereed, accepted final draft is one crucial "milestone," but not the only one, in the embryology of knowledge (and not even always the best one).

And last, some of the "authentication" worries arise from conflating self-archiving and self-publication . To say it in longhand: The main objective of the self-archiving initiative is the freeing of the refereed drafts from access/impact barriers. The refereed draft has already been "authenticated" by the journal that peer-reviewed it. Do not confuse that authentication with some worry you may have about whether this self-archived draft is indeed what the author purports it to be. The only thing the author is "self-certifying" in this case is that this is indeed the journal-certified final draft. There is of course always a possibility that it is not the journal-certified final draft; but that was also true when the author sent you an on-paper reprint. The probabilities can, as usual, be tightened to make them as high as we feel comfortable with in either case (especially with institution-CV-based self-archiving ). And, as in the case of preservation , self-archiving is at this stage merely a supplement, not a substitute for existing forms of authentication. (Eprints, however, should always contain a link to the DOI of the publisher's official version.)

So, again, there are no rational authentication concerns at all to deter us from self-archiving immediately.

3. Corruption

"I worry about self-archiving because eprints can be altered or otherwise corrupted on-line in ways they could not be corrupted on-paper."

If the "authentication" worry (2) is the worry about "self-corruption" by the author who has self-archived his own paper, this second "corruption" worry is about "allo-corruption" by parties other than the author.

Again, the answer is that simple and effective means are available to ensure that an on-line draft is uncorrupted with as high a probability as we feel we need. So this too is a non-problem. (Nor should it, again, be conflated with self-publication issues, which are irrelevant to the self-archiving of refereed, journal-published papers.) Whatever level of incorruptibility we feel we need, we can have it for self-archived papers too.

Consequently, corruptibility worries provide no rational basis at all for deterring us from self-archiving immediately.

4. Navigation (info-glut)

"I worry about self-archiving because there is already too much to read, and it is already too hard to navigate it on paper; adding eprints will just make this situation even worse.

This worry deserves even less space than the others. It is incontestable that the information glut -- http://www.sims.berkeley.edu/how-much-info/summary.html -- is far more navigable and manageable on-line than on-paper.

The primary objective of self-archiving is to free the refereed journal literature from impact-blocking access-tolls on-line. That literature is already being published on-paper. (If you think it should not be, it is with the journals and their referees that you need to take issue, not with self-archiving or the on-line medium!) When it is all accessible toll-free on-line, there is no need for anyone to feel any more (or less) obliged to read the refereed literature than they did on-paper. Keeping it either off-line or toll-based is certainly no cure for the information glut (if there is one); it merely makes the existing access-tolls the arbitrary arbiters of whether or not one reads something, rather than the reader's own rational judgement. (And unrefereed preprints can of course always be ignored altogether, if the reader wishes, on-line just as on-paper.)

In short, no rational deterrent at all to immediate self-archiving from concerns about navigation or information glut.

5. Certification

"I worry about self-archiving because papers are not certified on-line, the way they are in a journal on-paper."

This worry is again based on conflating publication and archiving : The journal publisher (and referees) provide the certification; the archive merely provides access. The author, in self-archiving, "self-certifies" his refereed, published draft as indeed being the self-same draft that the journal refereed and published (and certified). And this being the case is, as usual, a matter of probability, whether on-line or on-paper. And that probability can be made as high as we feel we need.

Again, no rational deterrent to immediate self-archiving in the certification worry.

6. Evaluation

"I worry about self-archiving because there is no evaluative process on-line as there is on-paper."

Again, a conflation of publishing and archiving : Journal editors and their referees evaluate drafts and revisions, and if/when they are satisfied that their journal's quality standards have been met, they certify the final draft as having met them (peer review). The author self-archives the peer-reviewed postprints (and unrefereed preprints, and perhaps revised post-postprints), tagging them correspondingly. We can decide how high a probability we need that the peer-reviewed draft is indeed the peer-reviewed draft, but that is not the problem of evaluation , but just the question of Authentication (2) again.

So there is no rational deterrent to immediate self-archiving anywhere in the evaluation worry.

7. Peer review

"I worry about self-archiving because on-line eprints are not refereed, as they are on-paper: What will become of peer review?"

Again, a conflation of publishing and archiving, as well as of preprints and postprints : The author self-archives both pre-refereeing preprints and refereed postprints (etc.), and each is clearly tagged as such. The peer review continues to be performed by the referees, as it always was. Peer-review is medium-independent, and self-archiving in no way alters the peer review system.

Part of the impetus for the groundless worry that self-archiving or open access are somehow at odds with peer review comes from "peer-review reformers," who have somehow managed to link their completely independent reform agenda to the open-access agenda (probably because of a misinterpretation of the implications of self-archiving the unrefereed preprint).
Those who wish to reform or replace peer review first need to go out and test their alternatives, to demosntrate whether or not they will generate a literature of a quality, reliability, and useability at least equal to the one we have now. But meanwhile, self-archiving is about providing open access to the peer-reviewed literature we have now, such as it is, to free it from access-tolls, not from peer-review.

See:
The Invisible Hand of Peer Review.
http://www.nature.com/nature/webmatters/invisible/invisible.html

Peer Review Reform Hypothesis-Testing http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/0479.html

A Note of Caution About "Reforming the System" http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/1169.html

Self-Selected Vetting vs. Peer Review: Supplement or Substitute? http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2340.html

No rational deterrent to immediate self-archiving in the peer-review worry.

8. Paying the piper

"I worry about self-archiving because someone surely has to pay for all this: you can't get something for nothing!"

There are many fallacies embedded in this worry, among them misunderstandings about the nature of global networked communication. Internet connectivity is now a standard part of the infrastructure of most of the world's universities and research institutions. If you are not equally worried about who pays for your emails, websites, and web-browsing, you should not be worrying about your self-archiving either. Moreover, paying access-tolls is not paying the pertinent piper here anyway! (I.e., it is not publishers who are paying for universities' network infrastructure!)

The refereed research literature is minuscule compared to the rest of the traffic on the Web . It is the flea on the tail of the dog. Worry about the storage and band-width for the growing daily creation and use of audio, video, and multimedia (most of it non-research use!) by researchers at universities and research institutions before even beginning to fret about the refereed flea.

As usual, there is also some of the archiving/publishing conflation here, thinking that we must find some sort of counterpart for the printing/distribution costs, somewhere. But there isn't any. The cost per-paper of permanent online archiving is virtually zero, yet everyone, everywhere, has access to it all, forever. This is a Gutenberg expense that has simply vanished in the PostGutenberg Galaxy, leaving only the Cheshire Cat's Grin.

There is indeed one essential publishing cost that still needs to be paid, but it has nothing to do with Internet use: It is the cost of implementing peer review. That cost, however, is only 10-30% of the access-tolls currently being paid, and hence could easily be paid out of the annual toll savings.

The last of the "who-pays-the-piper" worries is, I think, a variant of the Capitalism (14) worry. The best way to dispel it is is to note that refereed publishing in the PostGutenberg Galaxy, once the literature has been freed through self-archiving, is likely (apart from whatever optional add-on products and services there may still be a market for) to downsize into a service ( peer review), provided to the author-institution, instead of the toll-based product (the text) that was provided to the reader-institution in the Gutenberg era.

Nothing hinges on this, however, for as long as the world wants to keep paying for the toll-based product, even after the refereed literature has been self-archived, the piper will be fully paid, yet the literature will be free of all its access/impact barriers.

No rational deterrent to immediate self-archiving in the who-pays-the-piper worries.

9. Downsizing

"I worry about self-archiving because it may force journal publishers to shrink to a non-sustainable size, and then where would we be?"

No one can predict with certainty the evolutionary path that scientific/scholarly journal publishing will take once toll-free online access to the entire refereed corpus provided by author/institution self-archiving has prevailed. The toll-based market for the on-paper version, for the publisher's official on-line version or for other options may continue indefinitely, or it might shrink but re-stabilize at a lower level, or it might disappear altogether -- and this could happen relatively slowly or relatively quickly.

It is not clear in advance which of the current established journal publishers will want to continue doing what, under what conditions. The bottom line is that the only remaining essential service will be peer review. If and when that is the only service for which there remains a market, either current toll-access journal publishers will be able and willing to downsize to that new open-access journal niche, or they will terminate their journal operations, in which case their titles (that is, each journal's editor, editorial board, referees, and authorship) will simply migrate to new on-line-only open-access journal publishers who stand ready to adapt to the new niche [e.g., the Institute of Physics's New Journal of Physics Public Library of Science, and BioMed Central ]. Because self-archiving is distributed, gradual and anarchic, rather than growing locally, suddenly, and systematically, journal by journal, however, evolution will be gradual rather than abrupt, leaving plenty of time to adjust to a leveraged transition.

No rational deterrent to immediate self-archiving in worries about publisher downsizing.

10. Copyright

"I worry about self-archiving because it is illegal, it violates copyright agreements, and can jeopardize career and livelihood."

Please see the sections on copyright and on legal ways to self-archive despite restrictive copyright transfer agreements.

In brief, over 90% of journals already officially support self-archiving, and among those who do not yet support it, many will agree to author self-archiving if the author asks; and for those that still don't, self-archiving the preprint before submission and a "corrigenda" file after acceptance is sufficient, and completely legal. What career and livelihood depend on are peer review and research impact, and all self-archiving authors continue to enjoy both; neither one needs to be sacrificed for the other.

Ironically, the open access is also being held back by those well-meaning advocates who think that open access is dependent upon or equivalent to copyrght reform, with authors needing to retain copyright. (This is as incorrect and counterproductive as the belief that open access requires or entails peer-review reform, or that the only way to attain open access is through a transition to open-access ("golden") journal publishing.)
No rational deterrent to immediate self-archiving in copyright worries.

11. Plagiarism

"I worry about self-archiving because it is so much easier to steal someone else's text on-line, and publish it as one's own, than it is to do so on-paper."

This is again a matter of probability: Yes, "it is much easier to steal someone else's text on-line, and publish it as one's own, than it is to do so on-paper," but it is also much easier to detect such thefts on-line; and it is possible to do both (steal and detect) on-paper too.

Depending on how important we find it to do so, we can make escape from detection so improbable on-line that it becomes harder to plagiarize on-line than on-paper. It is not clear, however, whether it is even all that important to do so. Worries about plagiarism are usual based on the archiving/publishing conflation : Once one's findings have been refereed and published, it is hard for anyone else to derive any benefit from them at the expense of the author (the peer-reviewed version settles all subsequent authorship disputes).

Pre-refereeing preprints are another story; they are dealt with partly in the prior discussion of Authentication (2), and partly under Priority (12), below.

For refereed postprints, however, refraining from self-archiving them because of worries about plagiarism would be no more rational than refraining from publishing them on-paper in the first place, for the very same reason.

No rational deterrent to immediate self-archiving in plagiarism worries.

12. Priority

"I worry about self-archiving because one cannot establish priority on-line as one can on-paper."

Establishing priority is again a matter of probability, but it can readily be made much more definitive and reliable (and earlier) on-line than on-paper if we wish. See Authentication (2). More important, for the all-important refereed postprints, priority has already been established by publishing them, and the self-archiving is merely to maximize access and impact.

No rational deterrent to immediate self-archiving in priority worries.

13. Censorship

"I worry about self-archiving because censors could decide what can and cannot appear on-line."

This worry too is probably based in part on the usual archiving/publishing conflation (casting the Web and the Archive in the role of a Publisher who refuses to publish your work).

It is true that one's on-line literary goods are at the mercy of the archives and archivists. But one's analog on-paper literary goods were likewise at the mercy of the libraries. They could have chosen to "censor" our work too.

Again, it is just a matter of deciding how tight we wish to make the probabilities in this medium. Mirroring, caching/harvesting and distributed coding already go some way toward taking it out of any potentially sinister local hands. And for refereed, accepted postprints, this argument (against enhancing their access) makes no sense at all.

No rational deterrent to immediate self-archiving in worries about censorship.

14. Capitalism

"I worry about self-archiving because access-tolls are hallmarks of capitalism, market economics, supply and demand, free enterprise. Give-aways smack either of socialism, or market interference, or non-sustainability."

This too is merely a superstition. There are plenty of perfectly capitalistic precedents for give-aways, advertising being the most prominent one. If the thought of advertisers curtailing the potential impact of their ads by charging potential customers for access to them makes no sense, then it makes just as little sense to curtail the potential impact of research findings by charging potential users for access to them.

Nor is there any market interference in self-archiving one's own refereed research: If institutions and individuals want to pay for access-tolls to the on-paper version, or the publisher's official PDF, or further options, they can still do so; but there is no longer any need or justification for continuing to hold the essentials (the peer-reviewed draft) hostage to those toll-based options in the PostGutenberg era, any more than there was any need or justification for continuing to hold the essentials of long-distance communication hostage to postal transport costs in the era of telephony. (Rather than capitalism being under assault from self-archiving, trying to prevent researchers from benefiting from this new, more efficient and economical way of disseminating and maximizing the impact of their refereed research smacks of protectionism.)

Two variants on the capitalism-worry arise from scepticism about the eventual transition from providing a toll-based product to the reader-institution to providing a peer-review service to the author-institution. Note that, strictly speaking, it is not even necessary to answer these worries, as this eventual transition is hypothetical, whereas freeing the refereed literature now through self-archiving is not; but here are replies anyway:

Question 1: "Won't paying directly for the peer review service lead to inflated peer-review costs by the most prestigious journals?"

Question 2: "Won't peer-review revenues lower standards, so that lower-quality work is accepted in order to get more peer-review revenue?"

The answer to both is similar: Referees referee for free, and journal quality and prestige (and impact) depend on rejection rates. Trying to inflate revenue by lowering acceptance thresholds simply lowers quality, thereby favoring the competition, with its higher standards. This is a built in counter-weight. Likewise for raising peer-review rates: As referees referee for free, there is no reason one journal should charge more than another, and if they do, they risk driving not only the authors but also the unpaid referees to the competition. Because the competitive commodity in this anomalous give-away domain is quality, and nothing else.

A proposal has occasionally been voiced to preserve access-toll-barriers by buying authors off from self-archiving, by offering to share the revenue with them (royalty payments). But the trade-off between imprint-income and impact-income is so disproportionate for this anomalous domain that there is not faintly enough money available to make (refereed-research) authors prefer sacrificing their potential impact in exchange.

No rational deterrent to immediate self-archiving in worries about capitalism.

15. Readability

"I worry about self-archiving because it is inconvenient to read texts on screen, and hard on the eyes. It is also not suitable for bed, beach or bathroom reading."

At the moment it is undeniable that for extended, discursive reading, on-paper is still preferable to on-line. This will no doubt change, but even now it is no reason whatsoever for not self-archiving. First, a large proportion of the scientific and scholarly use of the refereed research literature consists of browsing and searching, not linear reading, and for this, on-line navigation is already incomparably superior. Second, there is still that vast potential readership to consider, whose access to your research in any form is currently blocked by unaffordable access tolls (Odlyzko 1999a , 1999b ; http://www.arl.org/stats/index.html ); for that entire disenfranchised population, it's either online or not at all. And last, even for linear reading, the archived version can always be printed off.

No rational deterrent to immediate self-archiving in worries about readability.

16. Graphics

"I worry about self-archiving because on-line graphics have coarser resolution than on-paper and require too much storage capacity and transmission time."

Graphics too will no doubt improve. With a few exceptions, such as fine arts and histology, digital graphics are already good enough. Users can always decide whether or not they feel they need to access the deluxe hard copy; no need to make a pre-emptive decision on their behalf, as the on-line version is in any case a supplement, not a substitute, for the time being. And graphics are quite a natural test-bed to see whether there is still any market left for any toll-based add-ons. In many cases, web illustrations are already considerably better than paper, with the potential for higher resolution and greater dynamic range, especially as links. This is particularly true for illustrations in fields where the data are collected digitally in the first place, such as Astronomy.

No rational deterrent to immediate self-archiving in worries about graphics.

17. Publishers' future

"I worry about self-archiving because of what it might do to journal publishers' future."

See the replies about Paying the Piper (8), Downsizing (9), and Capitalism (14), but note that this is all speculation and hypothesis, on both sides: If and when it should ever become necessary to do so -- it is not yet clear whether and when it will be necessary and all evidence to date is to the contrary -- then those journal publishers who are willing and able to cut inessential costs and downsize to a new open-access journal-publishing niche will be able to do so in a leveraged transition. In cases where they are not willing or able, new online-only open-access journal publishers [e.g., the Institute of Physics's New Journal of Physics, Public Library of Science, BioMed Central] will stand ready to take over the titles. The remaining peer-review service costs per submitted paper can be paid for by the author-institution out of 10-30% of its annual 100% access toll-savings. And refereed journal publication is only a small portion of publication, most of the rest of which, being non-give-away, will proceed on-line much the way it does on-paper.

No rational deterrent to immediate self-archiving in worries about publishers' future.

18. Libraries'/Librarians' future

"I worry about self-archiving because of what it might do to libraries' and librarians' future."

The refereed serials literature is all going on-line anyway, irrespective of the speed or success of the self-archiving initiative. If this requires restructuring of some librarian skills and functions, this will take place in any case. Some have thought that managing digital serials collections will fill the gap, but it is not clear how much management those will need, apart from paying the annual access toll-bills! Author/Institution Eprint Archives, on the other hand, will call for more digital librarian skills, in everything from helping researchers to do the self-archiving, to maintaining the institution's Eprint Archive and seeing to its continued interoperability with the rest of the world's Eprint Archives, its upgrading, and its preservation.

Moreover, in implementing and maintaining the institutional Eprint Archives, Libraries will be investing in the solution of their serials crisis. Of the 100% annual access-toll budget that this can potentially save, after 10-30% of it has been redirected to cover author-institution peer-review costs, the remaining 70-90% can be used to fund other librarians' activities, including the purchase of non-give-away materials such as books (whether on-paper or on-line).

No rational deterrent to immediate self-archiving in worries about libraries'/librarians future.

19. Learned Societies' future

"I worry about self-archiving because of what it might do to Learned Societies' future."

Learned Societies are potential allies in and beneficiaries of the self-archiving initiative. First, they are us. Whatever is good for research, and for research impact, is therefore also good for Learned Societies.

But many of them are also journal publishers, and hence may one day be facing downsizing pains. Unlike commercial publishers, however, their first and last allegiance will of course be to research and researchers, that is, us. We will hear rationalizations about needing the access-toll revenues to fund "good works" such as meetings, scholarships and lobbying. But it will quickly become evident that, on the one hand, some of these good works are not essentials either, and certainly nothing that we would want to sacrifice research impact for; and the subset of these good works that really is essential (e.g., meetings) will prove to be able to fund itself other ways too, rather than needing to be subsidized at the expense of research impact. (Imagine explicitly asking the society membership, once the causal connection between access and impact becomes common knowledge: "Are you willing to continue subsidizing your society's good works with your own lost research impact, by foregoing open-access and letting toll-access continue to decide who can and cannot use your [give-away] research?")

Learned Societies (and perhaps also University Presses) are also natural candidates for taking over the serials titles of commercial journal publishers who prefer to discontinue journal operations rather than scale down to just becoming peer-review service providers.

No rational deterrent to immediate self-archiving in worries about Learned Societies' future.

20. University conspiracy

"I worry about self-archiving because I worry that universities may have other plans for their researchers' writings, such as Eprint Archive Access-Tolls."

This worry seems to be based on some (one hopes) over-suspicious views about university administrators and their motives.

We should not forget that the give-away refereed literature is esoteric, with virtually no "market" per paper. So whereas there might be a basis for suspicion about what our hard-pressed universities might like to do if they could get their hands on our exoteric, non-give-away work (royalty-bearing books and textbooks), there's not much they could do to squeeze revenue out of our no-market, give-away refereed research reports even if they wanted to. On the contrary, our universities, like ourselves, co-benefit far more from the potential impact-income of our research output -- maximized by removing all access-barriers -- than from any potential imprint-income that could be squeezed out of it by in effect co-opting the "P" from the publishers' S/L/P (Subscription/License/Pay-Per-View) access-tolls and using it to charge institutional archive access-tolls.

Moreover, our universities' potential access-toll savings, and relief from their serials crises, are completely dependent on freeing access to our research. Any sign of university-levied archive-access tolls would simply serve to keep the current access tolls in place (simply changing the hand on the udder of the toll-based cash-cow).

No rational deterrent to immediate self-archiving in worries about University conspiracy.

21. Serendipity

"I worry about self-archiving because of those lucky happenstances that happen only when browsing index cards, library shelves, and journal contents."

This worry, despite its charm, does not deserve much space: With time, it will become evident that on-screen digital searching and browsing can be every bit as serendipitous as on-paper analog searching and browsing; chance adjacency effects are every bit as potent either way. The searching and browsing will simply be less exhausting to the limbs and fingers.

No rational deterrent to immediate self-archiving in worries about loss of serendipity.

22. Tenure/Promotion

"I worry about self-archiving because it does not count as refereed publication, and might even interfere with the chances for refereed publication."

Yet another instance of the archiving/publishing conflation: The self-archiving initiative is aimed at freeing refereed publication from access toll-based access/impact barriers (not from refereeing). Unrefereed preprints do not count as publications on-line any more than they do on-paper.

The other half of this worry is probably a variant of the Copyright (10) concerns ( q.v. ) as well as concerns about Embargo policies ( Harnad 2000a , 2000b ), both of which are groundless.

No rational deterrent to immediate self-archiving in worries about tenure/promotion .

23. Version control

"I worry about self-archiving because there may be many versions and there is no way to be sure which is which, and whether it is the right one."

There will be self-archived preprints, revised drafts, final accepted, drafts (postprints [ but not necessarily the publisher's proprietary PDF OAI-compliant Eprint Archives will tag each version with a unique identifier. All versions will be retrieved by a cross-archive OAI search , and the "hits" can then be identified and compared by the user to select the most recent, official or definitive draft, exactly as if they had all been found in the same index catalogue.

24. Napster

"I worry about self-archiving because it seems to be stealing, like Napster or Gnutella."

Author-end give-aways of their own digital products via self-archiving are the antithesis of consumer-end rip-offs of others' non-give-away digital products via napster www.napster.com or gnutella gnutella.wego.com.

It is very important to clearly distinguish and distance the two , because any inadvertent or willful conflation of the self-archiving initiative with napster can only retard the progress of the self-archiving initiative toward the optimal and inevitable.

("Information is free" is nonsense: There is and always was both give-away and non-give-away information. Steal the latter and you simply kill the incentive to provide it in the first place.)

25. Mark-Up

"I worry about self-archiving because it would jeopardize proper mark-up."

Mark-up (the tagging of all functional parts of a document, such as titles, headings, sections, figures, tables, paragraphs, and any other potentionally identifiable and manipulable sub-parts) is becoming increasingly important in digital documents. The most general mark-up "language" is called SGML and the subset of SGML that has been provisionally adopted for digital documents on the web is called XML . Most authors today use either Word, PDF, HTML , or TEX to create and render their documents. The documents thus produced do not have markup that is rich enough or flexible enough to allow important functions such as reference linking , flexible re-formatting, and reliable, intact migration to future formats for permanent preservation . This richer markup is currently provided by publishers and it must be done by hand and is therefore costly.

Hence an Eprint archive of documents self-archived without XML markup is only a short-term archive. A long-term archive requires the rich markup provided by publishers. But if present-day user preference for the free open-access documents prevents publishers from being able to recover their markup costs, will both the benefits of markup and the long-term functionality of the archived documents be lost?

The solution to this problem is the following:

(1) For now, self-archiving is not a substitute for what publishers do and provide, but a supplement to it, providing a parallel open-access version of the peer-reviewed text for any user whose institution cannot afford access to the publisher's toll-access version. The publisher's marked-up version will have more functionality, for those who can afford to pay for it, but the peer-reviewed full-text will at last be accessible to everyone, already maximizing its research impact today. This is the immediate short-term goal of self-archiving.

(2) Once the short-term goal of open access is attained, several alternative sequels become possible, and no one yet knows which of them will actually take place. The two main alternatives are:

(a) Nothing else changes. The self-archived version is accessible to all would-be users for free, and the publisher's marked-up version continues to be accessible only to those who can afford to pay. The publisher's revenues continue to pay for the mark-up, and its benefits are reserved for those who can afford to pay for them, as before, but the full-text without the markup (in WORD, HTML, PDF, or TEX) is available to everyone else.

It should be clear that if (a) is the eventual outcome, then that is no reason to hold us back from immediate self-archiving, as we have everything to gain from it (maximized access), and nothing to lose. The status quo continues, in parallel, along with the immediate effects of open access.
There is another possibility, however, and perhaps a more likely one:

(b) User preference for the open-access version reduces demand for the publisher's marked-up version to such an extent that its costs can no longer be covered from access tolls as they had been in the past. How is markup to be provided and paid for now?

If (b) is the eventual outcome, then because open-access will prevail, the cost-recovery can no longer be on the reader/institution end, in the form of access tolls. However, the reader/institutions also happen to be the author/institutions. Hence they are in a position to redirect a portion of their annual windfall toll savings to cover the remaining essential costs per outgoing paper rather than per incoming paper, as now. The collective cost currently paid by all subscribing institutions combined averages $1500 per incoming paper. If all subscribing institutions instead get back their portions of these costs, then the ~$500 per paper cost of peer review can easily be paid out of these annual windfall savings, with plenty of savings to spare. The cost per-paper of physical archiving is negligible: How much would markup cost, per paper, over and above peer review?
No one knows exactly, yet, but it is likely that a good deal of the task of markup can be offloaded onto the authors, just as digital text preparation has been, with the development of user-friendly XML markup tools. WORD will soon generate automatic XML versions, just as it now generates automatic HTML (and they will no doubt be equally inadequate, needing to be supplemented by some windows-based hand-manipulation by the author). But overall, it is likely that the pressure of necessity will inspire more and more effective and easy-to-use author-based markup capability.
The pressure of necessity that drives these adaptive changes, however, will come from the existence of the free open-access version. So markup concerns provide no reason to hold us back from immediate self-archiving.

26. Classification

"I worry about self-archiving because we would first need a subject classification system."

There are (at least) two ways to think of University Digital Archives, both of them important and valid, but definitely not the same:

(1) The University Digital Archive as the university digital library -- or, more specifically, the university digital library for all of the university's own scholarly, scientific and pedagogic output. (This includes journal articles, books, teaching materials, and any other digital content the university produces and wishes to include in its digital output.) See SPARC's position paper on institutional repositories and MIT's DSpace

There is no question but that a rigorous system of classification and tagging -- to make such a total university digital output navigable and integrable and interoperable with corresponding digital output from other universities in similar University Digital Archives -- would be extremely important to have, indeed a prerequisite for the usefulness and usability of such a university digital output library.

(2) The University Eprint Archive as a means of providing open access to all of the university's peer-reviewed research output (before and after peer review). Almost without exception, this is the work that also appears in the peer-reviewed journals sooner or later (indeed, that is how it gets peer-reviewed).

It should be clear that (2) is a very special subset of (1). But it should be equally clear that that special subset does not have any particular or pressing classification problem! These are not books. They are journal articles. Our journal articles are not indexed in our university library card catalogues (only the journals in which they appear are). When we want to search the journal literature, we do not look to any university classification system: we go to indexing services such as INSPEC, MEDLINE, ISI, etc. (Those do have their own classification systems, but it is unlikely that any of those classifications could out-perform google-style boolean search on an inverted full-text index, especially if aided by citation-frequency-based, hit-based, recency-based, or relevance-based ranking of search output, as done, for example, by citebase).
It is important to make it crystal clear that the peer-reviewed research corpus -- and those University Eprint Archives for which that particular corpus is the main target literature at this time -- do not have a classification problem, and need not and should not wait for any solution to any classification problem before getting on with the infinitely more pressing task of getting themselves filled with their university's research output -- so that they can at last start plugging the chronic leak in its potential impact!
Agenda (1) (the university digital output library) is very important and worth pursuing; it is also an extremely valuable collaborator to agenda (2) (open access to peer-reviewed research through institutional self-archiving) -- but only if the two agendas facilitate rather than restrain one another (as any implication that agenda (2) has classification problems to solve would most definitely do).

27. Secrecy

"I worry about self-archiving because it would compromise the secrecy of patents and sponsored research."

Self-archiving is only for research results one wishes to make public, just as publishing is. Whatever one does not wish to publish, one does not self-archive. (Eprint Archives also have the option of depositing a text for internal use only, not accessible to the public, if/when this is judged useful.)

28. Affordability

"I worry about self-archiving because it will interfere with making toll-access more affordable."

The immediate purpose of self-archiving is to maximize research impact, not to make toll-access more affordable. Research impact has been unavoidably lost (by research and researchers) since the beginning of refereed research publication because of the high costs of providing paper access. The online medium now makes it possible for them to put an end to this cumulative impact loss. Of course, universally affordable toll-access would have the same effect (if it were truly universal -- i.e., if the universities of all potential users of all refereed research could afford to access it all). It would be splendid if journal publishers could provide universally affordable toll-access, and they are certainly encouraged to work toward doing so. But in the meanwhile, it is quite understandable that today's researchers prefer not to wait (for when and if universally affordable toll-access arrives). They will self-archive to maximize their research impact now (while they are still alive and compos mentis).

Some may think the competition to the toll-access version from the open-access version will keep toll-access less affordable; some may think it will have the opposite effect, encouraging cost-cutting and downsizing to the essentials, making it more affordable. If the price of the value-added toll-access version becomes affordable enough, and the demand for its added-value is sufficient to sustain the market, then it is demand for the open-access version that will shrink, and along with it the incentive to self-archive, for the universal affordability will make any further impact loss negligible.

That is not where we are right now, however, and researchers would be rather foolish to wait patiently to see how things may or may not eventually turn out if they were to continue to renounce their potential daily impact even today, when it is no longer necessary.

29. Sitting Pretty

"I don't worry about self-archiving because there is really no problem: My institution gives me all the access and impact I want or need already. I'm satisfied!"

If a researcher -- especially a researcher at a well-off institution -- does not exercise some critical reflection, the natural feeling is: "Where's the problem? I and others at my institution were already well-off in paper days. Now, in the online era we are even better off, with desktop online access to everything, instead of having to walk to the library, and with licensing 'big deals' that get us even more journals than we used to have!"

This is related in part to the "Harvards vs. Have-Nots" misconception. http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/3177.html It is also a symptom of not having understood the causal connection between access and impact.
Yes, the better-off institutions enjoy better access to the peer-reviewed journal literature than the less-well-off institutions (and better access than they had in paper-days). But no institution can afford toll-access to all or even most of the 24,000 peer-reviewed journals that exist. And most institutions can afford toll-access to only a small and shrinking portion of them: http://www.arl.org/stats/index. html . And even the Harvards (not only the Have-Nots) are groaning under their growing serials-budget expenditures. So no researcher, at any institution, has access to more than a fraction of what there is. And usage patterns in those lucky fields where open online access is most advanced show that when everything is accessible and a keystroke is the only barrier, users make vastly more use of the literature. http://cfa-www.harvard.edu/~kurtz/jasis-abstract.html

So much for access. The other side of the coin is even more important: Researchers at prestigious institutions will also say that they only write for one another. But they don't really mean it. All researchers are interested in their research impact (citation counts), not only because that is one of the things that advances their careers and funds their further research, but also it is a measure of the size and importance of their contribution to knowledge. Few researcher are aware -- because the data on the strong causal connection between access and impact are new and still being gathered -- of the size of their own and their institution's cumulative daily, weekly, monthly, and yearly impact-loss owing to access-denial to those would-be users world-wide whose institutions cannot afford the toll-access to their work. http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0025.gif

In this equation, the Harvards are losing almost as much as the Have-Nots, because they are losing the potential impact from the users at the Have-Not institutions, which vastly outnumber the Harvards! Yes, the Harvards may be somewhat better off in their own access to the research output of others; but the following is just as true of them as it is of the Have-Nots: For every one of the 2,500,000 articles published annually in the 24,000 research journals it is a fact that it is not accessible to most of its potential users because of unaffordable toll-barriers. And (this too is critical): this would remain true even if all 24,000 journals were sold at cost.

It remains only to point out to the researchers who think they are sitting pretty today exactly how big their cumulating daily, weekly, monthly, and yearly impact loss is, as long as they delay making their research output open-access by self-archiving it. Estimates like the Kurtz study above show that this needless impact loss is substantial in terms of download impact. According to the most widely cited study of citation impact it is 336%. http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0006.gif

30. Rechanneling toll-savings

"I worry about self-archiving because if the toll system collapses, there will be no way for my institution to rechannel its library toll-savings from buying in the peer-reviewed research output of other institutions to paying journals for the peer-reviewing of our own institutional research output."

Where there is a will, there is a way. Necessity is the Mother of Invention.

31. Waiting for Gold

"I worry about self-archiving because open-access journals are the only stable solution."

There are two roads to open access to the 2,500,000 yearly articles appearing yearly in the planet's 24,000 peer-reviewed journals. The "golden road" is to create or convert 24,000 open access journals. The "green road" is for authors to self-archive the articles they publish in the toll-access journals. The golden road awaits the creation or conversion of 23,000 open-access journals, one by one. (There are about 1000 open-access journals to date, since 1991 http://www.doaj.org/). The green road awaits only self-archiving. The roads are both worth taking and complementary, but the golden road is long, slow, and uncertain, whereas the green road is short, fast, and already proven (already providing three times as much open access yearly, and stably ongoing since 1991). The optimal open-access strategy is hence a dual one:
(1: GOLD) Publish your articles in an open-access journal whenever a suitable one exists today (currently 1000, <5%)
and
(2: GREEN) Publish the rest of your articles in the toll-access journal of your choice (currently 23,000, >95%) and also self-archive them in your institutional open-access eprint archives.
It has to be clearly understood that (1) the library's serials crisis and its attendant journal pricing/affordability problem is not the same as (2) the researchers' article access/impact problem. OA self-archiving (green) solves (1) but not (2). If and only if the stable end-game for journal publishing in the online age is indeed destined to be gold (and no one today knows whether or not that is so), then the green road is also the fastest and surest way to get us there. See Publishers' future (17). But there is no certainty that gold is indeed the destined end-game, publishers know this (that is why 92% of journals are already green), and hence it is irrational for librarians or researchers to resist or delay self-archiving -- with its certain capacity to generate 100% OA once it is mandated, leading to its certain benefits for research -- for the reason that it is merely green and not gold, and does not necessarily lead to gold!

32. Poisoned Apple

"I worry about self-archiving even if the journal gives me the green light to do so, because if I do, the light may change to red."

Over ninety percent of journals have already given their green light to author self-archiving for at least six reasons. Here are those six reasons, in approximate order of priority:
(1) OA is Optimal and Inevitable for Research and Researchers. Open Access (OA) is clearly on the way. Its benefits to research and researchers -- in terms of enhanced research usage and impact -- are demonstrated and undeniable. Its progress is unstoppable. Going green is a natural way for research journal publishers to show support for OA and confirm that they are not in conflict with what is in the best interests of research and researchers. Opposing OA today is becoming increasingly bad public relations for journal publishers.
(2) Green is a Hedge Against Gold. At the same time, the risks of converting to OA journal publishing ("gold") are still considerable: There are still uncertainties about who will pay and with what, and how much it should cost and for what. The OA cost-recovery model has not yet been tested long, and only by about 5% of journals. Hence going green is a rational hedge against pressure to go gold: "If authors want OA so badly, let them show it by providing it for themselves, with our green light and blessing, rather than pressuring us to make all the sacrifices, and take all the risk upon ourselves."
(3) The Risk of Going Green is Low: There are physics journals that have been effectively green since 1991, and some of their contents have been 100% OA through self-archiving for years now, yet their subscription revenues have not eroded. The American Physical Society (APS) was the first green publisher. One physics journal (JHEP), born gold (subsidised), even converted back to green, successfully, by migrating to a green publisher (IOP) .
(4) If/When It Ever Came To That, Green Would Allow Publishers a Gradual Leveraged Transition to Gold. OA growth by author self-archiving is gradual and anarchic, article by article, rather than sudden and all-or-none, journal by journal. It gives journal publishers time to adapt to OA. If and when there should ever be a transition to gold, a prior green preparatory phase will allow this to be a stable leveraged transition rather than an abrupt and catastrophic one. (Equally important, the very same user-institution subscription/license cancellations that would drive the transition to gold, if/when they occurred, would at the same time generate the annual author-institution windfall savings that would then cover the institutional costs for author-institution-end (outgoing) payment for publication in place of the current user-institution-end (incoming) payment for subscription.)
(5) OA Enhances Journal Impact Too. Enhanced impact not only benefits reasearch, as well as authors and their careers, but it benefits journals too, as the journal impact factor (which helps sell journals) is the average of its articles' individual impacts.
(6) Research Institutions and Funders Will Soon Be Mandating Self-Archiving. The US House Appropriations Committee and the UK Parliament Science and Technology Committee have both recommended that self-archiving be mandatory for funded research. As this mandate is implemented, it will produce pressure for journals to go green or risk losing their authors. It accordingly makes more sense to anticipate this mandate by going green now.

In the light of these 6 reasons for publishers to go green now, it is hard to imagine why anyone would dream that authors taking publishers up on their green light today, by going ahead and self-archiving -- thereby generating still more OA, and still more demand for and reliance upon OA -- would make it easier rather than harder for any journal not to be green than it is today, when 92% are already green. On the contrary, authors failing to go ahead and self-archive even now that the publisher's light is green would give opponents of OA strong grounds for arguing that the research community does not need or want OA as much as it purports to after all, and hence that there is no real call for either green or gold!

See also: the Los Alamos Lemma and Zeno's Paralysis

	.eprints.org sites at Southampton serving Open Archives