SUMMARY: Arxiv is a Central Repository (CR) in which physicists have been self-archiving their unrefereed preprints and their peer-reviewed postprints since 1991. There is now a growing movement toward distributed Institutional Repositories (IRs). Thanks to the OAI Protocol, all OAI-compliant IRs and CRs are now interoperable: their metadata can be harvested into search engines that treat all of their contents as if they were in one big virtual CR. What authors self-archive is their peer-reviewed publications, not just their unrefereed preprints. An archive is merely a repository, not a certifier of having met a peer-reviewed journal's quality standards.
Since the research institutions themselves are the primary research providers, with the direct interest in maximising the uptake and usage of their own research output, the natural place for them to deposit their own output is in their own IRs. Any central collections can be harvested via OAI. Institutions are also best placed to monitor and reward compliance with self-archiving mandates, both their own institutional mandates and those of the funders of their institutional research output. Arxiv has played an important role in getting us where we are, but it is likely that the era of CRs is coming to a close, and the era of distributed, interoperable IRs is now coming into its own in an entirely natural way, in keeping with the distributed nature of the Net/Web itself.
Ginsparg, Paul (2006) As We May Read. The Journal of Neuroscience, September 20, 2006, 26(38): 9606-9608 doi:10.1523/JNEUROSCI.3161-06.2006Arxiv is a Central Repository (CR) in which physicists (mostly, and many mathematicians, and some computer scientists) have been self-archiving their unrefereed preprints and their peer-reviewed postprints since 1991. It is important to keep in mind that researchers self-archive preprints as well as postprints, because it makes a big difference whether one extrapolates from Arxiv as a preprint CR or a postprint CR, as we shall see below.
"[A]rticles are deposited [in Arxiv] by researchers when they choose (either before, simultaneous with, or after peer review), and the articles are immediately available to researchers throughout the world."
"As a pure dissemination system, [Arxiv] operates at a factor of 100-1000 times lower [1.0% - 0.1%] in cost than a conventionally peer-reviewed system (Ginsparg, 2001)."This is true, but it is tantamount to saying that as a pure dissemination system, photocopying the articles published in journals operates at a fraction of the cost of publishing a journal: A fraction, but a parasitic fraction, for without the journal, there would be nothing to either photocopy or distribute in Arxiv.
"with many of the production tasks automatable or off-loadable to the authors, the editorial costs will then dominate the costs of an unreviewed distribution system by many orders of magnitude."Translation: Online dissemination of unrefereed preprints alone costs a lot less than peer-reviewed publication. True, but what follows from that? Peer-reviewed publication costs a lot more than photo-copying too, but what authors photocopy and distribute is their peer-reviewed publications, not just their unrefereed preprints.
"Although the most recently submitted articles have not yet necessarily undergone formal review, the vast majority of the articles can, would, or do eventually satisfy editorial requirements somewhere.... [Arxiv's moderated] submissions are at least 'of refereeable quality'."Every paper is first an unrefereed preprint -- and then, eventually, most are revised into peer-reviewed, accepted articles (postprints). Hence if preprints are deposited in Arxiv at all, it stands to reason that Arxiv's most recently deposited (sic) papers (sic) have not yet undergone peer review. Tune in a year later, and they will have been, with the revised postprint now also deposited.
"[P]roposed modifications of the peer review include a two-tier system (for more details, see Ginsparg, 2002), in which, on a first pass, only some cursory examination or other pro forma certification is given for acceptance into a standard tier. At some later point, a much smaller set of articles would be selected for more extensive evaluation."This is a speculative hypothesis. It is no doubt being tested to see whether it works, whether it delivers results of quality and useability comparable to standard peer review, whether it is cost-effective, and whether it can replace journals. But as it stands, the hypothesis alone does not tell us whether and how well it will work; Arxiv is certainly not evidence for the validity of this hypothesis, since virtually all papers in Arxiv still undergo standard peer review. Arxiv is merely a CR that provides Open Access (OA) to both the preprints and the postprints.
"using standard search engines, more than one-third of the high-impact journal articles in a sample of biological/medical journals published in 2003 were found at nonjournal Web sites (Wren, 2005)."This is very interesting. This is the higher end of a self-archiving rate that we have found to range between about 5% and 25% across disciplines. Physics is of course even higher (mostly because of Arxiv) and computer science higher still (see Citeseer, a google-style harvester of distributed locally deposited papers).
"at least 75% of the publications listed [in neuroscience] were freely available either via direct links from the above Web page or via a straightforward Web search for the article title."This is even more interesting. It means that in such fields the majority of the articles -- note that we are almost certainly not talking about unrefereed preprints here but about peer-reviewed postprints -- are being self-archived already, so the only thing that remains to be done is to deposit (or harvest) them into the author's own OAI-compliant IR rather than a random website, to maximise visibility, harvestability, and impact.
"The enormously powerful sorts of data mining and number crunching that are already taken for granted as applied to the open-access genomics databases can be applied to the full text"Indeed. And semantic and scientometric analyses too (though article texts are not quite the same thing as the research data on which the articles are based, hence the analogy with the genomics data base may be a bit misleading).
"it is likely that more research communities will join some form of global unified archive system without the current partitioning and access restrictions familiar from the paper medium"What makes it most likely is the self-archiving mandates proposed or already adopted the world over (e.g., RCUK, Wellcome Trust, FRPAA, EC, plus individual institutional self-archiving mandates: CERN, Southampton, QUT, Minho).