The Access/Impact Problem
and the Green and Gold Roads to Open Access
The research access/impact problem arises because journal articles
are not accessible to all of their would-be users; hence, they are
losing potential research impact. The solution is to make all articles
Open Access (OA; i.e., accessible online, free for all). OA articles
have significantly higher citation impact than non-OA articles. There
are two roads to OA: the "golden" road (publish your article in an OA
journal) and the "green" road (publish your article in a non-OA journal
but also self-archive it in an OA archive). Only 5% of journals are
gold, but over 90% are already green (i.e., they have given their
authors the green light to self-archive); yet only about 10-20% of
articles have been self-archived. To reach 100% OA, self-archiving
needs to be mandated by researchers' employers and funders, as the
United Kingdom and the United States have recently recommended, and
universities need to implement that mandate.
The research journal-affordability problem and the resulting university libraries' journal budget crisis were what first brought the research article-access/impact problem to light, but the journal-affordability problem and the article-access/impact problem are not the same. According to Ulrichsweb (http://www.ulrichsweb.com/ulrichsweb/analysis/), about 24,000 peer-reviewed research journals exist worldwide, across all disciplines and languages, publishing about 2.5 million articles per year. But because journal prices keep rising and library budgets are limited, each university can afford only a small portion of that total. This means their users have access to only a fraction of those articles, even though, in the online age, we would have expected otherwise. This is the research journal-affordability problem.
What the journal-affordability problem unmasked was a further problem: As a consequence of the fact that most of their would-be users at most universities cannot access most of the 2.5 million articles published yearly (because their universities cannot afford the journal access-tolls), much of the potential research impact of those inaccessible articles is being lost. An article's research impact is the degree to which its findings are read, used, applied, built-upon, and cited by users in their own further research and applications. Research impact is a measure of the progress and productivity of research. That is the reason why researchers' careers (their salaries, promotions, tenure, funding, prestige, and prizes) depend on their impact; it is also why their universities (which cobenefit from the research funding, progress, and prestige) as well as their research funding agencies (which are answerable for the way they spend taxpayers' money) reward research impact.
Merely to do the research and then put your findings in a desk drawer is no better than not doing the research at all. Researchers must submit their research to peer review1 and then "publish or perish," so others can use and apply their findings. But getting findings peer-reviewed and published is not enough either. Other researchers must find the findings useful, as proved by their actually using and citing them. And to be able to use and cite them, they must first be able to access them. That is the research article access/impact problem.
To see that the journal-affordability problem and the article access/impact problem are not the same one need only note that even if all 24,000 peer-reviewed research journals were sold to universities at cost (i.e., with not a penny of profit) it would still be true that almost no university has anywhere near enough money to afford all or even most of the 24,000 journals, even at minimal access-tolls (http://fisher.lib.virginia.edu/cgi-local/arlbin/arl.cgi?task=setuprank). Hence, it would remain true even then that not all would-be users could access all of the yearly 2.5 million articles, and hence that that potential research impact would continue to be lost.
So although the two problems are connected (lower journal prices would indeed generate somewhat more access), solving the journal-affordability problem does not solve the research access/impact problem.
How big is the access/impact problem? Estimates are emerging, and their consistency and size are quite astounding. Lawrence2 reported that in computer science the citation impact of conference articles whose full texts are accessible online toll-free -- let us call that "Open Access" (OA), in line with the definition provided in 2001 by the Budapest Open Access Initiative: http://www.soros.org/openaccess/read.shtml -- is 336% higher than the impact of non-OA articles. Kurtz et al.3. and 4. have reported similar effects in astrophysics, and Odlyzko5 in mathematics.
We are charting this OA-impact advantage across all disciplines as well as across time in a study using a 12-year sample of 14 million articles from the Institute for Scientific Information (ISI) database. We are comparing the matched citation counts of OA versus non-OA articles by trawling the Web to find which of the 14 million articles within the same journal and year are and are not OA. Some results are already available for the physics/mathematics subset, with effect size and direction comparable to what Lawrence2 reported (Figure 1).6
Figure 1. Open Access (OA) vs. Non-Open Access (non-OA) Citation Impact Comparisons for All (Physics/Mathematics) Fields. OA articles are those that are self-archived in http://arxiv.org/. Gray curve is OA + non-OA = "Total Articles" per year (scale on right). Lower set of deviations (all positive) is "OAP," the proportion OA/(OA + non-OA) of articles that have been made OA, by year. Upper set of deviations (all positive) is "OAA," the OA/non-OA citation advantage, per year, relative to an even ratio of 1/1 (100%) in the number of citations to articles appearing in the same journal and year (scale on left). Leftmost value for each set of deviations is the 1992-2003 average; rightmost value is the 2001-2003 average. Correlations also show OAP grows by year and perhaps a small positive relation between OAA and year and between OAA and OAP. Further details in http://citebase.eprints.org/isi_study/.
How did some of the articles in those non-OA journals become OA? Because their authors "self-archived" them on the Web (i.e., made them accessible online toll-free for all would-be users): http://www.eprints.org/self-faq/. Physicists have been self-archiving in growing numbers since 1991, in a central archive called Arxiv (http://arxiv.org/show_monthly_submissions), as have computer scientists on their own Web sites, which are then harvested by Citeseer: http://citeseer.ist.psu.edu/cis.
But the self-archiving method with the greatest potential to provide OA is self-archiving in the author's own university's OAI-compliant Eprint Archives (http://software.eprints.org/handbook/). There are already over one hundred such institutional archives worldwide (http://archives.eprints.org/index.php?action=browse#type) and they are growing rapidly (but not yet rapidly enough; see Figure 2).
Figure 2. Growth of Institutional Archives and Contents. Displays a graph of all archives that have been flagged as "Research Institutional." The date-stamps of records as exported by the archive's OAI-PMH interface are used to plot a cumulative graph of records over time. The date of the earliest OAI-PMH record is used to show the number of cumulative archives over time (lower, more abruptly inflected curve, scale right). The number of metadata records exported by an archive may not reflect the number of full-text, publicly accessible documents (upper, longer linear phase curve, scale left).
OAI compliance means using the Open Archive Initiative's metadata-tagging protocol to tag the critical information (author, title, date, etc.) in a uniform way (http://www.openarchives.org/OAI/openarchivesprotocol.html). OAI-compliance makes those many distributed archives "interoperable," so that they can all be harvested by cross-archive harvesters such as OAIster (http://oaister.umdl.umich.edu/o/oaister/) into a single, global seamlessly searchable virtual OA archive. This global OA archive can then be enhanced with a "Google" for the research literature such as Citebase (http://citebase.eprints.org/), which counts citations instead of links and can rank articles by either the citation impact or the "usage impact" (downloads) for the article or the author7., 8. and 9.. Early days measures like the Citebase download/citation correlator (http://citebase.eprints.org/analysis/correlation.php) can even predict eventual citations 2 years later from the number of downloads today (see Figure 3 for the download/citation pattern across time in an area of physics for which the correlation between downloads and citations is about 0.4).
Figure 3. The download/citation cycle across time. In most areas of physics the correlation between downloads and citations is between 0.3 and 0.4.7. These graphs show the downloads (smaller left box) and citations (larger right box) that would be included in calculating the correlation for two papers, where downloads were included up to 4 months after deposit and citations up to 2 years. The effect is cyclic, downloads generating citations, and citations generating further downloads.
Such performance indicators and predictors can be included in standardized university OAI CVs (http://paracite.eprints.org/cgi-bin/rae_front.cgi) and then harvested by research fundors and evaluators to monitor self-archiving, to chart the progress and direction of research as well as to help make decisions on promotion and funding.10. and 11. There is now evidence that as many as 39% of authors may already be providing OA for at least one of their articles by one or the other of the three means of self-archiving (arbitrary Web sites, central disciplinary archives, distributed university archives).12. and 13. This 39% now needs to be systematically increased to 100%, for all articles, and the institutional self-archiving route is the most promising way to achieve that because universities and their researchers share in the benefits of maximizing research impact and share in the costs of lost impact.
All signs are favorable. There has been a great increase in OA consciousness in the past year, with many Declarations and Statements in support of OA worldwide such as:
Berlin Declaration: http://www.zim.mpg.de/openaccess-berlin/berlindeclaration.html.
WSIS Declaration: http://www.itu.int/wsis/documents/doc_multi-en-1161|1160.asp.
Bethesda Statement: http://www.earlham.edu/ peters/fos/bethesda.htm.
Budapest Open Access Initiative: http://www.soros.org/openaccess/view.cfm.
Public Library of Science: http://www.plos.org/about/history.html.
Wellcome Trust Statement: http://www.wellcome.ac.uk/en/1/awtvispolpub.html
IFLA Statement: http://www.ifla.org/V/cdoc/open-access04.html.
In response to the research community's expressed desire for OA, the latest Joint Information Systems Committee/Rights MEtadata for Open archiving (JISC/RoMEO) survey of over 8,000 journals indicates that over 90% are already "green," that is, they have given their official green light to author self-archiving (http://romeo.eprints.org/stats.php).14
About 1,200 journals (approaching 5% of the total 24,000) are even "gold," that is, they are OA journals, making all their own contents OA: http://www.doaj.org/. To cover their costs, however, many of these gold journals have had to adopt the OA journal cost-recovery model.15 Instead of the user-institution paying the journal access-tolls for incoming articles, the author-institution pays the journal peer-review and publication costs per outgoing article.
Currently, the riskiness and untestedness of this gold journal cost-recovery model make publishers more willing to go green rather than gold in response to the research community's demand for OA. Publishers note that physics journals have been green since 1991, and yet there still has not been any cancellation pressure. Universities that can afford to pay for the official non-OA version do so. Users at universities that cannot afford the non-OA version use the authors' self-archived OA versions. One prominent "born-gold" journal -- Journal of High Energy Physics (http://www.iop.org/EJ/journal/1126-6708) -- has even successfully made the transition backwards from gold to green in order to make ends meet after a few years of being toll-free. Yet its contents remain 100% OA because 100% of its authors self-archive them.
Publishers have done their part in response to the research community's demand for OA by giving their green light to author-institution self-archiving. It is now time for more of the research community to take them up on it. It is not enough to sit and wait for all 24,000 journals to convert to gold (http://www.eprints.org/self-faq/#31.Waiting). And it certainly is not fair for researchers to demand that publishers make all the sacrifices and take all the risk upon themselves while the research community does not bother to take the risk-free step of providing OA (which they purport to want and need so much) for their own articles -- by simply self-archiving them.
The research community is ready at last to update its existing "publish or perish" mandate to require also providing Open Access to the articles it publishes in the online era. The UK Parliament Science and Technology Committee (http://www.publications.parliament.uk/pa/cm200304/cmselect/cmsctech/399/39903.htm) has recommended (and the U.S. House of Representatives http://thomas.loc.gov/cgi-bin/cpquery/?&db_id=cp108&r_n=hr636.108&sel=TOC_338641& has already voted in favor of) legislation to the effect that as one of the conditions for receiving research funding it should be mandatory for the fundee not merely to publish but also to self-archive all the articles resulting from the funded research.
In an author survey, Swan and Brown12. and 13. report that the vast majority of their author sample indicated that they would self-archive willingly if their employer (or funding body) required them to do so! Hence, universities and research-funders are in the best position to usher in the OA era by adopting and implementing their own institutional OA provision policies (http://www.eprints.org/signup/sign.php).
More than 100 universities worldwide (http://archives.eprints.org/eprints.php?page=all) already have OA Eprint Archives. The adoption of official university OA self-archiving policies will help to maximize the number of such archives, as well as the number of articles in them -- by encouraging the 39% of authors who already self-archive12 to deposit their articles in their own university's OA Eprint Archive and by encouraging those authors who do not yet self-archive to start doing so for the sake of the enhanced impact the citation studies have shown OA will generate.6.
Along with the substantial recent rise in OA consciousness
worldwide, there has also been an unfortunate tendency to equate OA
exclusively with OA journal publishing (i.e., the golden road to OA)
and to overlook the faster, surer, and already more heavily traveled
green road of OA self-archiving. This oversight is probably a spin-off
of conflating the journal-affordability problem with the access/impact
problem. Let us hope that the mounting evidence of the powerful
impact-generating effects of OA, plus incentives from their employers
and funders, will at last induce the 61% of authors who have not yet
done so to take to the green road so that we can all enjoy the benefits
of 100% OA at last.
1. Stevan Harnad, The Invisible Hand of Peer Review, Nature Online. (5 November 1998), http://helix.nature.com/webmatters/invisible/invisible.html; longer version in Exploit Interactive 5 (2000), http://www.exploit-lib.org/issue5/peer-review/.
2. S. Lawrence, Online or Invisible?, Nature 411 (2001) (6837), p. 521 http://www.neci.nec.com/ lawrence/papers/online-nature01/.
3. Michael J. Kurtz,
Guenther Eichhorn, Alberto Accomazzi, Carolyn S. Grant, Markus
Demleitner, Stephen S. Murray, Nathalie Martimbeau and Barbara Elwell,
Worldwide Use and Impact of the NASA Astrophysics Data System Digital
Library, Journal of the American Society for Information Science
and Technology 55 (2004)
4. Michael J. Kurtz, Guenther Eichhorn, Alberto Accomazzi, Carolyn S. Grant, Markus Demleitner, Stephen S. Murray, Nathalie Martimbeau and Barbara Elwell, The Bibliometric Properties of Article Readership Information, Journal of the American Society for Information Science and Technology 55 (2004) http://cfa-www.harvard.edu/ kurtz/jasist2.pdf.
5. A.M. Odlyzko, The Rapid Evolution of Scholarly Communication, Learned Publishing 15 (2002), pp. 7-19 http://www.catchword.com/alpsp/09531513/v15n1/contp1-1.htm.
6. S. Harnad and T. Brody, Comparing the Impact of Open Access (OA) vs. Non-OA Articles in the Same Journals, D-Lib Magazine 10 (June 2004) (6) http://www.dlib.org/dlib/june04/harnad/06harnad.html.
7. T. Brody and S. Harnad, Earlier Web Usage Statistics as Predictors of Later Citation Impact (2004) (in preparation), http://www.ecs.soton.ac.uk/ harnad/Temp/timcorr.doc.
8. S. Hitchcock, A. Woukeu, T. Brody, L. Carr, W. Hall and S. Harnad, Evaluating Citebase, An Open Access Web-Based Citation-Ranked Search and Impact Discovery Service (2003) http://opcit.eprints.org/evaluation/Citebase-evaluation/evaluation-report.html.
10. A. Smith and M. Eysenck, The Correlation between RAE Ratings and Citation Counts in Psychology, Technical Report, Psychology, University of London, Royal Holloway (2002), http://psyserver.pc.rhbnc.ac.uk/citations.pdf.
11. S. Harnad, L. Carr, T. Brody and C. Oppenheim, Mandated Online RAE CVs Linked to University Eprint Archives: Improving the UK Research Assessment Exercise Whilst Making It Cheaper and Easier, Ariadne 35 (April 2003) http://www.ariadne.ac.uk/issue35/harnad/.
12. A. Swan and S.N.
Brown, JISC/OSI Journal Authors Survey Report (2004),
13. A. Swan and S.N. Brown, Authors and Open Access Publishing, Learned Publishing 17 (2004) (3), pp. 219-224 http://www.ingentaselect.com/rpsv/cw/alpsp/09531513/v17n3/s7/.
14. J. Cox and L. Cox, Scholarly Publishing Practice: The ALPSP Report on Academic Publishers' Policies and Practices in Online Publishing,, Association of Learned and Professional Society Publishers (2003) http://www.alpsp.org/2004pdfs/SFpub210104.pdf.
15. S. Harnad, Electronic Scholarly Publication: Quo Vadis?, Serials Review 21 (1995) (1), pp. 70-72 (Reprinted in Managing Information 2, no. 3, 1995), http://cogprints.ecs.soton.ac.uk/archive/00001691/00/harnad95.quo.vadis.html
A preliminary version of part of
this article appeared as Harnad, S., Brody, T., Vallieres, F., Carr,
L., Hitchcock, S., Gingras, Y, Oppenheim, C., Stamerjohanns, H., and
Hilf, E., The green and the gold roads to Open Access, Nature Web