Stevan Harnad, Canada
in Cognitive Science, Université du Québec à
Montréal, Canada, firstname.lastname@example.org
Tim Brody, Doctoral Candidate in Computer Science, University of Southampton, email@example.com
François Vallières, Computer Analyst, Observatoire des sciences et technologies (OST), Centre interuniversitaire de recherche sur la science et la technologie (CIRST), Université du Québec à Montréal, Canada, firstname.lastname@example.org
Les Carr, Senior Lecturer, University of Southampton, email@example.com
Steve Hitchcock, Postdoctoral Fellow, University of Southampton, S.Hitchcock@ecs.soton.ac.uk
Yves Gingras, Professor, Université du Québec à Montréal, Canada, firstname.lastname@example.org
Charles Oppenheim, Professor, University of Loughborough, C.Oppenheim@lboro.ac.uk
Heinrich Stamerjohanns, Postdoctoral Fellow, University of Oldenburg, email@example.com
Eberhard R. Hilf, Professor, University of Oldenburg, firstname.lastname@example.org
Abstract. The research access/impact problem arises because journal articles are not accessible to all of their would-be users, hence they are losing potential research impact. The solution is to make all articles Open Access (OA, i.e., accessible online, free for all). OA articles have significantly higher citation impact than non-OA articles. There are two roads to OA: the "golden" road (publish your article in an OA journal) and the "green" road (publish your article in a non-OA journal but also self-archive it in an OA archive). Only 5% of journals are gold, but over 90% are already green (i.e., they have given their authors the green light to self-archive); yet only about 10-20% of articles have been self-archived. To reach 100% OA, self-archiving needs to be mandated by researchers' employers and funders, as the UK and US have recently recommended, and universities need to implement that mandate.
The research journal-affordability problem and the resulting university libraries' journal budget crisis were what first brought the research article-access/impact problem to light, but the journal-affordability problem and the article-access/impact problem are not the same. According to Ulrichsweb ( http://www.ulrichsweb.com/ulrichsweb/analysis/ ),about 24,000 peer-reviewed research journals exist worldwide, across all disciplines and languages, publishing about 2.5 million articles per year. But because journal prices keep rising and library budgets are limited, each university can afford only a small portion of that total. This means their users have access to only a fraction of those articles, even though, in the online age, we would have expected otherwise. This is the research journal-affordability problem.
What the journal-affordability problem unmasked was a further problem: As a consequence of the fact that most of their would-be users at most universities cannot access most of the 2.5 million articles published yearly (because their universities cannot afford the journal access-tolls), much of the potential research impact of those inaccessible articles is being lost. An article's research impact is the degree to which its findings are read, used, applied, built-upon and cited by users in their own further research and applications. Research impact is a measure of the progress and productivity of research. That is the reason why researchers' careers (their salaries, promotions, tenure, funding, prestige, prizes) depend on their impact; it is also why their universities (which co-benefit from the research funding, progress and prestige) as well as their research funding agencies (which are answerable for the way they spend tax-payers' money) reward research impact.
Merely to do the research and then put your findings in a desk-drawer is no better than not doing the research at all. Researchers must submit their research to peer review (Harnad 1998) and then "publish or perish," so others can use and apply their findings. But getting findings peer-reviewed and published is not enough either: Other researchers must find the findings useful, as proved by their actually using and citing them. And to be able to use and cite them, they must first be able to access them. That is the research article access/impact problem.
To see that the journal affordability problem and the article access/impact problem are not the same one need only note that even if all 24,000 peer-reviewed research journals were sold to universities at cost -- i.e. with not a penny of profit -- it would still be true that almost no university has anywhere near enough money to afford all or even most of the 24,000 journals, even at minimal access-tolls: http://fisher.lib.virginia.edu/cgi-local/arlbin/arl.cgi?task=setuprank . Hence it would remain true even then that not all would-be users could access all of the yearly 2.5 million articles, and hence that that potential research impact would continue to be lost.
So although the two problems are connected (lower journal prices would indeed generate somewhat more access), solving the journal affordability problem does not solve the research access/impact problem.
How big is the access/impact problem? Estimates are emerging, and their conistency and size are quite astounding. Lawrence (2001) reported that in computer science the citation impact of conference articles whose full texts are accessible online toll-free -- let us call that "Open Access" (OA), in line with the definition provided in 2001 by the Budapest Open Access Initiative: http://www.soros.org/openaccess/read.shtml -- is 336% higher than the impact of non-OA articles. Kurtz et al. (2004a, 2004b) have reported similar effects in astrophysics, and Odlyzko (2002) in mathematics.
We are charting this OA-impact advantage across all disciplines as well as across time in a study using a 12-year sample of 14 million articles from the Institute for Scientific Information (ISI) database. We are comparing the matched citation counts of OA versus non-OA articles by trawling the Web to find which of the 14 million articles within the same journal and year are and are not OA. Some results are already available for the physics/mathematics subset, effect size and direction comparable to what Lawrence reported (Figure 1, Harnad & Brody 2004)
Figure 1. Open Access (OA) vs. Non-Open Access (non-OA) Citation Impact Comparisons for All (Physics/Mathematics) Fields. OA articles are those that are self-archived in http://arxiv.org/. Gray curve is OA + non-OA = “Total Articles” per year (scale on right). Lower set of deviations (green; all positive) is “OAP,” the proportion OA/(OA + non-OA) of articles that have been made OA, by year. Upper set of deviations (red; all positive except 2004) is “OAA,” the OA/non-OA citation advantage, per year, relative to an even ratio of 1/1 (100%) in the number of citations to articles appearing in the same journal and year (scale on left). Leftmost value for each set of deviations is the 1992-2004 average; rightmost value is the 2001-2003 average. Correlations also show OAP grows by year and perhaps a small positive relation between OAA and year and between OAA and OAP. Further details in: http://citebase.eprints.org/isi_study/
How did some of the articles in those non-OA journals become OA? Because their authors "self-archived" them on the Web (i.e., made them accessible online toll-free for all would-be users): http://www.eprints.org/self-faq/. Physicists have been self-archiving in growing numbers since 1991, in a central archive called Arxiv (http://arxiv.org/show_monthly_submissions ), as have computer scientists on their own Web sites, which are then harvested by Citeseer: http://citeseer.ist.psu.edu/cis.
But the self-archiving method with the greatest potential to provide OA is self-archiving in one's own university's OAI-compliant Eprint Archives (http://software.eprints.org/handbook/). There are already over one hundred such institutional archives worldwide (http://archives.eprints.org/index.php?action=browse#type) and they are growing rapidly (but not yet rapidly enough: see Figure 2)
Figure 2. Growth of Institutional Archives and Contents. Displays a graph of all archives that have been flagged as 'Research Institutional'. The date-stamps of records as exported by the archive's OAI-PMH interface is used to plot a cumulative graph of records over time. The date of the earliest OAI-PMH record is used to show the number of cumulative archives over time (green, scale right). The number of metadata records exported by an archive may not reflect the number of full-text, publicly accessible documents (red, scale left).
OAI-compliance means using the Open Archive Initiative's metadata-tagging protocol to tag the critical information (author, title, date, etc.) in a uniform way (http://www.openarchives.org/OAI/openarchivesprotocol.html). OAI-compliance makes those many distributed archives "interoperable", so that they can all be harvested by cross-archive harvesters such as OAIster (http://oaister.umdl.umich.edu/o/oaister/) into a single, global seamlessly-searchable virtual OA archive
This global OA archive can then be enhanced with a "google" for the research literature such as Citebase (http://citebase.eprints.org/), which counts citations instead of links and can rank articles by either the citation impact or the "usage impact" (downloads) for the article or the author (Brody & Harnad 2004; Hitchcock et al. 2003). Early-days measures like the citebase download/citation correlator (http://citebase.eprints.org/analysis/correlation.php) can even predict eventual citations two years later from the number of downloads today (See Figure 3 for an area of physics in which the correlation between downloads and citations is about 0.4.).
Figure 3. The Download/Citation Cycle Across Time. In most areas of physics the correlation between downloads and citations is between 0.3 and 0.4 ( Brody & Harnad 2004, in prep.). These graphs show the time-course of downloads (smaller left box) and citations (larger right box) that would be included in calculating the correlation for two papers, where downloads were included up to 4 months after deposit and citations up to 2 years. The effect is cyclic, downloads generating citations and citations generating further downloads.
Such performance indicators and predictors can be included in standardized university OAI CVs (http://paracite.eprints.org/cgi-bin/rae_front.cgi) and then harvested by research assessors and evaluators to chart the progress and direction of research as well as to help make decisions on promotion and funding (Smith & Eysenck 2001; Harnad et al. 2003):
There is now evidence that as many as 39% of authors may already be providing OA for at least one of their articles by one or the other of the three means of self-archiving (arbitrary Web sites, central disciplinary archives, distributed university archives) (Swan & Brown 2004): This 39% now needs to be systematically increased to 100%, for all articles, and the institutional self-archiving route is the most promising way to achieve that because universities and their researchers share in the benefits of maximizing research impact and share in the costs of lost impact.
All signs are favorable: There has been a great increase in OA consciousness in the past year, with many Declarations and Statements in support of OA worldwide such as:
WSIS Declaration: http://www.itu.int/wsis/documents/doc_multi-en-1161|1160.asp
Bethesda Statement: http://www.earlham.edu/~peters/fos/bethesda.htm
Budapest Open Access Initiative: http://www.soros.org/openaccess/view.cfm
Public Library of Science: http://www.plos.org/about/history.html
Wellcome Trust Statement: http://www.wellcome.ac.uk/en/1/awtvispolpub.html
IFLA Statement: http://www.ifla.org/V/cdoc/open-access04.html
In response to the research community's expressed desire for OA, the latest JISC/Romeo survey of over 8,000 journals indicates that over 90% are already "green," that is, they have given their official green light to author self-archiving (http://romeo.eprints.org/stats.php) (Cox & Cox 2003) .
About 1200 journals (approaching 5%) are even "gold," that is, they are OA journals, making all their own contents OA: http://www.doaj.org/. To cover their costs, however, many of these gold journals have had to adopt the OA journal cost-recovery model (Harnad 1995): Instead of the user-institution paying the journal access-tolls for incoming articles, the author-institution pays the journal peer-review and publication costs per outgoing article.
Currently the riskiness and untestedness of this gold journal cost-recovery model makes publishers more willing to go green rather than gold in response to the research community's demand for OA. Publishers note that physics journals have been green since 1991, and yet there still has not been any cancellation pressure. Universities that can afford to pay for the official non-OA version do so. Users at universities that cannot afford the non-OA version use the authors' self-archived OA versions. One prominent “born-gold” journal -- Journal of High Energy Physics (http://www.iop.org/EJ/journal/1126-6708) has even successfully made the transition backwards from gold to green in order to make ends meet after a few years of being toll-free. Yet its contents remain 100% OA because 100% of its authors self-archive them.
Publishers have done their part in response to the research community's demand for OA by giving their green light to author-institution self-archiving. It is now time for more of the research community to take them up on it. It is not enough to sit and wait for all 24,000 journals to convert to gold (http://www.eprints.org/self-faq/#31.Waiting ). And it certainly isn't fair for researchers to demand that publishers make all the sacrifices and take all the risk upon themselves while the research community does not bother to take the risk-free step of providing OA (which they purport to want and need so much) for their own articles -- by simply self-archiving them.
The research community is ready at last to do update its existing “publish or perish” mandate to require also providing Open Access to the articles it publishes in the online era. The UK Parliament Science and Technology Committee (http://www.publications.parliament.uk/pa/cm200304/cmselect/cmsctech/399/39903.htm) has recommended (and the US House House of Representatives http://thomas.loc.gov/cgi-bin/cpquery/?&db_id=cp108&r_n=hr636.108&sel=TOC_338641& has already voted in favor of) legislation to the effect that as one of the conditions for receiving research funding it should be mandatory for the fundee not merely to publish but also to self-archive all the articles resulting from the funded research.
In an author survey, Swan & Brown (2004a, 2004b) report that the vast majority of their author sample indicated that they would self-archive willingly if their employer (or funding body) required them to do so! Hence, universities and research-funders are in the best position to usher in the OA era by adopting and implementing their own institutional OA provision policies (http://www.eprints.org/signup/sign.php).
More than 100 universities worldwide (http://archives.eprints.org/eprints.php?page=all) already have OA Eprint Archives. The adoption of official university OA self-archiving policies will help to maximize the number of such archives, as well as the number of articles in them -- by encouraging the 39% of authors who already self-archive (Swan & Brown 2004a) to deposit their articles in their own university's OA Eprint Archive and by encouraging those authors who do not yet self-archive to start doing so for the sake of the enhanced impact the citation studies have shown OA will generate (Harnad & Brody 2004).
Along with the substantial recent rise in OA consciousness worldwide there has also been an unfortunate tendency to equate OA exclusively with OA journal publishing (i.e., the golden road to OA) and to overlook the faster, surer and already more heavily travelled green road of OA self-archiving. This oversight is probably a spin-off of conflating the journal-affordability problem with the access/impact problem. Let us hope that the mounting evidence of the powerful impact-generating effects of OA, plus incentives from their employers and funders, will at last induce the 61% of authors who have not yet done so to take to the green road so that we can all enjoy the benefits of 100% OA at last.
Brody, T. & Harnad, S. (2004, in prep.) Earlier Web Usage Statistics as Predictors of Later Citation Impact. http://www.ecs.soton.ac.uk/~harnad/Temp/timcorr.doc
Cox, J. & Cox, L. (2003) Scholarly Publishing Practice: The ALPSP report on academic publishers' policies and practices in online publishing. Association of Learned and Professional Society Publishers. http://www.alpsp.org/2004pdfs/SFpub210104.pdf
Harnad, S. (1995) Electronic Scholarly Publication: Quo Vadis? Serials Review 21(1) 70-72 (Reprinted in Managing Information 2(3) 1995) http://cogprints.ecs.soton.ac.uk/archive/00001691/00/harnad95.quo.vadis.html
Harnad, Stevan (1998)
hand of peer review. Nature
[online] (5 Nov. 1998)
Longer version in Exploit Interactive 5 (2000): http://www.exploit-lib.org/issue5/peer-review/
Harnad, S. & Brody, T. (2004) Comparing the Impact of Open Access (OA) vs. Non-OA Articles in the Same Journals, D-Lib Magazine 10 (6) June http://www.dlib.org/dlib/june04/harnad/06harnad.html
Harnad, S., Carr, L., Brody, T. & Oppenheim, C. (2003) Mandated online RAE CVs Linked to University Eprint Archives: Improving the UK Research Assessment Exercise whilst making it cheaper and easier. Ariadne 35 (April 2003). http://www.ariadne.ac.uk/issue35/harnad/
Hitchcock, S., Woukeu, A., Brody, T., Carr, L., Hall, W., and Harnad, S. (2003) Evaluating Citebase, an open access Web-based citation-ranked search and impact discovery service. http://opcit.eprints.org/evaluation/Citebase-evaluation/evaluation-report.html
Kurtz, Michael J.; Eichhorn, Guenther; Accomazzi, Alberto; Grant, Carolyn S.; Demleitner, Markus; Murray, Stephen S.; Martimbeau, Nathalie; Elwell, Barbara. (2004a) Worldwide Use and Impact of the NASA Astrophysics Data System Digital Library. Journal of the American Society for Information Science and Technology 55. http://cfa-www.harvard.edu/~kurtz/jasist1.pdfhttp://cfa-www.harvard.edu/~kurtz/jasist1.pdf
Kurtz, Michael J.; Eichhorn, Guenther; Accomazzi, Alberto; Grant, Carolyn S.; Demleitner, Markus; Murray, Stephen S.; Martimbeau, Nathalie ; Elwell, Barbara (2004b) The Bibliometric Properties of Article Readership Information. Journal of the American Society for Information Science and Technology 55. http://cfa-www.harvard.edu/~kurtz/jasist2.pdf
Lawrence, S. (2001) Online or Invisible? Nature 411 (6837): 521. http://www.neci.nec.com/~lawrence/papers/online-nature01/
Odlyzko, A.M. (2002) The rapid evolution of scholarly communication." Learned Publishing 15: 7-19 http://www.catchword.com/alpsp/09531513/v15n1/contp1-1.htm
Smith, A. & Eysenck, M. (2002) The correlation between RAE ratings and citation counts in psychology. Technical Report, Psychology, University of London, Royal Holloway. http://psyserver.pc.rhbnc.ac.uk/citations.pdf
Swan, A. & Brown, S.N. (2004a) JISC/OSI Journal Authors Survey Report. http://www.jisc.ac.uk/uploaded_documents/JISCOAreport1.pdf http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/3628.html
 A preliminary version of part of this article appeared as: Harnad, S., Brody, T., Vallieres, F., Carr, L., Hitchcock, S., Gingras, Y, Oppenheim, C., Stamerjohanns, H., & Hilf, E. (2004) “The green and the gold roads to Open Access.” Nature Web Focus. http://www.nature.com/nature/focus/accessdebate/21.html