Critique of EPS/RIN/RCUK/DTI "Evidence-Based Analysis of Data Concerning Scholarly Journal Publishing

Monday, October 9. 2006

Critique of EPS/RIN/RCUK/DTI "Evidence-Based Analysis of Data Concerning Scholarly Journal Publishing"

Stevan Harnad
American Scientist Open Access Forum

SUMMARY: This Report on UK Scholarly Journals was commissioned by RIN, RCUK and DTI, and conducted by ELS, but its questions, answers and interpretations are clearly far more concerned with the interests of the publishing lobby than with those of the research community.

The Report's two relevant overall findings are correct and stated very fairly in their summary form:
[1] "Overall, [self-archiving] of articles in open access repositories seems to be associated with both a larger number of citations, and earlier citations for the items deposited....The reasons for this [association] have not been clearly established - there are many factors that influence citation rates... Consistent longitudinal data over a period of years... would fill this gap."

[2] "There is no evidence as yet to demonstrate any relationship (or lack of relationship) between subscription cancellations and repositories... Proving or disproving a [causal] link between availability in self-archived repositories and cancellations will be difficult without long and rigorous research."
The obvious empirical and practical conclusion to draw from the findings -- that (1) all the self-archiving evidence to date is positive for research and that (2) none of the self-archiving evidence to date is negative for publishing -- would have been that the research community should now apply and extend these findings -- by applying and extending self-archiving (through self-archiving mandates) to all UK research output, along with consistent, rigorous longtitudinal studies over a period of years, to test (1) whether the positive effect on citations continues to be present (and why) and (2) whether the negative effect on subscriptions continues to be absent.

But instead, the two overall findings are hedged with volumes of special pleading, based mostly on wishful thinking, to the effect that (1') the observed relationship between self-archiving and citations may not be causal, and that (2') there may exist an as-yet-unobserved causal relationship between self-archiving and cancellations after all.

Even that would be alright, if this Report's conclusions were coupled with a clear endorsement of the proposed self-archiving mandates, so that the competing hypotheses can be put to a rigorous long-term test. But the only test the commissioners of this Report seem to be interested in conducting is "Open Option" publishing, i.e., authors paying publishers to make their article OA for them, instead of self-archiving it for themselves. This would certainly be a nice way to hold author self-archiving and institution/funder self-archiving mandates at bay for a few years more, while at the same time protecting publishers from undemonstrated risk of revenue loss. But it would also leave global unmandated self-archiving to continue to languish at the current spontaneous 15% rate that the self-archiving mandates had been meant to drive up to 100%. And it would leave research unprotected from its demonstrated risk of impact loss. The option of having to pay to provide OA is certainly not likely to enhance the unmandated rate of uptake by authors (though I'm sure publishers would have no quarrel with funder mandates to provide OA coupled with the funds to pay publishers' asking price for paid OA, as provided by the Wellcome Trust).

The longterm test will nevertheless be conducted, because four out of eight UK Research Councils have already mandated self-archiving. Their citation rates and their cancellation rates can then be compared with those for the four that have not mandated self-archiving (and whose authors hence do it spontaneously by "self-selection"). Alas this will be mostly comparing apples and oranges (e.g. MRC vs AHRC), and it will needlessly be depriving the oranges of several more years of potential growth enhancement. My guess is that all the other councils -- except possibly the paradoxical EPSRC (which evidently thinks, with the publishing lobby, that there's still some sort of pertinent pretesting to be done for a few more years here) -- will come to their senses long before that, unpersuaded by Reports like this one.

UK scholarly journals: 2006 baseline report
An evidence-based analysis of data concerning scholarly journal publishing.
Prepared on behalf of the Research Information Network, Research Councils UK and the UK Department of Trade and Industry.
By Electronic Publishing Services Ltd
In association with Professor Charles Oppenheim and LISU at Loughborough University Department of Information Science

This is a rather long and repetitious report, but it does contain a few nuggets. It is obviously biassed, but biassed in a restrained way, meaning it does not really try to conceal its biases, nor does it overstate biassed conclusions. It also (reluctantly, but in most cases candidly) acknowledges its own weaknesses.

(The Report was commissioned by RIN, RCUK and DTI, but it is glaringly obvious that the questions, answers and interpretations have been slanted toward the interests of the publishing lobby rather than those of the research community -- possibly because the research community has no lobby in this matter, apart from the OA movement itself! Nevertheless, there has been considerable circumspectness, at least in the summary and conclusion passages, with weak points and gaps usually pointed out explicitly rather than denied or concealed, and with the overall preoccupation with publishing interests rather than research interests very open too.)

Some quotes and comments:

Whilst some evidence does suggest that [self-archiving in] repositories [is] an important new factor in the journal cancellation decision process, and one which is growing in significance, there is no research reporting actual or even intended journal subscription cancellation as a consequence of the growth of OA self-archived repositories.

So far, this sounds fair and reasonable. (In fact, this is the gist of the Report! The rest is mostly special pleading.)

Subscriptions are reported to have been declining over a period of 10+ years, but for a number of reasons. Proving or disproving a link between availability in self-archived repositories and cancellations will be difficult without long and rigorous research. In this connection, the outcome of research recently announced by the Research Councils UK (RCUK) with the co-operation of Macmillan, Blackwell and Elsevier, will be eagerly awaited, even though a report is not due until late 2008.

With evidence of self-archiving's benefits to research mounting, and zero evidence yet of any negative effect at all on publisher revenue, publishers nevertheless seem quite willing to wait (and keep research waiting too), trying to fend off self-archiving and its potential benefits to research for a long time to come yet, in order to keep trying to find some evidence of negative causal effects on publisher revenue (or, failing that, to deny positive causal effects on research impact).

Note that whereas a link between OA self-archiving and subscription decline has not yet been "proved or disproved" (not for want of looking!) -- and it is for that reason that we are hearing these calls for "long and rigorous research" -- the vast preponderance of the evidence we do have has already "proved" a "link" between OA self-archiving and citation counts (a link that is almost certainly causal, despite the wishful thinking of some who have a vested interest in its all turning out to be merely a-causal self-selection and superstition on the part of authors).

The question that the research community accordingly needs to ask itself is whether self-archiving's evidence-based benefits to research should be held in abeyance still longer, and meanwhile interpreted by default as a-causal, in order to buy still more time to try to "prove/disprove" hypothetical subscription declines for which there is no evidence whatsoever to date, even in fields where self-archiving has been near 100% for years.

(Researchers should also go on to ask themselves whether the research benefits should be held in abeyance even if they are causally linked to a subscription decline: Is research impact to be sacrificed in the service of publisher revenue? Are we conducting and funding research in order to generate -- or to safeguard -- publisher revenue?)

There is no evidence as yet to demonstrate any relationship (or lack of relationship) between subscription cancellations and repositories. Work in this field would need sufficient, representative and balanced samples, and the collaboration of all stakeholders, including especially research institutions and publishers. Any such study will need to be maintained over a fairly extended period, with regular reports, since it seems likely that the position could change with time if the contents of self-archiving repositories become progressively more comprehensive.

This would be fine, if proposed as an extended research project to be conducted after self-archiving mandates are in place, to analyze their long-term effects on subscriptions.

But this would be an exceedingly self-serving suggestion on the part of the publishing community (and a methodologically empty one) if meant as a "pilot" study that must somehow be conducted before adopting self-archiving mandates. (And it would be exceedingly self-defeating of the research community to even consider accepting such a pre-emptive suggestion as a precondition, before adopting self-archiving mandates.)

There is some consistency in results that show more citations for articles self-archived in repositories as distinct from the same or similar articles available [only via journal] subscription (although there have also been a few contradictory results). Overall, deposit of articles in open access repositories seems to be associated with both a larger number of citations, and earlier citations for the items deposited.

This a fair summary -- except that immediately after stating it, this "association" is about to be deconstructed (much as the "association" between cigarette-smoking and lung cancer was deconstructed for years and years by the tobacco industry, claiming that only correlation had been demonstrated, and not causation). Read on:

The reasons for this [association] have not been clearly established - there are many factors that influence citation rates, including the reputation of the author, the subject-matter of the article, the self-citation rate, and, of course, how important or influential the repository is in its own right. The little existing evidence suggests that a possible [sic] reason for increased citation counts is not that the materials were free, or that they appeared more rapidly, but that authors put their best work into OA format. This research was limited to one discipline, however [astronomy], and more extensive evidence is required to validate this finding.

This (important) study by Kurtz et al in astronomy, however, is not what the vast majority of the evidence (no longer little!) shows: Moreover, as noted, this a-causal interpretation -- only one of the possible interpretations of the astronomy evidence -- also happens to be the interpretation that the publishing community prefers for all the self-archiving evidence, in all fields. The alternative interpretation is that the relationship is causal: that the OA advantage is not merely an arbitrary whim on the part of the better authors to make their work OA, to no causal effect at all (why on earth would they be doing it at all then?): They do it because making their work more accessible increases its accessibility, uptake, downloads, usage, applications, citations, impact -- exactly as the correlational evidence shows, without exception, in field after field.

(NB: The only methodologically unexceptionable way to demonstrate causation here, by the way, is to select a large enough random sample of articles, divide them in half randomly, mandate half of them to be self-archived and half not, and then compare their respective citation counts after a few years. No one is likely to do quite that study -- any more than it was likely that a large random sample of people would be divided in half randomly, with half mandated to smoke and half not! But we are in the process of doing an approximation to that causal study, by comparing the citation counts of articles in the IRs of the (few) institutions that have already mandated self-archiving with the average for other articles in the same journals/years in which those articles appeared, but that have not been self-archived; we will also compare the size of the OA advantage for mandated and comparable non-mandated self-archiving. [We do not believe for a moment that these data are necessary to demonstrate causation, as causation is a virtual certainty anyway, but we are ready to play the game, in order to try to cut short the absurd delay in doing the obvious: mandating self-archiving universally.])

Although quite a lot of evidence has been collected regarding the quantitative effect of OA on citation counts (whether in the form of OA journals or as self-archived articles), much of it is scattered, uses inconsistent methods and covers different subject areas.

Yet, despite this scatter, methodological inconsistency and diversity, virtually all of it keeps showing exactly the same consistent pattern: A citation (and download) advantage for the OA articles. (No amount of special pleading can make that stubborn pattern go away!)

Consistent longitudinal data over a period of years to measure IF trends in a representative range of journals would fill this gap

There is no gap! There is a growing body of studies, across all fields and all journals, that keeps showing exactly the same thing: the OA advantage (in article citations and article downloads: this is not about journal impact factors, especially because comparing different journals is comparing apples and oranges).

(There seems to be a confusion here between the existence of the correlation itself, between self-archiving and citation count counts -- this is found consistently, over and over -- and the question of the causal relation, which will not be answered by longtitudinal data (we have longtitudinal data already!) but by comparing mandated and unmandated self-archiving: if they both show the OA advantage, then the effect is causal and self-selection bias is a minor component.)

e.g., studying a range of journals that were toll-access and went OA (or vice versa). In the short-term, more data in different disciplines measuring the impact on citation counts of articles in hybrid journals or articles that are available in both forms versus articles that are only available in one of the forms will improve the evidence base.

No, the question about the reality and causality of the OA advantage will not be settled by OA journal vs. non-OA journal comparisons; that can always be dismissed as comparing apples with oranges, and, failing that, can always be attributed to self-selection bias (i.e., choosing to publish one's better work in an OA journal)!

And if we wait for the uptake of hybrid Open Choice -- i.e., paying the journal to self-archive the published PDF for you -- these "longtitudinal" studies are likely to take till doomsday (and any positive outcome can still be dismissed as self-selection bias in any case!).

What is needed is precisely the data already being gathered, on huge samples, across all disciplines, comparing citation counts for self-archived versus non-self-archived articles within the same journal and year. The result has been a consistent, high OA Advantage (which has elicited a lot of special pleading about causality).

So we will look at the mandated subset of the self-archived papers, to try to show that the OA advantage is not (only, or mostly) a self-selection effect (Quality Bias [QB]).

(There is undoubtedly a non-zero self-selection [QB] component in the OA advantage, but there are many other components as well, including a Quality Advantage [QA], an Early Access Advantage [EA], a Competitive Advantage [CA, which will, like QB, vanish once all articles are OA], and a Usage (Download) Advantage [UA]. At 100% OA, there will no longer be any QB or CA (or Arxiv Advantage [AA]), but EA, QA and UA will still be going strong. EA and UA components have already been confirmed by the Kurtz study in astronomy. QA is implied by the repeated finding of a positive correlation between citation count and the proportion of those articles with that citation count that are OA. The mandate study will try to show that this correlation is causal, i.e., QA, not QB.)

Harnad, S. (2005) OA Impact Advantage = EA + (AA) + (QB) + QA + (CA) + UA.
The whole area of the relationship between citation counts and scholarly communication channels is confused because of problems associated with quality bias [QB] (e.g., if scholars tend to self-archive only their best work, as suggested by Kurtz et al. [in astronomy]; alternatively, it may be that only the best journals are OA). In other words, differences in citation counts and IFs may simply reflect the quality of the materials under study rather than having anything to do with the channel by which the material is made available.

First, the issue is article citation counts, not journal Impact Factors (IFs).

Second, this is all special pleading. The biggest OA effects are based on comparing articles within the same journal/year. The size of the effect is indeed correlated with the quality of the article, because no amount of accessibility will generate citations for bad articles, whereas good articles benefit the most from a level playing field, with all affordability/accessibility barriers removed: that is the Quality Advantage [QA]. The idea that the Quality Advantage is merely a Quality (Self-Selection) Bias [QB], i.e., that the advantage is merely correlational, not causal, is of course a logical possibility, but it is also highly improbable (and would imply that accessibility/affordability barriers count for nothing in usage and citations, and that the better work is being made OA by its authors for purely superstitious reasons, because doing so has no effect at all!).

Overall, we concur with Craig's introduction that "the problems with measuring and quantifying an Open Access advantage are significant. Articles cannot be OA and non-OA at the same time."

They need not be. It is sufficient if we take a large enough sample of articles that are OA and non-OA from the same journals and years. Randomly imposing the self-archiving would be the only way to equate them completely (and our ongoing study on mandated self-archiving will approximate this).

(The analysis by Craig, commissioned by Blackwell Publishing, has not, so far as I know, been published.)

"Further, the variation of citation counts between articles can be extremely high, so making controlled comparisons of OA vs. non-OA articles nigh on impossible" [Craig, Blackwell Publishing]

(The way Analysis of Variance works is to compare variation between and within putatively different populations, to determine the probability that they are in reality the same population. The published comparisons show that the OA/non-OA differences are highly significant, despite the high variance.)

It would of course be absurd to try to compare citation counts for OA and non-OA articles having the same citation counts. But we can compare OA and non-OA article counts among articles having the same citation counts, in the same journals -- and what we find is a strong positive correlation between the citation count and the proportion of articles that are OA (just as Lawrence reported in 2001, but not only in computer science, but across all 12 disciplines studies so far, and with much bigger sample sizes):

Source 4.8: Hajjem, C., Harnad, S. and Gingras, Y. (2005) Ten-Year Cross-Disciplinary Comparison of the Growth of Open Access and How it Increases Research Citation Impact. IEEE Data Engineering Bulletin 28(4) pp. 39-47.

Note that the appendix to the Report under discussion here, states, in connection with the above study, which it cites:

"Harnad is THE advocate of OA and, thus, whilst expert in the field, is inevitably biased."

There is a bit of irony in the fact that in connection with another of the studies it cites:

Source 4.9: Harnad, S, Brody, T, Oppenheim, C et al, Comparing the impact of open access versus non open access articles in the same journals, D-Lib Magazine, 10,(6), 2004,

the appendix of the Report goes on to say:

"Harnad is THE exponent of OA, but, thus, potentially less objective."

Ironic (or, shall we say, conflicted, since this Report aspires to be a neutral one as between the interests of the research community and the publisher community), because the sole named collaborator on the Report is also a co-author of the above-cited study!

Let us agree that we all have views on the underlying issues, but that reliable data speak for themselves, qua data, and our data (and those of others) keep showing the same consistent OA Advantage. The disagreement is only on the interpretation: whether or not the consistent correlations are causal. And here, allegiances are tugging on both sides: Those favouring causality tend to come from the research community, those favouring a-causality tend to come from the publishing community. (Let us hope that the data from mandated self-archiving will soon settle the matter objectively.)

"[since] any Open Access advantage appears to be partly [sic] dependent on self-selection, the more articles that are {self-}archived... you'd expect to see any Open Access advantage reduce." [Craig, Blackwell Publishing]

Note that Craig carefully says "partly" -- and that we agree that self-selection is one of the many potential contributors to the OA advantage.

We also agree, of course, that once 100% OA is reached, the OA citation advantage -- in the form of an advantage of OA over concurrent non-OA articles -- will be reduced: indeed it will vanish! With all articles OA, there can no longer be either a Competitive Advantage [CA] or a Self-Selection Advantage (Quality Bias, QB) of OA over (non-existent) non-OA.

But the Quality Advantage [QA] will remain. (Higher quality articles will be used and cited more than they would have been if they had not been OA: this is not a competitive advantage but an absolute one.) And the Early Advantage [EA] as well as the Usage (Download) Advantage [UA] will remain too (as already shown by Kurtz's findings in Astronomy).

"Authors self-archiving in the expectant belief that each and every paper they archive will receive an Open Access advantage of several hundred percent are going to be sorely disappointed." [Craig, Blackwell Publishing]

This too is correct, but who on earth thought that OA would guarantee that all work would be used, whether or not it was any good? OA levels the playing field so merit can rise to the top, unconstrained by accessibility or affordability handicaps. But bad remains bad, and let's hope that researchers will continue to avoid trying to build on weak or invalid findings, whether or not they are OA.

The OA advantage is an average effect, not an automatic bonus for each and every OA article; moreover, the OA advantage is highly correlated with quality: The higher the quality, the higher the advantage. It is this effect that is open to the a-causal interpretation that the Quality Advantage [QA] is merely a Quality Bias [QB] (Self-Selection). But, equally (and, in my view, far more plausibly) it is open to the causal interpretation that OA causes wider usage and citation precisely because it removes all accessibility/affordability constraints that are currently limiting uptake and usage. That does not mean everything will be used more, regardless of quality ("usefulness"): But it will allow users (who are quite capable of exercising self-selection too!) to access and use the better work, selectively.

In addition, since the distribution of citations is not gaussian -- a small percentage of articles receives most of the citations and more than half of articles receive no citations at all -- it is almost axiomatic that the OA advantage will be strongest in the high-quality range

Finally, it is worth noting that all researchers in the field are agreed that if the vast majority of scholarly publications become available in OA form, no citation advantage to OA will be measurable.

It is a tautology that with 100% OA, the OA/NOA ratio is undefined! But EA will still be directly measurable, and it will be possible to infer UA and QA indirectly (UA by comparing downloads for articles of the same age, before and after OA for the same articles, and QA by doing the same with citations; the Kurtz study used such methods in Astronomy. But by that time (100% OA), not many people will still have any interest in the a-causal hypothesis.

Thus, what OA advantage there is will prove to be temporary if OA does become the standard mode of publication.

This, however, is simply incorrect. At 100% OA, the Competitive Advantage (CA) will be gone; the Self-Selection Advantage (Quality Bias, QB) will be gone; the method of comparing citation counts for OA and non-OA articles within the same journal and year will be gone. So much is true by definition.

But (as Kurtz has shown in Astronomy), the Early Advantage and the Usage Advantage will still be there. And the Quality Advantage, will still be there too; and that was what this was all about: Not just a horse-race for who can make his articles OA first, so as to reap the competitive advantage before 100% OA is reached (though that's not a bad idea!); not a guarantee that, no matter how bad your work, you can increase your citations by making them OA; but a guarantor that with access-barriers removed, quality will have the best chance to have its full potential impact, to the benefit of research productivity and progress itself, as well as the authors, institutions and funders of the high quality work.

(There is a bit of a [lurid] analogy here with saying that if only we can get everyone to smoke, it will be clear that smoking has no differential effects on human health! Perhaps the converse is a better way to look at it: if only we could get everyone to stop smoking, smoking will no longer have a differential effect on human health!)

(PS: OA is not a "mode of publication": OA publication is a mode of publication. OA itself is a mode of access-provision, which can be done in two ways, via OA publication or via OA self-archiving of non-OA publications.)

Self archived articles

It is this area that has been most studied, with numerous key publications. Most of these are focussed on the citation advantage of self-archived articles rather than of OA journals. Craig, in an as yet unpublished review, provides an excellent overview of the evidence collected to date. Lawrence (Source 4.13) is significant because it was the first major paper that identified a citation advantage for OA self-archived articles, and it has been widely cited ever since. However, it was based on a too small-scale a study to support general conclusions. Harnad et al. (Source 4.9) provides a useful summary of the state of play of OA advantage studies, while Hajjem et al. (Source 4.8 ) is fairly typical of the many articles produced by Harnad claiming that self-archiving leads to higher citation counts.

Let us be clear: The many OA vs. non-OA studies, ours and everyone else's, across more than a dozen different disciplines, many of them based on large-scale samples, all show the very same consistent pattern of positive correlation between OA and citation counts. Those are data, and they are not under dispute. The only "claim" under dispute is that that consistent correlation is causal...

Antelman (Source 4.1) is arguably the most carefully constructed study of the question. Articles in four disciplines were evaluated, and in each case it was found that open access articles had greater citation counts than non-open access articles.

One wonders why this particular small-scale study (of about 2000 articles in 4 fields) was singled out, but in any event, it shows exactly the same pattern as all the other studies (some of them based on hundreds of thousands of articles instead of just a few thousand, in three times as many fields).

Eysenbach challenges the notion that OA "green" articles (i.e., those in repositories) are more effective than OA "gold" (i.e., those published in OA journals, such as those produced by Public Library of Science) in obtaining high citation counts. It is this part of his paper that produced a furious response from Harnad, much of it focused on particular details.

The issue was not about OA green (self-archived) articles producing higher citation counts than OA gold (OA-journal)! No one had claimed one form of OA was more effective than the other in generating the OA Advantage before the Eysenbach study: It was Eysenbach who claimed to have shown gold was more effective than green -- indeed that green was only marginally effective at all!

And I think anyone reading the exchanges will see that all the fury is on the Eysenbach side. All I do is point out (rather patiently) where Eysenbach is overstating or misstating his case:

Harnad, S. (2006)PLoS, Pipe-Dreams and Peccadillos PLoS Biology eletters (May 16, 2006) [1] [2] [3] [4]

Eysenbach's study does find the OA advantage, as many others before it did. It certainly doesn't show that the gold OA advantage is bigger than the green OA advantage, in general. It simply shows that for the 1500-article sample in the one journal tested, Proceedings of the National Academy of Sciences (PNAS), a very high impact journal, both paid OA (gold) and green OA (free) increased citation counts over non-OA, but gold increased them more than green. That result is undisputed. Its extrapolation to other journals is:

The likely explanation of the PNAS result is very simple: PNAS is not a randomly chosen, representative journal: it is a very high-impact, very high visibility, interdisciplinary journal, one of very few like it (along with Nature and Science). Articles that pay for OA are immediately accessible at PNAS's own high-visibility website -- a website that probably has higher visibility than any single institution's IR today. So PNAS articles made freely accessible at PNAS's website get a bigger OA advantage than PNAS articles made made freely accessible by being self-archived in the author's own IR.

The reason it definitely does not follow from this that gold OA is bigger than green OA is very simple: Most journals are not PNAS, and do not have the visibility or average impact of PNAS articles! Hence Eysenbach's valid finding for one very high-impact journal simple does not generalize to all, most, or even many journals. Hence it is not a gold/green effect at all, but merely a very high-end special case.

Apart from the spurious gold/green advantage, Eysenbach did confirm, yet again, (1) the OA advantage itself, and confirmed it (2) within a very short time range. These are both very welcome results (but not warranting to be touted, as they were, by both the author and by the accompanying PLoS editorial, as either the first "solid evidence" of the OA advantage -- they certainly were not that -- or a demonstration that gold OA generates more citations than green OA: the very same method has to be tried on middle and low-ranking journals too, before drawing that conclusion!). (Nor are the PLoS/PNAS results any more exempt from the methodological possibility of self-selection bias [QB] than any of the many prior demonstrations of the OA advantage, as authors self-choose to pay PNAS for gold OA as surely as they self-choose to self-archive for green OA!)

The fury on Eysenbach's part came from my pointing out that his and PLoS's claim to primacy for demonstrating the OA advantage (and their claim of having demonstrated a general gold-over-green advantage) was unfounded (and might have been due to both PLoS's and Eysenbach's zeal to promote publication in gold journals: Eysenbach is the editor of one too, but not a high-end one like PNAS or PLoS): Eysenbach's was just the latest in a long (and welcome) series of confirmations of the OA advantage (beginning with Lawrence 2001), the prior ones having been based on far larger samples of articles, journals and fields (and there was no demonstration at all of a general gold over green advantage: just the one non-representative, hence non-generalisable special case of PNAS).

Both authors believe that OA produces a citation advantage, but Eysenbach has presented evidence that casts doubt on Harnad's notion that the "green" route is the preferred route to getting that increased impact.

Green may not be the preferred route to OA for editors of gold journals, but it is certainly the preferred route for the vast majority of authors, who either have no suitable gold journal to publish in, or lack the funds (or the desire) to pay the journal to do what they can do for free for themselves. The only case in which paid gold OA may bring even more citations than free green OA (even though both increase citations) is in the very highest quality journals, such as PNAS, today -- but that high-end reasoning certainly does not generalise to most journals, by definition. (And it will vanish completely when OA self-archiving is mandated, and the harvested IR contents become the locus classicus to access the literature for those whose institutions are not subscribed to the journal in which a particular article appeared -- whether or not it is a high-end journal.)

(There is also a conflation of the (less interesting) question of (1) whether green or gold generates a greater OA citation advantage [answer, for high-end journals like PNAS, gold does, but in general there is no difference] with the (far more important) question of (2) whether green or gold can generate more OA [answer: green can generate far more OA, far more quickly and easily, not just because it does not cost the author/institution anything, but because it can be mandated without needing either to find the extra funds to pay for it or to constrain the author's choice of which journal to publish in].

However, despite the intuitive attractiveness of the hypothesis that OA will lead to increased citations because of easier availability, the one systematic study of the reasons for the increased citations - by Kurtz (Source 4.12) - showed that in the field of astronomy at least, the primary reason was not that the materials were free, or that they appeared more rapidly, but that authors put their best work into OA format, and this was the reason for increased citation counts.

Astronomy is an interesting but anomalous field: It differs from most other fields in that:

(1) Astronomy consists of a small, closed circle of journals.

(2) Virtually all research-active astronomers (so I am told by the author) have institutional access to all those journals.

(3) For a number of years now, that full institutional access has been online access.

(4) So astronomy is effectively a 100% OA field.

(5) Hence the only room left for a directly measurable OA advantage in astronomy is (5a) to self-archive the paper earlier (at the preprint stage) [EA] or (5b) to self-archive it in Arxiv (which has evolved into a common central port of call, so it generates more downloads and citations -- mostly at the preprint stage, in astronomy).

(6) What Kurtz found, was that under these conditions, higher quality (higher citation-count) papers were more likely to be self-archived.

(7) This might be a quality self-selection effect (QB) (or it might not), but it is clearly occurring under very special conditions, in a 100% OA field.

(8) Kurtz did make another, surprising finding, which has bearing on the question of how much of a citation advantage remains once a field has reached 100% OA.

(9) By counting citations for comparable articles before and after the transition to 100% OA, Kurtz found that the citations per article had actually gone down (slightly) rather than up, with 100% OA.

(10) But a little reflection suggests a likely explanation: This slight drop is probably a shift in balance with a level playing field:

(11) With 100% OA (i.e., equal access to everything), authors don't cite more articles, they cite more selectively, able now to focus on the best, most relevant work, and not just on the work their institutions can afford to access.

(12) Higher quality articles get more citations, but lower quality articles of which there are far more (some perhaps previously cited by default, because of accessibility constraints) are cited less.

(13) On balance, total citations are slightly down, on this level playing field, in this special, small, closed-circle field (astronomy), once it reaches 100% OA.

(14) It remains to be seen whether total and average citations go up or down when other fields reach 100% OA.

(15) What Kurtz does report even in astronomy is that although total citations are slightly down, downloads are doubled.

(16) Downloads are correlated with later citations, but perhaps at 100% OA this is either no longer true, or true only for higher quality articles.

Similarly, more carefully conceived work on the impact of both OA journals and self-archiving on the quality of research communications, especially on the peer review system, will be required.

OA journals are peer-reviewed journals: What sort of impact are they feared to have on peer review?

And why on earth would the self-archiving of peer-reviewed, published postprints have any impact on the peer review system? The peers review for free. (Could this be just a veiled repetition of the question about the impact of self-archiving on journal revenues, yet again?)

Recently, the results of a study undertaken by Ware for ALPSP, which were published in March 2006 (Source 1.16, in Area 1), have provided at least some initial data on the question of the possible linkage between the availability of self-archived articles in an OA repository and journal subscription cancellations by libraries...: availability of articles in repositories was cited as either a "very important" or an "important" possible factor in journal cancellation by 54 per cent of respondents, even though ranking fourth after (i) decline of faculty need, (ii) reduced usage, and (iii) price. When respondents were invited to think forward five years, availability in a repository was still fourth-ranking factor, but the relevant percentage had risen to 81. Whilst this is not evidence of actual or even intended cancellation as a consequence of the growth of OA self-archiving repositories, it strongly suggests that such repositories are an important new factor in the decision process, and growing in significance.

Summary: No evidence of cancellations, but speculations by librarians to the effect that their currently fourth-ranking factor in cancellations might possibly become more important in the next five years...

Sounds like sound grounds for fighting self-archiving mandates and trying to deny research the benefit of maximized impact for yet another five years -- if one's primary concern is the possible impact of mandated self-archiving on publishers' revenue streams. But if one's primary concern is with the probable impact of mandated self-archiving on research impact, this sort of far-fetched reasoning has surely earned the right to be ignored by the research community as the self-serving interference in research policy that it surely is.

Stevan Harnad
American Scientist Open Access Forum