Wednesday, January 17. 2007
Citation Advantage For OA Self-Archiving Is Independent of Journal Impact Factor, Article Age, and Number of Co-Authors
In May 2006, Eysenbach published "Citation Advantage of Open Access Articles" in PLoS Biology, confirming -- by comparing OA vs. non-OA articles within one hybrid OA/non-OA journal -- the "OA Advantage" (higher citations for OA articles than for non-OA articles) that had previously been demonstrated by comparing OA (self-archived) vs. non-OA articles within non-OA journals.
This new PLoS study was based on a sample of 1492 articles (212 OA, 1280 non-OA) published June-December 2004 in one very high-impact (i.e., high average citation rate) journal: Proceedings of the National Academy of Sciences (PNAS). The findings were useful because not only did they confirm the OA citation advantage, already demonstrated across millions of articles, thousands of journals, and over a dozen subject areas, but they showed that that advantage is already detectable as early as 4 months after publication.
The PLoS study also controlled for a large number of variables that could have contributed to a false OA advantage (for example, if more of the authors that chose to provide OA had happened to be in subject areas that happened to have higher citation counts). Eysenbach's logistic and multiple regression analyses confirmed that this was not the case for any of the potentially confounding variables tested, including the (i) country, (ii) publication count and (iii) citation count of the author and the (iv) subject area and (v) number of co-authors of the article.
However, both the Eysenbach article and the accompanying PLoS editorial, considerably overstated the significance of all the controls that were done, suggesting that (1) the pre-existing evidence, based mainly on OA self-archiving ("green OA") rather than OA publishing ("gold OA"), had not been "solid" but "limited" because it had not controlled for these potential "confounding effects." They also suggested that (2) the PLoS study's finding that gold OA generated more citations than green OA in PNAS pertained to OA in general rather than just to high-profile journals like PNAS (and that perhaps green OA is not even OA!):
Eysenbach (2006): "[T[he [prior] evidence on the “OA advantage” is controversial. Previous research has based claims of an OA citation advantage mainly on studies looking at the impact of self-archived articles... (which some have argued to be different from open access in the narrower sense)... All these previous studies are cross-sectional and are subject to numerous limitations... Limited or no evidence is available on the citation impact of articles originally published as OA that are not confounded by the various biases and additional advantages [?] of self-archiving or “being online” that contribute to the previously observed OA effects."When I pointed out in a reply that subject areas, countries and years had all been analyzed separately in prior within-journal comparisons based on far larger samples, always with the same outcome -- the OA citation advantage -- making it highly unlikely that any of the other potentially confounding factors singled out in the PLoS/PNAS study would change that consistent pattern, Eysenbach responded:
Eysenbach: "[T]o answer Harnad's question 'What confounding effects does Eysenbach expect from controlling for number of authors in a sample of over a million articles across a dozen disciplines and a dozen years all showing the very same, sizeable OA advantage? Does he seriously think that partialling out the variance in the number of authors would make a dent in that huge, consistent effect?' – the answer is “absolutely”.My doctoral student, Chawki Hajjem, has accordingly accepted Eysenbach's challenge, and done the requisite multiple regression analyses, testing not only (3) number of authors, but (1) number of years since publication, and (2) journal impact factor. The outcome is that (4) the OA self-archiving advantage (green OA) continues to be present as a robust, independent, statistically significant factor, alongside factors (1)-(3):
In order of size of contribution:Tested:
Article age (1) is of course the biggest factor: Articles' total citation counts grow as time goes by.
Journal impact factor (2) is next: Articles in high-citation journals have higher citation counts: This is not just a circular effect of the fact that journal citation counts are just average journal-article citation counts: It is a true QB selection effect (nothing to do with OA!), namely, the higher quality articles tend to be submitted to and selected by the higher quality journals!.
The next contributor to citation counts is the number of authors (3): This could be because there are more self-citations when there are more authors; or it could indicate that multi-authored articles tend to be of higher quality.
But last, we have the contribution of OA self-archiving (4). It is the smallest of the four factors, but that is unsurprising, as surely article age and quality are the two biggest determinants of citations, whether the articles are OA or non-OA. (Perhaps self-citations are the third biggest contributor). But the OA citation advantage is present for those self-archived articles (and stronger for the higher quality ones, QA), refuting Eysenbach's claim that the green OA advantage is merely the result of "potential confounds" and that only the gold OA advantage is real.
I might add that the PLoS Editorial is quite right to say: "Since most open-access journals are new, comparisons of the effects of open access with established subscription-based journals are easily confounded by age and reputation": Comparability and confounding are indeed major problems for between-journal comparisons, comparing OA and non-OA journals (gold OA). Until Eysenbach's within-journal PNAS study, "solid evidence" (for gold OA) was indeed hard to find. But comparability and confounding are far less of a problem for the within-journal analyses of self-archiving (green OA), and with them, solid evidence abounds.
I might further add that the solid pre-existing evidence for the green OA advantage -- free of the limitations of between-journal comparisons -- is and always has been, by the same token, evidence for the gold OA advantage too, for it would be rather foolish and arbitrary to argue that free accessibility is only advantageous to self-archived articles, and not to articles published in OA journals!
Yet that is precisely the kind of generalization Eysenbach seems to want to make (in the opposite direction) in the special case of PNAS -- a very selective, high-profile, high-impact journal. PNAS articles that are freely accessible on the PNAS website were found to have a greater OA advantage than PNAS articles freely accessible only on the author's website. With just a little reflection, however, it is obvious that the most likely reason for this effect is the high profile of PNAS and its website: That effect is hence highly unlikely to scale to all, most, or even many journals; nor is it likely to scale in time, for as green OA grows, the green OA harvesters like OAIster (or even just Google Scholar) will become the natural way and place to search, not the journal's website.
Having taken up Eysenbach's challenge to test the independence of the OA self-archiving advantage from "potential confounds," we now challenge Eysenbach to test the generality of the PNAS gold/green advantage across the full quality hierarchy of journals, to show it is not merely a high-end effect.
Let me close by mentioning one variable that Eysenbach did not (and could not) control for, namely, author self-selection bias (Quality Bias, QB): His 212 OA authors were asked to rate the relative urgency, importance, and quality of their articles and there was no difference between their OA and non-OA articles in these self-ratings. But (although I myself am quite ready to agree that there was little or no Quality Bias involved in determining which PNAS authors chose which PNAS articles to make OA gold), unfortunately these self-ratings are not likely to be enough to convince the sceptics who interpret the OA advantage as a Quality Bias (a self-selective tendency to provide OA to higher quality articles) rather than a Quality Advantage (QA) that increases the citations of higher quality articles. Not even the prior evidence of a correlation between earlier downloads and later citations is enough. The positive result of a more objective test of Quality Bias (QB) vs. Quality Advantage (QA) (comparing self-selected vs. mandated self-archiving, and likewise conducted by Chawki Hajjem) will be reported shortly.
Brody, T., Harnad, S. and Carr, L. (2005) Earlier Web Usage Statistics as Predictors of Later Citation Impact. Journal of the American Association for Information Science and Technology (JASIST) 57(8) pp. 1060-1072.
Eysenbach G (2006) Citation Advantage of Open Access Articles. PLoS Biology 4(5) e157 DOI: 10.1371/journal.pbio.0040157
Hajjem, C., Harnad, S. & Gingras, Y. (2005) Ten-Year Cross-Disciplinary Comparison of the Growth of Open Access and How it Increases Research Citation Impact. IEEE Data Engineering Bulletin 28(4) pp. 39-47.
Harnad, S. (2006) PLoS, Pipe-Dreams and Peccadillos. PLoS Biology Responses.
Harnad, S. (2007) The Open Access Citation Advantage: Quality Advantage Or Quality Bias? [coming, stay tuned)
MacCallum CJ & Parthasarathy H (2006) Open Access Increases Citation Rate. PLoS Biol 4(5): e176 DOI: 10.1371/journal.pbio.0040176
Moed, H. F. (2006) The effect of 'Open Access' upon citation impact: An analysis of ArXiv's Condensed Matter Section
Stevan Harnad & Chawki Hajjem
American Scientist Open Access Forum