Monday, October 9. 2006
SUMMARY: This Report on UK
Scholarly Journals was commissioned by RIN,
RCUK and DTI, and conducted by ELS, but its questions, answers and
interpretations are clearly far more concerned with the interests of
the publishing
lobby than with those of the research community.
The Report's two relevant overall findings are correct and stated very
fairly in their summary form:
[1] "Overall, [self-archiving] of articles in open
access repositories seems to be associated with both a larger number
of citations, and earlier citations for the items deposited....The
reasons for this [association] have not been clearly established -
there are many factors that influence citation rates... Consistent
longitudinal data over a period of years... would fill this gap."
[2] "There is no
evidence as yet to demonstrate any relationship (or lack of
relationship) between subscription cancellations and repositories...
Proving or disproving a [causal] link between availability in
self-archived repositories and cancellations will be difficult without
long and rigorous research."
The obvious empirical and practical conclusion to draw from the
findings -- that (1) all the self-archiving evidence to date is
positive for research and that (2) none of the self-archiving evidence
to date is negative for publishing -- would have been that the research
community should now apply and extend these findings -- by applying and
extending self-archiving (through self-archiving
mandates) to all UK research output, along with consistent,
rigorous longtitudinal studies over a period of years, to test (1)
whether the positive effect on citations continues to be present (and
why) and (2) whether the negative effect on subscriptions continues to
be absent.
But instead, the two overall findings are hedged with volumes of
special pleading, based mostly on wishful thinking, to the effect that
(1') the observed relationship between self-archiving and citations may
not be causal, and that (2') there may exist an as-yet-unobserved
causal relationship between self-archiving and cancellations after all.
Even that would be alright, if this Report's conclusions were coupled
with a clear endorsement of the proposed self-archiving mandates, so
that the competing hypotheses can be put to a rigorous long-term test.
But the only test the commissioners of this Report seem to be
interested in conducting is "Open Option" publishing, i.e., authors
paying publishers to make their article OA for them, instead of
self-archiving it for themselves. This would certainly be a nice way to
hold author self-archiving and institution/funder self-archiving
mandates at bay for a few years more, while at the same time protecting
publishers from undemonstrated risk of revenue loss. But it would also
leave global unmandated self-archiving to continue to languish at the
current spontaneous 15% rate that the self-archiving mandates had been
meant to drive up to 100%. And it would leave research unprotected from
its demonstrated risk of impact loss. The option of having to pay to
provide OA is certainly not likely to enhance the unmandated rate of
uptake by authors (though I'm sure publishers would have no quarrel
with funder mandates to provide OA coupled with the funds to pay
publishers' asking price for paid OA, as provided by the Wellcome
Trust).
The longterm test will nevertheless be conducted, because four out of
eight UK Research
Councils have already mandated self-archiving. Their citation rates
and their cancellation rates can then be compared with those for the
four that have not mandated self-archiving (and whose authors hence do
it spontaneously by "self-selection"). Alas this will be mostly
comparing apples and oranges (e.g. MRC vs AHRC),
and it will needlessly be depriving the oranges of several more years
of potential growth enhancement. My guess is that all the other
councils -- except possibly the paradoxical EPSRC (which
evidently thinks, with the publishing lobby, that there's still some
sort of pertinent pretesting to be done for a few more years here) --
will come to their senses long before that, unpersuaded by Reports like
this one.
UK
scholarly journals: 2006 baseline report
An evidence-based analysis of data concerning scholarly journal
publishing.
Prepared on behalf of the Research
Information Network, Research
Councils UK and the UK Department
of Trade and Industry.
By Electronic Publishing Services Ltd
In association with Professor
Charles Oppenheim and LISU
at Loughborough University Department of Information Science
This is a rather long and repetitious report, but it does contain a few
nuggets. It is obviously biassed, but biassed in a restrained way,
meaning it does not really try to conceal its biases, nor does it
overstate biassed conclusions. It also (reluctantly, but in most cases
candidly) acknowledges its own weaknesses.
(The Report was commissioned by RIN, RCUK and DTI, but it is glaringly
obvious that the questions, answers and interpretations have been
slanted toward the interests of the publishing
lobby rather than those of the research community -- possibly
because the research community has no lobby in this matter, apart from
the OA movement itself! Nevertheless, there has been considerable
circumspectness, at least in the summary and conclusion passages, with
weak points and gaps usually pointed out explicitly rather than denied
or concealed, and with the overall preoccupation with publishing
interests rather than research interests very open too.)
Some quotes and comments:
Whilst some evidence does suggest that
[self-archiving in] repositories [is] an important new factor in the
journal cancellation decision process, and one which is growing in
significance, there is no research reporting actual or even intended
journal subscription cancellation as a consequence of the growth of OA
self-archived repositories.
So far, this sounds fair and reasonable. (In fact, this is the gist of
the Report! The rest is mostly special pleading.)
Subscriptions are reported to have been
declining over a period of 10+ years, but for a number of reasons.
Proving or disproving a link between availability in self-archived
repositories and cancellations will be difficult without long and
rigorous research. In this connection, the outcome of research recently
announced by the Research Councils UK (RCUK) with the co-operation of
Macmillan, Blackwell and Elsevier, will be eagerly awaited, even though
a report is not due until late 2008.
With evidence
of self-archiving's benefits to research mounting, and zero evidence yet of
any negative effect at all on publisher revenue, publishers
nevertheless seem quite willing to wait (and keep research waiting
too), trying to fend off self-archiving and its potential benefits to
research for a long time to come yet, in order to keep trying to find
some evidence of negative causal effects on publisher revenue (or,
failing that, to deny positive causal effects on research impact).
Note that whereas a link between OA self-archiving and subscription
decline has not yet been "proved or disproved" (not for want of
looking!) -- and it is for that reason that we are hearing these calls
for "long and rigorous research" -- the vast preponderance of the
evidence we do have has already "proved" a "link" between OA
self-archiving and citation counts (a link that is almost certainly
causal, despite the wishful thinking of some who have a vested interest
in its all turning out to be merely a-causal self-selection and
superstition on the part of authors).
The question that the research community accordingly needs to ask
itself is whether self-archiving's evidence-based benefits to research
should be held in abeyance still longer, and meanwhile interpreted by
default as a-causal, in order to buy still more time to try to
"prove/disprove" hypothetical subscription declines for which there is
no evidence whatsoever to date, even in fields where self-archiving has
been near 100% for years.
(Researchers should also go on to ask themselves whether the research
benefits should be held in abeyance even if they are causally
linked to a subscription decline: Is research impact to be sacrificed
in the service of publisher revenue? Are we conducting and funding
research in order to generate -- or to safeguard -- publisher revenue?)
There is no evidence as yet to demonstrate any
relationship (or lack of relationship) between subscription
cancellations and repositories. Work in this field would need
sufficient, representative and balanced samples, and the collaboration
of all stakeholders, including especially research institutions and
publishers. Any such study will need to be maintained over a fairly
extended period, with regular reports, since it seems likely that the
position could change with time if the contents of self-archiving
repositories become progressively more comprehensive.
This would be fine, if proposed as an extended research project to be
conducted after self-archiving mandates are in place, to
analyze their long-term effects on subscriptions.
But this would be an exceedingly self-serving suggestion on the part of
the publishing community (and a methodologically empty one) if meant as
a "pilot" study that must somehow be conducted before adopting
self-archiving mandates. (And it would be exceedingly self-defeating of
the research community to even consider accepting such a pre-emptive
suggestion as a precondition, before adopting self-archiving mandates.)
There is some consistency in results that show
more citations for articles self-archived in repositories as distinct
from the same or similar articles available [only via journal]
subscription (although there have also been a few contradictory
results). Overall, deposit of articles in open access repositories
seems to be associated with both a larger number of citations, and
earlier citations for the items deposited.
This a fair summary -- except that immediately after stating it, this
"association" is about to be deconstructed (much as the "association"
between cigarette-smoking and lung cancer was deconstructed for years
and years by the tobacco industry, claiming that only correlation had
been demonstrated, and not causation). Read on:
The reasons for this [association] have not
been clearly established - there are many factors that influence
citation rates, including the reputation of the author, the
subject-matter of the article, the self-citation rate, and, of course,
how important or influential the repository is in its own right. The
little existing evidence suggests that a possible [sic] reason for
increased citation counts is not that the materials were free, or that
they appeared more rapidly, but that authors put their best work into
OA format. This research was limited to one discipline, however
[astronomy], and more extensive evidence is required to validate this
finding.
This (important) study by Kurtz et
al in astronomy, however, is not what the vast majority
of the evidence (no longer little!) shows: Moreover, as noted, this
a-causal interpretation -- only one of the possible interpretations of
the astronomy evidence -- also happens to be the interpretation that
the publishing community prefers for all the self-archiving
evidence, in all fields. The alternative interpretation is that the
relationship is causal: that the OA advantage is not merely an
arbitrary whim on the part of the better authors to make their work OA,
to no causal effect at all (why on earth would they be doing it at all
then?): They do it because making their work more accessible increases
its accessibility, uptake, downloads, usage, applications, citations,
impact -- exactly as the correlational evidence shows, without
exception, in field after field.
(NB: The only methodologically unexceptionable way to demonstrate
causation here, by the way, is to select a large enough random sample
of articles, divide them in half randomly, mandate half of them to be
self-archived and half not, and then compare their respective citation
counts after a few years. No one is likely to do quite that
study -- any more than it was likely that a large random sample of
people would be divided in half randomly, with half mandated to smoke
and half not! But we
are in the process of doing an approximation to that causal study, by
comparing the citation counts of articles in the IRs of the (few)
institutions that have already mandated
self-archiving with the average for other articles in the same
journals/years in which those articles appeared, but that have not been
self-archived; we will also compare the size of the OA advantage for
mandated and comparable
non-mandated self-archiving. [We do not believe for a moment that
these data are necessary to demonstrate causation, as causation is a
virtual certainty anyway, but we are ready to play the game, in order
to try to cut short the absurd delay in doing the obvious: mandating
self-archiving universally.])
Although quite a lot of evidence has been
collected regarding the quantitative effect of OA on citation counts
(whether in the form of OA journals or as self-archived articles), much
of it is scattered, uses inconsistent methods and covers different
subject areas.
Yet, despite this scatter, methodological inconsistency and diversity,
virtually all of it keeps showing exactly the same consistent pattern:
A citation (and download) advantage for the OA articles. (No amount of
special pleading can make that stubborn pattern go away!)
Consistent longitudinal data over a period of
years to measure IF trends in a representative range of journals would
fill this gap
There is no gap! There is a growing body of studies, across all fields
and all journals, that keeps showing exactly the same thing: the OA
advantage (in article citations and article downloads: this is not
about journal impact factors, especially because comparing different
journals is comparing apples and oranges).
(There seems to be a confusion here between the existence of the
correlation itself, between self-archiving and citation count counts --
this is found consistently, over and over -- and the question of the
causal relation, which will not be answered by longtitudinal data (we
have longtitudinal data already!) but by comparing mandated and
unmandated self-archiving: if they both show the OA advantage, then the
effect is causal and self-selection bias is a minor component.)
e.g., studying a range of journals that were
toll-access and went OA (or vice versa). In the short-term, more data
in different disciplines measuring the impact on citation counts of
articles in hybrid journals or articles that are available in both
forms versus articles that are only available in one of the forms will
improve the evidence base.
No, the question about the reality and causality of the OA advantage
will not be settled by OA journal vs. non-OA journal comparisons; that
can always be dismissed as comparing apples with oranges, and, failing
that, can always be attributed to self-selection bias (i.e., choosing
to publish one's better work in an OA journal)!
And if we wait for the uptake of hybrid Open
Choice -- i.e., paying the journal to self-archive the published
PDF for you -- these "longtitudinal" studies are likely to take till
doomsday (and any positive outcome can still be dismissed as
self-selection bias in any case!).
What is needed is precisely the data already being gathered, on huge
samples, across all disciplines, comparing citation counts for
self-archived versus non-self-archived articles within the same journal
and year. The result has been a consistent, high OA Advantage (which
has elicited a lot of special pleading about causality).
So we will look at the mandated subset of the self-archived papers, to
try to show that the OA advantage is not (only, or mostly) a
self-selection effect (Quality Bias [QB]).
(There is undoubtedly a non-zero self-selection [QB] component in the
OA advantage, but there are many other components as well, including a Quality
Advantage [QA], an Early Access Advantage [EA], a Competitive
Advantage [CA, which will, like QB, vanish once all articles are
OA], and a Usage (Download) Advantage [UA]. At 100% OA, there
will no longer be any QB or CA (or Arxiv Advantage [AA]),
but EA, QA and UA will still be going strong. EA and UA components have
already been confirmed by the Kurtz
study in astronomy. QA is implied by the repeated finding of a positive correlation
between citation count and the proportion of those articles with that
citation count that are OA. The mandate study will try to show that
this correlation is causal, i.e., QA, not QB.)
Harnad, S. (2005) OA Impact Advantage = EA
+ (AA) + (QB) + QA + (CA) + UA.
The whole area of the relationship between citation
counts and scholarly communication channels is confused because of
problems associated with quality bias [QB] (e.g., if scholars tend to
self-archive only their best work, as suggested by Kurtz et al. [in
astronomy]; alternatively, it may be that only the best journals are
OA). In other words, differences in citation counts and IFs may simply
reflect the quality of the materials under study rather than having
anything to do with the channel by which the material is made available.
First, the issue is article citation counts, not journal Impact Factors
(IFs).
Second, this is all special pleading. The biggest OA effects are based
on comparing articles within the same journal/year. The size of the
effect is indeed correlated with the quality of the article, because no
amount of accessibility will generate citations for bad articles,
whereas good articles benefit the most from a level playing field, with
all affordability/accessibility barriers removed: that is the Quality
Advantage [QA]. The idea that the Quality Advantage is merely a Quality
(Self-Selection) Bias [QB], i.e., that the advantage is merely
correlational, not causal, is of course a logical possibility, but it
is also highly improbable (and would imply that
accessibility/affordability barriers count for nothing in usage and
citations, and that the better work is being made OA by its authors for
purely superstitious reasons, because doing so has no effect at all!).
Overall, we concur with Craig's introduction
that "the problems with measuring and quantifying an Open Access
advantage are significant. Articles cannot be OA and non-OA at the same
time."
They need not be. It is sufficient if we take a large enough sample of
articles that are OA and non-OA from the same journals and years.
Randomly imposing the self-archiving would be the only way to equate
them completely (and our ongoing study on mandated self-archiving will
approximate this).
(The analysis by Craig, commissioned by Blackwell Publishing, has not,
so far as I know, been published.)
"Further, the variation of citation counts
between articles can be extremely high, so making controlled
comparisons of OA vs. non-OA articles nigh on impossible" [Craig,
Blackwell Publishing]
(The way Analysis of Variance works is to compare variation between and
within putatively different populations, to determine the probability
that they are in reality the same population. The published comparisons
show that the OA/non-OA differences are highly significant, despite the
high variance.)
It would of course be absurd to try to compare citation counts for OA
and non-OA articles having the same citation counts. But we can compare
OA and non-OA article counts among articles having the same
citation counts, in the same journals -- and what we find is a strong
positive correlation between the citation count and the proportion of
articles that are OA (just as Lawrence
reported in 2001, but not only in computer science, but across all 12
disciplines studies so far, and with much bigger sample sizes):
Source 4.8: Hajjem, C., Harnad, S. and Gingras, Y.
(2005) Ten-Year
Cross-Disciplinary Comparison of the Growth of Open Access and How it
Increases Research Citation Impact. IEEE Data Engineering
Bulletin 28(4) pp. 39-47.
Note that the appendix
to the Report under discussion here, states, in connection with the
above study, which it cites:
"Harnad is THE advocate of OA and, thus, whilst
expert in the field, is inevitably biased."
There is a bit of irony in the fact that in connection with another of
the studies it cites:
Source 4.9: Harnad, S, Brody, T, Oppenheim, C et
al, Comparing
the impact of open access versus non open access articles in the same
journals, D-Lib Magazine, 10,(6), 2004,
the appendix of the Report goes on to say:
"Harnad is THE exponent of OA, but, thus,
potentially less objective."
Ironic (or, shall we say, conflicted, since this Report aspires to be a
neutral one as between the interests of the research community and the
publisher community), because the sole named collaborator on the Report
is also a co-author of the above-cited study!
Let us agree that we all have views on the underlying issues, but that
reliable data speak for themselves, qua data, and our data (and those
of others) keep showing the same consistent OA Advantage. The
disagreement is only on the interpretation: whether or not the
consistent correlations are causal. And here, allegiances are tugging
on both sides: Those favouring causality tend to come from the research
community, those favouring a-causality tend to come from the publishing
community. (Let us hope that the data from mandated self-archiving will
soon settle the matter objectively.)
"[since] any Open Access advantage appears to
be partly [sic] dependent on self-selection, the more articles that are
{self-}archived... you'd expect to see any Open Access advantage
reduce." [Craig, Blackwell Publishing]
Note that Craig carefully says "partly" -- and that we agree that
self-selection is one of the many potential contributors to the OA
advantage.
We also agree, of course, that once 100% OA is reached, the OA citation
advantage -- in the form of an advantage of OA over concurrent non-OA
articles -- will be reduced: indeed it will vanish! With all articles
OA, there can no longer be either a Competitive Advantage [CA] or a
Self-Selection Advantage (Quality Bias, QB) of OA over (non-existent)
non-OA.
But the Quality Advantage [QA] will remain. (Higher quality articles
will be used and cited more than they would have been if they had not
been OA: this is not a competitive advantage but an absolute one.) And
the Early Advantage [EA] as well as the Usage (Download) Advantage [UA]
will remain too (as already shown by Kurtz's findings in Astronomy).
"Authors self-archiving in the expectant belief
that each and every paper they archive will receive an Open Access
advantage of several hundred percent are going to be sorely
disappointed." [Craig, Blackwell Publishing]
This too is correct, but who on earth thought that OA would guarantee
that all work would be used, whether or not it was any good? OA levels
the playing field so merit can rise to the top, unconstrained by
accessibility or affordability handicaps. But bad remains bad, and
let's hope that researchers will continue to avoid trying to build on
weak or invalid findings, whether or not they are OA.
The OA advantage is an average effect, not an automatic bonus
for each and every OA article; moreover, the OA advantage is highly
correlated with quality: The higher the quality, the higher the
advantage. It is this effect that is open to the a-causal
interpretation that the Quality Advantage [QA] is merely a Quality Bias
[QB] (Self-Selection). But, equally (and, in my view, far more
plausibly) it is open to the causal interpretation that OA causes wider
usage and citation precisely because it removes all
accessibility/affordability constraints that are currently limiting
uptake and usage. That does not mean everything will be used
more, regardless of quality ("usefulness"): But it will allow users
(who are quite capable of exercising self-selection too!) to access and
use the better work, selectively.
In addition, since the distribution of citations is not gaussian -- a small
percentage of articles receives most of the citations and more than
half of articles receive no citations at all -- it is almost axiomatic
that the OA advantage will be strongest in the high-quality range
Finally, it is worth noting that all
researchers in the field are agreed that if the vast majority of
scholarly publications become available in OA form, no citation
advantage to OA will be measurable.
It is a tautology that with 100% OA, the OA/NOA ratio is undefined! But
EA will still be directly measurable, and it will be possible to infer
UA and QA indirectly (UA by comparing downloads for articles of the
same age, before and after OA for the same articles, and QA by doing
the same with citations; the Kurtz study used such methods in
Astronomy. But by that time (100% OA), not many people will still have
any interest in the a-causal hypothesis.
Thus, what OA advantage there is will prove to
be temporary if OA does become the standard mode of publication.
This, however, is simply incorrect. At 100% OA, the Competitive
Advantage (CA) will be gone; the Self-Selection Advantage (Quality
Bias, QB) will be gone; the method of comparing citation counts for OA
and non-OA articles within the same journal and year will be gone. So
much is true by definition.
But (as Kurtz has shown in Astronomy), the Early Advantage and the
Usage Advantage will still be there. And the Quality Advantage, will
still be there too; and that was what this was all about: Not just a
horse-race for who can make his articles OA first, so as to reap the
competitive advantage before 100% OA is reached (though that's not a
bad idea!); not a guarantee that, no matter how bad your work, you can
increase your citations by making them OA; but a guarantor that with
access-barriers removed, quality will have the best chance to have its
full potential impact, to the benefit of research productivity and
progress itself, as well as the authors, institutions and funders of
the high quality work.
(There is a bit of a [lurid] analogy here with saying that if only we
can get everyone to smoke, it will be clear that smoking has no
differential effects on human health! Perhaps the converse is a better
way to look at it: if only we could get everyone to stop smoking,
smoking will no longer have a differential effect on human health!)
(PS: OA is not a "mode of publication": OA publication is a
mode of publication. OA itself is a mode of access-provision, which can
be done in two ways, via OA publication or via OA self-archiving of
non-OA publications.)
Self archived articles
It is this area that has been most studied, with numerous key
publications. Most of these are focussed on the citation advantage of
self-archived articles rather than of OA journals. Craig, in an as yet
unpublished review, provides an excellent overview of the evidence
collected to date. Lawrence (Source 4.13) is significant because it was
the first major paper that identified a citation advantage for OA
self-archived articles, and it has been widely cited ever since.
However, it was based on a too small-scale a study to support general
conclusions. Harnad et al. (Source 4.9) provides a useful summary of
the state of play of OA advantage studies, while Hajjem et al. (Source
4.8 ) is fairly typical of the many articles produced by Harnad
claiming that self-archiving leads to higher citation counts.
Let us be clear: The many OA vs. non-OA studies, ours and everyone
else's, across more than a dozen different disciplines, many of them
based on large-scale samples, all show the very same consistent
pattern of positive correlation between OA and citation counts.
Those are data, and they are not under dispute. The only
"claim" under dispute is that that consistent correlation is causal...
Antelman
(Source 4.1)
is arguably the most carefully constructed study of the question.
Articles in four disciplines were evaluated, and in each case it was
found that open access articles had greater citation counts than
non-open access articles.
One wonders why this particular small-scale study (of about 2000
articles in 4 fields) was singled out, but in any event, it shows exactly
the same pattern as all the other studies (some of them based on
hundreds of thousands of articles instead of just a few thousand, in
three times as many fields).
Eysenbach
challenges the notion that OA "green" articles (i.e.,
those in repositories) are more effective than OA "gold" (i.e., those
published in OA journals, such as those produced by Public Library of
Science) in obtaining high citation counts. It is this part of his
paper that produced a furious response from Harnad, much of it focused
on particular details.
The issue was not about OA green (self-archived) articles producing
higher citation counts than OA gold (OA-journal)! No one had claimed
one form of OA was more effective than the other in generating the OA
Advantage before the Eysenbach study: It was Eysenbach who claimed to
have shown gold was more effective than green -- indeed that green was
only marginally effective at all!
And I think anyone reading the exchanges will see that all the fury is
on the Eysenbach side. All I do is point out (rather patiently) where
Eysenbach is overstating or misstating his case:
Harnad, S. (2006)PLoS,
Pipe-Dreams and Peccadillos PLoS Biology eletters (May 16,
2006) [1]
[2]
[3]
[4]
Eysenbach's study does find the OA advantage, as many others before it
did. It certainly doesn't show that the gold OA advantage is bigger
than the green OA advantage, in general. It simply shows that for the
1500-article sample in the one journal tested, Proceedings of the National Academy of
Sciences (PNAS), a very high impact journal, both paid OA (gold)
and green OA (free) increased citation counts over non-OA, but gold
increased them more than green. That result is undisputed. Its
extrapolation to other journals is:
The likely explanation of the PNAS result is very simple: PNAS is not a
randomly chosen, representative journal: it is a very high-impact, very
high visibility, interdisciplinary journal, one of very few like it
(along with Nature and Science). Articles that pay for
OA are immediately accessible at PNAS's own high-visibility website --
a website that probably has higher visibility than any single
institution's IR today. So
PNAS articles made freely accessible at PNAS's website get a bigger OA
advantage than PNAS articles made made freely accessible by being
self-archived in the author's own IR.
The reason it definitely does not follow from this that gold OA is
bigger than green OA is very simple: Most journals are not PNAS, and do
not have the visibility or average impact of PNAS articles! Hence
Eysenbach's valid finding for one very high-impact journal simple does
not generalize to all, most, or even many journals. Hence it is not a
gold/green effect at all, but merely a very high-end special case.
Apart from the spurious gold/green advantage, Eysenbach did confirm,
yet again, (1) the OA advantage itself, and confirmed it (2) within a
very short time range. These are both very welcome results (but not
warranting to be touted, as they were, by both the author and by the
accompanying PLoS
editorial, as either the first "solid evidence" of the OA advantage --
they certainly were not that -- or a demonstration that gold OA
generates more citations than green OA: the very same method has to be
tried on middle and low-ranking journals too, before drawing that
conclusion!). (Nor are the PLoS/PNAS results any more exempt from the
methodological possibility of self-selection bias [QB] than any of the
many prior demonstrations of the OA advantage, as authors self-choose
to pay PNAS for gold OA as surely as they self-choose to self-archive
for green OA!)
The fury on Eysenbach's part came from my pointing out that his and
PLoS's claim to primacy for demonstrating the OA advantage (and their
claim of having demonstrated a general gold-over-green advantage) was
unfounded (and might have been due to both PLoS's and Eysenbach's zeal
to promote publication in gold journals: Eysenbach is the editor of one
too, but not a high-end one like PNAS or PLoS): Eysenbach's was just
the latest in a long (and welcome) series of confirmations of the OA
advantage (beginning with Lawrence 2001), the prior ones having been
based on far larger samples of articles, journals and fields (and there
was no demonstration at all of a general gold over green advantage:
just the one non-representative, hence non-generalisable special case
of PNAS).
Both authors believe that OA produces a
citation advantage, but Eysenbach has presented evidence that casts
doubt on Harnad's notion that the "green" route is the preferred route
to getting that increased impact.
Green may not be the preferred route to OA for editors of gold
journals, but it is certainly the preferred route for the vast majority
of authors, who either have no suitable gold journal to publish in, or
lack the funds (or the desire) to pay the journal to do what they can
do for free for themselves. The only case in which paid gold OA may
bring even more citations than free green OA (even though both increase
citations) is in the very highest quality journals, such as PNAS, today
-- but that high-end reasoning certainly does not generalise to most
journals, by definition. (And it will vanish completely when OA
self-archiving is mandated, and the harvested IR contents become the
locus classicus to access the literature for those whose institutions
are not subscribed to the journal in which a particular article
appeared -- whether or not it is a high-end journal.)
(There is also a conflation of the (less interesting) question of (1)
whether green or gold generates a greater OA citation advantage
[answer, for high-end journals like PNAS, gold does, but in general
there is no difference] with the (far more important) question of (2)
whether green or gold can generate more OA [answer: green can
generate far more OA, far more quickly and easily, not just because it
does not cost the author/institution anything, but because it can be
mandated without needing either to find the extra funds to pay for it
or to constrain the author's choice of which journal to publish in].
However, despite the intuitive attractiveness
of the hypothesis that OA will lead to increased citations because of
easier availability, the one systematic study of the reasons for the
increased citations - by Kurtz (Source 4.12) - showed that in the field
of astronomy at least, the primary reason was not that the materials
were free, or that they appeared more rapidly, but that authors put
their best work into OA format, and this was the reason for increased
citation counts.
Astronomy is an interesting but anomalous field: It differs from most
other fields in that:
(1) Astronomy consists of a small, closed circle of journals.
(2) Virtually all research-active astronomers (so I am told by the
author) have institutional access to all those journals.
(3) For a number of years now, that full institutional access has been
online access.
(4) So astronomy is effectively a 100% OA field.
(5) Hence the only room left for a directly measurable OA advantage in
astronomy is (5a) to self-archive the paper earlier (at the preprint
stage) [EA] or (5b) to self-archive it in Arxiv (which has evolved
into a common central port of call, so it generates more downloads and
citations -- mostly at the preprint stage, in astronomy).
(6) What Kurtz found, was that under these conditions, higher quality
(higher citation-count) papers were more likely to be self-archived.
(7) This might be a quality self-selection effect (QB) (or it might
not), but it is clearly occurring under very special conditions, in a
100% OA field.
(8) Kurtz did make another, surprising finding, which has bearing on
the question of how much of a citation advantage remains once a field
has reached 100% OA.
(9) By counting citations for comparable articles before and after the
transition to 100% OA, Kurtz found that the citations per article had
actually gone down (slightly) rather than up, with 100% OA.
(10) But a little reflection suggests a likely explanation: This slight
drop is probably a shift in balance with a level playing field:
(11) With 100% OA (i.e., equal access to everything), authors don't
cite more articles, they cite more selectively, able now to
focus on the best, most relevant work, and not just on the work their
institutions can afford to access.
(12) Higher quality articles get more citations, but lower quality
articles of which there are far more (some perhaps previously cited by
default, because of accessibility constraints) are cited less.
(13) On balance, total citations are slightly down, on this level
playing field, in this special, small, closed-circle field (astronomy),
once it reaches 100% OA.
(14) It remains to be seen whether total and average citations go up or
down when other fields reach 100% OA.
(15) What Kurtz does report even in astronomy is that although total
citations are slightly down, downloads are doubled.
(16) Downloads are correlated with later citations, but perhaps at 100%
OA this is either no longer true, or true only for higher quality
articles.
Similarly, more carefully conceived work on the
impact of both OA journals and self-archiving on the quality of
research communications, especially on the peer review system, will be
required.
OA journals are peer-reviewed journals: What sort of impact are they
feared to have on peer review?
And why on earth would the self-archiving of peer-reviewed, published
postprints have any impact on the peer review system? The peers review
for free. (Could this be just a veiled repetition of the question about
the impact of self-archiving on journal revenues, yet again?)
Recently, the results of a study undertaken by
Ware for ALPSP, which were published in March 2006 (Source 1.16, in
Area 1), have provided at least some initial data on the question of
the possible linkage between the availability of self-archived articles
in an OA repository and journal subscription cancellations by
libraries...: availability of articles in repositories was cited as
either a "very important" or an "important" possible factor in journal
cancellation by 54 per cent of respondents, even though ranking fourth
after (i) decline of faculty need, (ii) reduced usage, and (iii) price.
When respondents were invited to think forward five years, availability
in a repository was still fourth-ranking factor, but the relevant
percentage had risen to 81. Whilst this is not evidence of actual or
even intended cancellation as a consequence of the growth of OA
self-archiving repositories, it strongly suggests that such
repositories are an important new factor in the decision process, and
growing in significance.
Summary: No evidence of cancellations, but speculations by librarians
to the effect that their currently fourth-ranking factor in
cancellations might possibly become more important in the next five
years...
Sounds like sound grounds for fighting self-archiving mandates and
trying to deny research the benefit of maximized impact for yet another
five years -- if one's primary concern is the possible impact of
mandated self-archiving on publishers' revenue streams. But if one's
primary concern is with the probable impact of mandated self-archiving
on research impact, this sort of far-fetched reasoning has surely
earned the right to be ignored by the research community as the
self-serving interference in research policy that it surely is.
Stevan Harnad
American
Scientist Open Access Forum