Central versus institutional self-archiving

Thursday, September 21. 2006

SUMMARY: NIH's, PLoS's, the Wellcome Trust's and now the UK MRC's unreflective support for PubMed Central (PMC), a Central Repository (CR), as the locus for direct self-archiving by authors is very unfortunate for Institutional Repositories (IRs), for self-archiving, and for Open Access (OA) progress in general. Alma Swan has published key papers on both OA self-archiving policy and institutional versus central self-archiving (IRs vs. CRs) analysing the reasons.
      (a) Institutional self-archiving and central self-archiving are at odds in the quest for a universal self-archiving policy solution that will cover all OA research output.
      (b) It would be awkward and inefficient to have a different external cross-institution CR as the locus of primary deposit for every funding area, subject area, combination of subject areas, or nation.
      (c) Researchers' own IRs are the most natural and efficient way to scale up to covering all of OA space from all disciplines, institutions and nations.
      (d) Direct central self-archiving is already obsolete in the OAI era of interoperable OAI-compliant IRs.
      (e) The optimal solution is for researchers to self-archive their own papers in their own OAI-compliant IRs and for CRs to be harvested from those distributed IRs.
      (f) Universities are in the best position to mandate self-archiving and monitor and reward compliance.
      (g) Mandating self-archiving in CRs instead simply creates an unsystematic and incoherent policy that does not scale up to covering all research output from all research institutions.
      (h) What the NIH, Wellcome Trust and MRC should be mandating is not direct depositing in PMC, but universal depositing in the fundee's own IR, from which PMC can then harvest collections.

Let me try to explain why unreflective support for PubMed Central (PMC, and UK PMC) as the locus for direct self-archiving by authors is very unfortunate for Institutional Repositories (IRs), for self-archiving, and for Open Access (OA) progress in general. The reason is very simple, and I very much hope that it will be given some thought by the many who are currently unquestioningly promoting central self-archiving. (Please note that this has nothing to do with the existence and enormous value of PMC itself: only with whether or not PMC (or any other Central Repository) should be the place where authors self-archive their papers, and the place where institutions and funders mandate that authors should self-archive their papers -- instead of self-archiving them in their own IRs.)

(1) PMC and UK PMC are grounded in two things, (i) the pre-OAI and pre-IR central-archiving model originating from the early and very successful Physics Arxiv and (ii) Harold Varmus's -- and hence NIH's, PLoS's, the Wellcome Trust's and now the UK MRC's fixation on the central (indeed the PMC) model of OA self-archiving. That self-archiving model is already obsolete in the OAI era of distributed, interoperable OAI-compliant IRs.

(2) Although they appear to be complementary -- after all, OAI renders all OAI-compliant archives, whether central or institutional, interoperable, and hence equivalent -- in reality, at this critical point in the evolution of OA self-archiving policy-making, (a) institutional self-archiving and (b) central self-archiving are profoundly at odds with one another in the quest for a systematic, universal self-archiving policy solution that will systematically scale up to cover all research output, from all institutions, in all disciplines, worldwide.

(3) In the OAI-interoperable age, the natural and optimal solution is for researchers to self-archive their own papers in their own OAI-compliant Institutional Repositories (IRs) and for whatever central archives one may wish to have -- whether subject-based or funder-based or national -- to be harvested, via the OAI protocol for metadata harvesting, from the distributed local IRs, rather than deposited, (or re-deposited) directly. That is what the OAI metadata-harvesting protocol was created for!

(4) So although on the surface it looks as if there is room for complementarity, pluralism, and parallelism between Central Repositories (CRs) and Institutional Repositories (IRs), the question of what their optimal interrelationship should be is far more complicated insofar as formulating a systematic, effective OA self-archiving policy is concerned, and ensuring that the policy will scale up to cover all of OA space. There is a profound and important strategic conflict specifically related to institutional and research-funder self-archiving policy (mandates).

(5) Dr. Alma Swan has published key papers on both the subject of OA self-archiving policy and the subject of institutional versus central self-archiving (IRs vs. CRs).

(6) The gist of the strategic and practical conflict between IRs and CRs, as well as the basis for resolving it, is the following:

(7) Universities (and other research institutions) are the primary research providers. It is their researchers who conduct and publish the research. It is they and their researchers who are in a position to provide OA. It is they and their researchers who co-benefit from providing OA by self-archiving their own research output. The natural place for them to self-archive their own research output is in their own respective (OAI-compliant) IRs. This covers all the output of all their disciplines (some research institutions have just one research speciality, whereas others, including all universities, cover most or all research specialties).

(8) Universities (and other research institutions) are real entities, with their own institutional identity, and it is their own institutional visibility and productivity and research impact (along with the impact and progress of research in general) that they are motivated and indeed necessitated to promote and foster. CRs, in contrast, do not correspond to institutional entities with needs of their own. (The partial exception is when a CR is research funder-based, where the funder is an entity with interests. I will return to this.)

(9) Universities (and other research institutions) are also the ones that are in the strongest position to mandate the self-archiving of their own research input, as well as to monitor and to reward compliance with their self-archiving policy. (Again, the only exception is a research funder, or a national government.)

(10) Universities (and other research institutions) are helped in their efforts to mandate OA self-archiving by OA self-archiving mandates from the funders of their research, but (a) not all their research is funded, (b) it would be extremely awkward and inefficient if for a single institutions' authors, there were a different external cross-institution CR that needed to be desposited in for every funder and every subject and every other possible combination of subjects (and nations!) .

(11) Instead, the natural and efficient way to gather content into CRs -- whether funder CRs or subject-based CRs or multidisciplinary CRs or national CRs -- is to selectively harvest their contents from the individual, distributed IRs of the researchers' own institutions.

(12) IRs are also the most natural and efficient and systematic and universal way to scale up to cover all of OA space -- originating from all disciplines, at all institutions, in all nations.

(13) A few generic OAI-compliant CRs are fine for provisionally or even permanently depositing research by researchers whose institutions do not yet have an IR (or by researchers who do not even have an institution!); but apart from that, direct depositing in CRs is extremely counterproductive at a time when self-archiving has not yet been established as a systematic research imperative.

(14) The optimal thing for both research institutions and funders to do now is to mandate self-archiving in the researcher's own IR (except where a default generic CR is needed because the researcher's institution does not yet have an IR).

(15) Compliance can be monitored and rewarded, primarily by the researcher's own institution, but also through the grant-fulfilment conditions of the funder.

(16) This will systematically scale up to cover all disciplines, at all institutions, globally.

(17) If central self-archiving (e.g., in PMC) is mandated instead, that simply creates an unsystematic and incoherent policy that does not translate into a general means of covering all research output of all research institutions.

(18) The NIH, Wellcome Trust and MRC self-archiving policies (though they make important contributions to OA) are hence complicating and retarding progress toward a universal, systematic solution toward making all institutions' research output OA because of their insistence on direct deposit in PMC.

(19) What the NIH, Wellcome Trust and MRC should be mandating is not arbitrary direct depositing in PMC, but universal depositing in the fundee's own IR, from which PMC (and any other CRs) can then harvest collections, if they wish.

(20) In this way, institutional and funder self-archiving mandates can be synergistic instead of antagonistic (confusing researchers about where to self-archive, arousing resentment about the need to do multiple deposits; failing to generalize and scale up to a systematic, universal self-archiving policy and solution, for all institutions, disciplines, funders and nations, and in general retarding instead of accelerating progress in the formulation of effective and compatible self-archiving policies globally).

(21) The last point is that not only is primary depositing in CRs a very bad idea, but in the OAI-age CRs need not "house" the full-texts at all: they really only need to be "virtual archives" in much the way that google or OAIster is: They harvest the metadata and links, allow focussed search, and then point back to the IRs for accessing the full-text itself. The notion of having to have one central "place" in which to put all papers is obsolete in the OAI age. (I am not referring to redundancy and preservation issues, for which some duplication is useful and indeed necessary; I am referring to the fallacious notion that we need CRs in order to have the target content for searching and accessing "all in one place." We do not; and we should not. Yet I am almost certain that this is the main reason so many people think they need a CR!)

Many well-meaning advocates of OA do not yet understand much of this, imagining that CRs like PMC will in some mysterious way manage to cover all of OA space. I hope the summary above will help to redirect the welcome and important contributions of the supporters of the NIH-PLoS-Wellcome-MRC OA initiatives in a direction that is more helpful for scaling up to cover the world's research output as a whole.

Pertinent Prior American Scientist Open Access Forum Topic Threads:
"Central vs. Distributed Archives" (began Jun 1999)
"PubMed and self-archiving" (began Aug 2003)
"Central versus institutional self-archiving" (began Nov 2003)
"Harold Varmus: 'Self-Archiving is Not Open Access'" (began June 2006)
Stevan Harnad
American Scientist Open Access Forum