APPENDIX 75
Memorandum from Dr Robert Cannon, Dr Nigel
Goddard, and Dr Fred Howell, Axiope
Summary: Open Access to scientific publications
will save public money, will facilitate scientific communication
and will improve access to scientific research by the general
public. But opening access to written publications is only the
first step in using information technology to improve scientific
productivity: the really exciting possibility is to open up access
to the actual research data underlying publications, since this
would allow significantly more value to be extracted from expensively
gathered data sets. Opening access to the electronic research
data on which publications are based is technically feasible and
would yield dramatic productivity benefits to the scientific community
and the nation. The prime example where public access to data
has shown its effectiveness is the genomic databaseswhich
have spawned entire new disciplines (eg, bioinformatics) and transformed
all other biomedical fields. In order to extend this success in
new fields, such as systems biology, functional genomics and systems
neuroscience, it is essential to widen access to actual primary
data (and not just the publications summarising the findings).
Open access to data also offers benefits to commercial publishers,
including a path forward for the traditional publishers that embraces
the Open Access model. A policy of open access to data is widely
supported by funding bodies but requires changes in the ways science
is funded and evaluated before it can be achieved.
1. Drs Cannon, Goddard and Howell are researchers
in Neuroinformatics, the combined field of Informatics and Neuroscience,
at the University of Edinburgh. They are also co-founders in a
software company, Axiope Limited that is developing software to
facilitate scientific communication and data sharing in accordance
with themes expressed here. We became concerned about access to
scientific data because we were aware of many exciting potential
research projects that were nevertheless difficult or even impossible
to undertake because of the inability to access the results of
publicly funded research.
2. Access to computers and to the Internet
is dramatically changing the way people work. Many actions that
used to be difficult, time consuming or expensive are now almost
free. For example, it takes a few seconds, and costs almost nothing
to view previous written evidence to this committee. However,
the existence of technology that can improve productivity does
not necessarily imply that it will be used. This is particularly
true within a scientific context where individual researchers
are encouraged to be independent. Developments that are generally
beneficial to the community may not actually take place in practice
because they are not locally beneficial to individual researchers
or organisations.
3. Science is in an excellent position to
benefit from technological developments in network infrastructure
and software but has been slow in making good use of it. Thanks
to substantial investment in JANET (the Joint Academic NETwork)
UK science has benefited from almost universal high speed internet
(well in excess of domestic broadband) for the last 10 years.
The reasons for the delay in realising the benefits of Internet
technology are partly technical, and partly cultural. Technical
reasons include availability of suitable software and the training
of researchers in IT skills. The cultural reasons concern the
motivation and reward systems in place in British and global science.
Many organisations are working to address different aspects of
the technical problems. For the full benefits of this technology
to be realised, government action is required to adjust the conditions
in which scientists operate in order to favor behaviours that
are ultimately of benefit all round. Some public-spirited researchers
are using the Web in an ad-hoc fashion to publish their data,
and some journals allow "supplementary data" to be uploaded
onto their sites, but there is no systematic approach to, or requirement
for, making scientific data public.
4. As scientists ourselves who depend on
the work of other researchers, we are keen to demonstrate that,
whatever mechanism is in place for publishing scientific papers,
the new technologies have enormous potential for improving scientific
productivity. The need for many scientific studies goes beyond
gaining access to the text and graphs that another researcher
puts in a paper. There is a need to gain access to the work itself:
to the primary data gathered during experiments from which the
publication is derived. Normally, very little provision is made
for the archiving, cataloging or storage of this data. The result
is that even the researchers who gathered it are unable to locate
or reuse it even just a few years after the publication of a paper.
Researchers themselves are not happy with this situation, but
under the existing system where publishing selected results in
high profile journals is paramount, it is very hard to justify
the time or resources required to do anything else.
5. Nevertheless, with external promoting,
some disciplines are moving in the direction of regarding data
publication as standard and the benefits are clear. For example,
the availability of genomic datasets (not just papers about them)
in public databases has spawned a huge growth in integrative and
cross-disciplinary research. Increasingly, particularly in biology
and medicine, new research depends on data from many different
experimental techniques, often from different laboratories. For
example, the benefits of having the raw data available for anyone
to analyse has prompted the gene expression community to encourage
major journals (including Nature, Science, The Lancet) to require
all papers based on gene expression micro-array experiments to
also make the raw data from their experiments freely available
online. This pioneering effort has led to the exhaustive recommendations
for presenting and publishing results from micro-array experimentsthe
"MIAME" requirements. The consequence is that there
is now the potential for extracting more value from experiments,
which have already been done. Furthermore, there appear to be
no losers in the game: those who collect the initial data see
it being used in ways they had never dreamed of. The other users
are able to do research that would have been impossible without
publication of the data.
6. The principle that research data gathered
at public expense should be made available for maximal use by
other scientists has already been adopted by several public funding
agencies. The National Institutes of Health in the United States,
has adopted a requirement[295]
from 1 October, 2003, in the case of grants in excess of $500,000
per annum, that data be made publicly available. The NIH statement
makes it clear that the obligation will be extended as methods
and technologies become available to achieve this. The UK MRC
has a policy on data sharing[296]
that begins with the statement:
"The MRC expects investigators supported
by MRC funding to make their research data available in a timely
and responsible manner to the scientific community for subsequent
research with as few restrictions as possible."
More recently, behalf of a consortium comprising
BBSRC, MRC, Wellcome Trust, JISC, DTI and NERC the MRC has issued
an invitation to tender for analyses of the data sharing landscape
in the life sciences[297].
The same sentiments have been endorsed by the OECD[298]
with a statement by the Dutch Minister of Education Culture and
Science including the following:
"It is obvious that Open Access will be
a necessary condition to realize the potential of research data
as the floating capital of global science. Governments of OECD
countries spend about $650 billion annually on research, expanded
use of data sources could impressively increase the taxpayers'
value of this expenditure."
We note that the term "Open Access"
is used here to refer to primary research data, not research publications.
7. Attitudes towards data sharing in the
scientific community cover the whole range from those who regard
it as scandalous that data is not routinely accessible, to those
who regard the data they collect as their own personal property
and resent any suggestion that it should be made available to
other researchers. What is clear, however, is that preparing material
for archiving and sharing incurs a cost in both time and materials.
While the eventual benefits are great, the main beneficiaries
are other scientists, not those who are responsible for doing
the work. Therefore, to achieve the transition into a data-sharing
culture, which is of greater benefit to everyone, individual scientists
need immediate reasons to prepare and publish their data. One
such inducement is the requirement by some journals that the data
upon which a paper is based must be in a public archive before
the paper can be published. Another possibility is that funding
councils impose conditions on grants that researchers must publish
the resulting data (as the NIH is beginning to do). A third possibility
is to introduce a system that adequately rewards researchers for
making data publicly available, perhaps with conditional grant
extensions of funds to cover data publication costs.
8. Traditional scientific publishers have
become content providers who face the challenge of the Open Access
model, as put forward by, for example, BioMed Central and the
Public Library of Science. We see significant opportunity for
the traditional publishers to refocus their business on value
added to the content provided, at no cost, by scientists. It is
already the case that journals which may charge for access to
publications do not charge for access to the data underlying those
publications, in the cases where it is available. Typically these
data are held in publicly funded database (eg, GenBank), and it
is clear that scientists will not agree to deposit their primary
data exclusively in a commercial database. It is vital that the
ownership of the primary research data remains in public hands
and publicly accessible. Traditional publishers can refocus their
services on adding value through curation, cross-referencing and
providing the best possible search and discovery facilities. There
is a parallel with the Open Source movement in software development.
Traditional software companies (including the largest players
such as IBM) have found that they can increase their value by
promoting Open Source and providing additional services. Scientific
publishers need to make a similar transition.
9. Recommendations. The Government has a
crucial role to play in catalyzing the transition to a scientific
culture in which research data is routinely stored in publicly
accessible archives. It can do this by:
(a) providing funds and assistance for pioneer
research groups to establish complete electronic archives of their
results, and encouraging the administrators of these archives
to co-operate with commercial entities, which provide added-value
services.
(b) acknowledging the importance of making
data publicly available by rewarding data publication in terms
of the Research Assessment Exercise and other evaluations.
(c ) In due course mandating that all publicly
funded research data should be publicly accessible within a specified
time of acquisition (subject to patient confidentiality conditions,
etc).
February 2004
295 NIH Data Sharing Policy: http://grants2.nih.gov/grants/policy/data_sharing/ Back
296
MRC policy http://www.mrc.ac.uk/index/strategy-strategy/strategy-science_strategy/strategy-strategy_formulation/strategy-other_initiatives/strategy-data_sharing/strategy-data_sharing_policy-link Back
297
Joint council's initiative http://www.mrc.ac.uk/index/strategy/strategy-science_strategy/strategy-strategic_implementation/strategy-data_sharing-link Back
298
OECD on Access to Research Data from Public Funding" http://www.oecd.org/document/41/0,2340,en_2649_201185_26391529_1_1_1_1,00.html Back
|