The Open Journal Project1, Multimedia
Research Group, Department of Electronics and Computer Science, University
of Southampton, Southampton SO17 1BJ, United Kingdom
* Electronic Publishing Research Group, Department of Computer Science, University of Nottingham, University Park, Nottingham NG7 2RD, United Kingdom
Contact for correspondence: email@example.com
Of these new features, links are one of the most important. Since 1995 the Open Journal project has been applying original software tools and techniques to support flexible linking in e-journal applications, based on selected journals which were available electronically but which were are not all exclusively electronic. In doing so the project foretold the impact that links will have. Links are not a superficial feature of the Web, nor are they simple add-on features for e-journals. Links have the power to alter the character of journals fundamentally, most obviously in the development of 'distributed publishing' in which users can find items of interest irrespective of the publisher (Dixon 1998). Ultimately, distributed publishing may transform the way in which individual documents are compiled by sharing components or 'objects', figures say, from different sources, and by using network-based software processes, or services, to enhance presentation.
Links are important for a number of reasons:
The project's impact has also been marked by its reporting of e-journal
developments more generally (Hitchcock et
al. 1996, 1997a). Informed by these
findings and the reported experiences of other users and publishers, elsewhere
we assess the future for e-journals more broadly than for the Open Journal
approach alone, asking how we can make the most of e-journals, again with
the user perspective principally in mind. 2
Resources of a wide variety of types, from primary journals to databases, were generously provided by a group of twelve publisher partners (see the Appendix) most of which supported the project throughout its full period and some of which continue to work with the project developers.
Initially the functionality provided by the link service extended linking capabilities to documents in any format (Carr et al. 1995), although for journals in the project it became clear that this could be relaxed to the two main formats above. There was discussion about including capabilities for pages in TeX, which is possible, and this might be important for subject areas with more mathematical content, but this was beyond the scope of the project. The linking tools will be compliant with the linking component of XML, an important new format for Web documents that emerged during the latter months of the project (Carr et al. 1998a). Full compliance will be formalised following publication of the XLink and XPointer standards.
Recognising that it will never be possible to serve the information
needs of users from single resources or single Web sites, a vital feature
of this approach was that links could be applied to documents wherever
they are on the Web - the distributed journal scenario again.
|Open Journal||Citation linking||Keyword linking||PDF linking||Release status|
|Cognitive Science||Yes||Open release; closed end May 1998|
|Biology||Yes||Yes||Released to selected evaluators|
|Computer Science||Yes||Yes||Yes||Internal project release|
The project was fortunate to be able to work with data provided by the Institute for Scientific Information (ISI) from its citation indexes. These indexes not only include abstracts from papers but the references too, and this is the basis of its distinctive services, now translated to the company's Web of Science (Hitchcock et al. 1998a). Working with a much smaller, but still substantial (500 MB), data set than Web of Science, the project demonstrated forward and backward linking within the secondary data but went further, extending the linking capability to remote but accessible full-text journals. For those journals it was shown that references could be linked - where the data set allows; in this example the data set was not big enough for comprehensive linking - to the secondary data or, potentially, to other full-text journals independently of the established authoring and publication process.
An important and successful component of citation linking was a software-based citation agent developed for use in the project. This agent recognises reference data within a downloaded paper, matches the citations against a pre-indexed database of abstracts and links the references to entries in the database where matches are found. All of these actions are performed in 'real time' as the requested paper is downloaded. Conceived as an autonomous processor, the agent was partitioned as a 'library' of functions to allow it to be integrated with other programs. In the project this approach was used to enable the agent to be used with the link service and PDF software. In continuing post-project work with publishers, described below, the citation agent is the common feature of the planned applications.
Some other research projects are developing tools for citation linking. While the Open Journal project emphasised text recognition, matching and linking, other projects are concerned with software agents that can find cited works on the Web (Han et al. 1997), and improved search services and parsing of other document formats (e.g. Postscript) to build automatic and comprehensive citation indexes (Giles et al. 1998). Another approach to serving links separately from documents is Hyper-G, an electronic publishing package based on a Web server with an object-oriented distributed network database and a separate link database (Schmaranz 1996). Hyper-G has been used in journal projects with Springer-Verlag and Academic Press.
In one sense serving links in this way is a rough-and-ready but practical method for implementing relationships, represented explicitly as links, between documents from different sources. This is a flexible approach but because no control is exercised over the documents the links can be unstable and need to be rigorously mantained. At the other end of the spectrum, an ideal application-independent and stable way of identifying documents or their components might be the Digital Object Identifier (Davidson and Douglas 1998). Publishers to have prototyped applications represented in the the gallery of the International DOI Foundation include Academic Press, Elsevier, Springer-Verlag and Wiley. The DOI is not yet an accepted standard, and to become so would require wide agreement. While links simply provide access to works, however, the DOI has a more demanding remit: "The intent of the (Association of American Publishers) AAP's Enabling Technologies Committee (which designed the DOI) was to support copyright protection, while ameliorating inconvenience to users, by supporting technology that promotes interoperability' (Rosenblatt 1997). It is not clear that these objectives can be easily reconciled without compromising user access. There are also concerns about possible limitations, such as restricting organizations that are permitted to assign DOIs to 'legitimate' publishers.
Between the DOI and link serving is Hellman's (1998) proposal for the Scholarly Link Specification Framework (SLinkS) which applies DOI-like identifiers to documents from different publishers but controlled via an intermediary service.
It is already clear from a number of publishing arrangements that the electronic scholarly literature will be dominated by cross-linking on citations between different journals and services. ISI Links has been announced as its means of mediating citation linking between Web of Science, collaborating publishers and subscribing institutions. Linking applications where links are applied between different journals and documents directly managed by a single publisher have been described for the BioMedNet service (in Hitchcock et al. 1998b), HighWire Press (Rubinstein 1997) and the Institute of Physics (Dixon 1998).
Some Web-based abstracts services enable third-party users to create links to entries in these sites. The best known is the National Library of Medicine's Medline service, the basis of widespread citation linking in biomedical fields. NCBI Citation Matcher allows users to find the Medline ID of any article in the database, given its bibliographic information, and to use that ID in a URL to retrieve the record. A related development, the PubMed project, additionally links back out from Medline entries to full-texts on the servers of cooperating publishers.
The Astrophysics Data System Abstract Service also helps with bibliographic
code querying to link directly to abstracts from outside the abstract
Given the ubiquity of keywords within quality journals, and the frequency with which terms might appear given their position in the classification hierarchy, it became a relatively simple task programmatically to develop and display large numbers of keyword links (Hitchcock et al. 1998b). This creates problems for users, in this case the classic problem of information overload through too many links. Links appear to be random and are not well labelled, in other words, users are unsure where a link will take them or whether following the link will be useful.
Links to dictionaries or glossaries produced from keywords can be more intuitive and useful, but even here the response was equivocal. Dictionaries are often assumed to be low-level texts, useful for novices and students. Specialists do not want to see links to dictionaries in research texts, as was the case in the Open Journal of Biology. Ironically the dictionary linked in this case, the Dictionary of Cell Biology, is produced by and for specialists and, being available on the Web, is the most up-to-date resource of its kind in a fast- moving field in which new terms are being created constantly. For users it seems that the benefit of even a well-labelled link is determined by their knowledge of the source being linked.
It would be easy to dismiss keyword linking on this evidence, but the opportunity for new, informed perspectives implemented through this type of link is worth pursuing. These links will almost always appear within the body of a text, not at a point that can be conveniently extracted. Citation links alone, while vital and useful and the obvious next step for e-journals, will in the long term be insufficient as a way of identifying relationships between texts or of creating new perspectives. There will be some technical refinements to the linking framework, and work continues to develop tools which enable users to reduce the number of links displayed and to apply links more selectively (Carr et al. 1998b). This is more of a culture gap than a technology gap, however, which on the part of the author, or link author, requires a better understanding of text structures (Renear 1997) and of the relationships between texts; and on the user side, requires more experience of this type of linking and raised expectations through better implementations.
One way forward, instead of using ready keywords, is to look again at texts for the occurrence of what might be called link words, quite a different concept from keywords and which are created with a better understanding of the linking strategy. With new demands being placed on e-journals perhaps link words will become an editorial task as common as creating keywords today.
On a smaller scale it would be possible to use this technique effectively
within single journals, where the effect of keyword linking would be to
overlay the journal index as links on the electronic text archive.
It required a major effort, but the project produced a working service for linking from PDF (Probets et al. 1998). It remains to be seen whether the dominance of PDF prevails for e-journals, but since it is more cost-effective to work with given formats rather than convert then PDF linking could be an important tool. Although converting references to HTML is one alternative, it does not address the need for links, keyword links perhaps when the implementation is refined, within the body of papers.
There is, however, the danger that the PDF tools will have to be updated every time there is a change to the linking framework, as happened during the project, or each time Adobe changes the specification for PDF or the way in which it supports the format. In principle, because Open Journal linking is a server-side process (Carr et al. 1998b), this approach is both independent of the platform used and of the version of a given software application on the user's machine. In practice, communication between Adobe client and server software has been version dependent - there are incompatibilities between Adobe Acrobat Reader versions 2 and 3, for example.
Adobe PDF may be a de facto standard as far as e-journals are
concerned, but it is still a proprietary format, and this is not ideal
in an environment such as the Web which promotes the use of open, public
standards that are intended to allow improved interoperability between
|"It's a great service!"||"It is a WONDERFUL idea. However..."|
"It would be a good idea to have an opportunity for marking citations."
Yes, they could have been better. Mostly this answer is informed by results, as a development project should be, but other elements might have been foreseen. The Open Journals would have been better if we had:
This could be discussed at length, but of wider importance, given the results reported here, is whether the concepts demonstrated in the project might be more broadly applicable and enduring. There are two reasons why they might be:
Links can be created programmatically in large numbers, but results
suggest that more precision and control over the presentation of links
is needed. To assist, link editing tools have to be further developed to
enable authors, editors and other content and information service developers,
as well as programmers, to manage links for different applications.
First examples of reformatting to support linking include conversion of reference sections to HTML, as demonstrated by the Institute of Physics' e-journals service. This service, which uses PDF to present papers, links citations from extracted HTML to a database of abstracts held by the publisher.
Alternatively, publishers may be tempted towards the Web-based successor to SGML, XML, if it delivers more cost-effective production, particularly if e-journals can generate their own independent income streams to support this development. Potentially XML, with its linking components, offers significantly more native capability for linking applications than does PDF, or even HTML, but it is not yet widely used.
There is belated recognition that e-journals must offer more than the
printed equivalent and citation linking will be the first example. There
are a number of possible effects. As more data is shared, how will it be
managed, by whom and where? As shared data sources become larger, will
static linking be adequate in a fast-changing, expanding data environment?
It is possible that anarchic users will stretch the limits of what is acceptable, but the motivations for change can be seen even among established publishers, who invariably have limited access to Web users. What if a publisher could extend its reach by placing links to its works directly into other services, library services for example? How could these links be maintained, updated and managed? Is it possible that respected publishers might want to do this, interact with other services?
One is. The Institute of Physics' Stacks service - 'the ultimate linking service' - generates tables of contents (TOCs) with embedded hyperlinks, and is aimed at librarians, other publishers, aggregators, abstracting and indexing services and producers of information gateways. In contrast to the project's link service which can manage link inclusion in Web pages independently of the data creator, Stacks delivers TOCs and link data via email or file transfer for implementation by the local service provider.
Is this a more practical approach than the project has applied, more likely to appeal to publisher needs, or is it simply more limited and less flexible? Whichever, an important principle has been demonstrated by two developments independently: data, not just computers, are becoming perpetually more distributed on the Web. No data provider can survive alone. Data will be shared and interactive, and not just at the user level. Again, this is recognised in the XML initiatives (Khare and Rifkin 1997). The sooner this is more widely recognised, the more likely that established cultures can begin to change and efforts can be directed towards building an online information environment in which new opportunities to serve users can flourish, rather than trying to constrain this environment by imposing other publishing models.
The legacy of the Open Journal project may eventually
be commercial applications built by publishers and supported by commercial
tools4 first tested in the project. Perhaps
a broader legacy will be to have contributed to developments leading towards
distributed data, by motivating the user benefits at a time when the prevailing
culture, especially among information providers, was difficult to reconcile
with the emerging needs.
2 This paper has been developed from a presentation given at a one-day seminar Making the most of e-journals in April 1998 at Loughborough University, UK, organised jointly by the UK Online User Group (UKOLUG) and the UK Serials Group. To get the complete story from that presentation this paper can be read in conjunction with a paper published elsewhere in which we draw a broader picture of the needs of e-journals, not just from the Open Journal perspective, and how we might make more of them. We discover that the capabilities which it seems are most widely desired remain limited by the prevailing framework for commercial journals publishing. Some non-prescriptive solutions to this problem are suggested. The slides from the Loughborough talk are also available.
3 Personal correspondence with Ann Okerson in January 1998. Based on NewJour data published and unpublished at that time Ann said: "If there is any way to put numbers on this ejournal movement, I would say that 5,000 is very conservative -- but that 5,000 would be 'real' journals and that number will be sky high a year from now."
4 A version of the link service software
is available from Multicosm Ltd,
although it does not currently support journal applications as developed
in the project. Negotiation continues with the company with a view to commercialising
the link service for publishers, possibly with the additional components
built by the project. This process will be informed by demand from publishers,
particularly those experimenting with their own applications.
Carr, L., De Roure, D., Hall, W. and Hill, G. (1998b) Implementing an Open Link Service for the World-Wide Web. World Wide Web, Vol. 1, No. 2 , 61-71 http://www.staff.ecs.soton.ac.uk/~lac/imp.pdf
Carr, L., De Roure, D., Hall, W. and Hill, G. (1995) The Distributed Link Service: a Tool for Publishers, Authors and Readers. World Wide Web Journal (special issue, Proceedings of the Fourth International World Wide Web Conference) No. 1, Winter 1995/96 http://www.w3.org/pub/Conferences/WWW4/Papers/178/
Davidson, L. A. and Douglas, K.
(1998) Digital Object Identifiers and Their Role in the Implementation
of Electronic Publishing. Socioeconomic Dimensions of Electronic Publishing
Workshop, held in cooperation with the 1998 IEEE International Conference
on Advances in Digital Libraries, April 1998
or see the updated version version in html
Digital Object identifiers: Promise and Problems for Scholarly Publishing. Journal of Electronic Publishing, Vol. 4, issue 2, December http://www.press.umich.edu/jep/04-02/davidson.html
Dixon, A. (1998) The Wannabee Culture:
Why No-One Does What They Used to Do. Issues in Science and Technology
Garfield, E. (1955) Citation indexes for science: a new dimension in documentation through association of ideas. Science, Vol. 122, 15 July, 108-111
Giles, C. L., Bollacker, K. D. and Lawrence, S. (1998) CiteSeer: An Automatic Citation Indexing System. Proceedings of the third ACM International Conference on Digital Libraries, Pittsburgh, USA, June (ACM: New York)
Han, Y., Loke, S. W. and Sterling, L. (1997) Agents for Citation Finding on the World Wide Web. In PAAM 97: Proceedings of the Second International Conference on the Practical Applications of Intelligent Agents and Multi-Agent Technology (Practical Application Company: Blackpool, UK), pp. 303-317
Hellman, E. (1998) Scholarly Link Specification Framework (SLinkS), public draft #1.5, November 24 http://www.openly.com/SLinkS/
Hitchcock, S. (1996) Open Journals.
Ariadne, issue 5, September
Hitchcock, S., Carr, L. and Hall, W. (1997a) Web Journals Publishing: a UK Perspective. Serials, Vol. 10, No. 3, November, 285-299 http://journals.ecs.soton.ac.uk/uksg.htm
Hitchcock, S., Carr, L. and Hall, W. (1996) A Survey of STM Online Journals 1990-95: the Calm Before the Storm. In Directory of Electronic Journals, Newsletters and Academic Discussion Lists, sixth edition, edited by D. Mogge, (Washington, D.C.: Association of Research Libraries), pp. 7-32, http://journals.ecs.soton.ac.uk/survey/survey.html
Hitchcock, S., Carr, L., Harris, S., Hey, J. M. N. and Hall, W. (1997b) Citation Linking: Improving Access to Online Journals. In Proceedings of the Second ACM International Conference on Digital Libraries, Philadelphia, USA, July (ACM: New York), pp. 115-122 http://journals.ecs.soton.ac.uk/acmdl97.htm
Hitchcock, S., Kimberley, R., Carr, L., Harris, S. and, Hall, W. (1998a) Webs of Research: Putting the User in Control. In Proceedings of IRISS'98: Internet Research and Information for Social Scientists, Bristol, UK, March http://sosig.ac.uk/iriss/papers/paper42.htm
Hitchcock, S., Quek, F., Carr, L.,
Hall, W., Witbrock, A. and Tarr, I. (1998b) Towards Universal Linking
Electronic Journals. Serials Review, Vol. 24, No. 1, Spring, 21-33 http://journals.ecs.soton.ac.uk/IFIP-SerRev98.html
Hunter, K. (1998) Adding Value by Adding Links. Journal of Electronic Publishing, Vol. 3, Issue 3, March http://www.press.umich.edu/jep/03-03/hunter.html
Open Journal Project (1995) An Open Journal Framework: Integrating Electronic Journals with Networked Information Resources. JISC/eLib sheet flyer http://journals.ecs.soton.ac.uk/flyer.html
Open Journal Project (1998) Open Journal
Project: Final Report to eLib, August
Khare, R. and Rifkin, A. (1997) Capturing the State of Distributed Systems with XML. World Wide Web Journal, Vol. 2 , No. 4, Fall, 207-218 http://www.cs.caltech.edu/~adam/papers/xml/xml-for-archiving.html
Probets, S., Brailsford, D. F., Carr, L. and Hall, W. (1998) Dynamic Link Inclusion in Online PDF Journals. In Proceedings of EP'98, the seventh International Conference on Electronic Publishing, Document Manipulation and Typography, St Malo, France, April http://www.ep.cs.nott.ac.uk/~sgp/ep98.pdf
Renear, A. (1997) The Digital Library Research Agenda: What's Missing -- and How Humanities Textbase Projects Can Help. D-Lib Magazine, July/August http://www.dlib.org/dlib/july97/07renear.html
Rosenblatt, B. (1997) The Digital Object Identifier: Solving The Dilemma Of Copyright Protection Online. The Journal of Electronic Publishing, Vol. 3, Issue 2, December http://www.press.umich.edu/jep/03-02/doi.html
Rubinstein, E. (1997) Notice the Library
Sprouting on Your Desktop? HMS Beagle, issue 15, September
http://www.biomednet.com/hmsbeagle/15/webres/insitu.htm (registration required)
Rusbridge, C. (1998) Towards the Hybrid Library. D-Lib Magazine, July/August http://www.dlib.org/dlib/july98/rusbridge/07rusbridge.html
Schmaranz, K. (1996) Professional Electronic Publishing in Hyper-G: The Next Generation Publishing Solution on the Web. WebNet 96, San Francisco, CA http://aace.virginia.edu/aace/conf/webnet/html/130.htm
Spink, A., Wilson, T., Ellis, D. and Ford, N. (1998) Modeling Users' Successive Searches in Digital Environments. D-Lib Magazine, April 1998 http://www.dlib.org/dlib/april98/04spink.html
Tenopir, C. and Ennis, L. (1998)
The Digital Reference World of Academic Libraries. Online, Vol.
22, No. 4, July
Weintraub, J. (1998) The Development
and Use of a Genre Statement for Electronic Journals in the Sciences. Issues
in Science and Technology Librarianship, Winter
W3C, the World Wide Web Consortium (1998) Extensible Markup Language (XML) 1.0. REC-xml-19980210, W3C Recommendation 10-February-1998 http://www.w3.org/TR/1998/REC-xml-19980210