In [Levy & Marshall] the assumption of a digital library being the repository of a fixed and permanent document collection is challenged. They argue that if the technology can accommodate fluid, revisable and even open-ended continuously-authored documents then these documents should surely find a home in a digital library. Also, from an archivist's point of view, semi-permanent and even ephemeral documents should be made accessible. This kind of argument presents a more open and dynamic environment.
In this paper we compare and contrast two approaches to information management systems as exemplified by two specific systems that, together with the WWW, could be useful for implementing some aspects of a digital library. These two approaches can be loosely summed up as 'open' and 'closed'. We go on to describe the "Open Journal Framework", a UK ELib project whose aim is to enhance the functionality of libraries of electronic publications by exploiting an open style of approach.
One of the main characteristic of Hyper-G which makes it a candidate for Electronic Library services on the Web is its guarantee of consistency: its undertaking to keep strict track of all documents and interdocument hypertext links which it handles.
Hyper-G has a superficially similar architecture to the Web: client browsers are provided documents by network servers, but unlike the Web the hypertext links (relationships between the documents) are stored independently. Hyper-G moves one step on from the Web by adding support for link maintenance and management, linking between different media types, different sets of links for different users, a docuverse, text retrieval and some visualisation tools for navigating around 'clusters' of related materials.
Each Hyper-G server maintains a document management system, which keeps the attributes of the documents on the server, a link database which maintains the links, and an information retrieval engine, which can retrieve on both the attributes of the document and also the full text content of the document. The servers themselves may be arranged into hierarchies underneath a world wide 'root' server, but the user connects directly to only one server. Hyper-G can also arrange to collect documents from other servers such as Web and Gopher servers.
The Hyper-G client browsers provide an interface for document and catalogue browsing, authoring and link creation, supporting a variety of standard text, picture, movie and 3D data formats.
Both within documents and between documents hypertext integrity is maintained by the authoring clients. Each document knows the id's of all the links it uses, and even though they are stored externally when a client loads a document it is also able to load all the links it requires. The client is then able to edit the document (or move it or delete it) without causing integrity problems, since at the client end all links are effectively embedded within the document.
To accommodate this using a closed system would require all the literature to be imported into the system (converted into the required format and have hypertext links added), or to simply ignore the links to the 'outside'. The former approach is likely to be very expensive, whereas the latter approach denies the advantage of link management to the majority of the links.
In this section we briefly describe Microcosm (a research system developed at the University of Southampton and now a commercial product) as used for managing local document resources and its successor, the DLS, a system which is being used for managing distributed information resources.
Microcosm has a fundamental model of a group of co-operating processes communicating via message passing which together supply various facilities for an information environment. Its main features are
* a selection-action paradigm for user interaction. Fixed link anchors (or buttons) are simply an author's predefined binding of a particular selection within a document to a particular hypertext action (such as follow link). In general, readers of a Microcosm hypertext can invoke a range of hypertext actions on arbitrary selections.
* links held externally to the documents they reference. This allows links to be made between the native documents of third-party applications, such as wordprocessors, spreadsheets, databases or CAD packages.
* a message passing framework, into which various document viewers or hypertext servers may be slotted.
* a document manager which associates document ids with document locations and a set of other attributes (such as title, author, keywords, description)
In order to see how the components of Microcosm function together, consider how a link is followed. The user may make a selection in an open document in a word processor, and then chooses the menu action "Follow Link". The application packages the selection, its position within the document and the document's identifier into a message which is sent through the system. A link database intercepts the message, looks up any links that correspond to that selection, and returns a message containing a specification of those links, along with the original link request message (possibly to be intercepted by further link databases). Eventually, all the link specification messages are intercepted by a dispatcher, which presents the user with a dialog box containing descriptions of each of the applicable links. The user selects a link and the dispatch box sends a "Dispatch Link" message to the appropriate viewer. The viewer intercepts the message, opens the appropriate document and highlights the destination selection.
In this model links are resolved on the basis of the content of the object that the user has selected on the screen. This can be a piece of text, part of an image, an object in a CAD diagram, or a map reference in a GIS system . An action such as "follow link" or "compute link" is then attached to this selection and that information passed through the system. This is significantly different from the Hyper-G model where links are requested by id, rather than discovered by a dynamic process of computations. Also, the system architecture allows both in-house and third-party information processing tools to be incorporated into the system. A particular exploitation of this flexibility within an Electronic Library context is discussed in [Davis & Hey 1995].
The provision of an independent link service is designed to allow any information environment to be augmented with hypermedia functionality, whether or not it provides link following facilities itself. The WWW, of course, has a well-established method for expressing links as attributes of its native document format, and so the link service will provide a complementary set of links on top of those standard facilities. By contrast, a simple text editor (such as Window's Notepad) has no built-in hypertext links, and so the link service provides an otherwise non-existent service to such users. Without a link service, Web users can follow links from HTML documents or `imagemapped' pictures into dead-end media such as spreadsheets, CAD documents or text; with the link service they can also follow links out of these media again.
End-users (readers or browsers) may choose to subscribe to this service by running a small interface agent which communicates with both the link service and the document viewer. For an information consumer on the Web, the link service provides an additional means of navigation that can be tailored very precisely to his or her exact needs.
When the user wishes to investigate links from some information, they select the data of interest and choose the Follow Link menu item from the interface agent. The agent grabs the current selection, tries to determine the current document context (which document was that selection made in? what was its URL? where in the document was the selection located?), parcels this information into a message which is sent to the link server. (This process actually consists of creating an HTTP message with POST data and sending it to a Web server, since the link service is actually hosted by the Web.)
The link server then responds with a set of links which are available from the specified selection in the specified document. These links are presented to the user in the form of a 'clickable' list of destinations, displayed as a page of HTML by the Web viewer.
Figure 2a: A user requests a link from the link service
Figure 2b: The server responds with a page of available destinations
As well as readers, authors may make use of the DLS by using the same interface agent. Since a part of the authoring process involves the author taking on the role of a reader, the author can benefit from the link service exactly as a reader can, but in addition an author can create links and edit link databases.
This kind of functionality is fairly straight-forward, but the real advantage for the author comes in the kinds of link definition that are allowed. Following the Microcosm model links may be declared to be more or less generic, i.e. having the location of the selected text constrained to appear more or less specifically within the static document context. A standard (or specific) link applies only at the exact place that the link source was selected, whereas a completely generic link will match the link source's selection at any place in any document. This facility allows the author to treat a link as a declaration which states "any place in such-and-such a document context that phrase `X' is mentioned links to this data", and allows the author to create a set of documents along with a set of links that can be used to `come to' the documents from other places as well as a set of links which `go to' other documents from the current documents.
The `come-to' link type leads to a resource-based authoring style in which an author can publish a largely standalone suite of documents, together with some link databases which define the `routes' into, through and out of the documents. Making use of the link service allows the author to `mix together' a number of these resources as the `into' links for each of them will act on the text of the others and bind them all together. In fact, the `into' links can act to bind the resources not just to each other, but to the larger Web of documents outside the author's control--the readers' environment. One of the major benefits of this authoring style is the scope for information reuse: not only can the author vary the internal paths through the documents by changing the link databases, but also the documents themselves can be used and reused in many different situations by providing different sets of `into' and `out of' links.
A problem for users of library information services in Higher Education is the isolated and diverse nature of the electronic information resources. Although a user can (in theory) from the same terminal access many dozens of journals, databases and articles on subjects of interest, it is necessary to navigate a complicated path through many providers information gateways in order to locate any particular piece of information of (as yet) undetermined relevance.
The goal of the project is to develop a framework of information retrieval technologies and electronic publishing practises to be used by information providers (especially journal publishers) which will allow them to make their publications available not as isolated, one-off resources, but as co-operating assets within an information delivery environment such as a library at an institution of Higher Education. To achieve this goal we aim to establish novel ways of seamlessly integrating journals that are available electronically over the network with other journals and information resources that are also available on the network, thus using the capabilities of the Distributed Link Service to realize the concept of the 'open' journal.
One of the major features of the DLS which helps this goal to be achieved is the use of generic links which enable the resource-based authoring paradigm described above. It is this facility that allows a journal to be published with a set of link databases that provide links
The concept of an Open Journal then is of a 'super journal' which consists of material from many individual journal, document and database resources, tied together by databases of links. The project is currently attempting to demonstrate this concept by producing an Open Journal of Biology, whose catalogue is seen in Figure 3. It consists of journals from a number of different publishers, served from a number of different sites in a number of data formats.
In [Levy], the description of the library as a static, closed system is challenged, since real world collections are subject to 'crumble', i.e. decay over time. So catalogues (as well as the documents they describe) require constant maintenance, without which consistency cannot be guaranteed. Perhaps we could say that Hyper-G would emphasise a consistent, controlled approach to library management, whereas Microcosm would lend itself to a more 'libertarian' approach. In the real world it is likely that neither approach is sustainable in its pure form, but what mixture of philosophies is required for a digital library is as yet unclear. [Carr96] reports on research to attempt to provide a mixture of Hyper-G-like consistency checking in combination with the DLS open environment.
L. Carr, D. De Roure, W. Hall, G. Hill "The Distributed Link Service: A Tool for Publishers, Authors and Readers", The Web Revolution: Proceedings of the Fourth International World Wide Web Conference 1995
L Carr, H Davis, D De Roure, W Hall and G Hill, "Open Information Services", Proceedings of the Fifth International World Wide Web Conference, Elsevier, 1996
H. Davis, W. Hall, I. Heath, G. Hill, R. Wilkins, "Towards an Integrated Information Environment with Open Hypermedia Systems", in ECHT '92, Proceedings of the Fourth ACM Conference on Hypertext, Milan, Italy, November 30-December 4, 1992, ACM Press, 181-190.
H. Davis, J. Hey, "Automatic Extraction of Hypermedia Bundles from the Digital Library", Proceedings of the Second Annual Conference on the Theory and Practise of Digital Libraries, 87-96, 1995 <URL: http://csdl.tamu.edu/DL95>
U. Flohr, "Hyper-G Organises the Web", Byte Magazine, 20(11), 59-64, November 1995
W. Hall, L. Carr, H. Davis, R. Hollom, "The Microcosm Link Service and its Application to the World Wide Web", in Proceedings of the First WWW Conference, Geneva.
S. Hitchcock, L. Carr, W. Hall " An Open Journal Framework: Integrating Electronic Journals with Networked Information Resources", ELIB Project <URL: http://journals.ecs.soton.ac.uk/flyer.html>
D. Levy & C. Marshall, "Going Digital: A Look at Assumptions Underlying Digital Libraries", Communications of the ACM 38(4), 77-84, ACM Press, April 1995
D. Levy, "Cataloging in the Digital Order", Proceedings of the Second Annual Conference on the Theory and Practise of Digital Libraries, 31-37, 1995 <URL: http://csdl.tamu.edu/DL95>
lis-elib, "Link Integrity", a thread of the Mailing List for the Electronic Libraries Programme, archived at <URL: gopher://nisp.ncl.ac.uk:70/1m/lists-special/lis/lis-elib/archives/1996-03>
K. Schmaranz, "Hyper-G and Electronic Publishing", in "Hyper-G. The Next Generation Web Solution", H. Maurer (Ed), Addison-Wesley, 1996.
Hugh Davis is a lecturer in Computer Science at the University of Southampton, UK, and was a founder member of the multimedia research group. He was one of the inventors of the Microcosm open hypermedia system, and is manager of the Microcosm research laboratory. His research interests include data integrity in open hypermedia systems and the application of multimedia information retrieval techniques to corporate information systems and to digital libraries.
Wendy Hall is a Professor of Computer Science at the University of Southampton. She is variously a Director of the Multimedia Research Group, the University's Interactive Learning Centre and the Digital Library Centre, researching into multimedia information systems and their application to industry, commerce and education.
Jessie Hey is a chartered librarian/information specialist and qualified teacher who has worked in a variety of library/information roles at California Institute of Technology, CERN and Southampton Institute of Higher Education. This was followed by 12 years at IBM's UK Development Laboratory where her jobs included managing the technical and business information services and setting up an interactive learning centre. She is now pursuing postgraduate research with the Multimedia Research Group at the University of Southampton.