Multimedia Research Group,
Department of Electronics and Computer
Science,
University of
Southampton,
Highfield,
Southampton,
Hants,
SO17 1BJ
UK
A parallel development in recent years has been the widespread growth of the World Wide Web (WWW)[Berners-Lee 92] into probably the most well-known hypertext system to date. While the WWW can be considered to be an open system, in that its protocols and format are publicly documented and available, in its current form it provides a closed hypertext system. In particular, its use of embedded link information can cause problems for the authors and maintainers of WWW documents.
However, these problems with the WWW are not fundamental to its operation, and are simply indicative of hypertext practice at the time it was conceived. The rapid growth has not allowed time for current implementations to reflect recent developments in the hypertext area. It is possible to augment the facilities of the WWW and provide open link service facilities to WWW users and authors. This paper describes our first experiences of applying the approach taken by Microcosm to a WWW environment.
Although designed with large-scale hypertext in mind, the initial implementation of Microcosm was based on a personal workstation, restricting its use to single user or LAN-based workgroups. We are currently developing the model to operate in a fully distributed environment [ Hill 94, De Roure 94].
The WWW on the other hand was designed with distributed access facilities from the beginning. This is provided in a very simple manner by the use of a node addressing scheme which allows remote systems to be specified. The hypertext model implemented by the current generation of WWW tools however has a simple point-to-point linking model based upon embedded links. This approach has several disadvantages which affect the scalability and maintenance effort required for a large distributed corpus of information.
· Link fossilisation and decay. Because link information is incorporated into the actual documents, management of changing information becomes a significant overhead. As documents are moved, edited, or deleted, any document which refers to it must also be altered to reflect this change. As the context of the document becomes wider, this problem increases, as there is no way to determine which documents refer to others, which would allow notification of changes to be made.
· Dead-ends. Another problem is that of links to dead-ends. This occurs because links are embedded in documents, and they can only be applied to the WWW's native document format, HTML (HyperText Mark-up Language). Links cannot easily be followed from other document types, even when they are viewed by a WWW client.
· Author-led navigation. Because of the nature of the hypertext model of the WWW, i.e. embedded point-to-point links, exploration of the available information can only be author-led. That is, users can only follow links to documents which the author of the current document is aware of and considers to be relevant.
The separation of links from documents can aid in the management of large hypertext structures. For example, in Microcosm, details of all documents are recorded in a document database, and each is assigned a unique identifier. All linking is then carried out in terms of these identifiers. Thus, if a document is moved or renamed, details of this change are recorded once in the document database, and all links remain valid. The separate management of link information can also allow links to be made between documents in proprietary formats. Links made in this way can be retrieved by the presenting application and made available without affecting the source format of the document, and therefore ensuring that it can still be manipulated in its own right. Another advantage of separate link information is the ability to provide a selection of possible link structures which may be applied to the same source documents.
Microcosm in particular can offer additional advantages over the WWW when authoring material. For example, 'generic' links may be made, which apply to a particular text string wherever it occurs, rather than just where the link was originally authored. This can greatly ease the authoring of common links, such as those on names or keywords, which would normally need to be created for each occurrence. Similarly, the text retrieval facilities available in Microcosm can help identify potential links, without the need to search documents manually. As well as reducing authoring effort, these facilities allow reader-led navigation of the hypertext to take place. Rather than solely follow a trail of clues provided by the author, the reader is able to 'query' the hypertext to find links. In addition, links will apply to documents which the author has never seen if the content is relevant.
Clearly, the type of facilities provided by Microcosm can offer advantages to WWW authors and users. Similarly the wide availability of the WWW and its distributed functionality can offer an easy method of making material authored in Microcosm more widely available. The following sections describe the ways in which Microcosm and the WWW may be integrated, and outline the results of the initial work that has been carried out.
This approach has been used to incorporate common WWW clients such as NCSA Mosaic and Netscape into Microcosm environments. This has been carried out initially using the Microcosm universal viewer [Davis 94] , and we are also investigating the development of a fully Microcosm-aware client.
This approach allows material to be easily created and maintained, but also allows it to be disseminated widely and easily. Further discussion of our experiences with this approach are presented in the next section.
This architecture must be matched by a Microcosm-style interface for the client, which allows the user to make arbitrary selections in a document, then choose an action to carry out on that selection from a menu (e.g. Follow Link). This interface has been provided as an additional utility which can be used in conjunction with various popular WWW clients. The utility creates an HTML request based on the selection and action, and causes the client to send this request to a predetermined WWW server. The result of the request is then displayed by the client. This document will typically list a number of possible links from the chosen selection as a set of HTML buttons.
Once the applicable links are established, the tool creates an HTML file and streams the text document to it, interleaved with HTML structuring information, and the links that have been found. The links are formed by combining the destination document name with a url 'stem' supplied to the tool as a command line argument. This identifies the server which will provide access to the resultant documents.
Some additional provisions must be made to ensure that all HTML requirements are met. For example, HTML does not permit links to multiple destinations, but Microcosm does, and this must be taken into account when creating the HTML version of Microcosm material. This is currently done by creating an intermediate page with the available links listed. Alternatively the available links could be listed in the source document. In addition, the tool must attempt to identify paragraphs in the original text, so that WWW clients can parse and format the resultant document correctly.
The main advantage of this approach is the reduction in maintenance effort, and the ease of re-use of material. If the details of link structures need to be updated, or altered, this can easily be done by adding new links to the Microcosm material and 'recompiling' the HTML documents. Similarly, if alternative link structures are required, these may be maintained in various linkbases. Then, if the contents of documents is to be changed, this can be performed once on the original documents, and the HTML versions recreated. In a purely HTML-based situation, these changes would have to be made to each alternative version, greatly increasing the effort involved and the chance of mistakes being made.
Another benefit is improved support for group authoring of hypertext material. Microcosm is able to support multiple access to a document set when working on a network, and offers facilities that make navigation of the available material much easier. This is a great benefit when material is being created by a number of authors. In addition, by utilising a pre-agreed approach to authoring, the links created may automatically highlight relationships between documents written by different authors. For example, authors can 'keyword' their documents by making appropriate generic links to them. Thus, documents created by different authors automatically become linked when keywords appear in the text. This implicit incorporation of cross-referencing is hard to provide when authoring HTML documents directly as all possible link destinations must be known in order for the appropriate links to be encoded. This behaviour is enhanced if the co-authors agree an appropriate vocabulary for use when linking. These results have been verified during initial experiments with a small group of authors.
To improve the WWW authoring facilities offered by Microcosm, an HTML editor could be developed which works in conjunction with Microcosm. The editor could provide all structure management for the source documents, whilst using Microcosm to provide efficient link management. This would overcome the problems with limited structure in text-based source documents.
One possible extension of the mcm2html converter is to allow it work in real time. This means that it could act as a gateway between an active Microcosm system and WWW clients. Thus the WWW view of the available documents is always up to date with the current Microcosm version. This is not the case at present as the WWW view must be actively 'compiled' from the Microcosm 'source'. Another benefit of this approach is that the use of alternative link structures could be enhanced. By providing some form of interface to the underlying link service offered by Microcosm, WWW users would be able to adjust the links being offered to them to suit their current requirements. For example, a user unfamiliar with the subject matter might use a dictionary linkbase, whilst an `expert' would not need this facility and could turn it off.
With the ability to provide full WWW access to an underlying link service in this way, it would then be possible to develop full collaboration between the Microcosm and WWW environments. With Microcosm operating as a link service in conjunction with a WWW server, and the Microcosm-style user-interface facilities for WWW as described in section 3.3, a fully inter-operative environment could be created. Links could then be authored and followed from any WWW document into material stored and managed by Microcosm.
In addition, by providing facilities for a more exploratory, reader-led navigation of WWW material, the user is able to browse the available information in any way appropriate to their current needs, rather than the path chosen by the author of the material. It also allows WWW material to be augmented by the flexible and configurable hypertext services that Microcosm offers.
The integration of Microcosm with WWW in this way allows the normally closed environment of WWW to incorporate features desired of open systems. This has benefits for both the author and user of WWW documents.
[Davis 92] H. Davis, W. Hall, I. Heath, G. Hill, R. Wilkins, "Towards an Integrated Information Environment with Open Hypermedia Systems", in ECHT '92, Proceedings of the Fourth ACM Conference on Hypertext, Milan, Italy, November 30-December 4, 1992, ACM Press, 181-190.
[Davis 94] H. Davis, S. Knight, W. Hall, "Light Hypermedia Link Services: A Study of Third Party Application Integration", in Proceedings of the Sixth ACM Conference on Hypertext, Edinburgh, Scotland, September 1994, ACM Press, 41-50.
[De Roure 94] D. De Roure, G. Hill, W. Hall, L. Carr, "A Scalable, Distributed Multimedia Information Environment", to be published in proceedings of Mediacomm `95.
[Fountain 90] A. Fountain, W. Hall, I. Heath, H. Davis, "Microcosm: an Open Model With Dynamic Linking", In Hypertext: Concepts, Systems and Applications. Proceedings of the European Conference on Hypertext, INRIA, France, November, 1990, 298 - 311.
[Hill 93] G. Hill, R. Wilkins, W. Hall, "Open and Reconfigurable Hypermedia Systems: A Filter Based Model", Hypermedia, 5(2), 1993.
[Hill 94] G. Hill, W. Hall, "Extending the Microcosm model to a Distributed Environment", In Proceedings of the Sixth ACM Conference on Hypertext, Edinburgh, Scotland, September 1994. ACM Press, 32-40.