CSTR 92-15
(c) University of Southampton
Abstract
This report examines open hypermedia systems, and argues that such systems provide users with richer and more diverse ways to access and integrate information from large and dynamic data sets in a distributed, heterogeneous environment. In particular, the enhanced Microcosm model for open hypermedia is examined, and the ways in which it provides such an environment are discussed. The paper continues by investigating the advantages and the short comings of this model and identifies areas in which further work must be completed before such systems can become widely adopted, in particular the granularity of link anchors, editing, and version control. Possible solutions to these problems are presented and discussed.
Contents
At the University of Southampton we have been working since 1989 on a model for open hypermedia, and have produced a system called Microcosm [Fountain90]. This system was initially designed with the intention of providing a test bed on which the research team would be able to experiment with various ideas in the field of multimedia, but is now being used at a number of sites for integrating multimedia applications and delivery of research and teaching materials. The open perspective of the system is the main attraction of the product. However, the authors believe using a hypermedia system as a presentation platform for multimedia information is simply a partial solution to a larger problem, which is the provision of a complete hypermedia environment at Operating System level. In this paper we examine the progress of open hypermedia systems towards providing "industrial strength hypermedia".
Many people use PC's and workstations as regular tools in their working day. What do they do with these machines? On the whole they use a relatively small number of packages to process data in some way that is pertinent to their job, and the remainder of the time they use the machine as a kind of electronic log book. They prepare documents, they send and receive electronic mail, they save documents they have received, they keep calendars and "to-do" lists, and they keep notes about tasks in which they are involved. All these activities produce a large amount of data which is kept on the machine. How do they navigate through this data? Most users survive by using the file directory structure as the primary method of locating data, and perhaps some utility such as grep to help locate relevant files.
If hypermedia techniques for linking and navigating information are so effective, why are users not using these systems for integrating their data? We suggest, as did Malcolm et. al. that the answer to this question is that current systems do not have the properties to enable this facility: the cost of attempting to use current hypermedia systems in such a dynamic way is higher than the benefit gained by the added value. In order to make use of a document in a hypermedia system, it must generally first be imported to the system. As soon as links are added to the system the data to represent the link will be stored in the file as some form of mark-up. The data is now in a closed system; it is no longer possible to process the data with the package that created it, as it is stored in a format private to the hypermedia system. If the system allows the user to export the data again, it will certainly lose the link mark-up information. In any case where data is dynamic in nature it seems unlikely that users will be prepared to risk this process of "handing over" their data, and furthermore it seems that the manual effort involved is likely to deter such behaviour unless the user is fairly certain of the need for the extra facilities provided by the hypermedia system.
While hypermedia systems remain closed they are likely to continue to be confined to applications where the data remains relatively static. The next generation of hypermedia must appear to the user as a facility of the operating system that is permanently available to add information linking and navigation facilities with the minimum of user intervention and without loss of existing functionality that was previously available. They must be able to make links to and from data produced by other information processing tools, and must have tools for automatic link generation and maintenance. Sun's Link Service [Pearl89] and Microcosm [Fountain90] are examples of hypermedia systems that have attempted to address these problems.
An adaptable environment for the integration of data, tools and services.Closed systems that keep data in a private format must also provide tools to access and process that data. e.g. Intermedia [Yankelovich88] provides applications such as InterText and InterDraw. However, this approach is a closed solution. Arguments amongst users about choice of text editors and drawing packages become almost religious in intensity. Users do not want to be confined to a particular package, and in any case it is not possible to predict the facilities that all users will require, nor is it sensible to duplicate existing specialised software packages. Hypermedia systems must allow users to create data in whatever package they choose, and then to make links from data to data, or to make active anchors [Palaniappan90] in these applications.
A system which is platform independent and distributed across platforms.
Users will wish to link from data and applications on one machine to data and applications on other machines. They will want these operations to be transparent. In most respects this is an issue in the domain of distributed operating systems. However, the hypermedia functionality must be distributed across the operating system, and link information must be portable across hardware platforms.
A system which makes it easy for users to find, update, annotate and exchange information.
If users are to adopt hypermedia systems, then the system must add navigational aids to those which are already provided by the operating environment, in such a way that they are very easy to use. It must be possible to alter the data using the normal range of tools already available in the system and to add new information, and annotations, for both private and public use. When changes are made to public information, it must be possible to notify users of those changes. There must be a notion of public and private workspaces, and it must be possible to move information (including link information) between these workspaces.
A system in which all forms of data and media are treated in a conceptually similar manner.
If the process of making links has a different interface in each package then users will not easily learn to make use of the full range of facilities and media.
a) A system which does not impose any mark-up upon the data which prevents that data being accessible to other processes that do not belong to the system.Clearly a system that conforms to these definitions will go a long way towards meeting the user requirements specified in the previous section.b) A system which can integrate with any tool that runs under the host operating system. Data produced by tools that are not part of the hypermedia system may be used within it without adding any special value to that data and without compromising the continued use of the data outside the system.
c) A system in which data and processes may be distributed across a network, and across hardware platforms.
d) A system in which there is no artificial distinction between readers and authors.
e) A system in which it is easy to add new functionality. i.e. new program modules may simply be inserted.
Figure 1: The Microcosm Model.
In Microcosm the user interacts with a viewer. A viewer is any application in which data may be displayed. Messages to perform actions are sent from the viewer to Microcosm, which then dispatches the message through a chain of filters. Each of these filters is then given the opportunity to respond to the message by blocking it, passing it on or changing it before passing it on. Based on the message contents, some filters may add new messages to the chain. Eventually the message(s) will emerge from the filter chain and arrive at the Link Dispatcher. This will examine the messages to see if they contain any available actions (such as links to follow), and if so it will offer these actions to the user.
1. The specific link is a link from a particular object at a specific point in a source document that connects to a particular object in a destination document.The model therefore allows for document to document links. These are links which may be followed from one particular document (or process) to another.2. The local link is a link from a particular object at any point in a specific document that connects to a particular object in a destination document.
3. The generic link is a link form a particular object at any position in any document that connects to a particular object in a destination document.
All the above are links with static destinations, in that the destination has been fixed. In all the above cases following a link may cause a document to be displayed at a specified position or simply at the start. Alternatively following a link may activate any process such as a program running with a specific dataset. The local and generic links, which have dynamic source anchors, are particularly powerful feature, described more fully in Fountain90, which allow a destination to be fixed just once, and subsequently the link may be followed from an appropriate source object in new data or in a new file, as soon as it has been created. Further link levels are:
4. Text Retrieval links. These are links which are dynamically computed when requested. There are two ways of achieving such links. The first is to use a grep filter, which attempts to match the text selected with the same text in any other document within some pre-defined set of text documents, and returns all possible document names into the link dispatcher. This method is relatively slow. A second method which we have implemented very successfully involves building an inverted index of all the desired documents before the system is used, and using standard information retrieval similarity calculations [Li92] to match the vocabulary of the selected source with the vocabulary of the documents, and offering the user links to the best matches. This method produces higher quality matches in a much shorter time, but the cost of pre-indexing the documents must be taken into account, and also the loss of dynamic response to changes in the documents.The mechanisms described above provide a rich and varied set of methods for finding new information, over and above the standard method of navigating the file directory structure, and it should be stressed that the open model makes it possible to add new navigational methods with ease as and when they are identified and implemented.5. Relevance Links. If a set of documents have been pre-indexed as required above, then it is also possible to cluster the documents based upon similarity of vocabulary. In such cases it is possible for the user to follow links to other documents in the same cluster.
Writing a new filter is a simple task that could be undertaken by any Windows programmer without any special understanding of Microcosm internals. A procedure is written to analyse the incoming message for the required tag(s) and to take appropriate actions. This is then inserted into a standard shell which takes care of all the communication with the DDE channels, and the new filter is then compiled. This flexible modular approach makes it easy for any programmer to make major changes to the functionality of the system.
In the general case, this is true. The same problem was confronted in the design of Sun's Link Service [Pearl89], which requires that all applications that wish to use the service are modified to become "link service aware". Pearl argues that if a link service was a standard feature of the operating environment, then all serious applications would be written to make use of this feature. However, in the immediate future this is not likely to be the case and so users of the system are limited to using those applications, such as TextEdit, which have been made specifically modified.
We have managed to overcome this problem. In Microcosm we have three types of viewers.
1. Fully aware Microcosm Viewers. These are viewers we have written ourselves, which have an action menu as described above. They package up suitable messages and communicate directly with the DDE channel. We have written ten such viewers which deal with data formats that are common in our environment, such as text, bitmaps, video, audio, Windows Meta files and rich text. The advantage of using these viewers is that it is possible to have certain functionality that is not possible in other programs. A Microcosm viewer may display active areas as buttons. These are combinations of object selection and action which are in some way highlighted. Any specific link source may be a button, and the information about what to highlight is stored in the linkbase. The viewer requests this information when loading the data. Microcosm viewers may also be asked to start up with the focus at any particular point in the data.There are some problems with this approach. The interface to the action menu is not uniform across applications, and since Partially Aware viewers and Unaware viewers are usually not able to provide information about the exact position in a document at which data was selected, it may not be possible to provide specific links from such documents. However local links and generic links are possible and provide the user with hypermedia functionality from any application.2. Partially Aware Viewers. These are applications from external sources which we have adapted to be Microcosm aware. Many packages such as Word for Windows, Toolbook and Superbase have some level of programmability and access to the DDE. Indeed, DDE access is frequently cited as a selling point for Windows applications as is access to Apple events for Macintosh programs. In such applications it is quite straightforward to write the necessary code to produce an action menu and to package a selection and action into an ASCII message for dispatch to the DDE channel. The process of adapting such applications is qualitatively quite different from re-writing the application to become link-aware, and is the sort of task that may be undertaken by any competent user of a package, given appropriate guidance.
3. Unaware Viewers. In the worst case, where it is not possible to build into the viewer any form of action menu, and there is no DDE access, we have introduced the idea of "clipboard links". The user makes a selection from the application in the normal way, and then copies the selection to the clipboard. An action menu may then be chosen from the Microcosm icon, and Microcosm will then take responsibility for taking the clipboard contents and the chosen action and packaging them into a message. A refinement of this approach allows the user to order Microcosm to "monitor the clipboard": whenever the clipboard contents change Microcosm will automatically package the new contents with a pre-selected action, such as follow-link.
Another advantage of keeping the links separate is that data remains accessible to the original application that created it. Users are free to continue to manipulate data free from artificial constraints created by the system. A further advantage is that it is possible to have more than one linkbase in place at any time. A common configuration for Microcosm is to have one linkbase that contains a set of links over a set of documents that was defined by the original author and for each user to have their own linkbase into which they store their own links and annotations. The idea may be extended to allow shared workspaces. Access permissions to such shared linkbases and the nodes to files to which they refer are the concern of the operating system, and therefore introduce no new problems for the hypermedia system.
A final advantage of link separation is the facility to process the links. We have produced a program which enables users to manipulate linkbases by such processes as merging links from other linkbases, deleting global references to files that have been removed and changing the scope of links. We envisage further programs that will make use of user defined link attributes.
Authoring effort may be considerably reduced in open hypermedia systems by making use of such facilities as generic links and computed links. These features are not specific to open systems, but the ability to extend the model, by adding features such as computed links, relies upon the system being open.
The system as described applies to any homogeneous file system: that is, any system on which all the files may be accessed and processed by the user's workstation. In the case of Microcosm this implies that all the files are either resident on the workstation or on some network drive. If they are available on a network drive, programs will be loaded onto the workstation before running. However, we see no barrier to allowing the filter processes to run on heterogeneous platforms, and, where the data may be passed between machines in a mutually compatible form, no barrier to allowing nodes to exist on different platforms. Again, the limits to distributing the hypermedia functionality are in the province of the operating system design rather than the hypermedia design. We are currently using an X-Windows based text viewer on a Unix system to use a Microcosm link base and filters running on a remote PC workstation, in order to navigate text files resident on both the Unix system and the PC system.
In the majority of applications we have found this approach to be perfectly adequate. Where hypermedia is being used as an extension to the directory system as a method of navigating a large information space, this level of granularity in discovering the correct destination is all that users expect, and they are quite happy to continue from the start of a document or data set by browsing with whatever tools are provided by the application itself. Where authors have produced a more finely linked and usually more static hypermedia product, such as a tutorial, they have almost invariably used the Microcosm aware viewers which provide the ability to link to specific points.
Various possibilities exist to improve upon this situation. The most easily available is the use of system macros. For example, in MS-Windows, a recorder is available which can record and replay keystrokes. The creator of a link into an OEM package may produce a macro that moves the focus to the item of interest in the data. When the link is followed, the package is started with the appropriate data set and then the macro is replayed.
A further problem with maintaining the link base separately from the document is that it is not immediately clear what links are available to the user. In closed systems there is usually some form of visual clue, such as bold text, which indicates that a particular object is "live" and may be clicked on. Microcosm allows authors to make specific links into buttons, and link aware viewers will query the linkbases for any buttons and display the objects in some visible way. However, it is not feasible to check for all generic links in this way. We have investigated two solutions to this problem. The first involved collecting the set of all links, including generic links, that would be available from the current document, and storing this data in a structure known as a trie. When the user asked to show-links the viewer would scan the visible portion of the document attempting to identify any possible match in the trie, and would then highlight the appropriate matches in the viewer. This solution was not really satisfactory because of the time taken to load the data into the trie in the first instance, and the fact that it only works in fully aware viewers. We are currently working on an improved version of this algorithm that does away with the need to load the data into a the trie every time a document is loaded.
A second implementation of show-links relies upon the observation that authors had made 70% of their source anchors on only one or two complete words. If those anchors which had already been made into buttons were excluded, then the number goes up to around 90%. We therefore implemented an option where the user highlights an area of text and selects the action, show-links. The show-links filter splits the text into words and pairs of words, and sends these words on through the filter chain as follow-links messages. The linkbases respond to these messages and the possible links all appear in the link dispatcher, along with the piece of source text that was matched. We have found this solution to be adequate in helping users to find the links that are available, and it has the advantage that it can be called from any viewer, using the clipboard if necessary.
The situation gets worse if a file which is the destination of a link is moved or deleted. Then the link will dangle. This is particularly a problem in a distributed file system where file availability may depend on network access and in a system where user A may allow user B to copy a linkbase which refers to files in user A's private workspace and are therefore unavailable to user B..
Whenever a document is loaded all specific links from the document are checked. If any of these links have a date for the source document that is earlier than the date of the document itself, the user is warned.This solution keeps the user aware of any inconsistencies, but makes no attempt to fix them when they occur. As a consequence of this approach we offer the following pragmatic advice to authors and usersWhenever a link is followed to a document, the date of the destination document is checked against the date of the link. Again, if they do not match the user is warned, or if the destination document no longer exists, the user is warned.
A link aware editor for text documents exists, which when loaded finds all references to the document in linkbases that are currently available, and updates these link anchors when editing is complete. During the edit the document is locked against other use.
Try to avoid the use of specific links where possible. Document to document links and generic/local to document links have far less scope for becoming inconsistent.Ownership of files should be confined to individuals, and the onus is on the individual to use the link aware editor where possible.
All linkbases used with a specific application should be kept on-line and in the available filters list so that the link aware editor may update them.
1. No edits may be made to documents of which the link service is unaware.This implies that the link service must be truly integrated with the operating system. An alternative approach is to keep a last edit date as an attribute within the application, and by comparing this with the operating system date, we can identify files that have been changed by non link aware applications.
2. The application is aware of all linkbases that might be affected by any edits, and has access to these linkbases in order to make suitable changes. (This must include linkbases from other projects in the case where applications intersect.)
This requirement tends to contradict the goal of open systems. Alternatives include identifying links that are fixed to out of date documents, and attempting to re-fix them into the new version of the document.
3. There is an algorithm for moving specific link anchors from an old version of a document to a new version.
This is an important algorithm that we have yet to work on. It is our belief that by using utilities such as diff it should be possible to repair linkbases by finding the new position of all link anchors, if indeed they still exist within the document. Storing some context within the link should make this process easier.
4. Versions of documents and link bases are maintained automatically, and on traversing a link to a document, the user is able to step back through previous versions of the document.
The most recent version of a document must always be available outside of the versioning system for access and processing by external applications. Where automatic processes such as the algorithm above have been used to modify links, it may sometimes be desirable to return to the old version. This will not be possible unless multiple versions are maintained.
True open hypermedia systems are a relatively new concept. In the long term it seems likely that operating system vendors will provide increasingly useful link and object management systems as an integral part of the operating system and application software will be written so that it is aware of these services.
[Fountain90] Andrew M. Fountain, Wendy Hall, Ian Heath and Hugh C. Davis, MICROCOSM: An Open Model for Hypermedia With Dynamic Linking, in A. Rizk, N. Streitz and J. Andre (eds), Hypertext: Concepts, Systems and Applications. The Proceedings of The European Conference on Hypertext, INRIA, France, November 1990, Cambridge University Press, 1990
[Halasz88] Halasz, Frank G.", Reflections on NoteCards: Seven Issues for the Next Generation of Hypermedia systems, Communications of the ACM, vol 31, num 7, pp 836-855, 1988
[Halasz91] Frank G. Halasz, "Seven Issues'': Revisited, Hypertext '91 Keynote Talk", 1991
[Hill92] Gary Hill and Wendy Hall, Microcosm: Intelligent Filter Management, Computer Science Technical Report, University of Southampton, UK, 1992
[Li92] Zhuoxun Li, Wendy Hall and Hugh Davis, Hypermedia Links and Information Retrieval, The Proceedings of the 14th British Computer Society Research Colloquium on Information Retrieval, Lancaster University, 1992
[Malcom91] Kathryn C. Malcolm and Steven E. Poltrock and Douglas Schuler, Industrial Strength Hypermedia: Requirements for a Large Engineering Enterprise, Hypertext '91 Proceedings, ACM Press, 1991
[Nielsen90] Nielsen J., Hypertext and hypermedia. Academic Press, London, 1990
[Palaniappan90] Murugappan Palaniappan, Nicole Yankelovich and Mark Sawtelle, Linking Active Anchors: A stage in the Evolution of Hypermedia, Hypermedia, vol 2, num 1, 1990
[Pearl89] Amy Pearl, Suns's Link Service: A Protocol for Open Linking, Hypertext '89 Proceedings, ACM Press, 1989
[Shipman89] Frank M. Shipman III, R. Jesse Chaney and G. Anthony Gorry, Distributed Hypertext for Collaborative Research: The Virtual Notebook System, Hypertext '89 Proceedings, ACM Press, 1989
Hugh Davis, Wendy Hall, Ian Heath, Gary Hill and
Rob Wilkins
Department of Electronics and Computer Science
University of Southampton
Southampton SO9 5NH
e-mail mcm@ecs.soton.ac.uk