Structure and Hypertext
by
Leslie Alan Carr
A thesis submitted for the degree of
Doctor of Philosophy
in the
Faculty of Engineering and Applied Science
Department of
Electronics and Computer Science
November, 1994
UNIVERSITY OF SOUTHAMPTON
ABSTRACT
FACULTY OF ENGINEERING AND APPLIED SCIENCE
DEPARTMENT OF ELECTRONICS AND COMPUTER SCIENCE
Doctor of Philosophy
Structure and Hypertext
by L. A. Carr
Hypertext techniques are now beginning to be used in the ways that early researchers anticipated, from personal note taking to online help for ubiquitous computing environments and universal literature resources, yet despite this, hypertext models have remained substantially unchanged. This thesis investigates the nature of text, how it may best be modelled on a computer and how the connections between related texts may be expressed in a flexible and efficient way.
First we look at the development of hypertext systems and then compare that with the complex structured nature of texts themselves. LACE, a small-scale hypertext system based on structured texts is introduced and compared with other hypertext systems. Approaches to large-scale distributed hypertexts are discussed, and LACE-92, a system used to produce hypertexts from distributed information services is presented. Finally LACE-93, a new document architecture for global hypertext environments is proposed.
But most of all, many thanks and much appreciation to my wife Jan who has at last received a straight answer to the question "How long until you finish the last chapter?".
Conklin's famous introduction to the subject [42] classifies the various hypertext systems into four broad application areas. Two of these were macro literary systems for large-scale literatures and problem exploration tools for highly flexible personal use on a much smaller amount of information (the third was a derivative of the former category and the fourth a miscellaneous category). However, it is important to realise that the two applications are not somehow separable: a scholar who needs to study and `inwardly digest' a topic needs not only to express his or her own thoughts, but also to draw on the information capacity of a huge library with the musings and conclusions of other researchers. These two application areas are then simply complementary functions that the same hypertext system should be able to provide. Van Dam [147] emphasises the ability of the computer to enhance connectivity, and that must be provided both in the large (in the sense of a newly published book being added to a library by connecting its key words and concepts to the huge web of pre-existing information links) and in the small (by connecting and re-connecting the hypertext structures which represent an author's initial ideas according to a changing and evolving personal understanding of the subject).
A review of the hypertext literature of the last twenty years shows that hypertexts have been implemented variously as:
* indivisible information nodes connected by links: Memex [31], NoteCards [63,64], KMS (a commercial version of the ZOG research project [98]), HyperCard [5]
* structured nodes connected by links: Augment [53], Dynatext [46]
* documents and modular sets of connecting links: Intermedia [148], Microcosm [55]
* a knowledge-based framework of texts: Aquanet [92], gIBIS [10], Much [117]
* non-linear `unfolding' text : Guide [27]
* everything incestuously connected to everything else: Xanadu [109]
* a word-processed document with added links: Word for Windows [101]
The above systems mainly share a common general model of hypertext: `nodes' of information which are `linked' to each other and so can be characterised by the linking mechanism, by the nodes that are linked and by the user interface which determines the way that the user encounters and manipulates these node and link objects.
It is important to realise that simply displaying a new information type is not sufficient for true multi-media hypertext (or hypermedia). HyperCard may easily enough be persuaded to display sequences from a videodisk, but unless the methods for creating links between video sequences or between video stills and text are in place then the system is simply acting as an expensive remote control device. It is imperative that each new information type be completely integrated into the system's hypertext network. Both Intermedia and NoteCards are successful in this area because they themselves are built on extensible environments (Object-Oriented C and Lisp respectively) so that modules defining hypertext functionality for new information sources may be slotted into the existing application's framework.
A compromise often adopted is to graft a `dumb' medium (like videodisk) onto a true hypertext network by associating each frame/sequence on the video with one `place holding' node in the network. Hypertext operations on the new medium are then mapped onto existing operations on a normal node. For example, node A will always activate the showing of video frame A', and node B will cause video sequence B' to be played. A link from frame A' to sequence B' can then be effected by placing a button on node A which links it to node B. The nodes would commonly display a brief textual description of the video sequences that they shadow. HyperCard is an example of such a hybrid hypermedia system, since HyperTalk extension commands are widely available which interact with external devices (such as a video player) via the host computer's communications port, although no information can be received from the remote device.
Even text-only systems may be classified according to the kinds of text which can be displayed. Some allow only fixed-font text (early systems such as NLS, HyperTIES and original versions of HyperCard), some allow multiple text fonts, sizes and styles (later versions of HyperCard, Guide), and some provide full composition facilities, including horizontal and vertical spacing (Word).
The size of the display is an important factor in any information system since it is desirable to present an uncluttered, easy-to-read screen to the user. Narrow newspaper columns are difficult to read because the eye has to keep rescanning to find the start of the line; small screens force the reader to keep flipping between pages of information.
If display-size is a problem for the reader, node-size is a problem for the author. Each node in a hypertext network typically requires a unique name and occupies a unique space in the information `hierarchy'. If the size of the node is fixed, then the author is faced with two choices, either to edit the material to fit within the node or to split the node into two new ones. The former choice may be effective during the initial construction of the network, but extra material that is to be added during a process of revision may force the latter to occur. If so, the split can be undertaken in two ways, either by re-partitioning the information into several logical chunks, each with its own node, or by creating a linear sequence of continuation nodes. Both of these courses of action may have the knock-on effect of invalidating some of the links to the existing node, requiring further revision work.
Intermedia is a prime example of a system with first-class links. There the links are stored separately from the documents to which they refer. These links also have the capacity for holding information in the form of attribute-value pairs that can be used as part of a reader's query.
It is the first-class nature of links that allow systems to provide a graphical browser. Without explicit storage of the links between the nodes it is impossible to tell the overall structure of the network without actually traversing it. Hence only systems such as NoteCards, Intermedia and derivatives of HAM provide such a function.
Insubstantial links are merely specifications for the address of a jump and as such only have any existence when they are invoked. HyperCard links, for example, are generally buttons containing the instruction ``go to card id 42106'' (or similar). However, buttons need not contain jump instructions, and, in fact, such a jump is not constrained to be attached to a button: it can instead be a part of a more general handler that is invoked when a particular key is pressed or invoked on some arbitrary event. It may also be the result of a more general computation, such as ``look up the id of the card which matches the text that the user has selected and go to it''. Because there is no clear correspondence between any particular object and the set of links in a hypertext network, HyperCard has no means of manipulating the network as a whole. Although it does maintain a graphically displayed list of the last 42 cards visited, no information is kept of alternative routes that may be followed.
NLS links are simply stored as part of the text of each node. Selecting the link reference causes a hypertext jump to be activated, but once again it cannot be easily distinguished as a link except by the user who understands the context. HyperTIES and ZOG also make use of links embedded in but displayed differently from the nodes' content. HyperTIES highlights the embedded link by displaying its name in a contrasting font, ZOG uses spatial highlighting, separating the links from the text. In both cases `address' of the destination node is the same as the name of the link.
Guide is more perverse in that it has a clear idea of the existence of links and indeed cannot display the document without a knowledge of the state of each link, but it has a less-well defined concept of nodes. As has been explained, each Guide document is considered a continuous scroll of material with links `folding' and unfolding spans of material. However, the reference links to other documents are of the insubstantial kind.
Insubstantial links are by nature unidirectional (HyperCard has no ``come from'' command), which make it difficult to model a naturally symmetric relationship between nodes but instead give rise to a directed walk-through of a document. It also leads to the ``you can't get there from here'' phenomenon, where many links may lead into a node, but none out, so leaving the reader stranded.
The granularity of the link source is also important. By limiting the positioning of links to an empty space at the bottom of the frame, the Memex forced the whole frame to act as the link's source. This of course makes it very difficult to know what is being linked from, i.e. what is the key phrase on this frame. Since there can be many links present on each frame, it makes the job even more difficult, especially compounded by having complete frames as link destinations. Intermedia is much more flexible since it allows any span of text within a node to act as the link source. NLS, ZOG, HyperTIES and Guide all have specific words and phrases within the text that are bound to the link, whereas NoteCards, HyperCard and Acrobat anchor a link at a single position on a card. As has been mentioned before, HyperCard's buttons are fixed at a geographical position on the card rather than a logical position within the text, making it very cumbersome to edit a node.
All systems make it easy to follow the natural progression of a document whether it involves scrolling through a linear document (Intermedia, Guide) or tree-walking through a hierarchical structure of nodes using next-sibling and return-to-parent operations (ZOG, HyperCard). Of more interest is how cross-reference links are encountered--here there are several considerations for a reader to be aware of--how can a link be recognised (what is its visual representation), and once the link opportunity has been recognised how can the link be invoked?
A common side-effect of making a hypertext jump is losing the context of the original information since many systems can only display one node at a time on the screen. This adds to the disorientation of the user who is navigating through an unknown network of unfamiliar information. NLS contrives (under certain circumstances) to lock part of a node onto the display, keeping some familiar reference for the user despite being limited to a terminal-sized display. NoteCards and Intermedia allow many nodes to be displayed at a time and so avoid the dangers of losing the reader but at the risk of swamping him or her with too many pieces of concurrent information. Although the display can handle many windows some are bound to become partially concealed. This is made worse by the hoarding instincts of the reader who is frightened to put away any window `just in case' it is needed again. Guide has a novel approach to these problems as it performs an inline replacement of the link cue by the information stored at the link's destination. Hence clicking on a highlighted key-phrase in Guide may cause it to be replaced by a paragraph giving a fuller explanation of the phrase. In this way both the source and destination contexts are still visible to the reader and hyperspace disorientation is minimised.
The `hyper-' prefix in the word `hypertext' indicates `more than', thus hypertext is literally `more than' text. The advantages of nonlinearity, cross-reference jumps and multimedia information have to be balanced against some of the disadvantages mentioned above or the resulting system will implement hypotext, text which is less useful or less accessible than its print-bound counterpart.
In the following chapters we will examine the nature of text and how it can be adequately modelled by a computer. Then we will look again at hypertext, in particular a system developed by the author in 1988 which is based on these ideas of text. We then examine the use of computer environments which aid reading or writing both electronic documents and electronic non-linear hypertexts and then return to re-examine the models of hypertext beyond the `nodes and links' seen here.
In this chapter we look at the way in which texts themselves are constructed, how texts can be adequately represented on a computer and compare the way in which hypertext systems model a network of linked texts.
In a computing environment, `text' is usually thought of as a simple sequence of characters as defined by the ASCII encoding. In fact, even to the present day, the most prevalent type of document or file is the text file: a sequence of ASCII characters whose record ends define the line breaks for display purposes. However a text in its fullest sense consists of more than just its encoding and layout information. It principally contains high-level cognitive information which is communicated in a natural language expressed by the character coding and layout. It is the expression of this embedded information which is the crucial purpose of the text, and the responsibility of the writer to construct the text in such a way as to accurately present the information to the reader and in a fashion which can be easily assimilated by the reader. In this way the intent or purpose of a text is not to produce a suitably formatted piece of paper, but to inform a reader.
Human cognition is often described in terms of a semantic network into which new facts are added through the learning process (see for example [54]). This gives rise to a close correspondence between texts and computer programs. A program specifies a set of actions which, when elaborated by a computing processor, produce a certain change of state in that system. Similarly, a text, when elaborated by a suitable cognitive processor, produces a change of state in that system, with an increased understanding achieved by an incrementally updated knowledge network.
Hierarchies bring structure to a text, but even sequence itself is a simple structuring tool, allowing a directed development of an argument or the build-up of context. A narrative text often makes use of parallel threads (sequences) which are interleaved throughout the text. Cross-reference is frequently seen in technical writing, allowing the author a means of emulating a network of ideas rather than a fixed hierarchy.
According to [141], the functions of superstructures become conventionalised in a given culture, leading to fixed schemas for the global content of a text. The following five superstructures are identified as being common to many text types:
Introduction: presuppositions and background
Problem: a twist on the state of affairs
Solution: resolution of the above
Evaluation: discussion of consequences
Conclusion: closing/summary
Stories, scientific papers, dramas and arguments are all identified as containing the above superstructures. Despite the variety of texts, the author is leading the reader towards a particular conclusion via a particular interpretation of the facts--a directed presentation. In each type of text, the structure acts both to direct and constrain the content--no introductory material is allowed to appear within a concluding section, nor is the conclusion allowed to precede the material which supports it.
.ce 1
.ft B
A title, a title, my kingdom for a title
.ft R
.sp 0.5i
In this chapter we look at the possible
Figure 2.1a: Nroff physical markup for a section heading
.H 1 "A title, a title, my kingdom for a title"
In this chapter we look at the possible
Figure 2.1b: Nroff mm logical markup for a section heading
Because of the tedious and repetitive nature of this `physically oriented' low-level typographic manipulation, markup languages adopt procedural abstractions (Figure 2.1b) which mirror higher-level physical document constructs like display paragraphs, hanging indents, bulleted lists and headings, and reflect a document's logical or abstract composition, such as its construction from chapters, sections and subsections, figures and tables. Emphasised text is no longer marked up with prescriptive physical commands to "change the font to italic", but with an abstract declaration that "the following text is emphasised". It is now the responsibility of the composition program to know how to suitably render emphasised text--in other words its role has expanded from dealing with page imaging semantics to dealing with document semantics. The advantage of this style of markup is that the author can concentrate on expressing ideas within an appropriate logical framework without worrying about issues of presentation that the compositor should deal with.
Markup systems which adhere to this philosophy (troff+mm, LATEX, GML) emphasise the logical nature of their markup, especially the facilities for expressing the document's overall hierarchical structure. However a closer inspection reveals that such markup is still implicitly tied to describing the physical layout of a printed document. In fact mm and LATEX markup for a `section' or `chapter' is defined in terms of lower-level primitives for changing fonts and leaving vertical space, and so is still a physically-oriented markup, rather than a truly logical one. Document structures like `sections' are catered for in name only; in fact one is really marking up the section heading alone. Another apparently separate component of the logical document structure, the footnote, only has any meaning in a paginated environment and may need to be re-interpreted as a marginal paragraph or an endnote in an on-line text presentation system.
SGML specifies each document architecture with a DTD (Document Type Definition) defining the hierarchy of structures which may compose the document. This architecture may be used by an interactive document editor to check the structure of the document being created, or by a document formatter to process the entire document. There is a strict syntax associated with the architecture which may be understood and verified by any SGML-compliant application, but each application is responsible for interpreting the meaning of the document structure, according to its requirements.
For example, figure 2.2 shows how a biographical dictionary may be marked up. To produce a printed document it may only be necessary to specially highlight the start of the entry, the name and dates of birth and death of the individual. The other tags may be completely ignored during formatting, with the text set as if they were not there. However, when forming a biographical database from the same document it may be deemed important to identify all the information marked above so that the database can be used to determine everyone who was educated at a particular university. Without the extra markup it would be impossible to pick out these details that make the data useful for many purposes apart from printing. The markup makes explicit the information that is embedded within the text, and this information can subsequently be reused in different ways.
<entry>
<biographand><name>John Smith</></>
<dob><day>12<month>June<year>194</dob>
<dod><day>1<month>Feb<year>1987b></dob>
John was born in <birthplace><place>Edinburgh</birthplace> and studied <subject>English</> at <education><place>Southampton</> University</>, graduating in <graduation><date><yr>1956</graduation>. He married <spouse><name>Emma Jones</name></spouse> in 1962 and became <profession>MP</> for Southampton in 1975 until his death.
</entry>
Figure 2.2: SGML markup for an entry in a biographical dictionary
According to [58] any document can be considered to have 3 parallel structures associated with it. These are the abstract representation which is concerned with the logical structure of the information contained in a document and made explicit by some form of high-level markup, the physical representation which is determined by a formatting process and the page representation which is defined by a viewing process. The physical representation corresponds to the document formatted for output on an infinitely long scroll, whereas the page representation is concerned with how the formatted representation can be mapped onto discrete pages. We have already seen that there are in fact a number of different structures which comprise the abstract document. The bibliographic dictionary example in figure 2.2 has a very fixed database-type micro-structure. The dictionary simply consists of an order list of entries, with no combining of data between entries into any higher level structure. In a report document, conversely, the superstructure would be combined with content-based macro-structures but probably without the detailed exposition of the microstructure.
In a text-processing environment, the technical author usually manipulates the abstract document (which is a union of the content of the document and the distinguishable markup interleaved with the content) via a text editor. This is normally done by treating the markup as text and providing a standard set of text manipulation functions which apply to both the content and the markup. Alternatively the editing process may provide the author with the same set of text manipulations but treat the markup separately by graphically interpreting it through the use of indentation, whitespace and highlighting (as found in IBM's LEXX editor). In a WYSIWYG environment the author is usually directly manipulating the physical representation (the-document-as-a-scroll) with all `markup' inserted invisibly and interpreted faithfully on the screen. There is little concept of the abstract structure, although `style sheets' give the illusion that such a structure exists by allowing logical names to be associated with groups of physical formatting specifications that are applied to specific paragraphs. Some WYSIWYG systems (e.g. Microsoft WORD) give the author access to the page representation, or the option of swapping between both representations as required.
Figure 2.3: A multimedia document
In a modern multimedia document environment, the models manipulated by the computer programs are no longer those of traditional printing technology, with common interfaces and operational semantics. A document may consist of a collection of video sequences, audio clips and computer animations as well as text. Abstract but physically-based markup can no longer be used to define `how to' present each piece of information because there is (as yet) no standard practice to follow for presenting non-textual information. In any case instructions such as `leave 2 seconds of space and then show this video clip in the top-left corner of the screen with that text next to it' leave little room for true hypermedia which necessitates user-directed interaction.
Although physically-oriented markup has a limited role in a truly multimedia environment, it has a crucial role to play in describing each different component of the document, describing its representation and its purpose (especially for non-textual information). The markup can therefore be used in two ways
i) To encode or represent the various document objects themselves
ii) To describe meta-information about the objects or their intended use.
For example, figure 2.3 demonstrates a document which consists of some text, a diagram and a video sequence. Figure 2.4 shows how it might be coded according to an SGML DTD, with the document structure used as a container to hold the various encodings of the document media (text, picture and video). The contents of the text objects are also coded according to the DTD, although the diagram object is coded according to some external scheme and the video object as a mixture of SGML markup and external scheme. All the document objects (text, diagram and video) are coded using SGML markup and also have special tags which give information about the objects.
<mmdoc>
<element type=text>Welcome to the Department of Electronics and Computer Science. <p>Click on the map below.</>
<element type=diag size=7x8
rendering=winmetafile>
AA145367382A5...</>
<element type=text>The Department was formed in 1990 as a merger between the Departments of Electronics (Faculty of Engineering) and Computer Science (Maths). This has resulted in a successful partnership of hardware and software expertise.</>
<element type=video size=3x3 rendering=frames>
<frame num=1 timecode=002701>AA145367382A5...
<frame num=2 timecode=002702>AA145367382A5...
<frame num=3 timecode=002703>AA145367382A5...
</>
</mmdoc>
Figure 2.4: Representing the document with SGML
Some of the issues that a document composition system must deal with in converting between logical and physical structures (legibility, readability, hyphenation, typographic design) can be seen in [126], but the nature of that relationship between these structures varies between the different systems of logical markup. For LATEX and ms/mm, which are high-level macro packages built on top of specific text formatters (TEX and troff respectively), the high level markup is simply a disguise for a sequence of physical formatting operations. Each piece of `logical markup' is actually a set of font changes and spacing commands in disguise. ODA maintains separate physical and abstract structures for a document in parallel, and provides an explicit mapping between the two structures. SGML, by contrast, deals with no concept of a physical structure, and devolves all formatting issues to separate application programs. (In fact the SGML LINK facility [29] can be used to provide stylesheet-like formatting capabilities, and the forthcoming DSSSL standard [2] defines an ODA-like mechanism for mapping SGML structure onto a physical structure.)
Extending the hypertext model to include typed links allows a structure to be imposed on the information content. The most frequently used model for this structure is that of human memory. Bush's paper held the view that human memory worked in the same fundamentally unstructured way as the associative links provided by the Memex, however modern theories of human cognition favour the semantic network [78], which as a network of nodes joined by typed links is congruent to a hypertext network with typed links. Various authors have tried to identify sets of link types sufficient for structuring a hypertext--[52] advocates the use of seven link types: being (subset relationship), showing, causing, using, having, including, and similarity. Nelson [109] provisionally suggests a large number of link types for Xanadu including correction, comment, translation, quote, expansion, suggested pathway and citation. Xanadu link types are entirely arbitrary, but in general the set of link types should be both small enough to maintain a rigid structure and large enough to be generally applicable in all situations.
Whereas the links model the relationships in a semantic network, the nodes of the hypertext are the information content proper. For a semantic network, each node has a fine information granularity, dealing in individual concepts or propositions, i.e. textual micro-structures where each node is a self-contained entity and does not rely on a global context for its meaning. Even at this level the hypertext network may be constructed to model one of two alternative semantic structures. It may either plot the relationship between facts and concepts in the knowledge domain or depict the cognitive structure of the expert's understanding of the field. The latter choice is seen as a powerful tool since the goal of education is to transfer the expert's cognitive structure to the novice [132].
Although the cognitive structure approach to hypertext provides a sound model of knowledge and concept, and is a popular model for implementing hypertext systems [77, 81, 84], it has no model of the text itself. Begeman & Conklin document the difficulty of comprehension for readers in such a hypertext environment, since however clear the concepts in each individual node are it becomes impossible to track the thread of these concepts through many linked nodes [10]. The author is limited to expressing ideas in fine-grained, distinct units which obscure the overall development of any larger ideas, i.e. there are no macro- or super-structures. Traditional text with its linear form allows for the development of ideas and arguments building on the successive disclosure of individual points and concepts. In such a medium there is an evolving context that can be built on: in a hypertext medium there is no such context as there is no `correct' path through the network, and no guarantee that the reader will have encountered a particular piece of information.
Hence, in a cognitive structure there can be no larger theme or overall point of view which directs the authoring process as the semantic network deals only with atomic facts or propositions and their interrelationships. One of the first rules of writing taught at school, is that a text should have a beginning, a middle and an end. This is not the case in a cognitively structured hypertext since there is no ordering or natural sequencing imposed upon the information.
Since links (as opposed to buttons which anchor the links to a position on the computer screen) have no independent existence, no extra information can be attached to them in the form of names, types or attributes. It is also not possible to provide a graphical browser to aid navigation through the network (this statement refers to the `plain' package--since HyperCard contains a powerful programming language it is possible to emulate more sophisticated hypertext features).
Other systems (such as Intermedia) do implement links as first-class objects, and allow direct manipulation of nodes, links and the network as a whole. These implementations also provide for extra information (in the form of tables of attributes) to be attached to nodes and links for use in the navigation process, allowing the reader (or the system) to filter links according to some criterion of relevancy.
As an example, the gIBIS system [10] has a small number of node and link types that allow the reader to differentiate between the kinds of information that are connected to any particular node. When viewing a node the reader may choose to follow up other information which supports, contradicts or in some other way relates to the current issue. All this information is deducible from the type of the individual links, and allows the reader to make decisions about the linked material without traversing the link. A similar phenomenon is demonstrated by McAleese's [51] method of qualifying citations for traditional texts. By providing extra information about the type of the citation readers can judge whether the reference is relevant to them.
Structure in a hypertext network is the organising principle that determines how the individual nodes are arranged and related to each other through the links. Typed links provide a useful means of structuring the hypertext network by placing some organising criteria on the information accessible from each node in the network. Such an organisation of the relationship between nodes in the network may, according to the choice of link types, reflect the relationship between the propositions in the knowledge domain, the organizational structure of the author's understanding, or a higher level text-oriented structure.
The choice between various kinds of structure as a basis for a hypertext model has consequences for the users of the hypertext. We have seen that the primitive associative-semantic structure provides no context and provides certain difficulties for effective use by readers. In the next chapter we look at other structuring mechanisms which provide better navigation facilities.
In [40] Cole & Brown, remarking upon the similarities between paper documents and hypertexts, state
"it also seems sensible to make provision for readers to have the advantage of hypertext navigation when viewing a document on screen, even if the document is eventually intended to be read from paper. These aims could best be achieved by having a common underlying representation for the structures of both types of document, together with well-defined ways of mapping these structures into different forms of representation... it is not suggested, of course, that a document designed for paper would necessarily make a good hypertext or vice versa, only that a usable representation should be readily available by applying different presentation styles."
Here we describe LACE [38, 118], a hypertext presentation environment built on the LATEX document production system. LACE turns each document into a database of components which can be individually addressed. A document can be viewed as a contiguous whole or have isolated components extracted. The logical structure of the document is important because it provides both coherence to the document as a whole and a mechanism for deconstructing the document.
LACE addresses the goal of automatic hypertext generation by producing a hypertext from the original sources used to create a paper document. LACE's hypertext viewer uses a document's explicit structural information (chapters, sections, floats, marginalia) and existing navigation structures (table of contents, index, citations).
The advantage that the Memex had was a huge increase in speed and convenience for the user--no more walking through miles of bookshelves or flicking through hundreds of pages to locate a single piece of information. Instead, everything was to be available through the motion of a lever. One of the appealing characteristics of the Memex was that it gave the reader just what they were used to: complete pages of text (lots of information available at a glance) which were designed especially to make the reading process easier. Use was made of both horizontal and vertical white space to set off important information and to help divide the text into units of paragraphs, sections and chapters. Different styles of letter-shape were used to further highlight and draw attention to important information. All this was available because the Memex gave a photographic reproduction of the original texts at their original size.
Limitations of size and layout hinder many hypertext systems from fulfilling the fundamental goal of hypertext. Small fixed node sizes force the author to break material into unnatural chunks thus hiding information from the reader, and the lack of typographic devices means that the reader finds it harder to locate information. Typically the author is forced to use excessive spacing in the layout to force items to stand out which in turn exacerbates the problem of size.
A further limitation is the insular nature of these systems--documents have to be explicitly authored within the system and can only be read using it. There is a need for open standards of access when building up a network database of literature so that one system may act as a hypertext viewer to a set of documents, while another can perform textual criticisms of the documents' contents and yet another may perform automatic content analysis. A great deal of effort must be put into creating an online literature database; it is imperative that the result is easily extensible and reusable.
Centuries of use of paper-based books and journals have led to many developments in their presentation which have enhanced the way in which we extract information from the page. The techniques of typography and layout to which we have already alluded, footnotes, indexes, tables of contents, citations and bibliographies all help us to navigate through printed material. Even physical attributes of the document (such as the relative thickness of the document) provide navigational clues when we read [11]. It is important that problems associated with readability and the convenience of the user-interface are solved, otherwise that which we call hypertext indicating facilities in excess of a normal text actually becomes hypotext or substandard literature which is no longer as useful as its original paper form.
LACE was conceived as a solution to some of these limitations. Instead of a fixed array of characters, LACE supports typeset pages, with different font styles and sizes used as they would be in a printed document. A page allows more information to be presented to the user, so reducing the problem of information fragmentation. LACE avoids insularity by using documents that conform to various common generic markup schemes (LATEX, WEB and troff's man macros) allowing them to be viewed in a hypertext environment or formatted for printing without modification.
In the following sections Lace will be explained according to the various concerns of a hypertext system: how to represent, store and retrieve documents (back end issues); what facilities to provide for browsing published documents and navigating through the body of published works (reader's front end); and what facilities to give provide for creating documents and for linking them into the existing body of literature (author's front end).
logical markup allows authors to make their intentions explicit
logical markup allows a document to be `ported' easily between diverse applications and systems
logical markup is very commonly used within the academic community
It is the first reason which is in fact the touchstone of LACE's approach to documents. An academic document often takes the form of a reasoned argument, and an argument involves a sequential development of points and a hierarchy of ideas and information which support and contribute to the main thesis. In a similar fashion a technical document frequently follows the structure of lexical taxonomy, in which the discourse proceeds from a general class to the subclasses and their particulars. The structure of such documents allows the reader to understand the contribution that a particular statement makes to the overall argument or theme and to make relationships between ideas that are being developed and ideas that have been previously established. If the structure is not stated clearly enough then readers are left to their own devices to make decisions about the function of a new piece of information, whether it is a subsidiary point of a previous topic or whether it stands alone as a major item in its own right, leading to ambiguity and confusion. Tyler [140] argues that the understanding of a document presupposes that the text as a whole is composed of a hierarchy of parts, and that comprehension of the text comes from construing those parts. The structure with which logical markup systems usually deal is a mixture of the macro-structures and superstructures of discourse analysis--the sections, subsections and sub-subsections are often used to dissect the subject-specific content, while their agglomeration into chapters and complete documents is controlled by the genre's (implicit) superstructure.
The use of logical markup which mirrors the logical structure of a document (structural markup) reinforces the semantics of the text that is being created. This is often helpful to the author as it coincides with the process of creating an outline of the document which both directs and constrains the authoring process. It is also helpful to the reader because the formatting process may use the structural semantics to provide visual clues to aid comprehension of the flow of the text (for example, the titles of key points may be emphasised by representing them in a bold font while subordinate information may be separated from a major points by extra vertical and horizontal space). It is also useful in a hypertext environment, because it makes explicit both the division of information amongst separate nodes and an initial set of links that can be established between those nodes--in effect providing method of automating the production of a hypertext network from a `flat' document.
SGML would seem to be the natural markup scheme for use with LACE, but LATEX [85] and troff's man package were chosen instead because of their widespread use within the academic community. In both cases the markup is actually implemented by programming language embedded in a lower-level formatting engine (see [8] and [85] for full descriptions). Of the two, LATEX is more widely used as it has been ported to all major computers and is compatible with all major printer types, whereas troff is mainly available on UNIX systems. LATEX is also the more flexible since it defines a one-to-many mapping between the abstract and physical structures by use of different document styles. Support has been provided for both troff's man macros and LATEX, although the description of LACE that follows will assume the latter.
As well as reinforcing the semantics of the text, structural markup frees the author from making decisions about the visual design of the document. This is especially important when the physical representation of the information may change radically for each different publication medium (computer screen, low fidelity computer printout or book). One of the goals of the LACE project is the reusability of literature. It is important that documents authored for one system are capable of being used in another, making the use of a text-based interchange format appropriate. LATEX allows the author to `plug in' different document styles to radically alter the physical representation of the document. LACE extends this capability by defining a hypertext style that formats the elements of the document's structure for display as nodes in a hypertext network.
The publishing process which makes a document accessible to the world at large involves entering it into a host-wide database of documents presided over by the document librarian, also known as the lace dæmon. The database holds information about the document such as its title, keywords that sum up its contents, its access permissions, its type (video, LATEX, WEB or man) and its location in the filestore.
In LACE the structural elements of the document are the nodes of the network--each element of the document may be individually addressed by naming its position within the structure and the title or number associated with it (e.g. `abstract', `chapter 1', `section Troubleshooting and Diagnostics' or `table 3.5'). As well as this, each document is published by entering its details into a site-wide database. A librarian process listens for requests for individual elements from particular documents, and displays them in an appropriate fashion on the console.
Each individual document is itself a database--a database of logical elements (in the logical markup sense) and the relationships between them. The document librarian is a process that listens over the network for requests of the form Far From The Madding Crowd:chapter 4. The librarian then looks in its database to find the document whose title is Far From the Madding Crowd, checks that it has public access, and works out where it is stored. The document is then inspected, and chapter 4 is extracted from it and sent over the network to the user who requested it. Each document also has a short nick-name stored in the database along with the title, so that requests don't become cumbersome to type!
The structure-based addressing scheme acts to unify various document types. Videodisks are published divided into chapters which can be logically subdivided into different video sequences or `sections'. This allows links to be made to various media: the librarian process is responsible for displaying the various data types in an appropriate fashion (a window on the console, a sequence on a separate video display monitor or an audio sequence from a CD player). In this sense, LACE is a hybrid hypermedia system, as explained in section 1.2.1, as it allows simple access to information in different media.
LACE uses all the facilities provided by the markup scheme to present the document to the reader in a useful way. For example, the author's use of chapters, section and subsections will not only allow the hypertext machinery to request that particular element, but also provides a table of contents with buttons to call up the sections automatically. This information is also provided as a menu and in pictorial form as a tree.
To publish a paper, it must first of all be converted into a form that the LACE librarian will accept. For LATEX documents this involves adding the lace documentstyle option at the head of the document and running latex as normal followed by hyperdvi. WEB documents (from TEX's structured programming language [82]) should first be processed by tex, (but should \input hyperwebmac instead of the standard webmac), and then by hyperdvi. Troff manual page are processed by hypertroff -man. All these processes create a .ps file that will be used by the librarian, and a .ps.map file that indexes the PostScript file by the original document's logical structure.
Publishing consists of making an entry in the librarian's database. This is accomplished by running the command lace -n foo, replacing foo with the name of the file that holds the document. The author is then prompted for a number of pieces of information (such as keywords describing the content of the document), and the database is updated.
At a casual glance, LACE provides a LATEX previewing facility, and allows the page to pass as part of the logical structure. This is convenient in allowing reader to make use of the familiar book metaphor. When a request is made for a document without specifying any substructure the librarian process returns `page 1'. The LACE menu has the usual TEX previewer functions of moving between pages as well as the hypertext capability of stepping backwards through the list of nodes seen so far. The Table of Contents, List of Tables and Lists of Figures, usually seen in the document's front matter are also turned into submenus which bring up a new window containing only the appropriate document element. As well as the menu these table structures have buttons placed over each line, so that clicking on the line in the Table of Contents which refers to section 3.3 also brings up a new window containing that node.
LACE buttons are transparent `patches' which respond to a mouseclick by sending a request for a document part to the librarian process. Visual cueing is left to the document style or author to decide, though it is anticipated that the author will never have to explicitly request a hypertext link. Instead, LACE attempts to infer as many links as possible from the structural markup. This is done not only for navigation structures like the contents list and lists of tables structures, but also for cross references and citations.
Some of the document's physical representation has been changed for more suitable behaviour in a hypertext environment. Document formatters make every effort to move some structures (table, figures and footnotes) out of the main body of text as they interfere with the flow of the discourse. Usually they are `floated' to the top of a new page or a page by themselves to minimise their interaction with the surrounding text. In a hypertext network they can be taken out of the containing node entirely. Tables, figures and footnotes are displayed by clicking on cross-references to them.
Figure 3.1: The title page of a LACE document
Figure 3.1 shows the title page of a LACE document. The window is surrounded by a frame with the title of the document, and an indication of the number of pages through the document (here on page 0 out of 26). In the top left-hand corner is the close button that shrinks the window to an icon when it is pressed with the left mouse button. In the bottom right-hand corner is the adjust button that shrinks or expands the window by clicking on it with the left mouse button and dragging the window until it has the required size. The bottom left-hand corner houses the zoom button which magnifies the size of the page by clicking on it with the left button (zoom in) or shrinks it with the right mouse button (zoom out). The upward, downward, left and right facing arrow-heads are scroll buttons that are activated by the left mouse button. To move the window, click anywhere in the frame and drag with the middle mouse button.
The menu is brought up with the right mouse button. The menu items Next Page and Previous Page advance through the document and back, one page at a time. Goto Page brings up a submenu with the page numbers that can be chosen directly, and Back returns the reader one at a time backwards through the list of pages that they have visited.
The Contents submenu lists the major document structures (exactly like a table of contents) and brings up the appropriate part in an independent window. It helps the reader to navigate quickly to the information wanted, as long as its position is known. There are also submenus for Figures and Tables which will only appear if the document had any figures or tables in it. A new window with the appropriate figure or table will be displayed when the reader makes a choice from one of these submenus. Since the object has been `floated out of the document' and away from the main flow of text the user will only see the object by using these menus, or by clicking on an explicit reference to that object.
The Add to Trail menu item puts a reference to the current element on display in the reader's trail. A trail is simply a document which is composed of list of references to other documents, with the effect that list of interesting references can be saved and replayed later. An automatic trail is also kept which consists of every document element seen, whether interesting or not! The Gotos and ComeFroms items give submenus detailing the other document parts (or nodes) that are referenced by this node or that reference this node, respectively. This enables the reader to jump to any piece of information that has been marked as being relevant to this node. The last item, Zap, destroys the window.
Figure 3.2 shows the the end of a (somewhat hectic) session which demonstrates many of LACE's features. The title page of a document similar to that shown in figure 3.1 is in the top left corner, as are further pages. Page 1 is shown slightly to the right, displayed in the same fashion as a printed page, but with a number of hypertext effects added, all implemented as buttons on the page. A button is an invisible patch placed over a significant word or phrase on the window. Since it is invisible, it is left to the typesetting software to mark the phrase as `active', generally by using a different font. Each document style will choose its own convention for displaying important material. When the mouse pointer is moved over a button the cursor is changed to a cross to indicate that a hypertext jump is available.
Buttons have been placed over each entry in the table of contents, list of tables and list of figures. Clicking on any line in the table of contents or list of tables performs a jump to that part of the document--a button has been placed around each line of the table of contents that sends a request to the LACE librarian for the specified node. Footnotes are treated similarly. Each footnote marker (usually a superscripted number or asterisk) has a button placed over it. When the button is clicked, the text of the footnote is displayed in a new window. References to other parts of the document are also given buttons: clicking on a piece of text that looks like for further details, see section 4.5 will bring up a new window with section 4.5 in it. Citations of other works are treated similarly: the citation markers (like `[8]' or `Rahtz85') are given buttons that bring up the full reference from the document's bibliography. There is currently no method for bringing up that document, even if it is held in the system.
To the right and overlapping with the first page in figure 3.2 is a window containing the Introduction section from the same document. It contains some of the text which is on the first page, but is not finished by the page boundary, containing all the text from section 1 and any enclosed subsections. The long window over the top, slightly obscured by the menu is a footnote window, brought up by clicking the (obscured) footnote marker. Below the title page is an experimental table of contents, produced by the LATEX typesetting software as part of the LACE document style, intended to give an alternative form of graphical navigation. The other windows are section 1, a table and a figure from another document. Appendix 2.1 contains further information on the use of LACE along with details of its implementation.
LACE allows a large degree of access to multiple information types by virtue of its unified data addressing scheme. Many of these types of information are provided by the typesetting software (diagrams, tables, complex mathematical equations, graphs) and are really part of a fundamentally textual document. Video information truly is a different medium, although it is not currently fully integrated into the environment since it is not currently possible to have a video node (or a subpart of a video node) as the source of a link. However, the generality of the addressing scheme allows computational information types (such as dynamic references to tables in a relational database) to be intermixed with text and video.
The large screen which LACE uses to display its windows allow a large body of information to be shown to the reader at once. This allows the reader to work in the familiar book paradigm, using its familiar navigation and browsing techniques (tables of contents, indexes, cross references).
Links are not currently first-class objects, but are simply references to the destination address invisibly embedded in the text. However, much can be done at the time of publication, including building a list of the links into and out of each node. The role often played by a graphical browser is adequately given by the table of contents. Changing the publication process so that links are included in the document's map file is sufficient to make links into first-class objects, allowing the creation of a true graphical browser.
An advantage of LACE is that it allows reusability of each document. The same version that was produced as a paper report can be stored as part of a hypertext network because of the generic nature of the markup that is used to describe it. If hypertext is to succeed as a medium it is just this sort of integration which is necessary, otherwise a prohibitive expense will be incurred in redocumentation. Practical experience shows, however, that authors tend to write for a particular medium, and even for a particular house style, especially if they are not anticipating the reuse of their efforts. Often this involves abandoning logical markup and mixing text formatting commands with the logical markup to gain a specific layout effect. Unfortunately, this meant that it was frequently necessary to alter (although one might say "improve") the document source to produce a suitable hypertext conversion.
A disadvantage of LACE is that is based on a non-WYSIWYG process, i.e. there are two versions of the document: the one that the author wrote (the LATEX document) and the POSTSCRIPT version that the typesetting software created. Because of this it is difficult to make any changes to a document, including adding new links, without resorting to a `recompilation' process. Since the typesetting software also destroys the sense of the document (turning it into a list of characters and white spaces) it is difficult to provide a dynamic querying of the document. LACE makes up for this deficiency by searching the source of the document and returning the appropriate node from the compiled version.
Acrobat, a more recent document environment than LACE, turns each document into a database of objects which can be individually addressed. Unlike LACE, the document is structured not according to its logical content, but its physical formatting characteristics. Acrobat does not attempt to automatically generate hypertext from an existing document, instead it `normalises' the document's formatted representation and provides a viewer which can add hypertext links and annotations to the document. Both Dynatext [46] and Grif [116] (also more recent systems) are similar in concept to LACE, but use SGML style markup instead of LATEX. DynaText directly interprets the SGML markup to produce a formatted physical structure, and so does not suffer from the disadvantages of LACE`s compilation process. DynaText also provides full-text indexing of its documents, whereas LACE only deals with a document's gross structural elements.
The end result of the authoring process for a paper document is not a database of facts but a discourse or directed presentation whose structure is a part of the meaning of the discourse [135]. LACE exploits this explicit structure of presentation as a navigation tool for browsing the hypertext. Grif [116] and DynaText [46] also perform a similar function for the structure of documents expressed in SGML.
Narnard and Narnard [106] also work with a model of structured documents, but extend it with parallel structures representing knowledge domain concepts and task domain concepts. Explicit relationships (or links) are set up by the author between the various layers of structure (from the document to the knowledge base or from the knowledge base to the task descriptions), so that the system can use these different information sources to improve the reader's navigation around the document layer. In this system, document structure is augmented by a content-based knowledge structure as a navigational utility, and can be used to generate new views of the original documents (synthesised documents). See section 3.2.4 for a description of a system which provides this kind of layering of structures to aid authoring.
Others have also seen structure as a means of navigating a hypertext, even when structure is not explicitly present in the network. Such post-hoc structuring techniques are explored in [18], where graph-theoretical algorithms are used to identify abstractions such as aggregate concepts within a network. Another post-hoc method of imposing structure on an existing hypertext to aid readers is given in [94], where implicit structures are derived from the spatial relationships of a displayed hypertext network. Salton uses textual comparisons to identify similarities between groups of texts, and thus creates a linked text structure to allow navigation [127].
Hypertext studies have looked at the mechanisms of joining and intertwining information units, of the mechanics of hypertext jumps, the technology which supports reading and analyses of comprehension. What is frequently lacking is the context of a complete document life-cycle in which to fit the finished hypertext: how is information presented for reading, and how is a document composed from the raw information.
Structure can be used not only to present pre-existing material but also to direct and constrain the authoring process. This section describes an extension to LACE to encompass the authoring stage of the document life-cycle by providing a simple model of the authoring process.
The previous description of LACE has shown how a document's logical structure is useful for imposing a hypertext presentation upon it and why that should be so: a document is created and shaped according to the rules governed by its superstructure. This section covers a model for authoring new material in a hypertext environment. LACE utilised "off-the-peg" literature, while other systems have assumed that pre-written (or pre-planned) material is to be imported chunk-by-chunk [72] in an author-unfriendly fashion. Hutchings describes in chapter 8 of [72] the necessity for a complete storyboarding process to take place away from the hypertext system in order to establish the required structure and contents without confusion. Most realistic writing assignments, however, involve an author starting from scratch without the benefit of a file containing the contents of the finished assignment.
Lace '92 is a prototype system which implements a simple model of authorship in which an author first of all researches (i.e. searches for relevant ideas and information), then chooses the information appropriate to the task, then organises the information within an informal structure. This structure develops iteratively into the logical structure of the authored document, and is used to create a new LATEX or SGML document containing the references and quotations chosen in the earlier stage.
This model assumes that the author is provided with more than enough source material (or information) to produce the finished work, and places the author in the role of a sculptor, chiselling away unwanted material to reveal a work of art beneath.
LACE-92 uses WAIS information retrieval techniques (see section 4.2 for a full discussion of WAIS) to provide the author's set of source documents from network-based information servers. Much of the material would be from pre-published works, and may be in a structured, hypertext format, allowing the chosen sources to act as transclusions or links back to their native hypertext networks. In this way, a new document will automatically be linked into the existing literature from the very start.
The partitioning and sorting described above proceeds on a network diagram, where the overall structure of the network is constrained by the document's chosen superstructure. When these processes have run to completion, the author has produced a skeleton of the finished document. It contains a framework consisting of the points and themes which are to be presented, with the sources placed appropriately. The framework is fully linked with the source literature, and, by virtue of the chosen document structure is also internally linked. The network browser could also facilitate the production of high level summaries, automated tables of contents, citations, references, footnotes, glossaries and other physical document structures.
What goes into the network is mainly "information" and "quotations" which are linked back to the original sources. However the final stage of production is a written document, not a handful of transcluded texts and citations, so it is necessary to do some real writing! This may be achieved in the browser, but the whole framework can instead be exported as an SGML or LATEX document for editing in a conventional environment.
Figure 3.3: The WAIS information-gatherer component of Lace'92.
LACE-92 exists as a prototype implemented in HyperCard, and is based around two HyperCard stacks. One (see figure 3.3) is for information retrieval using the WAIS protocols and a WAIS-like interface and the other (see figure 3.4) for manipulating the relevant retrieved information.
According to the simple authoring model of Lace '92, the author starts with a writing task which involves selecting relevant material from a library of information. This is done by choosing from among a menu of information servers, each typically concerned with a single topic. The field at the top left of the WAIS window in figure 3.3 shows that the user has chosen the database of CACM articles at the Internet site quake.think.com (these are a set of articles on the subject of hypertext which were originally published in [1]). The field at the top right of the window shows that the user has asked for articles about "hypertext" and the scrolling field in the middle of the screen shows the list of articles returned by this query. The user clicks on one of the lines to retrieve the full article which is then displayed in the bottom field.
Figure 3.4: The information-organiser component of Lace'92.
The user may select any text which is relevant to the task in hand, and by dragging it offscreen can have it placed in a new field in the organising window (figure 3.4). Along with the selected text, the system stores the details of the chosen article's remote storage address, the server it was obtained from and the offset of the selection from the start of the article. As the user builds up more and more of these selections they can be easily moved around the screen to reflect some form of incremental organisation. Each text selection can have its enclosing field moved, resized or deleted, and the text can be reformatted with single keypresses.
Once a collection of document fragments has been retrieved it is partitioned and sorted by laying the information out as an (informal) network. The network structure of the information is expressed by its position on the screen, with similarly relevant pieces of information being clustered together (this use of spatial layout is also described as a way of creating structure in [92]). To alleviate clutter on small screens, it is possible to select the nodes in a cluster and have them replaced by a named aggregate node which expands to a new (subordinate) network diagram by choosing New Partition from the Lace '92 menu. The network of document fragments should be created according to the required document superstructure, although this is not enforced. The collection, sorting and partitioning phases may go on in parallel and may be iterated many times, but once the user is ready to proceed with the writing phase of the task (combining the collected quotations and evidence into an original work) then the current state of the document can be dumped (in SGML or LATEX format) to a text file by choosing Export Structure from the Lace '92 menu. Not only is the partitioning reflected in the heading structure of the file, but also each quotation is recorded with its reference from the WAIS retrieval, so that this new piece of writing is created with `links' back into a wider body of literature.
Marshall & Shipman [94] point out that although unconstrained hypertexts are the norm, embedded constraints can aid the coherence and consistency of a network. Many systems which allow unconstrained hypertext, support the author by making it more easy to create consistent structures which are constrained according to some chosen type or model. These constraints may take the form of standard structural components which can be `plugged' together to produce the overall network. Both Notecards and Intermedia developers have recognised the need for this kind of author support. The NoteCards approach is the Instructional Design Environment [79] which provides structure accelerators, in the form of template cards (containing prototype text and links for various styles of card), automatic links (which create a new anchor, link and card of the appropriate type in one operation) and a structure library (which is a global store of named, contentless hypertext structures taken from specific instantiations of networks of cards). Intermedia provides Hypermedia Templates [39] which are a combination of IDE's template cards and structure library, containing both node prototype contents and inter-node links.
Both the Instructional Design Environment and the Hypermedia Templates provide "fill-in-the-blanks" support for authors to be able to rapidly create larger hypertext networks, but provide no support for the cognitive disciplines involved in writing (creating the node content). The cognitive overhead required for producing the hypertext form is lessened, but support is given not for what to say or how to say it but only where to say it.
Various authors have proposed hypertext models containing composite nodes, i.e. nodes which `contain' subnetworks of linked nodes [49, 91, 130]. These models help authors to compose large complex artefacts from smaller subgraphs, also resulting in improved human comprehension of the network structure. The HyDesign model [91] also helps the author to structure the network by providing aggregate links for sequence, hierarchy and group abstractions. De Bra et al [49] introduce other network abstractions: the tower (for representing multiple levels of description of a concept) and the city (for multiple views and perspectives of a concept).
A number of other hypertext systems have been designed to allow explicit representation of structure. Nanard & Nanard describe in [105] their use of MacWeb to allow the description and capture of knowledge-domain information. The MacWeb hypertext kernel provides weakly typed nodes and links; the application built upon it makes use of a separate hypertext network to define relationships between the different link and node types which are to be used in the main hypertext. It is this augmented type structure which is used as a basis of building the hypertext according to a specific pattern of knowledge elicitation.
Kaindl and Snaprud [80] make it explicit that there are two quite separate structures which the author needs to attend to: the structure of the text and the structure of the underlying knowledge. Whereas IDE, HyperMedia Templates and Narnard's systems attempt to structure the text according to (gross) relationships in the knowledge domain, Kaindl presents a mechanism for unifying the two structures in which the hypertext is implemented by a system of frames inside a knowledge representation tool. The rules of the system ensure a close match between the structure of the hypertext and knowledge domains, with appropriate links automatically managed between the corresponding nodes.
Aquanet was designed to allow authors to express structured relationships between hypertext nodes, but experiences from its use [93] show that authors do not rely on a predefined library of network structures. Instead they try to define their own schemas for hypertext structures, often without a full understanding of those structures, and were consequently frustrated by a system unable to support flexible schema modification. Similar problems of premature commitment were also seen by users of Notecards [93] and even users of non-hypertext structured document editors [22]. Aquanet users circumvented this problem by using the main display space as a drawing board, and expressing developing relationships between objects as similar spatial relationships between the object icons (e.g. similar objects may be placed in a messy pile; a `uses' relationship may be expressed by placing the `user' on the left-hand side of the `used'). Marshall & Rogers [93] describe the Aquanet users' various manipulations of representational structure as crucial to the author's interpretive process, and a basis for subsequent writing activity even if the hypertext network and content are not reused.
One of the aims of the SEPIA system [137, 138] is to help avoid problems of premature organisation of unfinished or poorly understood ideas by offering authors more assistance than organising linked nodes of text. It makes a similar but more extensive separation of domains than are seen in Kaindl's system. SEPIA provides an authoring environment based on the ideas of micro-, macro- and super-structures expanded in chapter 2. It provides a number of writing spaces in which different types of writing activity are performed, including a content space for building up semantic networks representing the domain knowledge (as well as notes, excerpts and whole authored texts), an argumentation space for generating, ordering and relating arguments about the knowledge and a rhetorical space for organising and re-organising the global outline, issues, arguments and coherent sentences. SEPIA is more complete than LACE-92 in that it provides a separate activity for the authoring processes associated with each `level' of structure composing the text. LACE-92 only allows the author to manipulate the final `rhetorically and argumentatively complete' text, even while trying to manipulate the basic relationships between the knowledge of the knowledge domain.
Stott & Furuta [135] classify hypertexts as browsable databases (hyperbases) or nonlinear documents (hyperdocuments) according to the method of constructing a hypertext from the lexias. A hyperbase is non-intentional (i.e. there is no overall presentation or argument), evolves over time and requires search and query techniques to augment link following to form a useful browsing strategy. Conversely, if the network browsing strategy is largely determined by the structures imposed by a co-ordinated act of authorship then it forms a hyperdocument. Note that this differentiation concerns the network, not the components: a large corpus of documents could form a hyperbase if there was no overall structure to the collection. The following subsections describe several hypertext environments which make use of novel structuring techniques.
This system is very similar in concept to HyperSet: it also implements a hyperbase without any directed links. Its browsing strategy is similar, but instead of moving from object to object' (as in normal hypertext) or object to set to intersecting set to object', the navigation goes from object to powerset to object, as shown below.
O is the set of all objects, S is the set of all sets, P is the powerset of all sets, and the set of sets to which o belongs is denoted as so. We know that so [[propersubset]] P and the number of elements of so is |so|. The set of sets to which both o and o' belong is given as so [[intersection]] so', and has |so [[intersection]] so'| items in it. A link notionally exists between o and o' if |so [[intersection]] so'| > t (where t is some threshold value, possibly 0). The links are prioritised according to the value of |so [[intersection]] so'|, so that the object with the largest number of common sets is given as first choice for the user's next destination.
For example, if
S = { intro, intermediate, advanced, biology, medicine, mechanics, example, text},
then
P={{}, {intro}, {intermediate}, ... {physics}, {intro, inter}, ... {medicine, physics}, {intro, intermediate, advanced}...{biology, medicine, physics}...}
and
s1 = {intro, medicine, text} |s1 [[intersection]] s2| = 0
s2 =
{inter, mechanics, text} |s1 [[intersection]] s3| = 1
s3 =
{intro, biology, examples} |s1 [[intersection]] s4| = 2
s4 =
{intro, medicine, examples}
and so the most important destination from object 1 would be object 3.
Theseus is not a hypertext system nor strictly a hypertext model, since it addresses few issues of hypertext technology, the implementation of hypertext mechanisms or the behavioural properties which they have. Instead it is a model for authoring `hypertexts' using currently available technology and currently available systems; there is one commercial `hypertext' which the Theseus Project has produced on the subject of `Cytology' using SuperCard and QuickTime on the Macintosh environment [68]. Theseus hypertext organisation consists of two quite distinct conceptual layers: mediabases and subject paths [70].
A mediabase consists of a number of objects (or nodes). The objects are of different kinds and probably different media. They may be simple text strings, whole documents or complete executable applications. The only distinctive requirement for an object is that it must be complete, and able to be used without reference to other objects. This completeness is intended both in the sense of hypertext (objects do not contain links to other objects) and in the sense of meaning (each object should be a self-supporting, standalone statement, amenable to understanding without recourse to other objects). Obviously it is impossible to make an object completely standalone, without reference to any outside information; rather the aim is to have the mediabase populated with `objective' statements which can be `discussed' intelligently on their own standing.
A subject path (also known as a thesis) is a linear sequence which can contain references to objects in the mediabase. Two or more subject paths intersect when they make reference to the same mediabase object. A subject path may mix a lot of (multimedia) information in with the object references, or it may use the object references by themselves. The subject path may be retraced either a step at a time, or in one jump to the start of the path (thus the analogy of the legend of Theseus). The user maintains a personal index of possible future paths in another subject path. The set of subject paths makes up the subject layer.
A Theseus `author' is a person who has something to express about a topic. These personal viewpoints are related to a wider frame of reference to a public database of archived materials (the mediabase). Each personal viewpoint becomes braided to other personal viewpoints via the objects it points to in the mediabase. Each viewpoint is formulated according to the goals, tastes and understanding of its author. Each objective statement in the mediabase acts as a focus for different arguments within the set of personal viewpoints. An mediabase object is given significance by an author's use of it; and that use `grounds' both the object and the thesis in a network of associations.
Moulthrop proposes hypertext as a deconstructive medium which should be used to restructure and relink texts, and in which the meaning of a text is found from its relationship to other texts [102]. This is seen in Theseus, where the meaning of a text (or object) is defined in terms of the subject paths that include it. The hypermedium is seen to evolve from the intersection of subject paths, not the intersection or multiplicity of objects, in fact the function of an object is as a site for these new intersections. The mediabase is a database of objects: somewhat like the Furuta hyperbase without connecting links. The subject paths on the other hand are hyperdocuments that apply intention and connection to the hyperbase.
One of the problems of Theseus is that it does not support reflexive information linking (a subject path cannot refer to other subject paths). The consequence of this is that every thesis made about a group of objects stands as written and cannot be annotated, commented on, criticised or supported. In fact, a thesis cannot even be referred to except as an indirect consequence of referring to its objects. This is a serious consequence of the model, but receives no mention in the literature except the statement that the the subjective understanding displayed in any thesis is equally valid.
Another potential difficulty in the Theseus model is that (by implication) when referring to an object, the complete set of subject paths that also refer to that object are visible. Here the analogy with the legend of Theseus breaks down since the user may be presented with dozens or hundreds of alternative loci, not simply the two or three that are seen in a maze. The lack of reflexive linking prohibits managing this situation by providing summary or partitioning structures within the `hypermedium' to constrain and direct user browsing.
Keeping the node and links model intact but extending the node addressing scheme to allow remote nodes and defining a node transport mechanism allows a hypertext system to be extended to operate across a network. This is the approach taken by the World-Wide Web project .
Alternatively, the node and links model may be abandoned, and large-scale textual connections may be achieved by "on-the-fly" machine searches implemented in text archives and document retrieval systems. This is the approach taken by the WAIS project .
The approaches of text-retrieval and hypertext links are compared and contrasted and a compromise, applying loose links to a flexible domain of documents, is discussed. This is the approach of the Microcosm project.
The effect of these three hypertext paradigms on hypertext production and maintenance is discussed, and the use of Microcosm loose links within the World-Wide Web is discussed.
The Web is based on a client-server model, in which a standard transfer protocol (HTTP, or HyperText Transfer Protocol [14]) is used to communicate hypertext documents in a standard format (HTML, or HyperText Markup Language [13]). The client simply interprets the document's markup to provide a visual rendering of the document, to maintain a history list of the user's recent session, and to enter a dialogue with a remote server to obtain the destination document when the user activates a link. The URL (Universal Resource Locator, [15]) which is used to specify the document which serves as the link destination is composed of four parts (see figure 4.1). It is the client's job to parse the first part in order to use the correct retrieval protocol (usually HTTP) and the second part to establish a connection to the correct server host. It is the job of the server to interpret the third part (the path) to produce a document. The client displays the document returned by the host, and jumps to the part of the document labelled by the (optional) fourth part of the URL.
protocol://host/path#name
http://bright.ecs.soton.ac.uk/ResearchJournal/paper1.html
file://bright.ecs.soton.ac.uk/pub/papers/im/mcm.ps
Figure 4.1: Universal Resource Locator Definition and Examples
The path part of the URL is cast in terms of a path in a hierarchical name space and by default this path is interpreted as a file name relative to the server's `home directory'. If the path represents a directory, not a file, the server may return a directory listing with the names of the files as links which, when activated, return the actual files themselves. Similarly, a part of the name space can be used to specify a program to be run and the arguments to be passed to it, so that the user is returned a document `composed on the fly'. The URL's path component can also be interpreted as representing a document and a query string that must be matched against that document. The server may even effect a gateway to another information service (such as WAIS or Gopher). The client is ignorant of the different alternatives: it simply uses a URL as an address for retrieving a piece of information.
HTML is undergoing various revisions: the current common form is known as HTML+, which is basic HTML with added tags for defining forms, or in-document dialog boxes. Various other document structures such as abstracts are also being added to the basic HTML repertoire.
<head><title>Example WWW
Document</title></head>
<body>
<H1>Important
Information</H1>
This is an example WWW document which contains
a
<a href="http://site.edu/docs/mydoc.html">link</a>
to
another document.
<p>
The word &lq;link&rq; in the previous
paragraph is
an anchor which would appear highlighted on the
display. If
the user were to click on the anchor,
the document
<b>docs/mydoc.html</b> would be retrieved
from the computer
called <i>site.edu</i> on the Internet.
</body>
It is interesting to consider the structuring options available to the Web author. By taking advantage of the HTML architecture a text can be written as a single, coherent document, with internal cross-references. Alternatively, the text can be split into many nodes and the text's structure inferred from the links between the nodes (HTML does provide a link attribute to explicitly code the relationship between the source and destination of the links, but it is largely unused by the browsing software).
One of the factors that influences this choice is whether HTML is the original authoring environment for this particular text--a translation from another textual or hypertextual environment may dictate the preferred structuring paradigm. Information from a card-based system like HyperCard may be naturally chunked, whereas a Word document may remain as a single, complex entity. The latex2html program which is used to convert LATEX documents into HTML gives the author freedom to specify the degree of chunking--whether new nodes are to be started for each subsection, or section or chapter.
* from any given document the user selects a linked document by clicking on an anchor, so that a path of nodes is traversed in order to reach the intended destination
* the user can `jump' to the exact document required by specifying the known address (URL) of the document.
The former mechanism requires the user to follow semantic cues in the contents of the documents in order to repeatedly choose the correct links to follow. The latter requires the user to to make use of an already-known address, which may come from:
* a compiled-in list of well-known documents provided with the Web client viewer software
* a short hotlist of remembered addresses of previously visited documents which were deliberately noted by the user
* a comprehensive list of every document seen by the Web viewing software
* outside the Web environment: new sites advertising their URLs on other electronic services (mailing lists and network news), or word of mouth from colleagues
i.e. apart from link following, it is only possible to navigate to a document if you have already been there, or if you are provided with a handle to it by its author or by someone else who has been there. This is then a `pure' link-following environment, without recourse to text searches or comprehensive document catalogues; it is almost impossible to navigate the Web with the aim of finding all documents about a particular topic.
Users of the Web will be aware that documents tend to fall into two broad categories: content-bearing documents about a particular topic and catalogue documents which contain no subject-domain information themselves, but do contain many links to other information sources. These other sources may themselves be content-full documents, or may be further catalogue documents. The significant point is that it is the catalogue documents which contain pointers to external resources and not the content documents. The content documents usually contain pointers within their local document structure (especially when a single document is implemented as a tree of nodes), but few if any pointers to other relevant works.
Unlike small-scale hypertext systems, it is not possible to enumerate the nodes participating in the network before visiting them. Neither is it possible to enumerate the links in the network since they are stored inside the nodes. Effectively the network unfolds through exploration: a starting point is required, from which one obtains a set of linked nodes, and from each of these further links are discovered. The process of obtaining a single node from the network takes a certain amount of time: each stage in the process (establishing a network connection, starting a remote HTTP server, extracting the node from a disk file, and transmitting it across the network) may take hundreds of milliseconds. Experience shows that several seconds are required to retrieve a typical node, given a lightly loaded server and a quiet network. Visiting every node in the Web is likely to take many weeks (at the time of writing) during which time the Web will be changing--it is impossible to take a completely static snapshot of the network. Even if time were no obstacle, there is no guarantee that every document which is part of the Web could be reached from an arbitrary node.
Although these constraints do not allow us to gain a complete picture of the Web, we can be confident that it forms a hierarchy to a first approximation. This is because the URL space is composed of a hierarchical Internet site name space combined with a hierarchical path name space. Beyond this, each document is contained by HTML's hierarchical document architecture.
Given the constraints on perceiving the Web above, some work was undertaken by the author to analyse the structures of the Web and the patterns of authorship seen. First of all a simple Web client called wwwgrab was written which takes a URL as an argument and retrieves the node addressed by that URL. Then a UNIX shell script (hyperfind) was written which takes a URL, invokes wwwgrab to fetch the document, analyses it and then recursively invokes hyperfind on all the URLs given as link destinations (this is an example of a WWW browsing program known as a spider). Different versions of hyperfind were tried: some which limited their exploriations to a particular site or internet domain in order to gain a detailed and in-depth picture of a localised region of the Web and some which followed out-of-site links by preference in order to gain large degree of coverage of the total Web, instead of becoming bogged down in a particular site's document archive.
The hyperfind script was run on the URL of a known WWW catalogue (the WWW sites list maintained by the National Centre for Supercomputing Applications at the University of Urbana-Champagne in Illinois) and on the URLs of several major Web sites (Cern in Switzerland and JNT in the UK). This exercise produced a list of some 12,000 nodes at 600 sites over a weekend. In order to be able to obtain some understanding of the structure of the Web beyond the first approximation of a hierarchy (above) a number of these sites were displayed graphically. A hierarchical visualisation of the results for a typical site are displayed in figure 4.3 (the data were extensively pruned to show only the relevant parts of the Web within this single site).
Examining this figure we see that there are about a dozen links from the home page to information about the Web project itself, general information about the city, the department and departmental events, specific information about the research groups within the department, private information for departmental personnel, pointers to other information providers within the University and also meta-information about the Web and other Internet services. The part of the Web relating to departmental research groups has been expanded to show six research groups, of which the Formal Methods group has been focussed on. The formal methods group contains links to each of eight of the academic personnel who compose the group, as well as a number of the projects being undertaken. Following the link to one of the academics shows a number of entries to information about a journal, abstracts of a couple of papers, lists of text books, a link to a (Gopher) mail archive and an FTP-able standard definition.
It seeems to be highly significant that having reached the bottom of the hierarchy at this point, where one would expect a significant amount of data to reside, the Web documents are either information about entities external to the Web (journals, textbooks, academic activities) or references to data which is held outside the Web proper (mediated by Gopher, FTP or even "snail mail"). In particular the two papers which are mentioned are not available in HTML: they must either be downloaded from an FTP archive in compressed PostScript form (and typically printed), or else they must be requested by paper post.
What these initial results seem to show is that currently most content-full documents are actually not Web-native (stored in HTML format and mediated by HTTP), but FTP-native (stored in a possibly compressed POSTSCRIPT format, mediated by FTP), and the the Web is used to provide an accessible route to these documents. The hypertext features of the Web actually implement a user-oriented navigation structure placed on top of the more primitive FTP archive or hierarchical file system; that navigation structure is based on a familiar `prospectus' metaphor (this is our organisation; here are the departments; here are the people who work here, their CVs and pointers to their papers and documents describing their research/commercial activities).
Figure 4.3: A snapshot of the Web at a typical site
One of the drawbacks of its simple embedded links model is that it does not allow easy hyper-document maintenance. Since the links are cast in terms of a URL which gives explicit document location information, any change to the organisation of the destination site will invalidate the links. This is not an uncommon occurrence--the above analysis seems to indicate that about 3% of links are inoperative.
Another problem of the simple embedded links model is that of deadends: only HTML documents and images can contain links. Although links can point at other kinds of documents which the client will arrange to have displayed by the appropriate native viewer, these documents can not contribute to the Web--they are dead ends with no link following possible.
The embedded links model gives another problem to authors: how can you construct a document so that it contains explicit embedded references to all the data destinations that are required. It is not uncommon to find Web documents where English discourse has been replaced with a list of "click here to see X" phrases.
The problem of topic-based navigation of the Web is similar to the problem of finding a file on a particular subject among the Internet's anonymous FTP services. In that environment at first enthusiastic volunteers published regular lists of sites and kinds of files at each site. Some sites also used to provide a file containing a complete list of all the files available from their machine. Eventually a single site provided a database of the names of files available at all of the well-known anonymous FTP sites; an interactive query service (known as archie) allowed any user to find out where a file was archived given a fragment from that file's name. This service has now been replicated across several dozen sites across the whole Internet, so that any user can obtain a list of potentially relevant files as long as the name of the file is indicative of its contents. A similar system could be applied to the Web; already software is available to allow the administrator to automatically catalogue each of the Web server's files.
A number of informal attempts are being made to provide similar services for the Web. Several programs like hyperfind (the generic term for a program which travels the Web is a spider or robot) have been used to create databases of URLs and document titles; a single URL is provided which gives a fill-out form to indicate the keywords which interest the user. This is matched against the database, and a list of clickable document titles is returned.
The problem with spiders is that they are too intrusive and take too long to run. One of the most well-known spider databases is currently running 5 months out of date. An alternative approach is to provide a shared database which users can voluntarily populate with information about their sites. This has been provided by the so-called Virtual Library project [97], which advertises a single URL corresponding to a fill-out form which can be used to register documents according to an evolving classification system. The main drawback with this approach is that it is a voluntary scheme, relying on authors providing information about their documents, and as such is not particularly well used.
The conceptual problem with all of these navigation services is the same as the general Web navigation problem: in order to use these services a user has to know that they exist. In order to discover their existence they must probably read about them from an alternative (broadcast or multicast) information source.
An alternative approach to large-scale `hypertext', as seen in the Wide Area Information Server (WAIS) products from Thinking Machines, is to do away with fixed links, and to maintain instead a distributed registry of nodes and their attributes [20]. For the user of such a hypertext environment, link following is exchanged for "on-the-fly" database searches in node registries. In the case of WAIS, the attributes of each node stored in the registry is a complete inverted index of the node's contents; link following is supplanted by choosing documents which are considered relevant to the current node. This model is similar to that of StrathTutor, except that the node attributes are not assigned explicitly by an author but are inherent in the node's text. The relevance rating of each possible destination document is calculated according to the similarity of terms in the text of the potential destination and the current node. (It is possible to have the relevance measure performed on a subpart of the current node to focus the reader's interest.)
WAIS does not provide many of the facilities of a hypertext system since it is really an information retrieval environment. Typical GUI-based WAIS clients provide separate fields for reading documents and typing `queries', although minor modification to the front end would provide pseudo-link following by allowing queries to be expressed by selecting text from the document.
WAIS is a networked client/server system, in which the client document viewer sends a piece of text to the server. It is the server's job to analyse the text and to score the relevance of documents it has registered. It then returns a shortlist of document titles and scores to the client, which allows the user to make a choice. There are many servers available on the network which serve different document resources (most of which are subject based), but there is a chicken-and-egg problem here: in order to obtain a selection of relevant documents from a large set of documents the user must first choose a relevant server from a list of several hundred possible servers. One way of doing this is to send a proto-request to the (so-called) server-of-servers, containing a list of keywords which identify the subject area that the required server must be registered for.
Producing a document is trivial with this system: there are no links to add, nor any particular document format to attend to (WAIS is mainly used for simple textual documents). Any document (or selection from it) can be used as a source for triggering links (or relevance matches). In order to be considered as a destination for a match, a document must be submitted to a server and indexed.
The advantage of this approach is that any document can be instantly linked into the server's corpus by virtue of its textual similarities. Another advantage is that it is resilient to document addressing or organisational changes. All `links' are generated by the server and not stored by the clients. If a document is altered or removed from the server's collection then the server need only perform a re-indexing operation to maintain consistency.
This model of hypertext navigation is also seen in Salton's work [127] where a linked structure is superimposed over texts and based on the textual similarities of the texts. In Salton's work the links are explicit, whereas in WAIS they are implicit; in both cases they are automatically generated from the text contents.
Hypertext links may be authored explicitly between two items in a document corpus, however this activity requires both creative, intellectual effort from a human author in setting up the links and non-creative, intellectual effort to maintain consistency among the links as the document corpus evolves. The need to minimise the load on the author and maintainer of such an interlinked document corpus has led to various efforts at automating the linking process, leaving the responsibility with the computer to generate and maintain the hypertext. This has important knock-on effects, as it allows an already existing literature to be integrated into a hypertext environment with (in theory) minimum effort.
In a hypertext environment which requires the author to create links manually the author has to choose both the source and destination of the link and then to invoke a linking operation. This may be done in a visual fashion by directly manipulating the contents of the document items to be linked, or else indirectly by naming the ends of the link. The main concern of manual linking is to correctly identify the address of the links' endpoints within a set of documents.
Automatic linking relies on a computer being able to derive suitable places within the document corpus to act as link endpoints. This may be done as a batch task to compile a "definitive" set of links between all suitable pairs of document items within the corpus, or else as an interactive search for all suitable destination endpoints from a specific document item selected by a reader. Whichever of the two methods is chosen, the identification of suitable endpoints is performed within various domains: syntactic, lexical or semantic.
It is possible to create hypertext links by identifying various superficial textual features common to various types of written information. Technical writing may employ phrases such as "see table 3" to indicate internal cross-reference or indicate an external citation by adding it as a parenthetical comment after the information to which it refers "through the use of SGML (Barron 1990)." Different publication bodies may vary the exact representation of these "link anchors", but all will aim for consistency so that the reader may easily understand what is being indicated. This has obvious advantage when a computer is brought to bear on the text, since the links may be recognised by the use of a simple set of regular expressions without any attempt to analyse the meaning of the character strings.
cross_reference ::= see ([Ff]igure)|([Tt]able)
[0-9]*(\.[0-9]*)?
citation ::= \([A-Z][a-z]* 19[0-9][0-9]\)
Figure 4.4: egrep-style regular expressions for two types of link source.
Not every style of writing uses explicit references as above, so syntactic analysis is not a universally applicable technique, however most documents available in computer-readable form are of a technical nature, so it affords a useful first attempt.
Once the source of a link has been discovered, it is necessary to identify the link's destination. Given a reference to an internal item, it is easy to find that item from its heading or caption. However, citing a reference to an external document (external to the document, not to the corpus) implies a global naming scheme which would be too cumbersome to quote in the body of the text. Usually citations are a key to be looked up in a list of references at the end of the text. It is this bibliography which provides everything necessary to obtain the destination (given sufficient motivation).
We have seen the necessity for modern hypertext systems to be able to use existing sources of documentation if they are to be useful in a practical way. LACE extended the notion of reusability by allowing the automatic creation of links from the original (flat) document. However, LACE only dealt with links created automatically as a side-effect of a document's logical markup, i.e. based on the structure of the document, but there is a lot more information in the content of the document than the markup. This section outlines an initial study of the literature and together with the results of some preliminary experiments to extend LACE to find ways of creating automatic links based on the information-rich document content. as well as the explicit document markup.
The literature deals with two approaches to document content: classification and indexing. The approach of the former is to relate the knowledge contained in a document with the universe of knowledge, specifying how the document is both similar and dissimilar to other documents. The latter highlights useful concepts contained in a document so that a reader may gain direct access to pertinent information. The process of indexing is analogous to that of creating links for a hypertext document: the latter is making specific references from one document to another, the former is creating a table of `half-links' which are only to be resolved when the literature is being read.
Obviously classification and indexing are not independent, since the choice of the set of index terms will depend upon the subject to which document addresses itself. Classification is difficult because it requires the ability to model the information contained in the document, whereas indexing requires only the ability to identify the important words and phrases which are associated with those concepts.
It is hoped that there is a method for extracting a set of index terms from a document or set of documents which is efficient (it should not require undue computing resources), general (it should work on literature from any field) and accurate (it should not produce insignificant terms nor miss out important ones). Given a set of index terms for each document in a hypertext system, any reference to such a term in one document would be linked to all other documents which also refer to it.
1 the area to the left of an upper cut-off point which is populated by very common words
2 the area to the right of a lower cut-off point which is populated by very rare words
3 the area in the middle which contains words which make up the significant content of the document
It is assumed that overly-common words are insignificant (the, to, of...) and that overly rare words do not contribute to the content of the document. The resolving power of words (the ability to discriminate content) is supposed to reach a peak between the two limits and fall off rapidly towards those limits. However, the position of the cut-off points can only be determined by trial and error, so an alternative approach is generally used, which is to filter the list of words through a set of `fluff' or `stop' words--those words which are known to be `contentless'. After this, the resulting words are stripped of their suffixes to match the equivalent stems.
Stone & Rubinoff [125] use co-occurrence statistics to distinguish between core terms which occur in all documents throughout a given field and particular terms which discriminate subfields within that field. First an indexing vocabulary is obtained from the complete set of documents: this represents the kernel set. This set is then expanded by taking each discarded word and computing its `association' with each kernel term. If the sum of these associations is greater than some threshold then the word is added to the list of particular terms. This step is repeated several times with successively higher thresholds to find those discarded words which associate strongly with the core and particular terms.
sophistication how different the document's pattern of index terms is compared to the pattern of index terms in the universe of documents to which it is related
pertinence how the pattern of index terms compares to the pattern of terms in a user's query
They maintain that given a query and a set of documents that in some way satisfy the query, the ideal starting point is a document which in its own field contains common concepts, i.e. which is unsophisticated. Taking all the documents which match the query, rank their (sophistication, pertinence) indexes and start off in the (low sophistication, high pertinence) quadrant, working through to the (high sophistication, low pertinence) quadrant as the user expresses interest.
Nelson [107] describes how IBM required their researchers to fill out a profile, detailing in their research interests in their own technical language. This profile was used to match against the descriptions of new papers and books that IBM process on a weekly basis. The technique has obvious extensions for browsing in a hypertext system, by allowing the user to create a typical document detailing their research interests. This document can be evaluated by various of the above techniques to produce `filter' for general queries (in fact, this is the method used by many WAIS front ends, as described in section 4.2 ).
The need is apparent to increase the usefulness of an index by increasing both these figures, although indexing every occurrence of every term would be counter-productive, providing too much material with a low signal-to-noise ratio. There is a need to balance representation, where the document is fully described by its index, with discrimination where the unique features of the document are highlighted.
The documents in use for these experiments are a month's CEEFAX news bulletins. These are composed of 585 individual news items, each about a paragraph in length. Appendix 2.4.1 lists the words appearing in the items in order of their frequency. All told there were 5535 unique words (no suffix stripping was done, so plurals are counted as separate words) appearing a total of 41,628 times. Appendix 2.4 contains a graph showing the frequency of occurrence of each word plotted against its rank in the frequency table. This graph conforms to the ideal plotted mentioned in subsection 4.2.1.1 although it is a more extreme example of an hyperbola and it is not clear where the cutoff points should be drawn. (The displayed graph has been `zoomed in' on the interesting part. Zooming out to see the complete set yields a graph which hardly leaves the axes.)
Table 4.1 shows the probabilities expected for a function word as given by Bookstein, Swanson & Harter. The figure at co-ordinate (n,m) in the table is the probability that a function word will occur m times in a particular document, given that it occurs n times in all 585 news items.
The table shows that no statistical significance can be construed for words that appear less that several hundred times in the complete set of news items. Unfortunately, Appendix 2.4.1 shows that most of the words which occur so frequently are obviously function words (the, by, are, been, for...). It would appear that the individual documents are too small to be usefully treated by this method.
2975 2000 1000 500 200 100 50 10 5 1
1 .0314 .1119 .3093 .3635 .2428 .1440 .0784 .0168 .0084
.0017
2 .0799 .1914 .2644 .1553 .0415 .0123 .0033 .0001 .0000
.0000
3 .1355 .2181 .1506 .0442 .0047 .0007 .0000 .0000 .0000
.0000
4 .1723 .1864 .0643 .0094 .0004 .0000 .0000 .0000 .0000
.0000
5 .1753 .1274 .0220 .0016 .0000 .0000 .0000 .0000 .0000
.0000
6 .1486 .0726 .0062 .0002 .0000 .0000 .0000 .0000 .0000
.0000
7 .1079 .0354 .0015 .0000 .0000 .0000 .0000 .0000 .0000
.0000
8 .0686 .0151 .0003 .0000 .0000 .0000 .0000 .0000 .0000
.0000
9 .0387 .0057 .0000 .0000 .0000 .0000 .0000 .0000 .0000
.0000
10 .0197 .0019 .0000 .0000 .0000 .0000 .0000 .0000 .0000
.0000
Table 4.1: Probabilities of Distribution of Function Words
The Curtice & Jones method does not define the criterion that two words must satisfy in order to co-occur. We have used the simple definition that two words co-occur if they both occur in the same sentence. From that definition, a scattergram can be obtained by plotting Ri against Ni. Their paper [45] indicates that the resultant points are scattered around a straight line of negative slope, noting that words corresponding to good index terms are those which appear below this line, i.e. words which appear with fewer others than expected. Applying this to our experiment results in a line of equation y = -0.0015x + 9 provides a good fit. Taking words with maximum distance from the line yields the following encouraging `Top-30' list (some of the vocabulary is explained by the fact that an ambulance strike was just occurring at the time).
ambulance contaminated emergency man ship
ambulances crews
farms ms site
armed criminal feed NatWest unions
calls
Deng ferry party walked
China department German Philips
Walker
collision dispute London plant were
Microcosm is a hypermedia system that has been developed and used at Southampton University over a period of five years. It has evolved considerably over that time, but retains the fundamental model of a group of co-operating processes communicating via message passing which together supply various hypermedia environment facilities. Its main features are
* a selection-action paradigm for user interaction. Fixed link anchors (or buttons) are simply an author's predefined binding of a particular selection within a document to a particular hypertext action (such as follow link). In general, readers of a Microcosm hypertext can invoke a range of hypertext actions on arbitrary selections.
* links held externally to the documents they reference. This allows links to be made between the native documents of third-party applications, such as wordprocessors, spreadsheets, databases or CAD packages.
* a message passing framework, into which various document viewers or hypertext link servers (also known as filters) may be slotted. This framework is a circular chain in the current implementation, so a message will be received by the next application "downstream". This receiving application may process the message and block it, pass it on unchanged, or modify it in some way.
* a message format for coding information about user requests or hypertext facilities between the various components of the above framework.
* a document manager which associates document ids with file names and a set of other attributes (such as title, author, keywords, description)
In order to see how the components of Microcosm function together, consider how a link is followed. The user makes a selection in an open document in a viewer application, and then chooses the menu action "Follow Link". The application parcels the selection, its position within the document and the document's identifier into a message which is sent down the chain. A link database intercepts the message, looks up any links that correspond to that selection, and sends a message containing a specification of those links down the chain, along with the original link request message (possibly to be intercepted by further link databases). Eventually, all the link specification messages are intercepted by a dispatch filter, which presents the user with a dialog box containing descriptions of each of the applicable links. The user selects a link and the dispatch box sends a "Dispatch Link" message to the appropriate viewer. The viewer intercepts the message, opens the appropriate document and highlights the destination selection.
The user interface to link authoring is very similar to that of Intermedia: a source is selected and the user chooses "Start Link", and then a destination is selected and the user chooses "Complete Link". How this differs from Intermedia is that the link is expressed not as a simple source point to destination point relationship, but as a mapping from a source selection in a particular context to a destination selection. When the user creates the link there are options to specify the context of the source selection: either this exact place in the source document, or any place in that document, or any place in any document. This choice allows the user to create a specific, local or generic link.
A generic link, the most common link type in Microcosm hypertexts, allows the author to associate a document with any occurrence of a particular textual string in any document. At first sight this may seem to be just a text retrieval operation, however there are certain key differences. Firstly, from a practical point of view, a generic link requires no indexing of the possible destination documents, nor a searching operation on every document in the hypertext in order to satisfy the link--a generic link has none of the overheads associated with text searching. Secondly, the difference between generic links and text retrieval is the difference between intentional and non-intentional hypertexts: a link expresses an author's knowledge of a relationship between the meaning of two entities in the hypertext, whereas a text retrieval operation expresses a statistical similarity in textual features of two hypertext entities. It is possible to liken a generic link to a text retrieval operation in reverse: a generic link works for a single destination and specifies a collection of applicable sources, whereas a text retrieval operation works from a specific source and describes a collection of applicable destinations. (See section 4.3.2 for more explanation about the declarative nature of generic links.)
The significance of generic links is particularly apparent to hypertext authors. When authoring links in many other systems, the question that a hypertext author is constantly posing is "where can I go to from here?". The answer to this question is usually `destination anchor a of destination node A'. For every piece of information (node) that is added to the system the author must add links that leave it, tieing it into the hypertext corpus. In Microcosm the author's perspective is changed, and the question becomes "from whence should I be able to come here?" or "what characteristics must another node have in order to link here?". This is may simply be a reposing of the original question, with the answer being `source anchor a in source node A', but is usually far more general such that the answer is in terms of multiple sources (frequently constrained by a particular context). The significance of this becomes apparent when one considers resource-based hypertexts, where a large number of unchanging nodes is provided as an encyclopædia or anthology. By describing these `generic links' for the resource one may add any new node to the hypertext (a teacher's overview of a topic or a student's essay) and it will be already tied into the corpus by the existing link framework.
In this respect Microcosm provides an excellent framework for developing resource-based learning materials, as demonstrated in the HiDES history courses [41], or the SToMP TLTP initiative [6]. As a particular example of the advantages of this authoring paradigm, consider setting up a multiple-choice test based on material in a standard course text. In a normal environment containing only specific links between nodes, for each possible wrong link (i.e. wrong answer) a separate correcting explanation must be written for the user, recalling the material in the original sources. Using Microcosm, the question, the text of each answer and any explanations written will all automatically be linked back to the concepts in the original sources by virtue of generic links from those sources which apply to the text used in the quiz.
The previous IBM PC version of Microcosm (version 2.2) provided a limited set of generalised link facilities (specific links from a point in a particular document to a point in another particular document, local links from a selection in a particular document to a point in another particular document and generic links from a selection in any document to a point in a particular document). The latest version of Microcosm (version 3.0) provides a hierarchy of logical types which can be used to classify each document in the hypertext . This new facility allows the `genericness' of links to be expanded: instead of just three levels of contextual constraint a link could also be constrained by the class or superclass of its source document. For example, in a Computer Science hypertext application, a link between the word `PASCAL' and a glossary entry giving a short description of the programming language may be valid only in documents of type `Introduction', rather than in technically more detailed documents of type `Language Syntax'.
The flexibilty of Microcosm link sources provides its `reversed' hypertext authoring paradigm (how may other nodes be linked to the current node?). It is possible to argue that generic links have similarities with StrathTutor's linking mechanism; in setting up a generic link between a source selection and a destination node, the author is classifying the destination node or labelling it with an attribute corresponding to the selection. Following a link is then a matter of selecting the text corresponding to an attribute. This is seen in practise, since an author frequently sets up a set of generic links on a resource document by selecting the phrases which are seen as describing the contents and setting up generic links from those points to themselves. Effectively the author, using the generic link mechanism, is labelling the document with textual key words or key phrases. Thus the authoring paradigm has become declarative in nature, describing the data rather than the processes involved in document links. Microcosm is similar in this use of generic links to systems like WAIS and StrathTutor. In all these cases there are no explicit connections between source and destination documents. Instead, the destination documents, either labelled or indexed, are a set of external information resources that bind, on the fly, to specific occurrences of source document selections. Microcosm is different from these systems because these bindings are the expression of authored links more usually seen in standard hypertext model systems like Intermedia or HyperCard.
Hypertext packages are frequently difficult to author in a scalable or generic fashion which allows for expansion or economic re-use for different purposes. The links, authored for a particular purpose, are fixed inside the document content and fixed to specific destinations. Expanding a Microcosm hypertext by adding new nodes involves one of two scenarios. If the nodes are new general resources (primary materials) then a group of new generic links must be added which will retrospectively apply to the existing hypertext components. If instead they are new secondary materials (students esays or teachers commentaries on the primary materials) then they will already be affected by the existing links. In this respect the Microcosm hypertext model is incrementally scalable.
Changing the purpose of the hypertext may involve keeping the collection of nodes substantially the same, but reworking links to provide different structures of access. In many hypertext packages changing the links means rewriting the texts because the links are embedded in the texts--in Microcosm it simply means applying a new set of linkbases to the same material, in a similar way to Intermedia's use of webs. The second advantage to Microcosm is that material which is added during the `repurposing' process will be automatically affected by any retained linkbases. Since many hypertext packages provide embedded point-to-point linking (i.e. from here you can go here) they fail to offer such expandability or maintainability.
Microcosm still retains many disadvantages when it comes to maintaining a hypertext. Since the current implementation has most links expressed in terms of fixed destinations (a generic link has a flexible source) then changes made to the destination may invalidate a link. The Document Manager provides a level of indirection between document identifier and physical file name, so renaming or re-organising a collection of hypermedia resources is not harmful, but removing a component file may leave a `dangling link'. Editing the contents of a destination file is likely to lose synchronisation between the offsets held in the link databases and the document contents and so `shift' the link anchor away from the correct link endpoint. This is not a problem for link sources, unless a specific link has been used.
This `editing problem' is not unique to Microcosm: see [50] for the way in which it is apparent in HyTime, and [48] for more details about how a hypermedia environment can be constructed to minimise the effects of this problem.
This hypertext has been translated from an existing book, and has an initial structure which is based on that book. The material is split into disjoint nodes which follow the book's division into sections and subsections, although the nodes only implicitly model the hierarchical structure of the original. The hierarchy is made explicit only on the `Contents' node which contains a table of contents with buttons linking the section and subsection titles to the appropriate nodes. Generic links have been created according to the index entries which the original author has indicated.
As well as the organisational structuring and the content-based linking, summary documents (containing links to the nodes they summarise) and overview documents (outlining the educational aims and objectives of the hypertext) have been provided. These have been added as pseudo document types (actually aliases for document type TEXT), allowing the user to distinguish them from normal contentful text documents in Microcosm's standard `Open Document' dialogue box. The pseudotypes `Photograph' and `Diagram' have also been added as aliases for documents of type `BITMAP'. We can see the various structures inherent in and imposed on the hypertext:
* an organisational hierarchy accessed via the specific links on the `Contents' node
* a cross-reference web of subject-based generic links, accessible from any use of a technical term in the material
* a catalogue of available nodes listed by document type
The catalogue of nodes is an inherent part of the Microcosm system (at least of this implementation). Every document that is known to the Document Manager is available from the `Open Document' dialogue box, listed according to its type and with its (user assigned) description. Since this dialogue box is one of the most obvious features of the Microcosm user interface, and coupled with the fact that a hypertext network has no explicit starting node, users often make quite heavy use of this method of navigation in opposition to hypertext links.
The original stack consists of a number of cards accessed through an Introductory card which contained a textual overview of the topics available and a set of buttons which lead to these topics. Contents and Index cards are also available to aid navigation. Version 2.0 was a faithful copy of this stack, with one document per card, and with no generic links, only buttons (i.e. highlighted specific links) navigating across the hypertext. The buttons either navigate from one topic to another (i.e. organisational structure) or provide subject-based cross-references. The only additional feature is the Microcosm catalogue of nodes which simply mirrors the Contents node in this case.
Analysis of logs taken of the student's use of the Microcosm and HyperCard versions shows that 43% of the HyperCard users interactions with the system were link following actions, rather than browsing through the table of contents or alternative navigation mechanisms. Contrasted with that figure, only 26% of Microcosm users actions were link following [73]. One of the reasons given for this is that the Microcosm catalogue is always conveniently available, whereas the Contents node must be explicitly requested. It is interesting and salutory to note that, as far as this group of inexperienced hypertext users were concerned, the most natural way to use a Microcosm hypertext is not through the linking facilities.
The second Microcosm Cell Motility application (version 2.2) is much more interesting in terms of its use of Microcosm features. It is structured as follows:
* the original nodes are retained.
* the original buttons (specific links) are retained. These are now know as the Tutorial Links.
* a set of Reference Links have been added. These are generic links to 30 of the most significant original nodes from appropriate key phrases.
* a Biology Dictionary has been added. This is a set of documents containing definitions for 3,000 biological terms along with the generic links which link into those dictionary documents.
* a Quiz has been added. This is a single Toolbook document which implements a set of multiple choice questions. It consists of a number of questions each with a set of possible answers and with each possible answer an associated short (up to 256 character) explanation of why it is correct or not.
Each of the above components is completely modular and has its own separate linkbase (the same Biology Dictionary is used in a number of different applications). The quiz deserves further elaboration: it has no links of its own, but it is subject to the links from the other resouces. When the user chooses one of the multiple choice answers then a dialogue box appears with the associated explanation and a button labelled `Further Explanation' which causes Microcosm to look for all the links which apply to any parts of the text in the explanation.
The application is structured into a set of Reference Works (a physics textbook, a databook, a glossary, a set of biographies and a bibliography) and a set of Teaching Documents (Tutorial Documents, also known as Scripts, and Activity Documents, usually ToolBook simulations or question and answer sessions). The reference works are mainly linked to by generic links from appropriate text selections in the tutorial works, except for the bibliography which is linked to by specific links from the tutorials. The textbook also contains a large number of specific links to its sections from the various tutorials, but the databook is mainly intended for browsing.
The tutorial works are the main teaching mechanism and are closely bound to their related activities by specific links. Some generic links are used to lead users into particular teaching sessions on particular subjects, but the tutorial sessions have, as the project has developed, become largely sequential. Each teaching document is a long textual document, intended to be followed from beginning to end, and each document contains specific links to the next and previous teaching documents. (These links are over the words "Next" and "Previous" and so the the order of teaching delivery has not been tied to the document content and can be customised by applying a different linkbase.) A decision was made to use no specific links between teaching documents, to protect students from the navigational confusion of jumping from topic to topic.
Much use is made of the logical type facility of Microcosm 3.0 for providing a classification hierarchy as a navigation mechanism for the students. Each major teaching theme also has a graphical map which indicates the units which should be followed, the order in which they come and the way they are grouped into sub-themes.
The authors of the project material have found the structure of the hypertext has varied as the facilities of the viewing application have evolved. Originally each teaching document had many associated annotation documents, but after the viewer implemented popup windows for marginalia these documents were incorporated into the appropriate `master' document. This has pruned the classification hierarchy considerably, and substantially reduced the total number of documents.
This hypertext bears many similarities to the previous two examples: it contains a suite of reference information which is accessed by a network of generic links. It also is subject to a classification imposed by the document manager. Where is differs is in the information-rich tutorial documents. Although all the subject-based information can be found in the reference works, the tutorials are more than a simple set of directed walks through the reference resources. For this reason they have general links leading into them, but no elaborate cross-referencing between them which would hinder the process objectives of making students understand a particular topic.
It is apparent from these informal studies that the document manager plays a major role in providing static structure and dynamic navigation around a Microcosm hypertext, especially with the latest facilities for assigning logical types to documents. The document manager is providing a simple classification system somewhat like the use of generic links described above but separate from the hypertext link facilities and the document contents. This fixed classification is appealing because it gives the user the illusion of an absolute frame of reference with which to orient themselves. However, the real power of Microcosm lies in its generic links which in turn supports the concept of generic and reusable authoring.
The Dexter Hypertext Reference Model [62] is a general model for hypertext, and often used to compare specific hypertext systems against. The Dexter model divides a hypertext system into three layers: the storage layer which is used to hold the node, link and anchor components of the network, the runtime layer which deals with user interaction mechanisms and the within components layer which addresses the content and internal structures of individual nodes. A link is a combination of specifiers; each specifier consists of a component specification (which resolves to a single node in the hypertext) and an anchor composed of an identifier (unique within each node) and an (undefined) mechanism for pointing into the node's (opaque) contents.
In some measure Microcosm is isomorphic to the Dexter model: the storage of nodes and links is separated and held by the host file system. The presentation of nodes and links and the user's interaction is controlled by the viewing applications and can be changed according to the user's requirements. The within components layer is equally vague in Dexter and Microcosm, where the interpretation of the content of nodes is left to the viewing applications. Both Microcosm and Dexter require an opaque handle into the node contents in order to specify the position of a link anchor; the handle may be a numerical offset from the start of the file, a hierarchical tree position, a two-dimensional co-ordinate in an image, or any required measurement. Where Dexter and Microcosm part company is in the relationship between links and nodes: each end of a Dexter link resolves (perhaps by a rule) to a single node, but a Microcosm link may resolve to arbitrarily many nodes. Because of this disparity it is difficult to descibe Microcosm in terms of the Dexter model.
We have seen that Microcosm provides a declarative authoring model but that link following is usually explained procedurally, in terms of the flow of messages and the actions of each individual Microcosm process. Especially because of the dynamic nature of the Microcosm environment, where viewers and filters are added or removed at will, it has been difficult to describe `link following' formally. This section shows that both link creation and link following can be described declaratively, and gives a simple declarative model (expressed in Prolog) which demonstrates all the features of Microcosm.
The declarative model has repercussions for the end-users of the system--by encouraging authors to think according to a declarative paradigm makes their task easier and allows greater extensibilty. A similar comparison can be made between two commercial font rendering schemes: Adobe's Type 1 and Microsoft's TrueType. The former consists of a character outline and `hints' about the important or potentially problematic regions of each character shape. The latter consists of a program with instructions to render the shape of each character. Each TrueType character has to contain explicit instructions to draw itself at every device resolution, and the expertise to maintain a good-looking character shape on low-resolution devices such as computer displays. The advantage of Type 1 is that it is easier to code individual character shapes: all the intelligence is in the (external) font rendering engine, and as font rendering technology improves the same set of outlines and hints can be drawn with better quality. The secret of Type 1 success is the ability to come up with a general set of hints that apply to all kinds of character shapes under all kinds of conditions. By analogy, a declarative approach to hypertext links that provides a general set of mechanisms for expressing relationships between nodes, may allow an improved link engine to provide a `better' set of destinations sites from a given source.
relates([documentA, offset1, "selection"], [documentB, offsetB,
"other"]).
relates([documentA, offset2, "foo"], [documentC, offsetZ,
"bar"]).
Creating a link requires the user to make selections "Point A" and "Point B". The system then stores the attributes of these selections in a linkbase. The following Prolog fragment assumes for simplicity that there is only one linkbase, and that it is stored in memory, not written to a file in permanent storage.
createlink([SrcDoc,SrcOffset,SrcSelection],[DestDoc,DestOffset,DestSelection):-
assert(relates([SrcDoc,SrcOffset,SrcSelection],
[DestDoc,DestOffset,DestSelection])).
Following a specific link from "Point A" is equivalent to gathering all the relevant attributes of "Point A" (i.e. [documentA, offsetA, selectionA]) and evaluating the following Prolog fragment to find point B:
relates([documentA, offsetA, selectionA], [WhichDoc, WhichOffset, WhichSel]).
Following a local link from "Point A" is similar to the above, except that the following Prolog fragment is elaborated instead:
relates([documentA, _, selectionA], [WhichDoc, WhichOffset, WhichSel]).
Following a generic link from "Point A" is also similar to the above, except that the following Prolog fragment is elaborated instead:
relates([_, _, selectionA], [WhichDoc, WhichOffset, WhichSel]).
(In this case the three different kinds of link are obtained by the application of three different rules to the linkbase. This is also seen in the current Microcosm implementations: document, offset and selection data are stored for all three kinds of links, but an extra field is used to distinguish between the different link types. We shall pick up on this distinguishing information at a later stage.)
Having located the destination of the link, this must now be displayed to the user by the document dispatcher. Most of the facilities that the dispatcher makes use of are beyond the scope of this model (running a program, opening a document and making a selection), but are shown in the following Prolog fragment.
dispatchlink([DestDoc,DestOffset,DestSel]):-
typeofdocument(DestDoc,Type), applicationof(Type,App), run(App,DestDoc),
makeselection(App,DestDoc,DestOffset,DestSel).
Assembling all these fragments yields the following simple schema for link following.
followlink(Source,Destination):-
findlink(Source,Destination),
dispatchlink(Destination).
findlink(Source,Destination):-
specific(Source,Destination);
local(Source,Destination);
generic(Source,Destination).
dispatchlink([DestDoc,DestOff,DestSel]):-
typeofdocument(DestDoc,Type), applicationof(Type,App),
run(App,DestDoc),
makeselection(App,DestDoc,DestOffset,DestSel).
specific([SrcDoc,SrcOff,SrcSel],[DestDoc,DestOff,DestSel]):-
relates([SrcDoc,SrcOff,SrcSel],[DestDoc,DestOff,DestSel]).
local([SrcDoc,SrcOff,SrcSel],[DestDoc,DestOff,DestSel]):-
relates([SrcDoc,_,SrcSel],[DestDoc,DestOff,DestSel]).
generic([SrcDoc,SrcOff,SrcSel],[DestDoc,DestOff,DestSel]):-
relates([_,_,SrcSel],[DestDoc,DestOff,DestSel]).
Purely declarative frameworks have no side effects, but Microcosm makes judicious use of side effects to keep track of a user's session history. Adding the following definition (a pseudo-dispatcher which just saves its arguments to a file) to the above schema allows us to keep such a history.
dispatchlink(Destination):- tell(`History'), print(Destination), nl, told, fail.
The three link following actions shown above are really just `plug and play' semantics--they are the most widely used semantics for link following, but are by no means the only ones. Another standard Microcosm link facility is the so-called "Computed Link" which does text-retrieval operations based upon the selected text. Its definition would be similar in form to that of the generic link, i.e. ignoring document and offset information. (A more sophisticated version of this facility may make use of the selection's semantic context and so take into account all these details.) Any mix of these (and other) features may be chosen at the start of the session, so that it is impossible to tell in advance how link following will be accomplished. In fact any feature may be added or removed at any time during the session, so it is impossible to guarantee that a particular link (relationship between two document points) will be available under all circumstances. In short, Microcosm semantics are dynamically added to the system, rather than being a static feature of it. To accomodate this flexibility within the model, let us allow the link following predicates (or link resolvers) to be defined dynamically, for example, reading their names from an initialisation file, and asserting this list of names as a fact in the Prolog knowledge base.
startup:-see(`Resolvers'), read(Resolvers), seen, assert(resolvers(Resolvers)).
Now let us rewrite the findlink predicate to take a particular resolution function name as a parameter, rather than explicitly including each function as an alternative.
findlink(Resolve, Source, Dest):- Resolve(Source, Dest).
And consequently, let us make followlink use all the resolvers in turn, storing all the link destinations in a list. A variant of dispatchlink must now be used to handle multiple link destinations.
followlink(Source, Destination):- resolvers(ResList),
findall(Dest,
(on(Res, ResList), findlink(Res, Source, Dest)),
DestList),
dispatchlink(DestList).
dispatchlink(ListofDests):-
ListofDests=[[_,_,_] | _], % check we have a list of dests here...
askuser(`Which links?', ListofDests, ActualDests),
map(dispatchlink,ActualDests).
Now we have given three alternative link dispatchers: one which acts on a single destination, one which acts on multiple destinations, and the history pseudo-dispatcher. We could make the definition of followlink more symmetrical by allowing a dynamic list of dispatchers, mirroring the definitions for the list of resolvers.
If we wish to increase the range of available link types, how should extra information be held in the link base, and how should a new resolver be coded? For example, if a new kind of link were implemented which only applied to source documents of a given type, then either all the work could be accomplished in the resolver
srctypelink([SrcDoc,SrcOff,SrcSel],[DestDoc,DestOff,DestSel]):-
relates([SrcDoc,SrcOff,SrcSel], [DestDoc,DestOff,DestSel]),
typeof(SrcDoc,
'Introduction').
or all the necessary information could be coded in the link itself
relates([Doc, _, 'PASCAL'], [gloss, 1000, 'PASCAL']):-
typeOf(Doc,
'Introduction').
Examining the user interface for link creation in the PC Microcosm implementations shows that in creating a generic or local link, the user actually defines a specific link along with a rule for generalising it. An alternative view is that the user gives an example of the generic link, from which the other links can be deduced. Either way, the user expresses a specific relationship which is an instantiation of a general rule , and this reflected in the entry in the linkbase which codes all of the information about the link (even the source offset and source nodes which are redundant for a generic link) along with an indication of the link type. Hence we could expand the linkbase entries as follows:
relates(generic, [doc21, 128, 'Internet'], [glossdoc, 1500,
'Networks']).
relates(specific, [doc34, 100, 'TeX'], [glossdoc, 2000,
'Typesetting']).
relates(srctype, [overviewdoc, 345, 'PASCAL'], [glossdoc,
90, 'PASCAL']).
The last of these examples may be a way to code the `generic link constrained by source document type' proposed earlier. The addition of a link type to a link is analogous to the addition of a hint to a Type 1 font: a link can always be deduced from the data in the linkbase entry (i.e. the specific example provided by the user), but if a suitably intelligent resolver is present it can use the link type to generalise a new set of links from this example. Any set of new link types will probably need to access more information about the link context than simply the document id, the selection and its offset within the document. Other information, such as the document type, or the document's description, keywords or even contents can be obtained from the document manager by the link resolver, but here we choose to explicitly store the information in the link base in order to make the declarative model more transparent.
relates(srctype, [overviewdoc, 345, 'PASCAL', introtype, "Languages
Overview "],
[glossdoc, 1000, 'PASCAL', referencetype, "Languages
Glossary"]).
The accompanying resolver for srctype links could be defined as follows:
srctypelink([SrcDoc, SrcOffset, SrcSel, SrcType, SrcDesc], Dest):-
relates(srctype, [_, _, SrcSel, SrcType, _ ], Dest).
The model now gives a mechanism for describing flexible generalisations on link sources, but can it also provide generalisations on link destinations (such as the `Computed Links' text retrieval mechanism in PC Microcosm) since resolvers only produce one destination from a link? In fact this is taken care of by followlink's use of findall which not only tries each resolver in turn, but also retries each resolver until it produces no more destinations. Hence computed links may be expressed as follows in the linkbase:
relates(computed, [overviewdoc, 345, 'PASCAL', introtype, "Languages
Overview "],
[glossdoc, 1000, 'PASCAL', referencetype, "Languages
Glossary"]).
and could be implemented by the following resolver (where the definition of the hypothetical grep is not included here):
computedlink([SrcDoc, SrcOffset, SrcSel, SrcType, SrcDesc],
[DestDoc, DestOffset, DestSel, DestType, DestDesc]):-
relates(computed,
[_, _, SrcSel, _, _ ], [_, _, _, _, _]),
grep(SrcSel, globalindex,
[DestDoc, DestOffset, DestSel, DestType, DestDesc]).
In this model every link must be labelled with a type name which can be recognised by a resolver. In the same way that an improved font rendering algorithm may produce better character shapes from the same Type 1 description, improvements to the resolvers may produce more, or more relevant, sets of destinations from the same links. For example, the generic links resolver could be enhanced to take into account variations in spelling or the use of homonyms in the selected text. Similarly, the above srctypelink resolver may select documents not only with the given type, but also whose description or keyword attributes include the type name.
By working in a declarative environment, it is possible to expand the link type in the linkbase entries from a pure label into an expression which is itself a representation of the relationship. For example, the following linkbase entry represents the (rather pointless) link between any word beginning with the letter `a' and a particular destination:
relates( string2list(SrcSel,[a|_]),
[mysrcdoc, 1234, "apple", 'text',
"pointless"],
[dictdoc, 514, "Alpha", 'text', "dictionary of
letters"]).
where SrcDoc, SrcOffset, SrcSel, SrcType and SrcDesc along with the corresponding Desc- forms are Prolog variables which will be instantiated before the `type' is evaluated.
This experiment was undertaken, by the author, as a demonstration in conjunction with Oxford University's Elektra project: (a study of 17th and 18th century women's literature). The texts were keyed in using a specific SGML document type based on the Text Encoding Initiative DTD [30], which was subsequently parsed by sgmls, an SGML parser. The intermediate output from the parser was processed by a UNIX awk script into an RTF file (for display by Word for Windows) and a linkbase (for use by Microcosm). The structure of each of the texts was similar: a title page, a table of contents and a sequence of chapters. Each chapter was divided into pages, paragraphs and lines (all strictly recorded in the markup so as to give a visual authenticity to the electronic reproduction). The translation process added whatever hypertext navigation and automatic link creation was possible, however since the explicit structure of each text was simpler than the technical reports that were used in LACE (the chapters are not subdivided into sections, and there are no explicit cross references or citations) this was mainly limited to linking each element of the table of contents to the matching chapter. The translation process also added links from the title page, table of contents and first page of text to grahics files which contain images of the corresponding page in the original printed document. A link was also made from any page in the document to biographical information about the author. These links were specified not by the document's structure, but agreed as standards for the project.
Figure 4.5a: An Elektra document under Microcosm
This translation process was specific not only to the DTD but also to this particular application. The TEI DTD is very general, and so formatting and linking decision were arrived at according to yield a specific style which was close to one particular original document. Some of the document markup allowed specifications of physical representation (e.g. the emphasis element had a representation attribute which could be used to code a specific font or style name to use to provide the emphasis) but the majority was chosen in the translation. Alternative measures such as the use of SGML's LINK specification to refer to style sheets may have been more appropriate for a more thorough experiment.
<!doctype ota>
<ota>
<text>
...
<pb n="19">but he taught them to be cruel while
<lb>he tormented them: the consequence
<lb>was, that they neglected him when he
<lb>was old and feeble; and he died in a
<lb>ditch.
<p>You may now go and feed your
<lb>birds, and tie some of the straggling
<lb>flowers round the garden sticks. After
<lb>dinner, if the weather continues fine,
<lb>we will walk to the wood, and I will
<lb>shew you the hole in the lime-stone
<lb>mountain (a mountain whose bowels,
<lb>as we call them, are lime-stones) in
<lb>which poor crazy Robin and his dog
<lb>lived.
</div>
<div type='chapter' n ='3' id=CH3>
<head>CHAP. III.
<lb><hi rend ='small italic'>The treatment of
animals—The story of
<lb>crazy Robin—The man confined in
<lb>the Bastille.</hi>
</head>
<p>In the afternoon the children bounded
<lb>over the short grass of the common,
</div>
</body>
</text>
</ota>
Figure 4.5b: SGML markup for Elektra document
Although this translation successfully allows Microcosm access to SGML documents, it is less sophisticated than the similar LACE process. There is no resultant collection of individual subnodes, nor can any part of the document be referenced externally. Instead, source and destination anchors are created inside the (linear) text as gotobuttons and bookmarks as described in section A1.11. Links that occur as a result of the translation of the document's internal structure are therefore not even handled by the Micocosm link engine, but by the word-processor itself. The user is free to make further links which are processed by Microcosm. See figures 4.5a and 4.5b for the markup of an Elektra document together with its display under Microcosm.
Link fossilisation is a significant disadvantage of WWW and occurs because link specifications have to be published as part of the document and cannot be changed without revising the document. Since links refer to their destination anchors via a specific machine name and path name then any change to the position of the destination requires every source document which refers to it to be changed--once published a document can never be moved or deleted. Although this is not an insurmountable problem in a locally controlled context, WWW used as a world-wide publishing mechanism assumes that every document is forever associated with its published address.
Dead ends frequently occur in WWW because only native WWW documents can have embedded links. If traversing a link leads to a foreign document being displayed by a foreign application (an RTF file displayed by Word) then no WWW links may be followed from it.
Microcosm does not suffer from these problems. Dead ends do not occur because almost any program can be used as a Microcosm viewer for many different kinds of data: links can be followed not only between text and graphic files, but between wordprocessed documents (Microsoft Word), design documents (AutoCAD), spreadsheets (Excel), databases (SuperBase), video documents (AVI) and simulations (SuperCard). Links do not get fossilised because they are not embedded in the documents to which they refer and because they represent rules for linking sets of documents together, rather than specific hardwired document references. What Microcosm does lack is the ability to access documents distributed across machines, but that facility of the WWW can easily be `plugged into' Microcosm by allowing a URL to be used in place of a Microcosm document id, as outlined below.
* Allow WWW files to be accessed by constructing a Microcosm filter which intercepts "Dispatch Link" messages from the Link Dispatch dialogue box. If the document is local to the machine, send the message on unchanged. If, on the other hand, the document to be opened is specified by a URL, check to see if it has already been downloaded into a local cache directory. If not, request the document from the appropriate network server and write it to a local file. If appropriate use an SGML parser to translate the HTML into RTF for viewing in Word for Windows. Once the (possibly translated) file exists on the local disk, send a new Microcosm message asking for it to be dispatched instead of the remote URL.
* Build a filter that allows links to be made to or from a WWW document. This filter should come in front of the MakeLink filter, intercepting any messages which indicate links to be made to files in the cache directory. It should translate the local file name back into the original URL and emit a corrected Make.Link message for the real link maker to pick up and add to the appropriate linkbase.
* Build a filter that allows links to be followed from a WWW document. This is a simple filter that reacts to Follow.Link messages with a URL as the SourceSelection by outputting a Dispatch.Link message with the DestDocument set to the same URL.
These modifications allow Microcosm to display HTML and other WWW files, follow WWW-type links in WWW files, create and follow Microcosm links to and from WWW files and create and follow WWW links in Microcosm files. These modifications are not an extension to or deviation from the standard version of Microcosm, rather they constitute a different configuration of the standard Microcosm framework to deal with a new information service. (The reason behind using Word as a Web viewer, in a similar fashion to the TEI viewer of the Elektra project descibed above, is because the standard Web viewers do not allow selections to be made, so making them difficult to use with Microcosm.) This mixture of facilities has the following effects on the users of both systems:
Microcosm readers normally have access to two kinds of material: task-neutral resources (such as dictionaries or literary anthologies) and task-specific resources (comments, essays, questions). Now navigation is not limited to the local environment, but extends to external non-task-specific resources.
Microcosm authors have the same improvements of navigation as for readers. However, this places a heavier burden on the author who is acting as a teacher or trainer, since it is his or her responsibility to be acquainted with the (constantly growing) set of resources which can be made available to the readers. Microcosm documents (and whole document collections) can now be made globally available, as can the Microcosm linkbases.
WWW readers are freed from the tyranny of the button: in order to access a piece of information on the Web it is necessary either to know its address or to be able to find a document that contains a link which references it. In an environment which has no alternative methods of navigation (e.g. a hierarchical structure) this can cause considerable problems, especially if documents are revised [65]. Although a problem in a localised hypertext environment, this is especially significant in a global, unco-ordinated information system. Using Microcosm's `generic links' the reader should be able to select any relevant text as a link to the required information.
WWW authors have greater freedom in the authoring process: instead of providing explicit buttons for navigation to every relevant piece of material, generic links can be used to provide standard services across a whole domain of information.
The above modifications are currently being made available for Microcosm. The author has currently produced the software to retrieve a WWW document and translate it for viewing under Word for Windows.
A practical problem with this approach has been the conversion between HTML and RTF. Although HTML has been cast in terms of an SGML document type definition, HTML documents are seldom used with a full SGML parser to verify them. Common usage frequently breaks the strict definition of the DTD, and so means that most user-authored HTML documents cannot be correctly parsed by an SGML-based process. For this reason the actual DTD used here is more relaxed than the DTD distributed by the WWW development team.
The advantage that WWW brings to Microcosm is access to a global hyperbase, but the advantages that Microcosm bring to WWW are flexible authoring of links and the application of links to an increased range of information media.
The major features of Microcosm are the selection and action link-following paradigm, external linkbases and the message passing framework. WWW provides both a message-passing framework: messages (in the form of URLs) are sent by a client viewing application via HTTP to a WWW server and a document (in HTML format) is received back. A client which wants to obtain Microcosm link services can then express its link request message in URL format and send it via HTTP to a Web server. The server can invoke a process which mimics the action of the Microcosm linkbase filters, and which sends back a (in HTML format) list of destination documents that were matched by the linkbase. That document would be displayed to the user as an equivalent of Microcosm's Link Dispatch dialogue box, allowing the user to choose from among the available destinations by clicking on the HTML buttons which describe them.
An essential pre-requisite for Microcosm generic links is the ability to make arbitrary selections within documents, not just to click on predefined buttons. This is accomplished in the PC environment by turning the wordprocessor Word for Windows into a WWW browser (as explained in the previous section) since the current version of the PC Mosaic viewer does not allow the user to make selections. Mosaic under the UNIX X Window environment does allow the user to make selections, and so a simple application called `Microcosm Lite' has been written which presents the user with a single button labelled `Follow Link'. When this button is pressed the application grabs the current selection (whether from the Mosaic viewer, or any arbitrary window) and turns it into a URL which Mosaic then sends to the linkbase server.
Figure 4.6: Microcosm Lite in use
In this way, whichever application the selection was made in, the link request and destination display is made by Mosaic (figure 4.6 shows a selection and the resultant set of links returned from the linkbase). Because of the current limitations of Microcosm Lite, only the selection and the type of the source doument's application is sent to the linkbase, unless the selection came from Mosaic in which case the source document name (or URL) and source document description (or title) are also available.
The link server and `Follow Link' message are specified by a URL as follows:
http://host.site/htbin/linkbase?userID+srcSel+srcOffset+srcDoc+srcType+srcDec
and is received by the linkbase program htbin/linkbase on the machine host.site. The program matches the selection against the links it maintains, and responds with an HTML document containing a list of the destination documents. The link server can also accept an extended message format to create a new link in the link database (link type and destination data are added).
The link server should encapsulate as much of the Microcosm behaviour as possible in terms of configurable links databases. Since individual user sessions are not maintained, the link server in fact provides a static four-tier hierarchy of link bases: the first associated with the specific source file in which the selection was made, the second associated with the collection of resources to which that file belongs (i.e. a linkbase for all the documents in the current directory), the third associated with the site (i.e. a per linkserver linkbase) and lastly a private linkbase associated with the user who sent the request. The link server (a simple UNIX shell script in the current implementation) searches each of the linkbases for relevant links in turn. In order to provide these Microcosm link services, a Web server must have the linkbase shell script available. In order for a user to make use of the Microcosm link services he or she must install the simple `Microcosm Lite' application.
This chapter has analysed a number of approaches to Wide Area information access. WAIS provides a text-retrieval method of navigation to large, remote information resources. WWW provides a simple hypertext method of navigation around a distributed set of document clusters. Microcosm provides a local method of information access which is easy to scale to larger, distributed environments and which provides a degree of robustness in a dynamic corpus of documents, while retaining the advantages of authored links over purely statistical measures of document similarity. By applying Microcosm methods and authoring practises to the WWW environment it is to be hoped the Web will become more robust and easier to author generally useful resources for.
LACE used a structured document architecture as a mechanism for expressing local coherence and a means of representing complex lexias which, although individually hyperdocuments, were collectively stored as a hyperbase. Lace-92 extended Lace by providing a mechanism for creating these locally coherent, structured lexias from information gleaned from a global hyperbase. In this chapter we look at Lace '93, a method of expanding coherence beyond the confines of a single local document, allowing authored intent to be applied with a global scope and so provide hyperdocument facilities.
One feature of the Web, or presumably of any global hypertext, is the mixture of co-operative and autonomous components of its use. The organisation of each Web site is independent of any of the others, but the details of this organisation, and summaries of the available data are shared co-operatively with other organisations, and frequently published by `key' sites to the benefit of everyone. In contrast, the authoring of the documents at each site is typically performed in isolation, and without reference to the documents available elsewhere on the network. A co-operative effort may be involved within a site to make its collection of documents coherent, but this breaks down at the larger scale and is not exhibited between sites. In other words, according to Furuta and Stott, at a certain scale the Web ceases to be a hyperdocument (no authored intent and no coherence) and becomes a hyperbase.
The point at which this transformation occurs is the point at which the Web becomes difficult for a reader to use. According to [135], a key feature of a hyperbase is the need to supplement link following with data querying as an information discovery strategy. However data query implies the ability to rapidly enumerate the nodes of the hypertext, a facility not present in the current implementation of the Web. Since information is highly distributed throughout the many sites that compose the Web, any topic-based task becomes all but impossible if the reader is not acquainted with a set of likely sites to start investigating. (The synthesis of Microcosm link facilities into the Web, explained in section 4.3.5, is a useful bridge between the hyperdocument and hyperbase states in that it provides authored links that act like content keyword queries.)
The exact scale at which the transformation between hyperdocument and hyperbase occurs is not fixed: it is certainly possible to store a collection of mainly unrelated articles as a single resource (or even a single document): this is a hyperbase at a very localised scale. Conversely, it should be possible to author a document which draws together information in resources across the global network, from widely diverse sites: this is a hyperdocument at the global scale. Let us refer to the latter as a coherence document, since it is a document which expresses an authored coherence between the contents of many separate resources, and it adds coherence to the hypertext network in which it features.
In the figure 5.1, the first network is a classic `well-connected' hypertext, with each node `near to' any other node. This kind of network is frequently seen as the result of a planned authorship activity. A `coherence document' simply provides an alternative viewpoint of the network information to the view seen from any other node. The second network is partioned into disjoint subnets, and the coherence document provides an original and genuinely summative and cohesive view of the network contents. So a coherence document can provide a useful function in a network which does not already exhibit a high degree of coherence.
Figure 5.1: Coherence Documents![]()
The coherence document provides yet another private coherent viewpoint on an already well-connected network.The coherence document provides a unique private coherent viewpoint on a partitioned network. Since the underlying network is not already well-connected, it acts to `glue' together the information there, and hence as a form of global coherence.
Catalogue nodes already provide simple lists of other places to go for organisational navigation; the coherence that the coherence document supplies should be in the subject domain: collating, comparing and contrasting the contents of other documents. It is post-hoc, added to the network as an afterthought (literally) and it must increase the structure inherent in the network without constraining the network.
In the above diagrams the coherence document is shown as being `different from' the network and without any links going to it, but the document must be placed somewhere within the document network and assigned a URL otherwise it could never be accessed. In this sense it does not providing an organising view on the network, imposed on the network from above; rather it is a participant in the network and subject to the rules of the network. As a consequence, the coherence document is just `another document', but expressed as the result of a particular authoring strategy. To make this document accessible to all interested parties it would be necessary to publicise its existence; since it is concerned with a single subject domain it may be necessary to provide a single, well-known catalogue URL which links to all the coherence documents.
Compare this with the authoring strategy of Theseus, in which the elements of the hyperbase are strictly isolated (no inter-component links) and the subject documents (coherence documents) are imposed onto and separate from the hyperbase.
A different type of document needs to be adopted to express global coherence: one which provides both a view on the network by making promiscuous reference to other material, but also `grounds' these references within a coherent framework. This requires more than making passing acknowledgement to related texts, but expounding the reference and explaining its context. This kind of model can been seen in older forms of technical literature as documented in [110]. Here a wider variety of intertextual mechanisms are used than in current technical literature--as well as citations, use of titles, letters, personal and historical narratives is seen to signal intertextual content. The purpose of these enhanced referential features is to indicate the relevance and importance of the new work in an environmrnt where a scientific text is seen as intelligible only in the context of an existing body of texts.
So a coherence document may be seen as a weaving of internal and external ideas, local and transcluded paragraphs, according to a particular rhetorical form. Moulthrop comments "discourse on hypertext could be conceived not as a series of discrete presentations but as contributions to an ongoing conversation" ([104]), so instead of a rhetoric of technical or scientific writing which compels authors to express their own thoughts, ideas and conclusions in isolation with brief reference to the work of others, we can postlate a rhetoric which encourages the inclusion of other writings with copious comment and annotation. Such inclusion leads the reader back to the orginal work, allowing them to see the same information in a new context. Here we also see at work Landow's claim that in an electronic book "the boundaries of the text become permeable" ([87]).
The WWW project is the only example of a large-scale co-operative hypertext environment that currently exists and we have seen that common use of the Web makes it incoherent and highly partitioned. Yet to some extent this is not due to the fundamental features of the Web architecture: its documents can reference arbitrary resources, and the author's document model allows structured arguments to be expressed. However, the author's rhetorical model, which encourages brief references to external works is reinforced by the Web's native document structure, which provides a link anchor to act as a button for the reader to activate a new document. The anchor is typically intended to be only a few words long since it is highlighted to stand out from the main text. Anchors may be annotated to record the relationship between the link's source and the destination, but this feature is not used in practise. Even transclusions of external material are provided, but then only as a mechanism for embedding graphical material for viewing.
By contrast, we have seen in section 2.2.3 that documents can be composed of objects, and that each object may have various attributes to defines its intended use. This leads to a new document model: that of a document as a view on a collection of objects. The objects may be contained in a single file, spread across several files, or even shared between several host computers. The view defines how the document is treated--how it is to be collated and composed from its set of component objects.
At first sight this new model seems overly complicated, but consider as an example a business report marked up in SGML: it may consist of list-of-contents elements, index elements, glossary elements, revision history elements, security elements, chapter elements, heading elements, paragraph elements and text data. An SGML DTD may define the parse structure of this file, but what does the document actually consist of? What is the current information contained in it? What information is viewable given the security clearance of the current session? In short, faced with a collection of objects with a complex set of relationships between them, where does one start in order to simply elaborate the meaning of the document? The answer to this is that the application which processes the document is responsible for untangling the network of objects and recognising their interdependent semantics. In the SGML world objects have attributes which identify them and the use to which they are put. An element's tag name can be seen as a particular case of an attribute which is also used in parsing the document according to an external grammar. This grammar may not reflect the meaning of the document in any normal sense--it may only indicate the way that a document can be expressed as a linear stream of text. The meaning of the document is derived by an application that understands the document, and may involve extracting particular objects from the document based on their attributes. Making sense of a typical document may involve the following two simple steps:
a Find the highest priority content object (e.g. out of the possible sections, subsections, chapters, parts, books, or volume structures).
b Elaborate the inline text content of this object and recursively any sub-objects that it contains. Also interpret the relationships between this object and any other objects (footnotes, glossaries, index entries, tables of contents, cross-references, marginalia) and display the related objects or an indication of the relationship if deemed necessary.
This process may become more elaborate according to the meta-information that is stored with the document. For example, revision histories and security information could add extra levels of checking for inclusion or exclusion of particular objects. Increasingly, the lexical structure of a file or entity is not clearly mapped onto the content structure of a document. (This tendency has been exacerbated by HyTime's hyperlinks as we shall see: footnotes, index terms, annotations and even whole sections and chapters may exist as independent entities, joined only by a ilink element in an independent part of the document.) In short, the SGML markup presents a container architecture for the document. In such a sitiuation it is the responsibility of the controlling application to understand and make use of the contained contents by means of the relationships between the linked items.
A number of standards are emerging from industry and internation bodies which are of particular relevance to document architectures. Many of them also move away from the simple assumption that document [[equivalence]] file by defining an object-based architecure which may be implemented as part of a file's contents. Although de jure standards are not usually welcomed as a popular computing activity, they are of particular importance when dealing with the production of a global information resource. Such a goal (which was one of the original aims of hypertext [108]) requires a collosal amount of investment from the producers of this information, and longevity of the product will be one of the main requirements to protect this investment. Crane [43], arguing that our contributions to this global information resource should remain part of the public record for an indefinite period, maintains that we should immediately move to a common interchange standard to be able to fully share information and functionality, and then allow this standard to evolve as the problems of large-scale hypermedia become better understood. Brown [25] expands on this by arguing that any hypertext source format should be text based and geared towards sharing not just between different hypertext systems, but between other software tools as well. The rest of this section will take a brief look at three commercial document standards (OLE2, OpenDoc and Acrobat), an evolving academic standard (HTML) and two international standards (MHEG, and HyperODA). The section finishes with a longer description of the HyTime standard and some examples of its use.
Both OpenDoc and OLE2 provide a mechanism for simple embedded objects, a concept often demonstrated by having a spreadsheet included inside a word-processor document, and both of these technologies emphasise the component nature of documents and applications. Although it currently looks as if a `word-processor' document has objects from foreign applications permitted to exist inside its boundaries, these models anticipate a set of co-operating data objects, operated on by a set of co-operating software components. In such an environment there are no native word-processor documents, spreadsheets or databases: instead there is a collection of data objects which can be operated on by various software components inside a unifying data structure.
OpenDoc, OLE2 and PDF make use of an object-centered architecture, but still associate the document's contents with a single file. PDF's objects have publicly-defined contents which must be read in order to understand the composition relationships that build the document. By contrast, OLE2 defines a well-understood hierarchy of containment which can be used without understanding the format of the component data streams. OpenDoc devolves the responsibility of composing the document to the controlling application, providing neither a publicly-understood containment mechanism, nor publicly-understood object formats.
HyTime presents a significant step away from the notion of a document as a file by building on SGML's concept of a document as a group of entities. A HyTime document has an explicit hub which is the central defining element of the document's contents and from which the contained objects are linked via a well-defined set of relationships. The aim of HyTime is to preserve information about the scheduling and interconnection of related components of a hypermedia document (e.g. audio, music score and libretto in a CDROM version of an opera) that would otherwise be embedded inside application-specific `scripts'.
When SGML was proposed as a standard it was becoming more commonplace for authors to exchange individual documents electronically and the requirement was for a common medium for expressing these documents. In recent years the development of international networks (such as the InterNet) has enabled sharing on a wider scale with repositories of documents and multimedia information being set up across continents. One of the important needs is to be able to tie these information resources together, linking to or citing other works published on a remote server. Many common applications do now provide hypertext facilities, enabling the linking of information. However most of them do this as a product of an internal scripting language: the links are hidden and exist as a consequence of the execution of a program rather than explicitly declared data objects making it difficult to exchange the data between applications.
HyTime markup can express important information about documents: about their structure and the way they should be presented. This information is added value--it allows a document to be reused and interchanged between systems for many purposes and as such is an economic consideration. The benefits of generalised markup (as exemplified by SGML) for representing document structure are increasingly appreciated, especially in commercial and military organisations which have to deal with large volumes of information. Projects such as the Oxford English Dictionary [119] illustrate the benefits of this approach both for the production of different versions of the dictionary in printed form and for the production of a CDROM-based version with advanced searching capabilities.
HyTime is a methodology for describing document features and the relationships between different parts of documents, but it does not prescribe the meaning of these features or relationships. It uses terms like `hyperlinking' without defining what happens when a link is followed, or even how a link is activated. HyTime is not a system that can be executed to display multimedia documents and jump between document objects using hyperlinks. An application would need HyTime added-value to interpret a HyTime-compliant document and render it for display. It is in fact anticipated that the main use of HyTime will be for encoding documents for interchange between various proprietary systems, and although the HyTime standard provides various facilities to speed up native rendering of a HyTime document, HyTime is not necessarily the most suitable format for coding multimedia material.
Although HyTime is used to mark up hypermedia documents in conjunction with SGML it is not a single document architecture (i.e. it is not a DTD). Early versions of the draft standard did in fact define a HyTime DTD, but this was abandoned as being too restrictive. Instead HyTime is often referred to as a meta-DTD since it provides a set of standard components (or `architectural forms') which can be used to construct document architectures. As such, HyTime defines a (very large) family of document architectures, and rules for constructing their DTDs.
HyTime is both abstract and specific: it provides abstractions of facilities that are useful in building document architectures, but is very specific about how these abstract facilities must be coded. HyTime constructs are expressed as combinations of SGML elements and attributes which have to be interpreted by a HyTime engine subsequent to their parsing as SGML elements. Although such a HyTime engine may appear to play the role of a post-processor for SGML files, a more co-operative role is needed, since the SGML parser may be required to provide access to any objects in external entities which the HyTime engine needs to interpret. In fact, both HyTime and SGML processing engines are likely to be components of a larger document handling environment.
HyTime is primarily concerned with documenting the relationships between different parts of documents. SGML already has facilities for making references between elements of a document: elements may be labelled with an id attribute and then referred to by that label in another element's idref attribute. This facility can be used to implement cross-references, hypertext jumps, object class systems, style sheets or many other constructs; however it is quite restrictive for a number of reasons. Firstly, only whole elements may be addressed, and so document objects are rigidly defined with quite a coarse granularity--it is not possible to quote a reference to a relevant fragment of a paragraph. Secondly, every element which is to be addressed must be explicitly labelled (conversely, only elements which the author has bothered to label may be addressed). This is not a worrying restriction to the originator of a document, who is free to make whatever labelling additions may be desired, but an author who is trying to `link in' to an existing work (a standard reference resource such as a dictionary, or a seminal academic paper) may have great problems expressing an arbitrary link using just an idref. The third problem with SGML idrefs is that they may only refer to labels within the same document. This makes linking to external reference works impossible without including them in their entirety through an entity reference.
Thus in order to allow flexible linking of documents, one of HyTime's major functions is to extend SGML's object addressing model. Object addresses may be constructed from a combination of sub-addressing techniques, starting from a well-known object, such as an SGML named external entity or a previously labelled SGML element (or HyTime object). From such a starting place it is possible to repeatedly narrow down the address by taking a linear offset from one of the ends of the object, or by specifying a hierarchical position within a tree-structured object. Object addresses (or a part of an object's address) may also be specified as the result of a query on the various properties of the document (its structure or data content). This flexible addressing mechanism may be used, for example, to allow a literature student to refer to a specific word or phrase buried inside a paragraph of a read-only document that is not even marked up in SGML.
HyTime also provides some facilities for describing hypertext links, providing a standard for representing marked up links. Links can be contextual (i.e. embedded in the document at one of the anchor points) or independent (occuring at a position in the document which is unrelated to any of the objects that it links). A link can have multiple end points, with a role assigned to each endpoint and a rules controlling traversal between the endpoints.
HyTime is a modular standard, with the document designer free to choose only those facilities which will be needed. The base module is always required and provides facilities using SGML constructs for object representation and addressing, as well as miscellaneous facilities for other HyTime modules. The measurement module defines the concept of addressing document objects according to a measurement along some abstract dimension (for example words 3 to 27 could be a measurement within a paragraph). Various standard units are defined for familiar temporal and spatial measurements. The location address module allows reference to be made to document objects which cannot be addressed with the normal SGML facilities of the base module: these objects can be referenced by name, position or query. A location ladder can be built up of gradually more and more specific location addresses (e.g. the draft chapter's fourth heading's third word's second letter). The hyperlinks module provides methods for representing link objects (based on the various object addressing and representation methods provided above) and the semantics associated with traversing the link. The scheduling module provides events which are objects positioned within a multi-dimensional co-ordinate space. The rendition module provides ways for describing the modifications that can be made to an object within an event and the ways that events can be projected from one co-ordinate system into another.
For example, in order to produce an index a list of terms has to be decided on, and then all the relevant occurrences of each term (or its synonyms) must be referenced in the text. Each index entry catalogues the relationship between its term and a number of occurrences in the document and can thus be modelled by a hyperlink (despite its name a hyperlink does not necessarily have anything to do with hypertext, only the connection of two document objects). A hyperlink encodes a connection between several document objects called `anchors' of the link, and assigns a `role' to each of the anchors. For an entry in an index there could be two anchors--the term to be indexed and the set of its occurrences within the document.
Figure 5.2a shows such an index entry. It connects a `term' element to an `occurrences' element whose instantiations have ids `t1' and `o1' respectively. Both elements are declared to occur inside the indexentry element. Indexentry is not part of HyTime, it is simply defined in the DTD (as shown in Figure 5.2b) with HyTime standard attributes. It is the HyTime attribute that identifies the indexentry as being an example of an independent link (ilink) to the HyTime engine. The HyTime engine can then handle the value of the linkends attribute to find the various anchors for the text processing application to use. (More likely the value of the anchroles attribute would be fixed in the DTD and so not given in the document instance itself.)
In text processing environments, index terms are frequently given special markup in the body of the text. If this is the case, HyTime may locate the term's use by referring to the markup's id. If this is not the case, or the indexer does not have write access to the document's text, then HyTime may locate the index entries by using a dataloc (data location) element. A dataloc element identifies an anonymous span of data within another named object (called the location source, or locsrc, perhaps an element with an id or a named entity ) by giving an offset from one end of that object and an extent. For example, if this section (entitled `Text Processing') had been marked up with an id of textp, the following examples of a dataloc element could address the word `production', either by counting characters or words from the start of the section. (The dimlist element treats its numbers as a measurement along an abstract dimension, in this case the data content of a section element.)
<dataloc locsrc=textp quantum=str><dimlist>45 10</></dataloc>
or <dataloc locsrc=textp quantum=word><dimlist>8 1</></dataloc>
<indexentry anchrole="term occurrences" linkends="t1 o1">
<term id=t1>multimedia
<occurrences id=o1>oc1 oc2 oc3</>
</indexentry>
Figure 5.2a: An entry in an index
<!ELEMENT indexentry - - (term, occurrences?)>
<!ATTLIST indexentry HyTime NAME #FIXED ilink
anchrole NAMES #REQUIRED
linkends IDS #REQUIRED>
Figure 5.2b: Defining an IndexEntry construct in the DTD
<!ATTLIST occurrences HyTime NAME #FIXED nmlist
nametype NAME #FIXED element>
Figure 5.2c: Defining an Occurrences construct in the DTD
Since each term appears numerous times within the document the `occurrences' anchor is a HyTime multloc or multiple location, which consists of a list of ids, each resolving eventually (perhaps indirectly through a dataloc) to a word in the document.
By use of the HyTime-based indexentry document structure given above, we have enabled the document designer to express connections between a specific document object (here a piece of text) and numerous places in the document. This allows the index to refer not just to occurrences of a particular word, but to whole paragraphs of text, or pictures and diagrams. It is the responsibility of the index creator to decide how to represent each of these connections.
HyTime can be used to represent such a course syllabus by using a finite co-ordinate system (fcs) to represent a timeline, and then mapping each component of the course onto the appropriate position on that timeline.
<!ELEMENT semester - - (courseschedule)+ >
<!ATTLIST semester HyTime NAME #FIXED fcs
axisdefs NAME #FIXED timeaxis>
<!ELEMENT courseschedule - - (lecture)+ >
<!ATTLIST courseschedule
HyTime NAME #FIXED evsched>
<!ELEMENT lecture - - (content)+ >
<!ATTLIST lecture HyTime NAME #FIXED event
exspec IDREFS #REQUIRED>
<!ELEMENT duration - O (#PCDATA)
-- LexModel(snzi, s+, snzi) -->
<!ATTLIST duration HyTime NAME #FIXED extlist
id ID #REQUIRED>
<!ELEMENT content - O (#PCDATA)>
<!ATTLIST content HyTime NAME #FIXED nmlist>
Figure 5.3a: Defining a Timeline in a DTD
<semester><courseschedule>
<lecture exspec=single>
<content>chap1</>
<lecture exspec=dbl>
<content>chap3 chap4 sect6</>
<lecture exspec=single2>
<content>chap2</>
</courseschedule></semester>
<duration id=single>26 1</>
<duration id=dbl>37 2</>
<duration id=single2>78 1</>
Figures 5.3a and 5.3b show the definition and use of such a timeline. In figure 5.3b we see that a semester contains a course schedule which contains a number of lectures, each of which contains a set of contents and refers to a duration for the lecture. The contents themselves are references to the contents of a text book, perhaps indirectly through a dataloc. Figure 5.3a shows how this is defined using HyTime's constructs. The semester is an example of a finite co-ordinate system whose axes are defined by a timeaxis structure (not shown here). In fact there is just one axis here (the time axis) which would be measured in `teaching blocks' for convenience. The courseschedules which it contains are examples of HyTime's event schedules. Each schedule contains many events (lectures in this example) which tie a document object (the content elements) to a position and extent in the co-ordinate system (place the content along the time axis).
The purpose of the duration elements (HyTime extlist) is to specify the start and extent of the event in the units of the co-ordinate system. This example uses particularly opaque measurements, so to make it more useful to a human it would be better to project the events in this co-ordinate system onto a natural calendar by using the event projector facility of the rendition module.
Microcosm [5] is an open hypermedia system developed at the University of Southampton. One of its chief features is that no information concerning links is held in documents; instead all link information is held in external linkbases which contain the required details about the source and destination anchors of the links. It comprises independent components (document viewers and link managers) which communicate by passing messages. Working in such an open environment means that the system response may be sub-optimal and so hypertexts developed in Microcosm may be translated to a cut-down but optimised delivery environment (such as Microsoft Help). One of the major problems inherent in such a translation is that the linking facilities of the two systems may not directly map onto each other. The rich nature of HyTime's linking capabilities make it possible to translate hypertext semantics into a HyTime representation without loss of information and it is therefore useful to use HyTime to form an intermediate representation (a kind of `Rich Hypertext Format') as a midway stage in mapping between two hypertext systems. The translation process then divides into a sub-process that converts a native Microcosm dataset into a HyTime-based representation, and then further translation process to convert (possibly a subset of) this HyTime representation into another hypermedia format [6].
\DocID history.intro \Offset 246 \Selection Mihailovich
Figure 5.4a: Microcosm Address Tuple
<nameloc id=histDoc>
<namelist nametype="entity">history.intro</></>
<dataloc id=mihail quantum=str locsrc=histDoc>
<dimspec>246
11</dimspec></>
Figure 5.4b: Address Tuple as a HyTime Location Ladder
The most common Microcosm addressing mechanism is the (document id, offset, extent) tuple. The Microcosm address specification tuple in figure 5.4a references a string of (implicit length) eleven characters starting at character offset 246 of a document whose id is history.intro. It could be expressed as the two-stage HyTime `location ladder' in figure 5.4b, in which the first (nameloc) element associates an SGML id histDoc with the document, and the second (dataloc) element locates the string within the identified document. Any reference to the name mihail will now resolve to the requested object.
HyTime links may have more than two anchors, and the document designer has to provide semantics for each of the anchors. By contrast, Microcosm links have only two anchors (source and destination), but a destination anchor may be composed of many documents' objects (the equivalent of a HyTime multiple location). HyTime links can take two forms--contextual links, whose definitions appear at one of the sites of the link anchors (i.e. in context), and independent links, whose definitions are given at some other place in the hyperdocument. Microcosm links are always of the latter type, since link definitions are stored in separate linkbases, referring to their anchor positions through the addressing mechanisms above.
A Microcosm linkbase can now be modelled as a collection of HyTime independent links:
<mcmlink anchrole="source destination" linkends="srcid dstid"
endterms="linkdisp1 linkdisp2">
where the multiple destination may be specified as a simple list of destinations as follows:
<nameloc id="dstid"><namelist nametype=element>
destid1
destid2 destid3</></>
This example is similar to the index example given previously, except that the information given by the link endterms is intended to specify how the link source and destination are to be portrayed--here the source is formatted as a button and provides a short preview of each component of the multiple destination. This is achieved using elements of the following form:
<displayinfo id="linkdisp1"> <anchorformat>button</></>
<displayinfo id="linkdisp2"> <anchorformat>normaltext</></>
which are referred to by references to their unique identifier (id) within the mcmlink element.
A Microcosm link may completely specify its source anchor (in terms of document, offset and content) in which case it is known as a specific link. But by leaving the offset or document unspecified the content acts as a source anchor for this link anywhere that it appears in any document. This is a generic link which no longer contains explicit connections to a source document location.
HyTime makes provision for locations to be specified as the result of a query performed on the content or structure of a document, defining a standard query notation (HyQ) for this purpose and it is possible to express the source locations of a generic link with such a query. This can be done by replacing the explicit dimension specification (dimspecs) in figure 5.4b above with an axis marker query which represents a matching operation against the required texts. Any query notation (e.g. regular expression searches) is allowed in this context. For specific links, the source specification srcid resolves (through a dataloc) to a single location. For generic links, srcid resolves to a multiple location through a query which returns a dataloc for each occurrence of a particular piece of text, where the query domain is either a single document (local link) or the entire hyperdocument (generic link).
Figure 5.5: Lace '93 environment
In this section we present a distributed, structured, object-centered multimedia document architecture (Lace '93) which addresses the issues of document models and authoring rhetoric to provide global hyperdocuments, allowing authors to apply coherence to the lexias which compose an existing hypertext network. It does this in a number of aspects:
* providing a document architecture which allows a document to be represented as a collection of components, both local and remote
* providing explicit relationships between the components
* promoting a style of authorship (a rhetoric) which encourages the merging of local and remote components
* defining a viewer which can display the components and their relationships
A Lace '93 document is a particular view on a set of objects. It is implemented as a file containing a set of objects (or object specifications), a set of relationships between the objects, and (possibly) a set of local definitions for the implementation of the relationships. A Lace '93 environment, therefore, consists of an object manager, a relationship manager and a display manager.
file reference: the contents of the object are the name of a file on the local host's file system
WWW reference: the contents of the object are the Universal Resource Locator of a document available via the World-Wide Web
ruler reference: based on HyTime's data location facilities, the contents of the object are interpreteted as offset measurements from the ends of another object.
The ruler reference is used to define an object as a subpart of another object. The measurements are either of the form
m n the object consists of the n characters starting from the m'th character from the start of another object (these semantics are borrowed from HyTime's dimspec facility)
m -n the object starts at the m'th character and continues to the n'th character from the end of another object (these semantics are borrowed from HyTime's dimspec facility)
/first/ /second/
the object starts at the first occurrence of the
character string /first/ and continues to the next occurrence of
/second/ in another object. If the first character of the second
string is the caret (^) then the selection finishes immediately before the
second match. This is a convenience and not based on a HyTime facility.
Since Lace '93 is a distributed document architecture not all the document's objects may be contained in the same file, or even on the same machine. In the extreme case (most likely in resource-based publishing) all the objects may be held externally with only the object relationships in the Lace '93 file.
PDF defines certain `implicit' object relationships, namely
a document is composed a set of pages of a page contains a set of components a page requires a set of font objects a page is described a thumbnail byLace '93 will allow variant instantiations of a component based on physical rendering criteria (what resolution can the display provide?) or abstract rendering criteria (what are we trying to communicate to this student?). This would be enabled by allowing new relationships, such as
a text object translates into another text object French an image previews a hires photo a PostScript renders an SGML object objectObviously these new kinds of relationships cannot be predefined by an International Standard, but must be allowed to be defined on a per-document-type basis. For this reason the object relationships must be labelled with an identifying `type' so that the displaying application knows how to treat the objects, and, starting at the root of the document, knows how to render the collection of objects as a whole. This kind of relationship facility is provided for in HyTime by the hyperlink, a construct which despite its name does not necessarily have anything to do with hypertext. A hyperlink simply associates a group of objects together, ascribing `roles' to each of the objects. A group of images could be tied together by a hyperlink with anchor roles "hires medres lowres caption".
Obviously there must be some complicity between the document type designer and the application designer so that the application can treat each object relationship appropriately. Some default rules for the treatment of unrecognised relationships would be necessary, such as `ignore the related objects' or `treat the relationship as equivalent to contains' or `treat the relationship as equivalent to contains-only-the-first-object'.
Let us adopt a notation for expressing the relationships between document objects: rel1(obj1, obj2, ..., objn) expresses a relationship, rel1, between the objects obj1 to objn. The following relationships may be useful for Acrobat
previews(thumbnail1,page1) a page's thumbnail preview object contains(page1,text1,text2,photo1) the objects on a page required(font1,text1,text7,text9) resources required for renderingwhereas the following extended relationships may be useful in Lace '93
abstracts(text1,page3) a text object summarises the information on a page image(photo1, image2, bitmap3) three different image formats of the same data revision(text1, text2, text3) three different versions of the same piece of textThe meaning of the first three relationships are built into Acrobat, whereas the latter three are not. Lace '93 makes it possible to use arbitrary relationships, but requires some method of defining the meaning of those relationships. It is would be possible to produce an extended set of relationships and "hard-wire" them into Lace '93, but it is preferable to allow the relationships to be defined on a per-document-type basis. This is similar to SGML's approach which abdicates responsibility of interpreting the meaning of the document to the relevant application while making as much of the content semantics as explicit as possible. The problem then is how to adequately express the semantics of the relationships in an open fashion.
Microcosm's hypertext model allows the declaration and manipulation of arbitrary relationships between document items and so may provide an ideal engine for elaborating the relationships in Lace '93 documents. It has been demonstrated previously (section 4.3.2) that Microcosm's declarative link model caters for arbitrary link relationships, allowing either labelled relationships (such as the above revision, image or abstracts) or relationships referred to by an explicit specification of their semantics. However, in that model the relationships are specified mainly in terms of the object addresses. What is required by Lace '93 is not only a mechanism of expressing a fixed relationship between varied objects, but a mechanism for expressing flexible relationships between fixed objects. Here we are leaving the world of relationships between static document attributes and entering the world of relationships which depend on dynamic, runtime attributes of the system.
Microcosm services are usually invoked to follow a single link from one complete document to another, but what is proposed here is that Microcosm services are invoked en masse, in batch from a document's hub to build up the necessary view of the document, by resolving the links to the document components.
Let us take as an example some real but simple object relationships and examine their semantics. Firstly, a document contains a set of pages. This containment relationship implies that the super-object consists of a sequential elaboration of a set of sub-objects. To render the whole document it would be necessary to construct each subobject in order, however in an interactive application it is likely that the sub-objects would only be evaluated upon instruction from the user. Compare that containment relationship with a page contains a set of text and image objects. Here the subobjects must be evaluated (in any order) to produce the super-object. A text object requires a font--this relationship implies that another object (not a subobject) must be previously elaborated in order to display the first object. An object is an alternative rendering of another object--this relationship implies that if a given object, required for display by a superobject, is unavailable or unsuitable for use, then an alternative object may be substituted in its place. This is a very general relationship and covers the case of alternative image resolutions, duplicate object copies available for network services and different renderings of a piece of textual information. The alternative object would normally be ignored, unless some special criterion was fulfilled (e.g. is this object's server down?). An object summarises another object--this relationship implies that a given object contains a compact rendition of the information in another object. It could be used to provide abstracts for documents and document parts, or captions for figures and tables. The summary object may very well be ignored unless asked for by the user. An object explains another object. This relationship is similar to the above, but gives a more, rather than less, detailed explanantion.
Of these relationships, we can see three principal types of relationship: object containing other objects, objects being alternatives for other objects and objects provividing extra information about other objects. These relationships are prototypes, or superclasses of the actual relationships that are intended and may be represented using the HyTime device of architectural forms. Using this device, each relationships, expressed as an SGML tag, has a #FIXED atribute which labels its supertype (e.g. attribute lace93 may have value contains) as well as other attributes which are used to define further relationship semantics. When the document is interpreted, each relationship element would have its lace93 attribute inspected to determine how to treat it.
In the current prototype, only the contains and extra supertypes have been implemented to allow rudimentary document composition. A number of slightly different containment relationships have been implemented, such as requires, where the contained objects need to be elaborated before the first object and includes, where the contained objects are elaborated after the original object. The extra objects may be elaborated in various ways, depending on the viewing application, document semantics and user's preference. Either the extra information may be incorporated into the document itself (perhaps as a marginal paragraph or with some visual highlighting to separate it from the contained text) or a reference to the material may be included in the form of a button or a menu choice.
The current prototype also does not yet use Microcosm link services for implementing the inter-object relationships.
Here are the steps which are followed in order to render a Lace '93 document (an example of such a document and the resulting HTML document is given in Appendix 2.3).
* Parse the document object specifications and the object relationships.
* Find objects which have a relationship of supertype contains with respect to the root object.
* Render each of these objects (by recursively looking for contained objects) into the Display Manager native format and also each of the objects that are extra to these objects (this step is not recursive if the objects are not to be contained in the document).
* Send the composed document to the viewing application (in this case Mosaic).
The dynamically composed document should contain hypertext facilities (e.g. buttons) to lead the reader back to the original objects from which it was composed.
If the general understanding of the nature of a document changes, then this affects the use and production of documents, and the way that information is disseminated. If the unit of information dissemination is no longer a complete magnum opus but a finer-grained object, then information can be shared more effectively, references may be more precise, and the goal of effective information re-use is more achievable. So one of the barriers to a more coherent view on a global hypertext network is the implementation of documents as files, because it hinders effective sharing and reuse of information resources. It also causes the problem of chunking, or splitting information into a set of nodes (see section 1.2.2), where each node (like a file) is well-defined with inviolable boundaries.
One of the problems inherent in the Web is that it promotes the old relationship document [[equivalence]] file. A single, complete file is viewed as an entity, information is authored and presented in terms of these entities, and links are made between information elements embedded in these entities. In order to express coherence it is necessary to make finer distinctions in the information to be presented. It is also necessary to present the information in a more immediate fashion than the traditional "click here to see this" paradigm which both discourages the reader [146] and increases their disorientation [27].
Thus the Lace '93 document architecture achieves the goal of providing global coherence by allowing hypertext authors to work with information components rather than information containers: units of information which can be reused and represented in new contexts and for different purposes. This is of particular significance for the users of a working environment which is becoming increasingly distributed (because of the Internet) and whose components are being shared to an unprecedented degree (because of the WWW).
Lace displays the isomorphism of text and hypertext, providing a mechanism for expressing local coherence through complex structured lexias.
Lace-92 provides a mechanism for creating locally coherent, complex, structured lexias from the contents of a global hyperbase.
Lace-93 implements true hyperdocuments: a way of applying a coherent, authored view to a global hyperbase of components.
The previous chapter of this thesis argues for the redefinition of the fundamental nature of a document as a view on a set of distributed objects. This is of particular significance since documents are increasingly being defined in terms of objects, and objects are being managed increasingly in a distributed context [120].
Other pieces of the author's work also support this conclusion: the work on the World-Wide Web in section 4.1, 4.3.4 and 4.3.5 shows up the failings of current document technology to produce a coherent distributed document environment; the work with Microcosm shows the usefulness of generic link specifications in producing flexible structures for hypertexts (section 4.3) and defines a formal specification of Microcosm link semantics (section 4.3.2) that can be used as the basis of a link engine for the proposed distributed document architecture.
There is however a definite anti-structure debate for the `cutting edge' of hypertext use. Moulthrop [102] argues that attempts (such as the above) to coerce hypertexts to behave like printed texts wrongly constrain the medium when it should be acting firstly as an adjunct to print (allowing authors to experiment with a dynamic text) and then as an independent deconstructive literary medium. It also seems ironic that the mechanisms for expressing logical document structure which in Lace are used to completely specify the semantics of a document are used in Lace '93 to describe a less well-prescribed document semantics based on the dynamic determination of object relationships.
The use of structure as a document construction tool is one area which seems well worth following up. Lace '92 based its information retrieval tools on the then-developing WAIS service before the WWW project gained its enormous popularity. By providing a single addressing scheme for many current information services (HTTP, FTP, USENET news, WAIS) it is possible to reference almost every document held online on the "information superhighway", but there is little support for any task other than browsing and so to extend the domain of Lace '92 to include the Web would be to not only add functionality to a user's Web interface, but also to increase the number of documents which are written for the Web, and which are linked into its global literature.
Extending Microcosm to support WWW access (described in section 4.3.4), provides chaperoned access to the Web: documents are automatically imported by the user's document manager and are classified for future reference according to subject material, session time and current task. Making Lace `92 into a filter for Microcosm could define an authoring agent capable of keeping track of the various authoring tasks assigned to an individual (write a lecture on object-oriented databases, a paper on hypermedia standards, a literature review of CSCW) and automatically records documents as relevant sources when the user browses them.
The hyperfind script has already been run on the URL of a known WWW catalogue (the WWW sites list maintained by the National Centre for Supercomputing Applications at the University of Urbana-Champagne in Illinois) and on the URLs of several major Web sites (Cern in Switzerland and JNT in the UK). This exercise has currently produced a list of some 12,000 nodes at 600 sites.
Large nodes (more than a few kilobytes) are likely to be complete documents rather than small chunks of information. Single-lexia documents should be nodes with a high degree of internal structuring and should have a relatively high proportion of markup. The minimal markup required for a document (to frame its contents) comes to about 100 bytes. If a document has a low markup percentage it could be due to the fact that it contains minimal markup. However, the markup required for adding headings of different levels does not add a significant volume to the node. It is the link markup which adds a significant amount since the URLs (about 50 bytes long on average) are coded as markup attributes, not document content.
If the figures for markup size and link size are very close, then it is likely that most of the markup is being used to code links. This is frequently seen in catalog nodes which simply exist to point to other documents. It is also seen in documents generated as directory listings: they contain a title and one link for each file in the directory. In these cases the number of links will usually be quite high.
Nodes with a large proportion of links are probably catalog or hub nodes which exist only to point to other nodes and may be independent of the `content-bearing' network nodes. Nodes with a smaller proportion of links may just contain cross-references to nodes with related content.
From the node metrics it is possible to examine the connectivity of each node--how many nodes does it link to and (less conclusively) how many nodes link to it? Are the linked nodes within the same resource, within the same site or organisation, or world-wide?
The purpose of this work is to test the hypothesis that there a not a truly world-wide web, but a world-wide collection of local webs based on the hierarchical structures of the underlying services on which the Web is implemented.
The author has postulated that the declarative Microcosm model is useful for elaborating the relationships between the objects, and that the relationship section of a Lace '93 document is in fact a private `linkbase' which acts upon the private `docuverse' which is the set of objects which may compose the document. This is the topic of further work as it is not yet clear how:
links can be specified in terms of dynamic session properties rather than static document or object properties
Microcosm flexible link semantics can be usefully combined with user-defined link relationships
In order to keep track of the document objects it is necessary to employ some form of distributed object manager which is different from the common file managers. The object manager should keep track of objects, and allow access to them by various means (id and property queries) but also allow a flexible approach to objects (not a once-for-all partitioning of a file into fixed objects). There are a number of commercial or academic projects which provide object-centred services, which may be of use in this context. CORBA [111] concerns the way in which objects can be interfaced to one another by an Object Request Broker, but it does not clearly define a database for storing objects. It also places heavy emphasis on software objects which embody computation activities, whereas Lace '93 objects are dumb (computational non-active) document components. PCTE, Portable Common Tools Environment [143], does define an object base, but the individual objects can only be accessed by following links from other objects, instead of by querying their attributes. Perhaps the most likely candidate is a Persistent Object Manager (the kernel of every OODBMS) which is used to provide basic object storage and retrieval services [90].
As well as the document model and format which has been described as making up Lace '93, it is necessary to consider an author's contribution to writing coherent documents. As with all SGML and HyTime uses, it is quite probable that the author will not directly manipulate the tagged document format which will be hidden by the user interface. Work needs to be done on the constructs that may be included in the author's conceptual model , e.g.
transclusion The author's interface for specifying the containment relationships.
span links Links which are not so abbreviated as a short button, perhaps displayed in the margin, perhaps consisting of several paragraphs of text
relationships The author's interface to using the relationships between the local and remote objects which make up the document.
As well as considering the author's interface to the document model, work needs to be done on the author's rhetoric: i.e. the kind of statements that are useful to make in this environment and the style of writing and intertextual references that are encouraged (see [110]). Part of this is shown in the experience of the Microcosm project constructing generic, reusable document resources, but some of it is specific to Lace '93--the possibility of using not only straight transclusions but (as the viewing application becomes more sophisticated) marginalia, annotations, commentaries, expansions and other rhetorical devices.
In the early days of computing, a document was a sheaf of punched cards. Of course, it was dropped it wasn't a document any more, yet despite this disadvantage the view of document as hardware has had a lasting influence on our view of how computers should implement documents. As punched cards became less common, documents lost their physical status and became files stored on magnetic media, but the legacy of the punched card was seen in the constraints put on these files: every line in the document was limited to a width of 80 characters and the document became a sequence of lines in a text editor.
Sequences of lines became sequences of paragraphs, ASCII text turned into variable width, multi-font, scalable character glyphs, sequential files are now turning into `structured storages' with complex internal organisation, but still a document is considered a bounded data object: something with a metaphorical elastic band strapped around it to keep its contents in. Against this environment hypertext has stood apart, offering exotic display services and connectivity features which cannot be reconciled to documents with impermeable boundaries.
This thesis reviews the way that hypertexts and documents have been constructed, and argues for a re-evaluation of the way we represent and compose computer-augmented documentation which unbinds information from monolithic storage units--documents and files are not synonymous. Both text and hypertext documents can be expressed by the relationships between their (potentially distributed) information components; such an explicit model will finally allow the distinction between text and hypertext to be abandoned.
[2] Angerstein, P., `Summary of the Document Style Semantics and Specification Language (DSSSL), Draft International Standard 10179', International Standards Organisation Document ISO/IEC JTC1/SC18/WG8 N1427
[3] Apple Computer Inc., `OpenDoc: Shaping Tomorrow's Software', BYTE, February 1994
[4] Apple Computer Inc., `OpenDoc: Shaping Tomorrow's Software', White Paper. Available by anonymous FTP from cil.org at opendoc-interest/OD-overview.rtf (1933)
[5] Apple Computer Inc., Macintosh HyperCard User's Guide, Apple Computer Inc
[6] Bacon, R.A., `STOMP: Software Teaching of Modular Physics', Proceedings of the International Conference on Physics Computing, Legano, 1994.
[7] Barron D., `Why use SGML?', Electronic Publishing: Origination, Dissemination & Design, 2(1), 3-24, (1989)
[8] Barron D., Rees M., `Text Processing and Typesetting with UNIX', Addison Wesley (1987)
[9] Bechtel B., `Inside Macintosh as Hypertext', in [120], 312-323
[10] Begeman, M., Conklin, J., `The right tool for the job', Byte, 13 (10), 255-267, (1988)
[11] Benest I, `A HyperText System with Controlled Hype', HyperText II Conference Paper
[12] Berners-Lee TJ, Cailliau R, Groff J-F, "The World-Wide Web", Computer Networks and ISDN Systems, 24(4-5), 454-459.
[13] Berners-Lee, T., `Hypertext Markup Language (HTML): A Representation of Textual Information and MetaInformation for Retrieval and Interchange', Internet Draft. Available by anonymous FTP from info.cern.ch at /pub/www/doc/html-spec.txt (1993).
[14] Berners-Lee, T., `Hypertext Transfer Protocol (HTTP): A Stateless Search, Retrieve and Manipulation Protocol', Internet Draft. Available by anonymous FTP from info.cern.ch at /pub/www/doc/http -spec.txt (1993).
[15] Berners-Lee, T., `Uniform Resource Locators (URL): A Unifying Syntax for the Expression of Names and addresses of Objects on the Network', Internet Draft. Available by anonymous FTP from info.cern.ch at /pub/www/doc/url-spec.txt (1993).
[16] Bookstein, A and Swanson, DR `Probabilistic models for automatic indexing', Journal of the American Society for Information Science, 25, 312-318, (1974)
[17] Bornstein J., Riley V., `Hypertext Interchange Format--Discussion and Format Specification', Proceedings of the Hypertext Standardization Workshop Jan 16-18 1990 , National Institute of Science and Technology (Special Publication 500-178), 39-47
[18] Botafogo R., Schneiderman B., `Identifying Aggregates in Hypertext Structures', Proceedings of the 4th ACM Conference on Hypertext 1992,63-74
[19] Botafogo, R. A., Rivlin, E., Schneiderman, B., `Structural analysis of Hypertexts: Identifying Hierarchies and Useful Metrics', ACM Transactions on Office Information Systems, 10 (2), 142-180, April 1992.
[20] Bowman, C. M., Danzig P. B., Manber, U., Schwartz, M. F., `Scalable Internet Resource Discovery', Communications of the ACM,, 37(8), 98-107, ACM Press, 1994.
[21] Brailsford D, Adobe's Acrobat--the Electronic Document Catalyst, Computer Science Technical Report, Nottingham University, UK
[22] Brown H., `Editing Structured Documents--Problems and Solutions', Electronic Publishing: Origination, Dissemination & Design, 5(4), 209-216 .
[23] Brown H., `Standards For Structured Documents', British Computer Society Journal, 32(6), 505-514, (December 1989)
[24] Brown P., `Hypertext: The Way Forward', Document Manipulation and Typesetting, 183-191, Cambridge University Press 1988
[25] Brown P., `Standards for Hypertext Source files: the experience of UNIX Guide', Proceedings of the Hypertext Standardization Workshop Jan 16-18 1990 , National Institute of Science and Technology (Special Publication 500-178), 49-58
[26] Brown P., `UNIX guide: lessons from ten year's development', Proceedings of the 4th ACM Conference on Hypertext 1992,63-70
[27] Brown, P. J., `Turning Ideas into Products: The Guide System', Hypertext `87 Papers, 33-40, (November 1987)
[28] Bryan M, Standards for Text and Hypermedia Processing, Information Services and Use, 13 (1993), 93-102, IOS Press.
[29] Bryan, M., `SGML: An Authors Guide to the Standard Generalized Markup Language', Addison Wesley Publishing Company, 1988.
[30] Burnard L., Rolling your own with the TEI, Information Services and Use, 13 (1993), 141-154, IOS Press.
[31] Bush, V. `As We May Think', Atlantic Monthly, 101-108, (July 1945)
[32] Campbell, B. and Goodman J. M. `Ham: A General Purpose HyperText Abstract Machine', Communications of the ACM, 31.7, 856-861, (July 1988)
[33] Caras, GJ, `Comparison of Document Abstracts as Sources of Index Terms for Derivative Indexing by Computer', Proceedings of the American Documentation Institute Annual Meeting, 4, 157-161, (1974)
[34] Carlson, P., `The rhetoric of hypertext', Hypermedia, 2, 109-31.
[35] Carr L, `HyperCard Extensions for Multi-Media Databases', Southampton University Department of Computer Science Technical Report, 88-1
[36] Carr L, Barron D, Hall W, Why Use HyTime?, Electronic Publishing: Origination, Dissemination and Design, 2(1), 3-24 (Dec 1993)
[37] Carr L, Davis H, Hall W, Experimenting with HyTime Architectural Forms for Hypertext Interchange, Information Services and Use, 13 (1993), IOS Press.
[38] Carr, L., Rahtz, S., Hall, W., `Experiments with TeX and hyperactivity', TeX90 Conference Proceedings, 13-20, Tugboat 12(1), TeX Users Group, PO Box 9506, Providence, Rhode Island, USA.
[39] Catlin K., Garrett N., Launhardt L., `Hypermedia Templates, An Author's Tool', Proceedings of the 3rd ACM Conference on Hypertext 1991 ,147-160
[40] Cole F., Brown H., `Standards: What can Hypertext Learn form Paper Documents?', Proceedings of the Hypertext Standardization Workshop Jan 16-18 1990 , National Institute of Science and Technology (Special Publication 500-178), 59-70
[41] Colson, F., Hall, W. Multimedia Teaching with Microcosm-HiDES: Viceroy Mountbatten and the Partition of India. History and Computing 3(2), 89-98, 1991.
[42] Conklin, E. J., `Hypertext: An Introduction and Survey', IEEE Computer, 17-41, (September 1987)
[43] Crane G., `Standards for a Hypermedia Database: Diachronic vs Synchronic Concerns', Proceedings of the Hypertext Standardization Workshop Jan 16-18 1990, National Institute of Science and Technology (Special Publication 500-178), 71-81
[44] Croft W., `A Retrieval Model for incorporating Hypertext Links', Proceedings of the ACM Conference on Hypertext 1989, 213-224
[45] Curtice, RM and Jones, PE `Distributional Constraints and the Automatic Selection of an Indexing Vocabulary', Proceedings of the American Documentation Institute Annual Meeting, 4, 152-156, (1967)
[46] Database Publishing Systems Ltd, `DynaText', Product Note, Database Publishing Systems Ltd, 608, Delta Business Park, Great Western Way, Swindon, Wiltshire, UK.
[47] Davis H., Hall W., Heath I., Hill G., Wilkins R., Towards an Integrated Information Environment with Open HyperMedia Systems, Proceeding of the ACM Conference on Hypertext, ACM Press 1992.
[48] Davis. H. C., `Version Control for the Hypermedia Systems', PhD Thesis, Department of Electronics and Computer Science, University of Southampton, Southampton, UK, 1994.
[49] De Bra P., Houben G., Kornatsky Y., `An Extensible Data Model for Hyperdocuments', Proceedings of the 4th ACM Conference on Hypertext 1992,222-231
[50] DeRose, S. J., Durand, D. G., `Making Hypermedia Work: A User's Guide to HyTime', Kluwer Academic Publishers, 1994.
[51] Duncan, E. B., McAleese R. `Qualified citation indexing online?' In: National Online Meeting Proceedings--1982. Compiled by M E Williams and T Hogan. 77-85. Medford (NJ), Learned Information
[52] Duncan, E., `Structuring Knowledge Bases for Designers of Learning Materials', Hypermedia, 1 (1), Taylor Graham, 1989.
[53] Englebart, D. L. and English, W. R. `A Research Centre for Augmenting Human Intellect', AFIPS Conference Proceedings, 33.1
[54] Eysenck, M.W. & Keane, M.T. Cognitive Psychology: a Student's Handbook. Lawrence Erlbaum Associates, Hove, Sussex, 1990.
[55] Fountain A., Hall W., Heath I and Davis H, `MicroCosm: An Open Model for HyperMedia With Dynamic Linking', Southampton University Department of Computer Science Technical Report, 90-7
[56] Fre H., Stieger D., `Making Use of Hypertext Links when Retrieving Information', Proceedings of the 4th ACM Conference on Hypertext 1992,102-111
[57] Frei H., Stieger D., `Making Use of Hypertext Links when Retrieving Information', Proceedings of the 4th ACM Conference on Hypertext 1992,102-111
[58] Furuta R., `An Object-Based Taxonomy for Abstract Structure in Document Models', British Computer Society Journal, 32(6), 494-504, (December 1989)
[59] Furuta R., Plaisant C., Schneiderman B., `A Spectrum of Hypertext Constructions', Hypermedia 1(2), 179-195.
[60] Furuta R., Stotts P., `The Trellis Hypertext Reference Model', Proceedings of the Hypertext Standardization Workshop Jan 16-18 1990 , National Institute of Science and Technology (Special Publication 500-178), 83-93
[61] Gosling, J., `The NeWS book.'Sun Microsystems.
[62] Halasz F., Schwartz M., `The Dexter Hypertext Reference Model', Proceedings of the Hypertext Standardization Workshop Jan 16-18 1990 , National Institute of Science and Technology (Special Publication 500-178), 95-133
[63] Halasz, F, `Reflections on Notecards: 7 Issues for the Next Generation of HyperMedia Systems', Communications of the ACM, 31.7, 836-851, (July 1988)
[64] Halasz, F., Moran, T. P. and Trigg R. H. `Notecards in a Nutshell', Proceedings of the 1987 ACM COnference of Human Factors in Computer Systems, 45-52
[65] Hall, W., `Ending the Tyranny of the Button', IEEE Multimedia 1(1), 60-68, Spring 1994.
[66] Hall, W., Carr, L., Davis, H., DeRoure D., `The Microcosm Link Service and its Application to the World-Wide Web', Proceedings of the First International World-Wide Web Conference 1994,25-34
[67] Harnden R., Stringer R., `Theseus', International Federation of Lbrary Assistants,18(3)
[68] Harnden R., Stringer R., `Theseus--A Model for Global Connectivity', Proceedings of UK Systems Society 3rd International Conference 1993, Plenum: New York.
[69] Harnden R., Stringer R., `Theseus--A Way of Doing', accepted for Hewson Report, HGC, Olney, Bucks.
[70] Harnden R., Stringer R., `Theseus--the Evolution of a HyperMedium', Cybernetics and Systems,Vol 24: 255-280
[71] Howell G, `Hypertext Meets Interactive Fiction', HyperText II Conference Paper
[72] Hutchings G., `Patterns of Interaction with a Hypermedia System: A Study of Authors and Users', PhD Thesis, Department of Electronics and Computer Science, University of Southampton, Southampton, UK, 1993.
[73] Hutchings G., Hall W., Colbourn C., `Patterns of Students' Interactions with a Hypermedia System', Interacting With Computers, 295-314, 5(3), Sept 1993, Butterworth-Heinemann
[74] Ichimura S., Matsushita Y., `Another Dimension to Hypermedia Access', Proceedings of the 5th ACM Conference on Hypertext 1993,63-72
[75] International Standards Organisation, Hypermedia/Time-based Structuring Language (HyTime), ISO/IEC Standard 10744, 1992
[76] International Standards Organisation, Standard Generalized Markup Language (SGML), ISO Standard 8879, 1986
[77] Jonassen, D. H., `Semantic Network Elicitation: Tools for Structuring Hypertext', Hypertext: state of the art, 142-152, Intellect: Oxford, 1990
[78] Jonassen, D.H. Hypertext/Hypermedia. Educational Technology Publications Inc., Englewood Cliffs, NJ, 1989.
[79] Jordan D., Russell D., Jensen A.-M. & Rogers R., `Facilitating the Development of Representations in Hypertext with IDE', Proceedings of the ACM Conference on Hypertext 1989, 93-104
[80] Kaindl H., Snaprud M., `Hypertext and Structured Object Representation: A Unifying View', Proceedings of the 3rd ACM Conference on Hypertext 1991 ,345-358
[81] Knopik, T., Ryser, S., `AI methods for structuring hypertext information', Hypertext: state of the art, 224-230, Intellect: Oxford, 1990
[82] Knuth, D. K., `The WEB system of structured documentation', Stanford Computer Science Report 980, Stanford, California, September 1983.
[83] Koegel JF et al, HyOctane: A HyTime Engine for an MMIS, Proceedings of Multimedia 93, ACM Press
[84] Koh, T., Loo, P. L., Chua, T., `On the design of a frame-based hypermedia system', Hypertext: State of the Art, 154-165, Intellect: Oxford, 1990
[85] Lamport L., `The LATEX book', Addison Wesley (1985)
[86] Landow, G. P., `The rhetoric of hypermedia: some rules for authors', Hypermedia and Literary Studies, MIT Press, Cambridge, 1991.
[87] Landow, G., `Writing With and Against A Hypertext System', Seminar, Department of Electronics & Computer Science, University of Southampton, UK, 1994.
[88] Lee. Z., `Computed Links for the Microcosm Hypermedia System', PhD Thesis, Department of Electronics and Computer Science, University of Southampton, Southampton, UK, 1993.
[89] Luhn, H. P., `The Automatic Creation of Literature Abstracts', IBM Journal of Research & Development, 2, 159-165, (1958)
[90] Manoal, F., Heiler, S., Georgakopoulos, D., Hornick, M., Brodie, M., `Distributed Object Management', Technical Report GTE Laboratories Inc., 1992,
[91] Marmann M., Schlageter G., `Towards a Better Support for Hypermedia Structuring: The HYDESIGN model', Proceedings of the 4th ACM Conference on Hypertext 1992, 232-241
[92] Marshall C., Halasz F., Rogers R., Janassen W., `Aquanet: A Hypertext tool to hold your knowledge in place', Proceedings of the 3rd ACM Conference on Hypertext 1991 ,261-274
[93] Marshall C., Rogers R., `Two Years before the Mist: Experiences with Aquanet', Proceedings of the 4th ACM Conference on Hypertext 1992,53-62
[94] Marshall C., Shipman F., `Searching for the Missing Link. Discovering Implicit Structure in Spatial Hypertext', Proceedings of the 5th ACM Conference on Hypertext 1993,217-230
[95] Maurer, H., Tomek, I., `Some aspects of Hypermedia Systems and their treatment in Hyper-G', Wirtschaftsinformatik, 32(2), 187-196, April 1990.
[96] Mayes J., Kibby M., Watson H., `StrathTutor: The Development and Evaluation of a Learning-by-Browsing System on the Macintosh', Computers in Education, 12(1), 221-229, (1988)
[97] McBryan, O., `GENVL and WWWW: Tools for Taming the Web', Proceedings of the First International World-Wide Web Conference 1994,79-90
[98] McCracken D., Akscyn `Experiences with the ZOG HCI System', International Journey of Man-Machine Studies, 121, 293-310, (1984)
[99] Michalak S., Coney M., `Hypertext and the Author/Reader Dialogue', Proceedings of the 5th ACM Conference on Hypertext 1993,174-182
[100] Microsoft Corporation, `Object Linking and Embedding: Version 2.0 ', Microsoft Technical Backgrounder
[101] Microsoft Corporation, Reference to Microsoft Word, Microsoft Corporation
[102] Moulthrop S., `Beyond the Electonic Book: A Critique of Hypertext Rhetoric', Proceedings of the 3rd ACM Conference on Hypertext 1991 ,291-298
[103] Moulthrop S., `Hypertext and the "Hyperreal"', Proceedings of the ACM Conference on Hypertext 1989, 259-263
[104] Moulthrop S., `Towards a Rhetoric of Informating Texts', Proceedings of the 4th ACM Conference on Hypertext 1992,171-189
[105] Nanard J., Nanard M., `Using Structured Types to incorporate Knowledge in Hypertexts', Proceedings of the 3rd ACM Conference on Hypertext 1991 ,329-343
[106] Narnard J., Narnard M., `Should Anchors be Typed too?', Proceedings of the 5th ACM Conference on Hypertext 1993,51-62
[107] Nelson, P., `User Profiling for Normal Text Retrieval', Proceedings of the American Documentation Institute Annual Meeting, 4, 228-295, (1974)
[108] Nelson, T. `Computer Lib', 2nd Edition, Microsoft Press, 1987
[109] Nelson, T. `Literary Machines', published by the author, ISBN 0-89347-056-2
[110] O'Neill J., `Intertextual Reference in Nineteenth Century Mathematics', Science in Context, 6(2), 435-468, (1993)
[111] Object Management Group, `The Common Object Request Broker: Architecture and Specification', Document Number 91.12.1
[112] Parunak H., `Don't Link Me In: Set Based Hypermedia for Taxonomic Reasoning', Proceedings of the 3rd ACM Conference on Hypertext 1991 ,233-242
[113] Parunak H., `Hypercubes Grow on Hypertrees (and other observations from the land of hypersets)', Proceedings of the 5th ACM Conference on Hypertext 1993,73-81
[114] Peehong C et al `The Vortex Document Preparation Environment', Lecture Notes in Computer Science 236, 45-54, (1986)
[115] Price R, MHEG: An Introduction to the future International Standard for Hypermedia Object Interchange, Proceedings of Multimedia 93, ACM Press
[116] Quint V., Vatton I., `Combining Hypertext and Structured Documents in Grif', Proceedings of the 4th ACM Conference on Hypertext 1992,23-32
[117] Rada, R., `Hypertext: From Text to Expertext', McGraw-Hill Book Company, London. 1991.
[118] Rahtz, S. P. Q., Carr, L. A., Hall, W. H., `Creating multimedia documents: hypertext processing', Hypertext: state of the art, 183-193, Intellect: Oxford, 1990
[119] Raymond D., Tompa F., Hypertext and the Oxford English Dictionary, Communications of the ACM, 31(7), 67-83 (1988).
[120] Reinhardt, A., `Managing the New Document', 91-104, Byte, August 1994
[121] Riley V., `An Interchange Format for Hypertext systems: the Intermedia Model', Proceedings of the Hypertext Standardization Workshop Jan 16-18 1990 , National Institute of Science and Technology (Special Publication 500-178), 213-222
[122] Ritchie I., `Hypertext--Moving Towards Large Volumes', British Computer Society Journal, 32(6), 516-523, (December 1989)
[123] Rizk A., Sauter L., `Multicard: An Open Hypermedia System', Proceedings of the 4th ACM Conference on Hypertext 1992,4-10
[124] Rizk A., Streitz N., André J. (Eds.) `Hypertext: Concepts, Systems and Applications', Proceedings of the European Conference on Hypertext, INRIA, France, (1990), Cambridge University Press
[125] Rubinoff, M and Stone, DC `Semantic Tools in Information Retrieval', Proceedings of the American Documentation Institute Annual Meeting, 4, 169-174, (1974)
[126] Rubinstein, R., `Digital Typography: An Introuction to Type and Composition for Computer System Design', Addison Wesley, 1988
[127] Salton G., `Selective Text Utilization and Text Traversal', Proceedings of the 5th ACM Conference on Hypertext 1993,131-144
[128] Schneiderman B, `Designing the User Interface', Addison-Wesley 1987
[129] Schneiderman B. & Kearsley G., `Hypertext Hands-On!', Addison-Wesley 1989
[130] Shackelford D., Smith J., Smith F., `The Architecture and Implementation of a Distributed Hypermedia Storage System', Proceedings of the 5th ACM Conference on Hypertext 1993,1-13
[131] Shackleford, D. E., `The Architecture and Implementation of a Distributed Hypermedia Storage System', Proceedings of the 5th ACM Conference on Hypertext 1993,1-13
[132] Shavelson, R.. `Methods for examining representations of subject matter structure in students' memory', Journal of Research in Science Teaching, 11, 231-249, 1974.
[133] Silverman, C & Halbert, M `Relevancy Revisited--the User as Learner', Proceedings of the American Documentation Institute Annual Meeting, 4, 53-57, (1974)
[134] Steven Newcomb, Neill Kipp, Victoria Newcomb, The "HyTime" Hypermedia/Time-based Document Structuring Language, Communications of the ACM, 34(11), 67-83 (November 1991).
[135] Stotts P., Furuta R., `Hypertext 2000: Databases or Documents?', Electronic Publishing: Origination, Dissemination & Design, 4(2), 119-121, (1991)
[136] Stotts P., Furuta R., Ruiz J., `Hyperdocuments as Automata: Trace-based Browsing Property Verification', Proceedings of the 4th ACM Conference on Hypertext 1992,272-281
[137] Streitz N., Haake J., Hannerman J., Lemke A., Schuler W., Schütt H., Thüring M., `Sepia: A Co-operative Hypermedia Authoring Environment', Proceedings of the 4th ACM Conference on Hypertext 1992,11 -22
[138] Streitz N., Hannermann J., Thüring M., `From Ideas and Arguments to Hyperdocuments', Proceedings of the ACM Conference on Hypertext 1989, 343-364
[139] Thüring M., Haake J., Hannermann J., `What's Eliza doing in the Chinese Room? Incoherent Hypertexts and how to avoid them', Proceedings of the 3rd ACM Conference on Hypertext 1991 ,161-177
[140] Tyler S., `The Said & The Unsaid: Mind, Meaning and Culture', Academic Press (1978)
[141] van Dijk, T.A., `Macrostructures: An Interdisciplinary Study of Global Structures in Discourse, Interaction and Cognition', Hillsdale N.J: L. Erlbaum, 1980
[142] van Dijk, T.A., `Text and Context', Longman's Linguistic Library, London: Longman, 1977 .
[143] Wakeman, L., Jowett, J., `PCTE: The Standard for Open Repositories', Prentice Hall, 1993
[144] Wright P., `Cognitive overheads and prostheses: some issues in evaluating hypertexts', Proceedings of the 3rd ACM Conference on Hypertext 1991 ,1-12
[145] Wright P., Lickorish A., `An empirical comparison of two navigation systems for two hypertexts', HyperText II Conference Paper
[146] Wright, P. Cognitive Overheads and Prostheses: Some Issues in Evaluating Hypertexts. In Hypertext `91: Proceedings of Third ACM Conference on Hypertext, San Antonio, TX December 15-18 1-12, 1991.
[147] Yankelovich N. , Van Dam A., Meyrowitz N., `Reading and Writing the Electronic Book', IEEE Computer, 15-30, (October 1985)
[148] Yankelovich, N. et al, `The Concept and Construction of a Seamless Information Environment', IEEE Computer, 81-96, (Jan 1988)
[149] Zunde, P. `Evaluating and Improving Internal Indexes', Proceedings of the American Documentation Institute Annual Meeting, 4, 86-89, (1974)
Thereafter, calling up such a frame would display a list of link names which the reader may choose to follow by manipulating an appropriate set of control levers. Chaining these `links' together provided `trails' of interest. Bush anticipated electronic encyclopaedias produced with ready-made meshes of these associative trails.
What Bush conceived was a hypertext system with bi-directional named links which mapped frames to frames. The nodes were ordered in a mainly hierarchical fashion (following standard library classification and indexing procedures) with arbitrary cross-referencing, however this ordering was not to be an inherent feature of the system, but a discipline imposed on the initial set of links.
With the exception of links, the information stored was essentially analog in nature, and so no mechanically assisted browsing was possible. The main advantage that the memex offered a library user therefore were a much-increased speed of access and the explicit storage and browsing of trails of thought.
All of the project information was stored in files, with each file divided into hierarchical statements. Arbitrary reference links were allowed between statements; links were commonly displayed as tagged code-strings inside the text. The console is divided up into multiple `windows' each of which provides a view onto some part of the data. A link was activated by clicking with the mouse on a link tag and then again on the window where the result is to be displayed. Links may be indirect, in which case the tag refers to a statement where the final link address is to be found.
NLS in its original form has no buttons; jumps were performed by activating a link specification (either by clicking on it with the mouse button or by typing its name). Each link specification was composed of 3 parts: the display start which consisted of an address to jump to and a modification of that address, the view filter and a format specification.
The display start was the name of a statement (the first word of the text of that statement), the name of a marker which was pointing to the statement or the statement'sid. The id is a statement's address within the file's hierarchy, for example `6b5' refers to the fifth subsubstatement of the second substatement of statement 6. The address modification was an operation such as `successor', `predecessor', `parent', `eldest child' that yielded a new statement according to the text's structure. The `search' operation allowed a statement to be selected by the text that it contained, with rules given in a content analysis language. The view filter was used to select which statement following the display start appeared to the user. Filtering took place according to the statements' depth in the hierarchy (level filtering) and their content (using the same rules as above). The format was used to restrict the length of the statements that were displayed and to control the space that separated them.
The level-filtering operation is commonly used to provide a summary of a document on the assumption that the more detailed elaborations of an argument appear in the lower reaches of the document's structure. This leads to a distinctively artificial style of writing (an example of which is seen in [53] which was authored with the NLS system).
ZOG consisted of a database of screen-sized text frames which were viewed one at a time on standard computer terminals. Each frame was structured in its layout and consists of a title, topic information, menu selections and global pads. The `topic information' is the text that holds this frame's knowledge; the title is a one-line summary of this information and gives the frame's unique identification. The menu area gives a set of alternative destinations for finding further information. By convention, the labels on the menu choices are the same as the names of the cards to which they lead. The global pads are at the bottom of the frame and provide a standard set of choices for the reader (for example, go back, go forward and help). All choices are made by selecting the labels with a mouse or by typing the number of the menu item (or initial letter of the pad). Although the database can represent any arbitrary network topology, the ZOG designers express a strong preference for tree-structures as the initial format of the data. In this way frames are designed so that each menu item leads to the children of the current frame and the global pads `next' and `previous' move along the current frame's siblings. The ZOG philosophy stresses that menu items must only perform tree-wise navigation and that cross-reference jumps can only be performed by the pads. Browsing a ZOG database is accomplished purely by selecting menu items and pads--there is no facility to find a frame by satisfying a particular query, but the speed of response which ZOG achieved (a fraction of a second between selecting an item and it being displayed) allowed the users to locate information and select potentially interesting branches very quickly. Each selection is associated with an action. The default action is to go to the referenced frame, but a simple internal programming language allowed more complicated interaction with the user. More sophisticated requirements were fulfilled by `agents', which were external programs invoked by the host computer's operating system and sharing a common convention for taking their input and placing their output in predefined ZOG frames.
The user interface (now very familiar) consisted of a bitmapped display on which many overlapping windows were drawn. A mouse was used to manipulate the various objects portrayed in each of the windows, and a menubar of available commands was displayed at the top of the screen (commands are invoked by choosing them with the mouse or by control-keys on the keyboard). The applications were similar in function to the commercially available MacWrite, MacDraw and MacPaint, but added the capability of creating, editing and following links.
Creating a link was (intentionally) very similar to executing a Cut/Paste operation from the Macintosh desktop metaphor. A source item is selected and the ``Start Link'' command is chosen. The destination item for the link is the selected and the ``Complete Link'' command is chosen. Small icons are displayed at the source and destination to indicate the presence of the link. Double-clicking on such a link anchor point will bring up a new window containing the destination point in the same way that double-clicking on a program icon will start that program. This `seamless' grafting of added functionality onto a pre-existing user interface is emphasised throughout Intermedia.
The link end-points may be attached to any block (contiguous selection of the document), and not just to the document `node' or window which contains them. This is an important distinction as many systems distinguish between the container (frame, window or card) and the information in it by allowing links to be addressed to a container. Subsequent editing or restructuring the document may often lead to many of the links becoming invalidated because they no longer point to the correct information. Intermedia does not suffer from this drawback because links are attached to the information itself.
When working with a particular document, a map window is displayed which shows icons representing the current document and the links that exist to other documents. This map is updated as the current document changes and as new links are added.
Both links and blocks may have property sheets associated with them (analogous to the style sheets that control the physical appearance of a paragraph in a word processor). The property sheets contain fields showing the creator id, the time of creation (automatically filled in), an explainer and a list of keywords (supplied by the author) and may be used as part of the query specification for a search.
Intermedia keeps block and link information separate from the documents themselves, storing it in webs instead. When a browser opens a new web it imposes a new set of links on a family of documents, and allows different users to maintain different perspectives on a set of literature.
Notecards supports unidirectional typed links which connect information at the card level. The source of a link is displayed as an icon anchored at a point on the source card. Clicking that icon will display the notecard which is its destination. Each icons has a different appearance according to the nature of the information that is contained on the destination card.
There are two specialised card types: browsers and fileboxes. A filebox can `contain' other cards and fileboxes and is used to impose an intial hierarchical structure on a network of cards (the system imposes the restriction that each notecard must be stored in at least one filebox). A Browser is a card which contains a diagram of a network of notecards. The diagram is created by the system and can be used for navigation purposes, or by direct editing for changing the structure of the network, relinking notecards.
Navigation is mainly achieved by following links in one of three different contexts: a browser, a filebox or a notecard. Apart from these three mechanisms, there is a simple query system which searches for nodes matching the reader's specification.
HAM defines various objects which make up a hypertext network together with the operations that can be performed on those objects. HAM's top-level object is the graph, which represents a complete hypertext network and which is partitioned into a tree of contexts. A graph is constituted of nodes joined by links. A node may contain text or binary data and may be subject to automatic version control. Links are used to relate a source and destination node and may too be subject to version controls. The versioning allows the state of any node or link to be queried at any point in its history. Contexts, nodes and links may all have attributes attached which can provide application-specific information. HAM defines the following generic operations which can be applied to any object: create, destroy, get and change. All manipulate an object according to a particular version time. For example, to read a node the `get' operator is passed a reference to a node and a version time and returns the data that the object held at that time. Aside from miscellaneous operations that are only applicable to particular object types there is also a filter operation which takes a version time and a predicate (a test based on attribute values) and returns a list of all the objects in a graph which satisfied that predicate at that time.
HAM has been used to emulate various hypertext systems, but as it is not itself a hypertext system, those entries in the following table which depend on the capabilities front end have been marked as not applicable .
HyperTIES unusual method of link selection has been found to be particularly efficient, especially for novice users [128]. The embedded menu technique has also been shown to be effective in an environment where those who are using it are not computer literate, and stands opposed to the practise of creating separate icons to act as link anchors or having a separate set of menu options. HyperTIES also includes an authoring package to allow people with limited computer skills to create and maintain a HyperTIES database. An introduction to Hypertext intended for such users [129] has been published in book form in conjunction with a PC disk containing the same information in a HyperTIES database.
Behind the scenes HyperCard implements a full function programming language called ``HyperTalk''. The object-oriented language is used to write scripts for the various objects (stacks, cards, fields and buttons) allowing them to respond to various events (e.g. opening a stack, going to a new card, clicking the mouse or pressing a key). Apart from defining new functions with HyperTalk, new routines written in C or Pascal can be linked into the system. This has allowed new media (video, sound and animation) to be incorporated into HyperCard stacks (see [35]).
One of the problems of describing the capabilities of HyperCard (or indeed NoteCards) as a hypertext system is that it is coupled with a highly-extensible programming language and a sufficiently rich set of data types that allow the basic system to emulate more or less any other type of hypertext system. By carefully writing a set of scripts it is possible to make HyperCard act like Guide [27] or NoteCards. Such emulations may suffer from efficiency problems and not provide the level of service that a hand-tuned system will, but it still becomes difficult to draw the line between what HyperCard can and can't do.
Despite the power that HyperTalk provides (allowing computationally-active hypertexts), HyperCard remains difficult to use for pure hypertext, since buttons are anchored to a physical point on a card, not to a region of text within a field. For this reason, even the most minor editing operation on a field will require that all its link anchors be repositioned. Cumbersome work-arounds are possible, but it is for this reason that many claim that ``HyperCard is not hypertext''. Instead, HyperCard's ease of use, its integration with the Macintosh environment and the power of its programming language have led to its success both as a software prototyping tool and an easy-to-learn front-end to highly technical software such as the Oracle database. It is also being used as the basis for help systems for other software.
The author enters text as with a normal word processor (the Macintosh implementation provides the usual font-, size- and style-changing commands), and then selects a piece of text and through a menu operation turns it into a replacement button. The button then is then replaced with the default expansion ``..expansion..'' which the author edits. A symmetrical operation exists for allowing the author to select an expansion and to provide a name for the replacement button.
The newly-created replacement button-expansion text pair are now difficult to edit because the text cannot be selected by the mouse. To this end there is a menu command which freezes the state of all buttons allowing their text to be changed.
A typical Guide document is encountered in a `high level summary' form, where each summary is in fact a replacement button. Clicking on each summary will expand it to show more detailed information, with each expansion containing more replacement buttons. In this respect Guide acts as a folding editor, progressively disclosing more and more information at the reader's request. Slight variations of the replacement buttons are used to display relevant information in other windows or to jump to another document, put these are to be rarely used according to the author. As Guide is intended for the novice user it discourages the use of both disorientating hypertext `gotos' and `find' operations, preferring instead to use inline expansions.
Guide has been used in commercial environments, and has been used as an online help tool for the Macintosh PageMaker program.
Both nodes and links are examples of objects composed of a set of unordered, named, typed slots containing values which are basic data types (numbers, text, pictures). Link objects have slots whose values are allowed to be other objects and each object type has a different graphical appearance for manipulation on a graphical browser.
The user interface is given by windows which contain graphical views onto the full structure of the hypertext network, a list of all the objects in the network and a view of the full contents of a selected object. The kinds of objects used in the hierarchy and the relationships between them are constrained by a set of schema.
Aquanet's link objects are in fact n-ary relationships, and their graphical relationship reflects their slot-based nature. In order to make links between a set of objects in Aquanet, the slots in the link object are filled in with the names of the linked objects. The literature does not make it clear how this is handled by the user interface; what is clear is that it is the opposite of most linking operations, since instead of selecting a node and applying a link to it, one selects a link and applies a set of nodes to it. A corollary to this is that there are no link anchors or buttons which can appear within a node, instead, when viewing the hypertext its is the nodes which appear inside the link relations. (Links are therefore node-to-node relationships.)
Word is not meant to be a hypertext system, since it presents a cumbersome interface to its hypertext facilities (authoring a link is rather cumbersome). However, by customising it using the built-in programming language it is possible to make an adequate hypertext user interface.
A `node' within Acrobat is equivalent to a printed page, and may be of arbitrary size but the information content of the node is fixed and may not be edited or manipulated in any way except for viewing. A `link' associates a source with a particular view (location and magnification) of a particular destination node. The link source is either a rectangular area of the node (which may be highlighted by outlining) or an entry in a special hierarchical list of bookmarks (usually used as a table of contents).
Links are first-class objects as they are stored explicitly as separate objects within each document, but no graphical browser is provided. Links can only be made to local nodes (within the same document). The nodes are arranged in a sequence, corresponding to the print order of the pages of original data, but a hierarchy can be imposed by the use of bookmarks. Note that there will not usually be a one-to-one mapping between the `logical' section names in the tree of bookmarks and the nodes in the document.
At the center of the lace server's world then is a database which describes all the documents which have been published on this node. Each document in turn is a database of logical elements, text and links.
The first and second fields define a nickname and full name for the document, respectively. Either can be used for referring to the document over the network. The third field (currently unused) specifies the permissions associated with this document and the fourth (also unused at this time) gives a comma-separated list of keywords that describe the contents of this document. The fifth field gives the name of the directory in which the document is to be found, the sixth gives the name of the file in that directory. The last field is the type of the document, specifying which medium it is on (e.g. video) or which markup system has been used to represent it (e.g. TEX or troff).
The command updoc puts this database into the UNIX dbm format, in a database called `docs.byname' which is the object that the document server actually deals with.
test:Humanities Computing:public:humanities,archaeology:/usr/lace:foo.tex:latex
Figure A2.1: A entry from the Lace database
<Request> ::=
<RequestType><DocumentSpec>
<RequestType> ::= a |
n | o
<DocumentSpec> ::= <DocumentName> |
<DocumentName>:<SubdocSpec>
<SubdocSpec> ::= <Subdoc
Type><Reference> | <Title>
<Subdoc Type> ::=
page | chapter |section | table | figure |
footnote ...
<Reference> ::= <Title> \vbar
<Number>
<Title> ::=
[a-zA-Z.,:;?!"`()]
<Number> ::= <Digits> |
<Digits> . <Number>
<Digits> ::= [
0-9 ]
Figure A2.2: Server Request Protocol
Request type n asks for a named document fragment to be displayed in a new window on the client's NEWS server. This is similar to the o request which overwrites the contents of a window with a named document fragment. Request type a makes the server add an annotation to the named document fragment. The user is prompted for a title and the body of the annotation which is then added to the annotation file associated with the document. Links to and from the annotation are inserted into the document's link file.
All of these requests involve asking for a document by name. This is how the name matching is performed: first the document name is checked against the list of nicknames and then against the list of full document names. The test is case insensitive and compresses all sequences of multiple blanks to one blank. A similar operation is done when matching a subdocument title to the requested title, except that any title which has the requested title as a prefix will satisfy the match (i.e. the request nfoo:intro will be matched by the Introduction of document foo).
The special name me is recognised as referring to `the current document', so that a document may send a request for me:section 2 to display section 2 of itself.
Any request which consists only of a document name will notionally have the subpart :page 1 appended to it. The page is created as a physical structure by the typesetting software rather than an explicit logical structure provided by the author.
Hence, the lace server considers a document to be stored in a file whose name is given in the document database, except that any extension that the filename has is ignored and a .ps extension is appended. This .ps file consists of two elements: a setup procedure which typically defines the set of fonts that the document uses, and a list of pages where each ``page'' is a procedure which contains the instructions to print one page. Each document element in the page array is marked by POSTSCRIPT comments, as can be seen in figure A2.3.
%%Document Setup {
/c-med.240 /Courier 33.208800 TeXPSmakefont def
/h-med.270 /Helvetica 37.359900 TeXPSmakefont def
/t-bol.300 /Times-Bold
41.511000 TeXPSmakefont def
/cmr10.300 /cmr 41.511000 TeXPSmakefont def
}
%%Document Setup End
%%Page List
%%Page: 1
{ 1 @bop1
%LACE
mark section 1 begin Introduction
t-bol.360 @sf 141 1355 p 49 c 216 1355 p
(Intr) s 302 1355 p (oducti) s 435 1355 p (on) s t-rom.300 @sf 141 1442 p
(This) s 231 1442 p (report) s 345 1442 p (describes) s 516 1442 p (the) s ...
%LACE mark section 1 end
141 2324 p
%LACE mark section 2 begin
Courses
t-bol.360 @sf 141 2373 p 50 c {
%LACE mark footnote 1
begin
cmr6.300 @sf 1641 2484 p 49 c t-rom.240 @sf 1658 2495 p 73 c 1683 2495
p (am) s 1738 2495 p (grateful) s 1856 2495 p (to) s 1896 2495 p (Lou) s 1964
2495 p (Burnard) s
%LACE mark footnote 1 end
} pop
t-rom.300 @sf 1611
2517 p (and) s 1684 2517 p (two) s 141 2566 p (one-term) s 304 2566 p (options)
s
}
Figure A2.3: A Lace Document in PostScript Form
As well as the .ps file, there is a map file which catalogues the positions of all the logical document structures in the .ps file (as shown in figure A2.4).
Each line in the file represents one structure in the document. The first and second fields are the byte offsets for the beginning and end of that structure from the start of the .ps file, the third and fourth fields represent the page numbers on which the structure starts and ends, the fifth field is the type of the structure (e.g. page, section, table), the sixth is the ordinal number of that structure (as in table 2 or subsection 2.3) as provided by the formatter, and the remainder of the line is the title of that structure. The name of the map file is the same as the .ps file, with the extension `.map' added to the existing `.ps' extension.
991 2031 0 0 page 0
2031 19591 1 1 page 1
19591 32923 2 2 page 2
13
957 0 0 special 1 Document Setup
12429 17273 1 1 section 1
Introduction
17310 86771 1 6 section 2 Courses
18153 18770 1 1 footnote
1
53045 56597 4 4 subsection 2.1 Equipment
67177 72080 5 5 figure 1
Questionaire given to students
72155 75018 5 5 table 1 Results for all
groups of students
82668 86771 5 6 subsection 2.2 Conclusion on
courses
@/usr/lace/cstr86-2.ann 0 354 1 1 annotation 1 Three Years On
An extension to this basic format has been made to allow annotations to be linked in without altering the original document. This can be seen in the final line of the figure, where an extra field (signified by the @ character) has been prepended. This extra field gives the name of a separate file to which the rest of the line refers. Hence the last line in figure A2.4 says that the first annotation (titled ``Three Years On'') to this document is to be found between bytes 0 and 355 in the file called /usr/lace/cstr86-2.ann.
As well as the .ps and .map files, there is a link file associated with the document. This link file lists all the references that exist between the document structures. The example shown in figure A2.5 shows that the introductory section of this document makes reference to section 2, and that annotation 1 is commenting upon section 5. This file is scanned by the document server to provide a menu of `Come Froms' and `GoTos' that relate to the current structure.
me:section Introduction me:section 2
me:section Humanities Computing
me:footnote 1
me:section Humanities Computing me:section
4
me:annotation 1 me:section 5
The following document structuring commands are modified to add markers to the POSTSCRIPT, so that LACE can pick them out from the formatted output: abstract, chapter, section, subsection, subsubsection, table, figure, footnote, bibliography, aside
The definition of table, figure and footnote has been changed so that they take up no space on the page, but instead inhabit a separate window that is brought up when pressing a button over a reference to them. For example, a footnote window is brought up when the reader presses the button over the footnote marker in the main body of the text. Similarly a figure window is displayed when the reader presses a button over some text like `see figure 3' which will typically have been created by the new LACElabel and LACEref commands
A new command link is defined. This takes two parameters--the first is a piece of text over which a button will be placed, the second is the LACE address of a document part. For example, the command \link{see section 3}{me:section 3} will make a new window with section 3 of this document pop up when the user clicks on the (invisible) button over the text `see section 3'
A new environment aside is defined. This behaves very much like a footnote: the LATEX fragment
\begin{aside}{Click Me For More Information}
This project has been funded by the World Wildlife Appeal
\end{aside}
will produce an invisible button over the words ``Click Me For More Information'' which, when pressed, will bring up a window with the text that is in the body of the environment.
The label and ref commands have been extended in the form of the LACElabel and LACEref pair. These two commands differ from the standard LATEX forms in that they save not only the number of the current environment, but also its type e.g. section 3.4 or table 2. As well as doing this, the reference text has a link to the item it references.
The table of contents, list of figures and list of tables all have invisible buttons on each of the lines, so that clicking on any line will bring up a window with that part of the document in it.
Any use of the cite command to produce a bibliographic citation in the main body of the text has a button over it that brings up a window containing the full bibliography entry. No generalised facilities yet exist for bringing up the cited document, even if it is a published LACE document.
After invoking hyperlatex, use hyperdvi to produce the .ps file. Then use mkmap and mklinks to produce the auxilliary files.
Define a new command lN which is similar to LATEX's link command. The first parameter is the piece of text which is to have a button appearing over it, the second parameter is a LACE document part which will appear in a new subwindow when the link is activated.
The section command .SH has been modified to add markers to the .ps file, so that LACE can pick out the synopsis, usage description, and bugs sections from the formatted output.
The preprocessing stage of hyperman looks for strings of the form ...cat(1)... and turns these into buttons that are linked to the appropriate manual pages on the assumption that the manual page for foo in section N of the manual is published under the nickname ``foo(N)''.
There is major problem with troff in that any `specials' that are pushed through the formatting stage using the \! notation are floated to the start of the line. This means that it is difficult to mark out the beginning and end of the active area of a button. The way around this is currently to put a special null-length marker string into the text stream. For example, to have a button over the word `foo' which linked the user to section 3 of document `bar', the following troff sequence \\kxfoo\\ky\\h'|\\nxu'BUTTON:foo:bar:section 3\\h'|\\nyu' would be used which puts the horizontal position of the beginning and end of the text `foo' into registers x and y respectively. Then troff backs up to the beginning of the text, writes the string `BUTTON:' followed by the text again, followed by the LACE documentpart name `bar:section 3', and then skips to the endpoint of the original text. The process which post-processes the .ps file takes the position ofthe start of the button to be the current position when the string `BUTTON' is found, and the width of the button to be the width of the string delimited by the next two colons.
Each section definition is marked so that the postprocessor hast he high-level structural information along with the low-level formatted text
Every reference to another section (e.g.``...this code is used in section 5'' or <Store all the reserved words 64>) has a button attached to it that will bring up that section in a new window.
The index has buttons over the references to each section number
LACEwindow is a subclass of the system-defined LiteWindow class, which has a number of extra attributes:
Prolog the POSTSCRIPT definitions required to display a particular sort of document i.e. slightly modified versions of the TEX or troff prologs that would be sent to a laser printer. The prolog will vary according to the document type
Title the full title of this document (displayed in the window's title stripe)
Setup the POSTSCRIPT procedure that sets up the environment for this document. Usually defines the set of fonts to be used. Notice that the prolog is specific to a particular document type, whereas the setup is specific to a particular document.
Pages the array of contiguous pages that is currently known by the window. Each page is simply a procedure which when executed will render the page onto the screen.
MaxPage the number of the last page currently held in memory
MinPage the number of the first page currently held in memory
RealMaxPage the number of the last page of the document (usually `0' for the title page)
RealMinPage the number of the first page of the document
Menus the menu object that is displayed when the reader pressesthe mouse's menu button inside a LACE window.
The POSTSCRIPT code shown in figure A2.6 is a typical creation of a new LACEwindow.
%%Document Title
(Humanities Teaching)
%%First & Last Pages
0
7
%%First of This Page Set
1
%%Document Setup
{
/ag-book.360
/AvantGarde-Book 49.813200 TeXPSmakefont def
/p-bol.300 /Palatino-Bold
41.511000 TeXPSmakefont def/p-ita.240
/Palatino-Italic 33.208800
TeXPSmakefont def
/cmmi10.300 /cmmi 41.511000 TeXPSmakefont def
/cmr6.300 /cmr 24.906600 TeXPSmakefont def
/cmsy10.300 /cmsy 41.511000
TeXPSmakefont def
}
%%Menus
[ (Contents =>)
[ (1 Introduction)
{(me:page 1) /doLink win send}
(2 Humanities Computing) {(me:page 2)
/doLink win send}
(3 Undergraduate Courses) {(me:page 2) /doLink win
send}
(3.1 1985-86 course) {(me:page 2) /doLink win send}
(3.2
Student reaction) {(me:page 4) /doLink win send}
] /new DefaultMenu
send
(Tables =>)
[ (1 Computer Usage) {(me:table Computer Usage)
/doLink win send}
(2 Specific hardware) {(me:table Specific hardware)
/doLink win send}
] /new DefaultMenu send
] /new DefaultMenu
send
%%Page List
[
%%Page: 1
{1 @bop1p-romsc.300 @sf141 -54
p
(LIST) s241 -54 p (OF) s311 -54 p 84 c334 -54 p (ABLES) sp-rom.300
@sf
...}
]
/new LACETeXWindow send
Figure A2.6: Creating a New Lace Window
The parameters given to a new window are, in order, the document title, the minimum and maximum page numbers of the document, the pagenumber of the first page of the current batch, the document's setup procedure, the menus and the set of pages in the current batch.
The window may know about several pages at a time, because a particular document structure may span several physical pages. This is the reason for having a MaxPage and a RealMaxPage. If the reader requests a new page, the window first checks to see if it is in the current set of pages before passing on a request to the LACE server. This two-level storage is used to improve the response of the system.
Looking carefully at the last line of figure A2.6 shows that the current implementation actually uses different classes for each type of document i.e. a LACETeXWindow class for TEX documents and a LACEtroffWindow for troff documents. These classes are used for efficiency, as they already have the (not insignificant) prologs elaborated in their environments.
The methods that the LACEwindow classes respond to are as follows
/NextPage go to the next page of the document. If the next page is not in the page array, dispatch a request to the LACE server
/PrevPage go to the previous page of the document. If the previous page is not in the page array, dispatch a request to the LACE server
/GotoPage go to the page whose number is passed as a parameter, possibly sending a request to the LACE server.
/RecentPage backtrack through the list of last-read pages. repeatedly issuing this message will return the reader to the `original' page
/CurrentPage returns the number of this page
/newpages takes two parameters: a new page array and the number of the first page in the array. This message is sent back to the window by the LACE server in response to an `o' request.
/dest destroy this window
/doLink takes a LACE address and sends a request to the LACE server to fire up a new window with those contents.
/AddToTrail add this document part to the current trail
/abortLACEserver
kill the LACE server which is presiding over this window
on openStack
global connectionID, source, question
push card
go to first cd
put fld "Source" into source
put fld "Question" into question
makeMenus
lock screen
put "Starting connection..."
set the cursor to watch
push card
go to card "Comms Console"
put empty into fld "CommsLog"
put false into tickle
put TCPNameToAddr("152.78.64.8") into theHost
put TCPActiveOpen(theHost, 23, 0) into connectionID
if connectionID contains "fail" then
--put "The Result:" && connectionID
beep
answer "Cannot contact Wynkyn on Ethernet" with "OK"
hide message box
put empty into connectionID
pop card
exit openStack
end if
wait until TCPState(connectionID) is "established"
wait until TCPCharsAvailable(connectionID) > 0
gobble
TCPSend connectionID, numToChar(255)&numToChar(252)&numToChar(24)
repeat
get TCPrecvupto(connectionID, return, 2, empty)
if it contains "login:" then exit repeat
end repeat
TCPSend connectionID, "none"&return
wait for 2 seconds
TCPSend connectionID, "nowayhose"&return
repeat
put TCPrecvupto(connectionID, return, 2, empty) into str
set the cursor to busy
if prompt(str) then exit repeat
put str after fld "CommsLog"
end repeat
pop card
pop card
hide message box
end openStack
on closeStack
global connectionID
put "Closing connection..."
-- TCPSend connectionId, "logout"&return
TCPClose connectionID
-- wait until TCPState(connectionID) is "closed"
TCPRelease connectionID
put empty into connectionID
hide message box
end closeStack
on gobble
global connectionId
get TCPRecvChars(connectionID,TCPCharsAvailable(connectionID))
put it after fld "CommsLog"
end gobble
function prompt str
if str contains "[[section]][[section]][[section]]" then TCPFAIL
put length(str) into max
return char max-1 to max of str is "$ "
end prompt
function doCommand theCmd
global connectionId
TCPSend connectionId, theCmd&return
put empty into theRes
repeat
put TCPrecvupto(connectionID, return, 2, empty) into str
set the cursor to busy
if prompt(str) then exit repeat
put str after theRes
end repeat
--delete line 1 of theRes
delete line 1 of theRes
put superStrip(theRes, lineFeed) into theRes
return theRes
end doCommand
function interpret id
if id is empty then return "no connection"
else return TCPState(id)
end interpret
on makeMenus
reset menubar
create menu "Lace-92"
put "Wais Query,Shell,Interrupt,-,New Partition,-,Export Structure" after menu "Lace-92" with~
menumessages "doWQ,doShell,doIntr,,makeAClass,,exportToFile"
end makeMenus
function wprompt str
if str contains "[[section]][[section]][[section]]" then TCPFAIL
if last char of str is space then delete last char of str
put length(str) into max
return char max-5 to max of str is "quit]:"
end wprompt
on waisSearch sources, words, remote
global connectionId, tickle, numArts, wState
if wstate is "connected" then
waisExit
put "disconnected" into wstate
end if
if remote is empty then
put "/usr/lib/wais/bin/waisexsearch -h wynkyn -p 210 -d"&&word 1 of sources&&words into theCmd
else
put word 1 of sources into db
if db is "cacm" then put "/proj/wais/db/cacm/cacm" into db
put word 3 of sources into host
--put "/usr/lib/wais/bin/wtel@tex"&&db&&host&&words into theCmd
put "/usr/lib/wais/bin/waisexsearch -h"&&host&&"-p 210 -d"&&db&&words into theCmd
end if
lock screen
push card
go to card "Comms Console"
TCPSend connectionId, theCmd&return
put empty into theRes
put 0 into numArts
put empty into buffer
repeat
put TCPCharsAvailable(connectionId) into n
put TCPRecvChars(connectionID,n) after buffer
put length(buffer) into m
put buffer into str
if char m-1 to m of buffer is return&linefeed then
put empty into buffer
else
put number of lines of buffer into lb
if lb > 1 then
delete line 1 to lb-1 of buffer
end if
delete last line of str
end if
if str is empty and wprompt(buffer) then
exit repeat
end if
set the cursor to busy
if str is empty then next repeat
if wprompt(str) then exit repeat
put str after theRes
put str after fld "CommsLog"
select after last char of fld "CommsLog"
if numArts > 0 then
get word 2 of str
delete last char of it
put "Getting headline of article"&&it&"/"&numArts
end if
if str contains "+++- Spad" then put "Calling UK gateway"
if str contains "++ x25 server closed connection" then
put "JANET call failed"
beep
pop card
exit waisSearch
end if
if str contains " +++- bytes/pkts" then
put "search failed"
beep
pop card
exit waisSearch
end if
if str contains "Connected to" then put "Connected to UK gateway"
if str contains "waissearch" then put "Issuing information search"
if str contains "waisexsearch" then put "Issuing information search"
if str contains "SunOS Release" then put "Logged onto to UK gateway"
if str contains "NumberOfRecordsReturned:" then
put offset("NumberOfRecordsReturned:",str) into pos
get char pos to 30000 of str
put "Found"&&word 2 of it&&"relevant articles"
--put word 2 of str into numArts
end if
if numArts > 0 then
get word 2 of str
delete last char of it
put "Getting headline of article"&&it&"/"&numArts
end if
end repeat
hide message box
put true into tickle
--delete line 1 of theRes
delete line 1 of theRes
put superStrip(theRes, lineFeed) into theRes
pop card
put offset("Search Response:", theRes) into pos
if pos is 0 then
beep
else
delete char 1 to pos of theRes
delete first line of theRes
end if
put theRes into fld "Results"
put return&"quit: Finish" after fld "Results"
put empty into fld "Text"
put "connected" into wstate
-- waisExit
end waisSearch
on waisExit
global connectionId
TCPSend connectionId, "0"&return
put TCPrecvupto(connectionID, return, 10, empty) into str
TCPSend connectionId, "q"&return
put TCPrecvupto(connectionID, return, 10, empty) into str
repeat
put TCPrecvupto(connectionID, return, 2, empty) into str
set the cursor to busy
if prompt(str) then exit repeat
end repeat
end waisExit
on doIntr
global connectionId
TCPSend connectionId, numToChar(3)&return
end doIntr
on doShell
global connectionID
ask "What command?" with "date"
if it is empty then exit doShell
answer doCommand(it)
end doShell
on sink
global connectionId
put empty into theRes
repeat
put TCPrecvupto(connectionID, return, 2, empty) into str
set the cursor to busy
if prompt(str) or wprompt(str) then exit repeat
if str != empty then put str
end repeat
put empty
hide message
end sink
function TCPgetLine
global connectionId
put empty into str
repeat
put TCPrecvupto(connectionID, return, 2, empty) after str
if str contains "[[section]][[section]][[section]]" then TCPFAIL
if last char of str is return then exit repeat
end repeat
return str
end TCPgetLine
on TCPFAIL
beep
answer "TCP driver failed. Connection aborted" with "Damn!"
exit to hyperCard
end TCPFAIL
on waisSelect n, dname
put offset(space&n&":",fld "Results") into pos
if pos is 0 then
beep
exit waisSelect
end if
get char pos to pos+1000 of fld "Results"
get first line of it
put word 4 of it into foo
delete char 1 to 6 of foo
if foo is empty then
put word 5 of it into nlines
put word 6 to 100 of it into title
else
put foo into nlines
put word 5 to 100 of it into title
end if
if dname is empty then
WAISchoice n, title, nlines
else
WAISchoice n, title, nlines, dname&":"&superstrip(char 1 to 26 of title,":")
end if
global waisID
put getWaisID() into waisID
end waisSelect
function getwaisID
global connectionId, source
if last word of source is "ecs.soton.ac.uk" then
TCPSend connectionId, "i"&&"/home/wynkyn/2/users/lac/tmp/FOO.ID"&return
else
TCPSend connectionId, "i"&return
end if
put empty into theRes
repeat
put TCPrecvupto(connectionID, return, 2, empty) into str
set the cursor to busy
if wprompt(str) then exit repeat
put str after theRes
end repeat
if last word of source is "ecs.soton.ac.uk" then
put "UNIX_temporary:FOO.ID" into fname
open file fname
read from file fname until eof
close file fname
put it into theRes
else
delete line 1 of theRes
put superStrip(theRes, lineFeed) into theRes
end if
return theRes
end getwaisID
on doWQ
global connectionID, source, question
click at -10,-10
if last word of source is "ecs.soton.ac.uk" then
put empty into remote
else
put true into remote
end if
waisSearch source, question, remote
end doWQ
on startProgress
show cd fld "Backdrop"
show btn "Item Backdrop"
show btn "Group Backdrop"
set the width of btn "Item" to 0
set the left of btn "Item" to the left of btn "Item Backdrop"
show btn "Item"
set the width of btn "Group" to 0
set the left of btn "Group" to the left of btn "Group Backdrop"
show btn "Group"
end startProgress
on progress i, imax, g, gmax
put round(min(i/imax,1.0) * the width of btn "Item Backdrop") into wid
set the rect of btn "Item" to (the left of btn "Item"), the top of btn "Item", (the left of btn "Item") + wid, the bottom of btn "Item"
put round(min(g/gmax,1.0) * the width of btn "Group Backdrop") into wid
set the rect of btn "Group" to (the left of btn "Group"), the top of btn "Group", (the left of btn "Group") + wid, the bottom of btn "Group"
end progress
on endProgress
hide cd fld "Backdrop"
hide btn "Item Backdrop"
hide btn "Group Backdrop"
set the width of btn "Item" to 0
hide btn "Item"
set the width of btn "Group" to 0
hide btn "Group"
end endProgress
on waisChoice n, title, nlines, fileIt
global connectionId, numArts, numLines, source
if last word of source is "ecs.soton.ac.uk" then
TCPSend connectionId, n&&"/home/wynkyn/2/users/lac/tmp/FOO.TMP"&return
else
TCPSend connectionId, n&return
startProgress
end if
get TCPgetLine()
put empty into theRes
put empty into fld "Text"
put 0 into l
put empty into buffer
repeat
put TCPCharsAvailable(connectionId) into n
put TCPRecvChars(connectionID,n) after buffer
put length(buffer) into m
put buffer into str
if char m-1 to m of buffer is return&linefeed then
put empty into buffer
else
put number of lines of buffer into lb
if lb > 1 then
delete line 1 to lb-1 of buffer
end if
delete last line of str
end if
if str is empty and wprompt(buffer) then
exit repeat
else if str is empty and buffer is (linefeed & "$ ") then
put "ERROR: crashed back to local host"
endProgress
beep
--pop card
exit waisChoice
end if
set the cursor to busy
if str is empty then next repeat
if wprompt(str) then exit repeat
put str after theRes
-- put str after fld "CommsLog"
-- select after last char of fld "CommsLog"
add number of lines of str to l
if fileIt is empty then
progress l,nlines, l,nlines
else
progress l,nlines, 1,numArts
end if
if str contains "++ x25 server closed connection" then
put "JANET call failed"
endProgress
beep
--pop card
exit waisChoice
else if str contains "+++- bytes/pkts" then
put "nfs.tn connection timed out"
endProgress
beep
--pop card
exit waisChoice
else if length(fld "Text") < 4000 then
put superStrip(str, lineFeed) after fld "Text"
end if
end repeat
if last word of source is "ecs.soton.ac.uk" then
put "UNIX_temporary:FOO.TMP" into fname
put empty into theRes
open file fname
repeat
read from file fname for 16000
if it is empty then exit repeat
put it after theRes
end repeat
close file fname
else
endProgress
--pop card
--delete line 1 of theRes
end if
put superStrip(theRes, lineFeed) into theRes
if fileIt is empty then
if length(theRes)>30000 then
put char 1 to 30000 of theRes into fld "Text"
answer "Only first 30K of article will be displayed. Store whole article in a file?" with "No" or "OK"
if it is "OK" then
ask file "Store article where?"
if it != empty then
put it into fname
open file fname
repeat
write char 1 to 10000 of theRes to file fname
delete char 1 to 10000 of theRes
if theRes is empty then exit repeat
end repeat
close file fname
end if
end if
else
put theRes into fld "Text"
end if
else
put fileIt into fname
open file fname
repeat
write char 1 to 10000 of theRes to file fname
delete char 1 to 10000 of theRes
if theRes is empty then exit repeat
end repeat
close file fname
end if
end waisChoice
on Z
global passKeys
put false into passKeys
end Z
on keyDown which
global passKeys
if passKeys is true then
pass keydown
exit keyDown
end if
if the target contains "card" then
if which is numToChar(8) then put "del" into act
else if which is "+" then put "bigger" into act
else if which is "-" then put "smaller" into act
else if which is "=" then put "normal" into act
else if which is "b" then put "bold" into act
else if which is "i" then put "italic" into act
else if which is "p" then put "plain" into act
else if which is "r" then put "reformat" into act
else if which is "s" then put "scroll" into act
else if which is "e" then put "edit" into act
else if which is "Z" then
put true into passKeys
exit KeyDown
else
beep
exit keyDown
end if
put number of cd flds into mc
repeat with d=2 to number of cd flds
put mc-d+2 into c
if the mouseLoc is within the rect of cd fld c then
if act is "del" then
set the cursor to watch
lock screen
choose field tool
select cd fld c
doMenu "Cut Field"
choose browse tool
unlock screen
else if act is "bigger" then
get the textSize of cd fld c
if it is 6 then get 7
else if it is 7 then get 8
else if it is 8 then get 9
else if it is 9 then get 10
else if it is 10 then get 12
else if it is 12 then get 14
else if it is 14 then get 18
else if it is 18 then get 24
else get it
set the textSize of cd fld c to it
else if act is "smaller" then
get the textSize of cd fld c
if it is 7 then get 6
else if it is 8 then get 7
else if it is 9 then get 8
else if it is 10 then get 9
else if it is 12 then get 10
else if it is 14 then get 12
else if it is 18 then get 14
else if it is 24 then get 18
else get it
set the textSize of cd fld c to it
else if act is "scroll" then
lock screen
if the style of cd fld c is "scrolling" then
put the rect of cd fld c into r
set the rect of cd fld c to item 1 of r, item 2 of r, (item 3 of r)-16, item 4 of r
set the style of cd fld c to rectangle
else
put the rect of cd fld c into r
set the rect of cd fld c to item 1 of r, item 2 of r, (item 3 of r)+16, item 4 of r
set the style of cd fld c to scrolling
end if
unlock screen
else if act is "reformat" then
lock screen
put reformat(cd fld c) into cd fld c
set the textStyle of line 1 of cd fld c to bold
unlock screen
else if act is "normal" then
set the textSize of cd fld c to 10
else if act is in "bold italic plain" then
set the textStyle of cd fld c to act
else if act is "edit" then
get the script of cd fld c
put line 2 of it into offs
delete line 1 to 3 of it
delete last line of it
global xx
put getDocFromID(it,1000) into xx
end if
exit keyDown
end if
end repeat
beep
else
pass keyDown
end if
end keyDown
function reformat s
put line 1 of s & return&return into XX
repeat with c=2 to number of lines of s
set the cursor to busy
if line c of s is empty then
if last line of XX != empty then
put return after XX
end if
else if first char of line c of s is space then
if last line of XX != empty then
put space after XX
end if
put word 1 to 999 of line c of s after XX
else if first char of line c of s is "\" then put return&line c of s &return after XX
else
if last line of XX != empty then
put space after XX
end if
put line c of s after XX
end if
end repeat
return XX
end reformat
on drawAClass c, r
--lock screen
put (the pattern)&&(the filled)&&(the lineSize)&&(the centered) into old
set the pattern to 1
set the filled to true
set the lineSize to 4
set the centered to false
choose select tool
drag from item 1 of r, (item 2 of r)-10 to item 3 of r, item 4 of r
doMenu "Clear Picture"
choose rectangle tool
drag from item 1 of r, item 2 of r to item 3 of r, item 4 of r
choose select tool
drag from item 1 of r, item 2 of r to item 3 of r, item 4 of r
repeat 20 times
doMenu "Darken"
end repeat
choose rectangle tool
set the centered to true
drag from round(((item 1 of r)+(item 3 of r))/2), item 2 of r to~
round(((item 1 of r)+(item 3 of r))/2)+60, item 2 of r+10
if there is a btn c then
select btn c
doMenu "Clear Button"
end if
choose Button Tool
doMenu "New Button"
set the name of btn "New Button" to c
set the style of btn c to transparent
set the rect of btn c to round(((item 1 of r)+(item 3 of r))/2)-60, (item 2 of r)-10,~
round(((item 1 of r)+(item 3 of r))/2)+60, item 2 of r+10
set the textAlign of btn c to center
set the textFont of btn c to helvetica
set the textSize of btn c to 14
set the textHeight of btn c to the textSize
set the textStyle of btn c to bold
set the autoHilite of btn c to true
choose browse tool
unlock screen
set the pattern to word 1 of old
set the filled to word 2 of old
set the lineSize to word 3 of old
set the centered to word 4 of old
end drawAClass
on makeAClass
global classlist, classRegs
ask "Name of class" with "nothing"
if it is empty or it is nothing then exit makeAClass
put it into newClass
if the short name of this cd is not "Overview"
then
put (the short name of this card)&"." before newClass
end if
put getRect() into newReg
put newClass & return after classList
put newReg&return after classRegs
put classList into fld "classList"
put classRegs into fld "classRegs"
drawAClass newClass, newReg
end makeAClass
function getRect
set the cursor to arrow
lock screen
doMenu "New Button"
put the number of btns into tmp
set the style of btn tmp to transparent
set the showName of btn tmp to false
set the width of btn tmp to 2
set the height of btn tmp to 2
hide btn tmp
unlock screen
wait until the mouse is down
set the topLeft of btn tmp to the clickLoc
show btn tmp
put item 1 of the clickLoc into sx
put item 2 of the clickLoc into sy
repeat until the mouse is up
set the rect of btn tmp to sx, sy, item 1 of the mouseLoc, item 2 of the mouseLoc
end repeat
get the rect of btn tmp
select btn tmp
doMenu "Clear Button"
choose browse tool
return it
end getRect
function membersofClass name
global classList, classRegs
repeat with c=1 to number of lines of classList
if line c of classList is name then exit repeat
end repeat
if line c of classList != name then return empty
put line c of classRegs into r
put empty into res
repeat with c=2 to number of cd flds
if the loc of cd fld c is within r then put c&space after res
end repeat
return res
end membersOfClass
function namesOf members
global classList
put empty into res
repeat with c=1 to number of words of members
put (the short name of cd fld (word c of members))&return after res
end repeat
return res
end namesOf
on mouseUp
if the target contains "card button" then
if the short name of this cd is "Overview" then
selectClass the short name of the target
else
go to card "Overview"
end if
else pass mouseUp
end mouseUp
on mouseDown
if "card field" is not in the target then exit mouseDown
global mdTick
if mdTick != empty then
if the ticks - mdTick < 20 then
doubleClick
exit mouseDown
end if
end if
put the ticks into mdTick
get the clickLoc
put item 1 of it into sx
put item 2 of it into sy
if (the right of the target - sx)<20 and (the bottom of the target - sy)<20 then put "size" into act
else put move into act
repeat until the mouse is up
get the mouseLoc
put item 1 of it into x
put item 2 of it into y
if x is sx and y is sy then next repeat
if act is "move" then
set the loc of the target to (item 1 of the loc of the target)+x-sx,~
(item 2 of the loc of the target)+y-sy
else
set the rect of the target to item 1 of the rect of the target,~
item 2 of the rect of the target,~
(item 3 of the rect of the target)+x-sx,~
(item 4 of the rect of the target)+y-sy
end if
put x into sx
put y into sy
end repeat
end mouseDown
on selectClass which
put namesOf(membersOfClass(which)) into m
answer m
zoomInOnClass which
end selectClass
on zoominonClass which
if there is a card which then go to card which
else
doMenu "New Card"
set the name of this card to which
doMenu "New Field"
hide cd fld 1
go back
put membersOfClass(which) into fnos
repeat with c=1 to number of words of fnos
select cd fld (word c of fnos)
doMenu "Copy Field"
go to card which
type V with commandKey, shiftKey
go back
end repeat
choose browse tool
go to cd which
put (the pattern)&&(the filled)&&(the lineSize)&&(the centered) into old
set the pattern to 1
set the filled to true
set the lineSize to 4
set the centered to false
choose rectangle tool
put the rect of the card window into r
drag from round(((item 1 of r)+(item 3 of r))/2)-120, item 2 of r+30 to~
round(((item 1 of r)+(item 3 of r))/2)+120, item 2 of r+60
choose Button Tool
doMenu "New Button"
set the name of btn "New Button" to which
set the style of btn which to transparent
set the rect of btn which to round(((item 1 of r)+(item 3 of r))/2)-120, item 2 of r+30,~
round(((item 1 of r)+(item 3 of r))/2)+120, item 2 of r+60
set the textAlign of btn which to center
set the textFont of btn which to helvetica
set the textSize of btn which to 24
set the textHeight of btn which to the textSize
set the textStyle of btn which to bold
set the autoHilite of btn which to true
set the pattern to word 1 of old
set the filled to word 2 of old
set the lineSize to word 3 of old
set the centered to word 4 of old
choose browse tool
end if
end zoomInOnClass
on exportToFile
global classList, source
ask file "Name the structured file"
if it is empt then exit exportToFile
put it into fname
set the cursor to watch
put "<document>"&return into foo
put empty into fs
repeat with c=1 to number of lines of classList
put line c of classList into cname
put "<section>"&cname&"</>"&return after foo
put membersOfClass(cname) into fnos
put fnos after fs
repeat with d= 1 to number of words of fnos
put "<quote header=""e&line 1 of (cd fld (word d of fnos))"e&&"link=""e&"WAIS("&source&")"&qote&">"&return after foo
if line 2 of (cd fld (word d of fnos)) is empty then put line 3 to 30000 of (cd fld (word d of fnos)) after foo
else put line 2 to 30000 of (cd fld (word d of fnos)) after foo
put "</quote>"&return&return after foo
end repeat
put return after foo
end repeat
put empty into missing
repeat with c=2 to number of cd flds
if c is not in fs then put c&space after missing
end repeat
if missing != empty then
put "<section>Miscellaneous</>"&return after foo
put missing into fnos
put fnos after fs
repeat with d= 1 to number of words of fnos
put "<quote header=""e&line 1 of (cd fld (word d of fnos))"e&&"link=""e&"WAIS("&source&")"&qote&">"&return after foo
if line 2 of (cd fld (word d of fnos)) is empty then put line 3 to 30000 of (cd fld (word d of fnos)) after foo
else put line 2 to 30000 of (cd fld (word d of fnos)) after foo
put "</quote>"&return&return after foo
end repeat
put return after foo
end if
put return&"</document>"&return after foo
open file fname
write foo to file fname
close file fname
-- answer foo
end exportToFile
function getDocFromID id, nlines
global connectionId, source
if last word of source is "ecs.soton.ac.uk" then
get "/usr/lib/wais/bin/getdoc > /home/wynkyn/2/users/lac/tmp/FOO.DOC"
else
get "/usr/lib/wais/bin/getdoc"
end if
TCPSend connectionId, it&return
put length(it)+1 into l
get TCPstate(connectionID)
put "Requesting document"
repeat with c=1 to number of lines of id
put line c of id into theLine
repeat
if theline is empty then exit repeat
TCPSend connectionId, word 1 to 20 of theLine & return
get TCPRecvUpTo(connectionID, return, 10, empty)
if it contains "[[section]][[section]][[section]]" then
TCPFAIL
end if
set the cursor to busy
put length(word 1 to 20 of theLine)+1 into l
delete word 1 to 20 of theLine
end repeat
end repeat
--TCPSend connectionId, numToChar(4)&return
repeat
set the cursor to busy
get TCPRecvUpTo(connectionID, return, 10, empty)
if it contains "[[section]][[section]][[section]]" then
TCPFAIL
end if
if it contains "done." then
exit repeat
end if
end repeat
put empty into theRes
wait 1 second
repeat
put TCPCharsAvailable(connectionId) into n
if n = 0 then exit repeat
get TCPRecvUpTo(connectionID,return,2,empty)
end repeat
put "Retrieving document"
put 0 into l
if last word of source is "ecs.soton.ac.uk" then
put "UNIX_temporary:FOO.DOC" into fname
open file fname
read from file fname until eof
close file fname
put it into theRes
open "UNIX_temporary:FOO.DOC" with "Giorgio:Applications:BBEdit 2.1.1 [[florin]]:BBEdit"
else
startProgress
put empty into buffer
repeat
put TCPCharsAvailable(connectionId) into n
put TCPRecvChars(connectionID,n) after buffer
put length(buffer) into m
put buffer into str
if char m-1 to m of buffer is return&linefeed then
put empty into buffer
else
put number of lines of buffer into lb
if lb > 1 then
delete line 1 to lb-1 of buffer
end if
delete last line of str
end if
if str is empty and wprompt(buffer) then
exit repeat
else if str is empty and buffer is (linefeed & "$ ") then
put "ERROR: crashed back to local host"
endProgress
beep
--pop card
exit getDocFromID
end if
set the cursor to busy
if str is empty then next repeat
if prompt(str) then exit repeat
put str after theRes
add number of lines of str to l
progress l,nlines, l,nlines
if str contains "++ x25 server closed connection" then
put "JANET call failed"
endProgress
beep
--pop card
exit getDocFromID
else if str contains "+++- bytes/pkts" then
put "nfs.tn connection timed out"
endProgress
beep
--pop card
exit getDocFromID
end if
end repeat
endProgress
--pop card
delete line 1 of theRes
put superStrip(theRes, lineFeed) into theRes
end if
put empty
hide message box
return theRes
end getDocFromID
on startProgress
end startProgress
on endProgress
end endProgress
on progress
put the params
end progress
<docobjects>
<object id=start><H1>My little document</H1>
This document is about dinosaurs.</object>
<object id=dino1>
<WWW>http://www.hcc.hawaii.edu/dinos/dinos.1.html</WWW></object&g;
<object id=dino>
<ruler dest=dino1>/For the/ /in the world./</ruler></object>
<object id=o1>Here's some inline document</object>
<object id=o2 type="application/postscript">
0 0 moveto 100 100 lineto stroke
(here's some inline postscript, i.e. formatted document) show
</object>
<object id=f1><WWW>http://bright/cs/papers/www94.html</WWW></object>
<object id=f2><ruler dest=f1>/<H1>/ /^<H1>/</></object>
<object id=o3><ruler dest=f1>96 3</></object>
<object id=o4>Glossary definition of the term hypermedia</object>
<object id=o5><WWW>http://bright.ecs.soton.ac.uk/</WWW></object>
<object id=o6><contents>hypermedia</></object>
<object id=p1><ruler dest=f1>/<H1>/ /^<H1>/</></object>
<object id=p2><file>Makefile</></object>
<object id=p3><file>colphoto.jpg</></object>
<object id=o7>Decription of a graphic</object>
<object id=pic1><IMG SRC="http://bright/forest.gif"></object>
<object id=pic2><IMG SRC="http://bright/univ.gif"></object>
<object id=pic3><IMG SRC="http://bright/bargate.gif"></object>
</docobjects>
<docrelationships>
<includes objs="main start">
<includes objs="main dino">
<summary objs="o1 o3">
<quote objs="o4 o3">
<summary objs="o4 o5">
<generic objs="o6 o5">
<alternative objs="start o1">
<includes objs="main f2">
<imagechoice objs="o7 p1 p2 p3">
<includes objs="main pic1">
<alternative objs="pic1 pic2">
<alternative objs="pic1 pic3">
</docrelationships>
</lace93>
%%
"<lace93>" MARKUP(SLACE93);
"</lace93>" MARKUP(ELACE93);
"<docobjects>" MARKUP(SDOCOBJ);
"</docobjects>" MARKUP(EDOCOBJ);
"<docrelationships>" { BEGIN DR; MARKUP(SDOCRELN); }
"</docrelationships>" { BEGIN 0; MARKUP(EDOCRELN); }
"<relationships>" MARKUP(SRELNS);
"</relationships>" MARKUP(ERELNS);
"<relationship>" MARKUP(SRELN);
"</relationship>" MARKUP(ERELN);
"<remark>" MARKUP(SREM);
"</remark>" MARKUP(EREM);
"<object"[ \t]*id=[^>]*">"
{
extern char *strchr();
char *s=strchr(yytext,'=');
strcpy(idval,s+1);
s=strchr(idval,'>');
*s='\0';
MARKUP(SOBJ);
}
"</object>" MARKUP(EOBJ);
<DR>"<"[^>]*">" MARKUP(RELATIONSHIP);
. { if(!indata && isspace(*yytext)) {}
else{ indata=1;
strcpy(textfrag,yytext); return(TEXT);}}
\n { if(!indata && isspace(*yytext)) {}
else{ indata=1;
strcpy(textfrag,yytext); return(TEXT);}}
%%
%{
#define MAXSTR 1024
#define MARKUP(x) indata=0;return(x)
char idval[MAXSTR], textfrag[MAXSTR], textval[MAXSTR], contentval[MAXSTR], relval[MAXSTR], ob1val[MAXSTR], ob2val[MAXSTR];
int indata;
extern int numobjs, numrels;
extern char *obspec[], *id[];
extern char *relname[], *relob1[], *relob2[];
%}
%%
lace93: SLACE93 objects docrelns ELACE93 ;
objects: SDOCOBJ objectplus EDOCOBJ ;
objectplus: object | objectplus object ;
object: SOBJ data EOBJ
{numobjs++;
obspec[numobjs]=(char *)malloc(strlen(contentval)+1);
id[numobjs]=(char *)malloc(strlen(idval)+1);
strcpy(obspec[numobjs],contentval);
strcpy(id[numobjs],idval);
};
docrelns: SDOCRELN docrelnplus EDOCRELN ;
docrelnplus: docreln | docrelnplus docreln ;
docreln: RELATIONSHIP
{ sscanf(yytext, "<%s objs=\"%[^ ] %[^\"]\">", relval, ob1val, ob2val);
numrels++;
relname[numrels]=(char *)malloc(strlen(relval)+1);
strcpy(relname[numrels],relval);
relob1[numrels]=(char *)malloc(strlen(ob1val)+1);
strcpy(relob1[numrels],ob1val);
relob2[numrels]=(char *)malloc(strlen(ob2val)+2);
strcpy(relob2[numrels],ob2val);
}
;
data : string {strcpy(contentval,textval);} |
data string {strcpy(contentval,textval);}
;
string : TEXT {strcpy(textval,textfrag);} |
string TEXT {strcat(textval,textfrag);}
;
%%
#include "lex.yy.c"
yyerror(s)
char *s;{
fprintf(stderr,"ERROR: %s\n",s);
}
#define MAXSTR 1024
#define MAXOBJS 100
#define MAXRELS 100
int numobjs=0, numrels=0;
char *obspec[MAXOBJS], *object[MAXOBJS], *id[MAXOBJS];
char *relname[MAXRELS], *relob1[MAXRELS], *relob2[MAXRELS];
extern char *resolve();
extern void doobj();
char *getobject(c)
int c;{
if(object[c]==NULL) object[c]=resolve(obspec[c]);
return(object[c]);
}
char *objbyid(id)
char *id;{
if(strcmp(id,"main")==0)return("");
else return(getobject(onumbyid(id)));
}
int onumbyid(i)
char *i;{
int c;
for(c=1; c<=numobjs; c++){
if(strcmp(id[c],i)==0)return(c);
}
return(0);
}
char *alternative_of(id)
char *id;{
int c;
for(c=1; c<=numrels; c++)
if((strcmp(relob1[c],id)==0) &&
(strcmp(relname[c],"alternative")==0)){
fprintf(stderr,"Would you rather see %s than %s?\n",
relob2[c], id);
}
return(id);
}
main(){
int c;
yyparse();
obspec[0]=object[0]=NULL;
/****
for(c=1; c<=numobjs; c++)
printf("OBSPEC %d (id %s) = '%s'\n", c, id[c], obspec[c]);
****/
for(c=1; c<=numobjs; c++)
object[c]==NULL;
/****
for(c=1; c<=numobjs; c++)
if(object[c]==NULL)object[c]=resolve(obspec[c]);
for(c=1; c<=numobjs; c++)
printf("OBJECT %d (id %s) = '%s'\n", c, id[c], object[c]);
for(c=1; c<=numrels; c++)
printf("RELATIONSHIP %d = '%s', %s->%s\n", c, relname[c],
relob1[c], relob2[c]);
****/
/** To elaborate the document we find everything that relates to
main **/
doobj("main");
/****
for(c=1; c<=numrels; c++)
if(strcmp(relob1[c],"main")==0){
if(strcmp(relname[c],"contains")==0){
printf("%s",objbyid(alternative_of(relob2[c])));
}
else printf("****main--%s-->%s\n",relname[c],relob2[c]);
}
****/
}
#include <string.h>
#define MAXSTR 1024
extern char *wwwgrab(), *newsgrab(), *objbyid();
char *filegrab(s)
char *s;{
int fd;
long len;
char *retbuf;
if((fd=open(s,0))<0)return(NULL);
len=lseek(fd,0L,2);
retbuf=(char *)malloc(len+1);
lseek(fd,0L,0);
read(fd,retbuf,len);
close(fd);
retbuf[len]='\0';
return(retbuf);
}
char *subpartstr(s, a, b)
char *s;
char *a, *b;{
char *retbuf, *p1, *p2;
int siz;
int notend=0;
if(*b=='^'){
b++;
notend++;
}
p1=strstr(s,a);
if(p1==NULL)return(NULL);
p2=strstr(p1+strlen(a),b);
if(p2==NULL)return(p1);
if(notend) siz=p2-p1;
else siz=p2+strlen(b)-1-p1+1;
retbuf=(char *)malloc(siz+1);
strncpy(retbuf,p1,siz);
retbuf[siz]='\0';
return(retbuf);
}
char *subpart(s, a, b)
char *s;
int a, b;{
char *retbuf;
int siz;
if(b<0)siz=strlen(s)+b-a+2;
else siz=b;
retbuf=(char *)malloc(siz+1);
strncpy(retbuf,s+a-1,siz);
retbuf[siz]='\0';
return(retbuf);
}
char *resolve(s)
char *s;{
char name[MAXSTR], str1[MAXSTR], str2[MAXSTR];
char *t;
int first, last;
if(*s!='<')return s;
if(sscanf(s,"<WWW>http:%s",name)==1){
t=strstr(name,"</");
*t='\0';
return(wwwgrab(name));
}
else if(sscanf(s,"<WWW>news:%s",name)==1){
t=strstr(name,"</");
*t='\0';
return(newsgrab(name));
}
else if(sscanf(s,"<news>%s",name)==1){
t=strstr(name,"</");
*t='\0';
return(newsgrab(name));
}
else if(sscanf(s,"<file>%s",name)==1){
t=strstr(name,"</");
*t='\0';
return(filegrab(name));
}
else if(sscanf(s,"<ruler dest=%[a-zA-Z0-9]>%d %d",
name, &first, &last)==3){
return(subpart(objbyid(name),first,last));
}
else if(sscanf(s,"<ruler dest=%[a-zA-Z0-9]>/%[^/]/ /%[^/]/",
name, str1, str2)==3){
return(subpartstr(objbyid(name),str1,str2));
}
else return(s);
}
1173 to
1140 of
1092 in
1019 a
715 and
468 has
449 said
380 for
372 was
358 on
355 mr
349 is
321 have
289 he
283 by
273 be
255 that
247 at
242 were
236 been
234 are
221 it
217 with
197 from
191 will
186 an
179 police
167 which
163 after
154 government
145 had
142 they
138 as
138 would
133 his
126 not
123 people
121 two
115 but
115 says
114 their
113 party
111 east
103 who
101 its
101 london
101 today
100 new
99 being
99 more
96 last
92 over
86 this
86 west
80 british
80 minister
77 also
77 ambulance
75 secretary
73 president
72 there
70 britain
70 german
70 out
69 one
68 about
66 first
66 than
63 into
62 leader
61 pay
58 all
58 man
58 up
57 expected
57 killed
57 million
57 no
57 year
55 against
55 us
54 germany
54 mrs
53 called
53 country
52 she
51 emergency
51 when
50 action
50 three
50 union
49 prime
49 talks
48 could
47 calls
47 labour
47 meeting
47 report
46 before
46 service
46 should
45 died
45 general
45 south
44 dispute
44 foreign
44 north
44 say
44 soviet
43 court
43 group
43 health
43 now
42 between
42 national
42 spokesman
41 crews
41 four
41 home
41 week
40 other
39 election
39 her
39 if
39 workers
38 arrested
38 five
38 found
38 injured
38 night
38 years
37 next
37 office
37 since
36 announced
36 nuclear
36 only
36 security
36 yesterday
35 during
34 bbc
34 communist
34 john
34 made
34 month
34 some
34 unions
34 work
33 any
33 former
33 near
33 taken
32 because
32 england
32 european
32 hospital
32 reported
32 six
32 water
31 car
31 power
31 thatcher
30 children
30 company
30 council
30 leaders
30 part
30 told
29 army
29 chief
29 members
29 plans
28 back
28 drug
28 germans
28 murder
28 trade
27 down
27 end
27 high
27 may
27 off
27 or
27 sir
27 world
26 campaign
26 city
26 commission
26 force
26 men
26 northern
26 parliament
26 state
26 united
25 area
25 held
25 least
25 number
24 conference
24 drugs
24 free
24 major
24 officials
24 put
24 states
23 chairman
23 charged
23 decision
23 ec
23 industry
23 ireland
23 officers
23 still
23 them
23 thought
23 time
23 under
23 visit
23 way
23 where
22 changes
22 fire
22 forces
22 help
22 later
22 left
22 seven
22 support
22 troops
21 border
21 committee
21 correspondent
21 defence
21 dr
21 elections
21 environment
21 inquiry
21 offer
21 opposition
21 public
21 stations
21 strike
21 take
21 well
20 agreement
20 another
20 central
20 commons
20 congress
20 days
20 families
20 hungary
20 leading
20 make
20 military
20 money
20 ms
20 place
20 protest
20 reports
20 shot
20 staff
20 use
20 want
19 agreed
19 authorities
19 authority
19 begun
19 bomb
19 bush
19 capital
19 cut
19 feed
19 following
19 including
19 plant
19 scotland
19 speaking
19 station
18 armed
18 case
18 community
18 death
18 due
18 energy
18 europe
18 further
18 given
18 go
18 head
18 increase
18 kong
18 led
18 most
18 official
18 programme
18 refused
18 several
18 so
18 social
18 taking
18 those
18 tomorrow
18 town
18 transport
18 violence
18 warned
18 working
17 already
17 attack
17 ban
17 become
17 billion
17 china
17 claims
17 county
17 earlier
17 gorbachev
17 investigation
17 meanwhile
17 meet
17 through
17 without
16 africa
16 anti
16 appeal
16 arrived
16 beirut
16 cabinet
16 chancellor
16 come
16 david
16 economic
16 give
16 him
16 hong
16 house
16 involved
16 israeli
16 lebanon
16 management
16 many
16 months
16 morning
16 mps
16 natwest
16 outside
16 poland
16 prague
16 radio
16 senior
16 sent
16 set
16 teachers
16 used
16 vote
16 wanted
16 wants
16 war
15 *
15 accident
15 appear
15 around
15 christian
15 collision
15 companies
15 contaminated
15 control
15 department
15 embassy
15 essex
15 increased
15 information
15 issue
15 jobs
15 member
15 news
15 oil
15 policy
15 post
15 second
15 services
15 shadow
15 stop
15 thousands
15 wales
14 according
14 accused
14 allegations
14 austria
14 boat
14 body
14 debate
14 education
14 full
14 holding
14 interest
14 ira
14 japan
14 kinnock
14 leadership
14 ministry
14 move
14 non
14 others
14 rate
14 sea
14 statement
14 summit
14 sweden
14 television
14 while
14 women
13 ago
13 american
13 appeared
13 believed
13 clarke
13 continue
13 countries
13 day
13 dead
13 did
13 eight
13 fans
13 farms
13 indian
13 issued
13 lead
13 local
13 long
13 lost
13 mass
13 miles
13 nearly
13 neil
13 officer
13 payments
13 peter
13 politburo
13 political
13 press
13 province
13 reforms
13 remain
13 resigned
13 responsible
13 royal
13 seized
13 sunday
13 threat
13 took
13 until
13 widespread
12 added
12 aids
12 attempt
12 better
12 black
12 board
12 both
12 build
12 can
12 claimed
12 co
12 coalition
12 collided
12 connection
12 criminal
12 deng
12 despite
12 explosion
12 fell
12 ferry
12 fired
12 gas
12 goods
12 groups
12 industrial
12 international
12 killing
12 krenz
12 manchester
12 march
12 mp
12 nhs
12 offered
12 opened
12 plan
12 previous
12 questioned
12 rejected
12 released
12 result
12 rise
12 saturday
12 seats
12 seen
12 ship
12 suspended
12 uk
12 urged
12 very
12 voted
12 wednesday
12 won
11 although
11 ambulances
11 asked
11 bank
11 believe
11 bid
11 call
11 challenge
11 charge
11 church
11 confirmed
11 cost
11 cover
11 czechoslovakia
11 de
11 deal
11 democrats
11 denied
11 director
11 discuss
11 every
11 follows
11 france
11 friday
11 future
11 great
11 greater
11 hurd
11 ill
11 imposed
11 india
11 kenneth
11 launched
11 legal
11 likely
11 line
11 moscow
11 must
11 parkinson
11 possible
11 proposed
11 protests
11 published
11 questioning
11 rail
11 replaced
11 ruling
11 scottish
11 serious
11 seriously
11 tv
11 walked
11 welcomed
11 woman
11 yorkshire
10 afternoon
10 agency
10 aid
10 alleged
10 among
10 announcement
10 began
10 belfast
10 berlin
10 cape
10 carried
10 cause
10 conditions
10 democracy
10 document
10 early
10 engine
10 ex
10 far
10 figures
10 find
10 ford
10 front
10 hold
10 hours
10 however
10 include
10 jaguar
10 king
10 klerk
10 known
10 law
10 laws
10 m
10 measures
10 might
10 missing
10 needed
10 normally
10 november
10 parties
10 phillips
10 present
10 pressure
10 red
10 reduction
10 refugees
10 release
10 republic
10 rights
10 ruled
10 schools
10 seeking
10 share
10 show
10 site
10 soon
10 standards
10 total
10 walker
10 written
10 yard
10 york
9 african
9 again
9 animal
9 association
9 awarded
9 brought
9 calling
9 casualties
9 centre
9 change
9 charges
9 coast
9 condemned
9 conspiracy
9 continuing
9 cook
9 crash
9 crime
9 critical
9 delegates
9 demand
9 described
9 douglas
9 each
9 eastern
9 economy
9 exodus
9 failed
9 fighting
9 football
9 forum
9 fund
9 glasgow
9 holland
9 human
9 independent
9 inflation
9 investment
9 irish
9 magistrates
9 making
9 much
9 newspaper
9 offences
9 passengers
9 peace
9 person
9 privatisati
9 rally
9 reached
9 rebels
9 refusing
9 relations
9 robert
9 sank
9 scheme
9 september
9 such
9 then
9 train
9 ulster
9 whether
9 worth
9 wounded
8 act
8 air
8 allow
8 amount
8 announce
8 annual
8 archbishop
8 attacked
8 available
8 away
8 business
8 came
8 care
8 cases
8 catholic
8 child
8 civil
8 clear
8 college
8 colombia
8 costs
8 couple
8 crashed
8 cross
8 custody
8 customs
8 czechoslova
8 december
8 decided
8 democratic
8 deputy
8 do
8 earth
8 el
8 electrical
8 electricity
8 family
8 ferranti
8 financial
8 gerasimov
8 guildford
8 hindu
8 included
8 investigate
8 israel
8 jiang
8 joint
8 justice
8 keep
8 kept
8 operation
8 pact
8 passenger
8 policies
8 poll
8 posts
8 princess
8 privatised
8 pro
8 research
8 resignation
8 review
8 road
8 role
8 saying
8 situation
8 solidarity
8 sources
8 spent
8 st
8 start
8 supplies
8 surrounded
8 survey
8 syrian
8 takeover
8 temple
8 term
8 terrorist
8 terrorists
8 threatened
8 transplant
8 unsafe
8 victims
8 wakeham
8 western
8 what
8 wife
8 yet
8 zone
7 able
7 across
7 affected
7 aged
7 airport
7 alan
7 amnesty
7 anthony
7 barnett
7 biggest
7 birmingham
7 bring
7 brooke
7 buy
7 camp
7 camps
7 cannahis
7 coach
7 cocaine
7 confirm
7 consider
7 constable
7 convicted
7 criticised
7 current
7 details
7 diplomat
7 dismissed
7 drew
7 dropped
7 egon
7 ensure
7 exchange
7 expecting
7 exploded
7 fall
7 forward
7 freedom
7 french
7 gould
7 hit
7 hungarian
7 hurricane
7 imported
7 introduced
7 jordan
7 judge
7 just
7 kohl
7 latest
7 leaving
7 letter
7 liberal
7 lifted
7 live
7 market
7 midnight
7 minutes
7 monday
7 monopoly
7 newcombe
7 open
7 orders
7 overtime
7 parents
7 paul
7 plane
7 planes
7 private
7 problem
7 production
7 pupils
7 question
7 rather
7 ridley
7 salvador
7 same
7 settlement
7 shares
7 shop
7 southern
7 special
7 stay
7 system
7 thames
7 too
7 trial
7 try
7 tuc
7 ubs
7 unless
7 unrest
7 wall
7 warsaw
7 weekend
7 whose
6 advanced
6 advertising
6 age
6 agriculture
6 ahead
6 airways
6 allowed
6 almost
6 aoun
6 apparently
6 areas
6 arms
6 attacks
6 attempted
6 august
6 banks
6 based
6 begin
6 best
6 blamed
6 captain
6 carnogursky
6 cash
6 civic
6 clashes
6 close
6 colombian
6 concern
6 concerned
6 consumer
6 continued
6 controls
6 councils
6 declared
6 defraud
6 disaster
6 discussed
6 duty
6 elected
6 employees
6 engineering
6 evidence
6 factory
6 february
6 figure
6 firemen
6 food
6 forced
6 form
6 fraud
6 fuels
6 funds
6 game
6 george
6 good
6 gordon
6 guerrilla
6 guerrillas
6 having
6 hearing
6 higher
6 himself
6 hoped
6 hospitals
6 hostages
6 hundred
6 important
6 improved
6 infected
6 inside
6 interview
6 invasion
6 island
6 jailed
6 james
6 join
6 journalist
6 language
6 lanka
6 large
6 largest
6 leaked
6 lebanese
6 less
6 life
6 like
6 link
6 loan
6 loans
6 lord
6 magazine
6 malta
6 negotiating
6 nicholas
6 normal
6 occurred
6 ordered
6 organisatio
6 own
6 package
6 paid
6 paris
6 passing
6 philippines
6 pledged
6 polish
6 polling
6 poole
6 position
6 prevent
6 prices
6 prison
6 promised
6 proposals
6 protection
6 provide
6 radios
6 raised
6 rates
6 received
6 resign
6 resulted
6 results
6 right
6 robin
6 roger
6 rule
6 saudi
6 seek
6 select
6 sell
6 sentence
6 separate
6 short
6 shots
6 smith
6 speak
6 squad
6 sri
6 started
6 steps
6 stockbroker
6 student
6 supply
6 suspected
6 tax
6 teacher
6 team
6 tests
6 trying
6 unity
6 using
6 vehicles
6 warns
6 white
6 winds
6 withdrawn
6 yarmouth
5 '
5 abolition
5 acting
5 agenda
5 aircraft
5 alternative
5 america
5 anniversary
5 answer
5 applied
5 approved
5 april
5 arbitration
5 arrive
5 arts
5 assets
5 assistant
5 author
5 b
5 banned
5 birth
5 blast
5 bonn
5 book
5 br
5 breaking
5 brigade
5 bringing
5 britons
5 broke
5 building
5 built
5 buying
5 canterbury
5 capsized
5 caused
5 cheshire
5 clearing
5 closed
5 coal
5 collins
5 conflict
5 conservativ
5 considered
5 consortium
5 constitutio
5 convictions
5 cornwall
5 deaths
5 decide
5 delay
5 derbyshire
5 development
5 disease
5 double
5 drowned
5 dutch
5 efforts
5 employers
5 ended
5 english
5 equal
5 establish
5 estimated
5 even
5 evening
5 ever
5 exclusion
5 experts
5 facing
5 failure
5 fellow
5 fight
5 followed
5 friends
5 fully
5 funding
5 fw
5 get
5 got
5 green
5 half
5 heart
5 heavy
5 helicopters
5 highly
5 homes
5 how
5 huge
5 improve
5 incentives
5 incident
5 includes
5 indicated
5 injuries
5 insufficien
5 involving
5 iraq
5 islamic
5 japanese
5 jury
5 key
5 kidnapped
5 killings
5 late
5 lose
5 mainly
5 mainten
5 malcolm
5 marcos
5 mark
5 markets
5 match
5 mayor
5 mccarth
5 mean
5 migrant
5 mikhail
5 modrow
5 moment
5 motion
5 nato
5 nearby
5 negotia
5 norfolk
5 notting
5 ozone
5 pakista
5 panoram
5 park
5 passed
5 patten
5 plants
5 players
5 pope
5 prevent
5 previou
5 price
5 profits
5 provide
5 raids
5 rajiv
5 re
5 reactor
5 ready
5 recorde
5 region
5 remande
5 remove
5 residen
5 restric
5 rig
5 riot
5 river
5 robbery
5 ruc
5 rushdie
5 safety
5 school
5 search
5 secret
5 see
4 abroad
4 abuse
4 accept
4 access
4 adamec
4 admitted
4 advice
4 affair
4 aimed
4 allan
4 allegedly
4 allowing
4 answering
4 antrim
4 antwerp
4 anyone
4 appealed
4 arab
4 arabia
4 argentina
4 arrow/count
4 article
4 assurances
4 attempts
4 attended
4 ayodha
4 ba
4 badly
4 ballot
4 bar
4 barricaded
4 base
4 basildon
4 battle
4 bazoft
4 became
4 becoming
4 beef
4 behind
4 believes
4 betting
4 bill
4 blackpool
4 blue
4 bnfl
4 boost
4 breach
4 briton
4 broadcastin
4 bryan
4 bse
4 budapest
4 burma
4 burning
4 cable
4 campaigner
4 cancelled
4 chance
4 charities
4 cheap
4 chris
4 clash
4 climbdown
4 club
4 comes
4 comment
4 commissione
4 commitment
4 commonwealt
4 compensatio
4 compromise
4 condition
4 container
4 cope
4 courts
4 credit
4 cricklewood
4 crowd
4 customers
4 czech
4 dagenham
4 damage
4 data
4 dealing
4 defensive
4 demands
4 derby
4 desmond
4 destruction
4 devices
4 dinkins
4 disposal
4 division
4 docked
4 doctors
4 dollar
4 domestic
4 done
4 dorset
4 drinking
4 driver
4 drop
4 earmarked
4 edinburgh
4 effect
4 effective
4 enough
4 entered
4 environment
4 executions
4 executive
4 executives
4 existing
4 experimental
4 filled
4 final
4 flights
4 flown
4 forged
4 foundation
4 fourth
4 frigate
4 funeral
4 gale
4 gandhi
4 gennady
4 governments
4 grant
4 grants
4 guard
4 gummer
4 gunmen
4 guns
4 haemophilia
4 halted
4 hampshire
4 handling
4 hands
4 hans
4 haughey
4 helmut
4 herrhausen
4 historic
4 holiday
4 honecker
4 houses
4 housing
4 hugo
4 inadequate
4 incidents
4 increasing
4 independenc
4 institute
4 interocean
4 investigate
4 iran
4 islands
4 italy
4 jack
4 jail
4 january
4 judges
4 july
4 kent
4 knew
4 laundering
4 lawson
4 leave
4 legislation
4 let
4 levels
4 main
4 maisonette
4 mann
4 manual
4 marchioness
4 means
4 medellin
4 medical
4 memorial
4 menem
4 milk
4 millions
4 miners
4 monitored
4 monopolies
4 moslem
4 mosque
4 motorbike
4 movement
4 moves
4 murdering
4 narrowly
4 nationalist
4 nations
4 nationwide
4 nature
4 negotiators
4 nigel
4 nine
4 old
4 opening
4 opinion
4 opportunity
4 order
4 outlined
4 overall
4 overnight
4 p
4 paper
4 path
4 patients
4 patricia
4 patrolling
4 paying
4 pending
4 penguin
4 per
4 peru
4 pile
4 planning
4 platform
4 pleasure
4 points
4 policemen
4 polls
4 possibility
4 pravda
4 rebel
4 receive
4 recently
4 record
4 reduce
4 redundancie
4 reform
4 refusal
4 regiment
4 relatives
4 reporting
4 represent
4 represented
4 resume
4 resumed
4 retire
4 returning
4 reversed
4 rifkind
4 rises
4 risk
4 rival
4 rivers
4 romanjan
4 rumours
4 run
4 russian
4 sacked
4 sale
4 sales
4 salman
4 science
4 seat
4 sheffield
4 shipping
4 shooting
4 shortage
4 shown
4 singh
4 smuggled
4 sold
4 spoke
4 spotted
4 stable
4 step
4 stopped
4 strong
4 succeed
4 summonses
4 survivors
4 suspect
4 suspects
4 suspicious
4 synod
4 tadeusz
4 telephone
4 telephones
4 tour
4 towed
4 tractor
4 traffic
4 treasury
4
4 tried
4 trouble
4 turned
4 tutu
4 unofficial
4 urging
4 vauxhall
4 verdict
4 via
4 victory
4 village
4 visited
4 volcano
4 voluntary
4 voting
4 waddington
4 waiting
4 ways
4 why
4 willesden
4 works
4 worse
4 worst
4 writs
4 xiaoping
4 young
4 zemin
3 abolish
3 abortion
3 accompanied
3 accusing
3 additional
3 address
3 adviser
3 aerospace
3 affairs
3 affect
3 africans
3 aim
3 disling
3 alexander
3 alleviate
3 alliance
3 allies
3 along
3 alongside
3 ambulanceme
3 amid
3 ammunition
3 amounting
3 arm
3 arrest
3 arrival
3 artist
3 asia
3 asian
3 assembly
3 attending
3 austrian
3 babies
3 baghdad
3 bail
3 barriers
3 bases
3 beating
3 beginning
3 below
3 benefits
3 bills
3 birthday
3 bit
3 blocking
3 blood
3 boesak
3 bombing
3 bombings
3 born
3 bratislava
3 break
3 brian
3 bridge
3 britoil
3 broadcaster
3 bus
3 cathedral
3 cbi
3 cent
3 changed
3 channel
3 charles
3 chau
3 chemical
3 chemicals
3 choice
3 cholera
3 chosen
3 christmas
3 chunnel
3 circumstanc
3 clapham
3 class
3 climate
3 coca
3 collapsed
3 colony
3 compared
3 complex
3 controversi
3 corruption
3 counter
3 counterfeit
3 counterpart
3 coup
3 crisis
3 critically
3 criticising
3 crossing
3 crown
3 cup
3 cuts
3 cyprus
3 daily
3 damaged
3 danger
3 dealings
3 debating
3 decisions
3 deficit
3 deputies
3 destroyed
3 detectives
3 development
3 device
3 disappeared
3 discovered
3 discussions
3 disrupt
3 dissident
3 documents
3 dollars
3 dominating
3 drawn
3 dubcek
3 dublin
3 duchess
3 e
3 elect
3 eleven
3 elsewhere
3 emigrate
3 emissions
3 encourage
3 equipment
3 escalation
3 escaped
3 estate
3 estimates
3 ethnic
3 excluded
3 explain
3 extra
3 extraordina
3 eye
3 face
3 fear
3 fears
3 federal
3 ferdinand
3 fewer
3 fireman
3 firms
3 forests
3 formal
3 fought
3 garcia
3 gathered
3 gave
3 geoffrey
3 georgia
3 germanys
3 gerry
3 gloucesters
3 going
3 greece
3 greenhouse
3 growing
3 growth
3 guarantee
3 gun
3 gunman
3 hand
3 handled
3 headed
3 heading
3 headquarter
3 heard
3 helicopter
3 helping
3 heroin
3 heseltine
3 hill
3 hole
3 holidays
3 hope
3 hotel
3 hour
3 hundreds
3 hunt
3 hurt
3 husak
3 husband
3 i
3 ii
3 illegally
3 immediate
3 imports
3 industries
3 injuring
3 inquiries
3 instead
3 intercity
3 iraqi
3 isle
3 jaruzel
3 jet
3 jewelle
3 job
3 judicia
3 june
3 jungle
3 kind
3 lack
3 ladisla
3 latter
3 layer
3 leaking
3 liberat
3 lift
3 limit
3 linked
3 list
3 1ockerb
3 losses
3 lot
3 low
3 lower
3 loyalis
3 luxury
3 mail
3 maintai
3 maintai
3 manager
3 managin
3 manila
3 marched
3 margate
3 marine
3 materid
3 mcginni
3 measure
3 mechani
3 message
3 metropo
3 meyer
3 missile
3 mixed
3 moldavi
3 moldavi
3 mortgag
3 motors
3 moved
3 mudwad
3 multi
3 murdere
3 navy
3 negotia
3 neither
3 newcast
3 observer
3 obtain
3 occupied
3 offensive
3 offices
3 officially
3 often
3 ortega
3 osman
3 our
3 outbreak
3 outcome
3 outskirts
3 oxfordshire
3 paintings
3 palace
3 parliamenta
3 particularl
3 partner
3 partners
3 patient
3 personnel
3 persuade
3 petrol
3 placing
3 planted
3 played
3 plunged
3 plymouth
3 policeman
3 pollution
3 poor
3 popular
3 postal
3 practices
3 pregnant
3 premier
3 presidentia
3 pressurised
3 prestwick
3 primary
3 progress
3 project
3 prosecuted
3 prosecution
3 prospect
3 prospects
3 providing
3 puppet
3 quality
3 radiotherap
3 range
3 rays
3 reach
3 recognise
3 reconsidere
3 reformers
3 repeal
3 repeated
3 repeatedly
3 rescue
3 reshuffle
3 resignation
3 response
3 restore
3 restriction
3 restructure
3 returned
3 reunificati
3 reuniting
3 richard
3 rigging
3 rising
3 roads
3 rocket
3 romania
3 rome
3 room
3 routes
3 rover
3 runcie
3 safe
3 san
3 satanic
3 save
3 saw
3 scale
3 scientists
3 section
3 segregated
3 seizure
3 self
3 sellafield
3 send
3 sending
3 sensitive
3 sentenced
3 sergeant
3 served
3 sexual
3 shops
3 signs
3 similar
3 sizewell
3 slump
3 smaller
3 smyth
3 soccer
3 socialist
3 solicitor
3 son
3 southwark
3 speculation
3 split
3 standard
3 stands
3 star
3 starting
3 steel
3 stepped
3 stepping
3 stock
3 stockholm
3 strongly
3 structure
3 sun
3 sutcliffe
3 talk
3 tanker
3 task
3 tell
3 tens
3 theatre
3 thieves
3 together
3 tom
3 tomes
3 tough
3 trace
3 traffickers
3 tragedy
3 transaction
3 trust
3 tuesday
3 twice
3 tyrone
3 un
3 unable
3 unidentifie
3 unit
3 unrealistic
3 usa
3 vacancies
3 verses
3 veto
3 vice
3 victim
3 view
3 violent
3 votes
3 walesa
3 warren
3 whitley
3 whole
3 whom
3 wigan
3 wight
3 wiltshire
3 winter
3 wirral
3 withdraw
2 abu
2 acas
2 accepted
2 accidents
2 account
2 accounted
2 accounts
2 activists
2 activities
2 add
2 addition
2 addresses
2 adequate
2 adjourned
2 advertiseme
2 adverts
2 advisors
2 airlines
2 albanian
2 alfred
2 algeria
2 ali
2 alive
2 amassing
2 amateur
2 ambassador
2 anderton
2 anger
2 anglia
2 angry
2 announcing
2 anonymous
2 antarctic
2 anticipated
2 antonio
2 apartment
2 appalling
2 apply
2 appointment
2 approach
2 arens
2 argued
2 arrears
2 arrests
2 arriving
2 ask
2 assassinati
2 assault
2 atmosphere
2 attempting
2 attracted
2 auditor
2 australia
2 automatic
2 availabilit
2 average
2 avert
2 bag
2 bahamas
2 baker
2 bakery
2 bakker
2 balance
2 ballistic
2 balloted
2 banham
2 banker
2 banking
2 banners
2 banning
2 barons
2 barrage
2 battles
2 bavaria
2 beaten
2 beckerr
2 becomes
2 behalf
2 belief
2 belts
2 bench
2 benefit
2 benidorm
2 beverley
2 big
2 bihar
2 bike
2 birds
2 bitter
2 blackout
2 blacks
2 blew
2 block
2 boateng
2 boats
2 boeing
2 bogota
2 bolivia
2 bookshop
2 bp
2 brain
2 brazilian
2 bribe
2 briefing
2 brighton
2 broad
2 broadcast
2 brother
2 brown
2 brussels
2 brutality
2 bsc
2 buckingham
2 buildings
2 busy
2 campaigning
2 canada
2 candidate
2 carbon
2 carlisle
2 carlos
2 carrying
2 cartel
2 catastrophi
2 caught
2 caulton
2 caution
2 celebrated
2 centres
2 ceremonies
2 cfcs
2 cha
2 chances
2 checkpoint
2 chelsea
2 chinese
2 choose
2 cities
2 citizens
2 civilians
2 claimants
2 claiming
2 classroom
2 cleric
2 clubs
2 clwyd
2 coetzee
2 cold
2 colonel
2 comaneci
2 combat
2 commanded
2 commemorate
2 commercial
2 committed
2 common
2 comparable
2 complete
2 completed
2 comply
2 confederati
2 confessions
2 confidence
2 confident
2 confidentia
2 confrontati
2 consignment
2 conspiring
2 constituenc
2 contact
2 contained
2 crack
2 crackdown
2 create
2 created
2 crew
2 criticism
2 crowded
2 crowds
2 cultural
2 cunningham
2 curfew
2 cutbacks
2 dawes
2 deadline
2 dealers
2 debts
2 decline
2 decrease
2 deducted
2 deep
2 defeated
2 defences
2 defended
2 defending
2 delayed
2 delegation
2 delors
2 demanding
2 demonstrati
2 demonstrati
2 demonstrato
2 denies
2 depletion
2 deposed
2 depriving
2 derek
2 designated
2 determined
2 deterred
2 developed
2 developing
2 devonport
2 dialogue
2 dickel
2 differences
2 different
2 difficult
2 difficulty
2 dioxide
2 diplomats
2 disagreemen
2 disbanded
2 discotheque
2 discovery
2 discussing
2 dishes
2 dismantling
2 dixon
2 diy
2 doctor
2 does
2 donald
2 downing
2 dredger
2 dresden
2 dressed
2 drought
2 dunoon
2 durban
2 eduardo
2 effects
2 efficiency
2 elderly
2 electronics
2 ellesmere
2 else
2 emigrants
2 encouraging
2 enforce
2 engaged
2 enter
2 entering
2 epidemic
2 erich
2 escapees
2 escort
2 esso
2 evacuated
2 evacuation
2 events
2 everyone
2 exercise
2 expects
2 experience
2 exploration
2 explosions
2 exposed
2 express
2 expressed
2 extradition
2 extreme
2 fa
2 faces
2 faction
2 fair
2 faldo
2 faulty
2 feet
2 fighters
2 finalised
2 finnish
2 firefighter
2 firm
2 firmly
2 fog
2 follow
2 forcibly
2 foreigners
2 forest
2 formed
2 forming
2 fortune
2 forty
2 fossil
2 fourteen
2 frank
2 frankfurt
2 fulham
2 gain
2 gang
2 garrison
2 generally
2 getting
2 ginniff
2 giuliani
2 giving
2 global
2 glyndwr
2 goal
2 goes
2 gone
2 goodwin
2 gotland
2 guarantees
2 guards
2 guest
2 guilty
2 gulf
2 gyula
2 hammersmith
2 hammond
2 handed
2 handicapped
2 happen
2 happened
2 hard
2 hardliners
2 harmful
2 haul
2 hayward
2 heads
2 heathrow
2 here
2 hesitate
2 hidden
2 highlight
2 hillsboroug
2 hindley
2 hired
2 history
2 homeless
2 humber
2 hussein
2 ian
2 identif
2 illegal
2 impact
2 impleme
2 improve
2 inactiv
2 infecti
2 informe
2 ingleto
2 injunct
2 innocen
2 insider
2 install
2 intelli
2 intende
2 intensi
2 inter
2 introdu
2 investm
2 invited
2 iramedia
2 isc
2 issues
2 issuing
2 italian
2 italian
2 itself
2 jan
2 jihad
2 jim
2 joined
2 joining
2 jones
2 jordani
2 jose
2 journey
2 junctio
2 junior
2 justifi
2 kaifu
2 karolyi
2 katyush
2 kelly
2 kidnapp
2 killer
2 kilos
2 kingdom
2 klawer
2 kurds
2 lancash
2 landing
2 latin
2 launch
2 lawfull
2 lecturers
2 leeds
2 leicestersh
2 leon
2 level
2 libel
2 liberties
2 light
2 locked
2 londonderry
2 looking
2 lowest
2 lubbers
2 lubowski
2 lump
2 macfarlaine
2 machine
2 mafia
2 maginn
2 maguires
2 maigret
2 mainland
2 maize
2 mallon
2 mansfield
2 manslaughte
2 manuel
2 margate
2 maria
2 marines
2 maronite
2 martin
2 marxist
2 matter
2 maude
2 maximum
2 mayoral
2 mcginn
2 mconie
2 meat
2 media
2 meets
2 meibion
2 mellor
2 memo
2 merseyside
2 messages
2 metal
2 michael
2 michelle
2 mistrust
2 mladenov
2 modern
2 monitor
2 moors
2 mother
2 motivated
2 myra
2 n
2 named
2 nationals
2 natural
2 necessary
2 needs
2 neighbourin
2 network
2 newly
2 nicaragua
2 nidal
2 nor
2 notes
2 nujoma
2 nupe
2 object
2 objections
2 obtained
2 once
2 operations
2 opportuniti
2 opposed
2 optimistic
2 opting
2 option
2 ordination
2 outline
2 outlining
2 overboard
2 overwhelmin
2 pacific
2 packages
2 palermo
2 pan
2 papers
2 paramilitar
2 parked
2 participati
2 parts
2 past
2 peacekeepin
2 pensioners
2 pentagon
2 period
2 permission
2 permit
2 peruvian
2 phetchaburi
2 phone
2 photographs
2 picasso
2 pitra
2 planned
2 politics
2 polytechnic
2 poorer
2 preliminary
2 preparation
2 prepare
2 presented
2 presenting
2 presidency
2 priests
2 prince
2 printing
2 privileges
2 probably
2 problems
2 process
2 produce
2 produced
2 producers
2 professiona
2 promote
2 prosecution
2 provided
2 provisional
2 provisions
2 pub
2 purpose
2 quasar
2 quickly
2 radiation
2 radical
2 raf
2 raided
2 railways
2 rainfall
2 raising
2 rape
2 reaffirmed
2 real
2 rear
2 reception
2 recognition
2 recommend
2 recommended
2 recovery
2 recruiting
2 referendum
2 referred
2 refuge
2 refugee
2 regional
2 reinforced
2 reiterated
2 reject
2 rejection
2 religious
2 remained
2 remaining
2 removed
2 repatriatio
2 required
2 resin
2 resistance
2 resolved
2 respect
2 respond
2 retail
2 retired
2 revolution
2 rifles
2 ring
2 rioting
2 riyadh
2 rockets
2 roman
2 rose
2 row
2 rowntree
2 rudolf
2 rudolph
2 rugby
2 rumbold
2 rumour
2 running
2 runs
2 s
2 sack
2 sailing
2 sailor
2 sam
2 sanctions
2 satisfied
2 scaled
2 scientific
2 scrapped
2 screening
2 sdlp
2 searching
2 sector
2 sedition
2 seems
2 sees
2 seizures
2 sek
2 semi
2 series
2 setting
2 settled
2 severe
2 severely
2 sex
2 sfeir
2 shareholder
2 sharp
2 sharply
2 shelter
2 shevardnadz
2 sign
2 signal
2 significant
2 simenon
2 simon
2 skipper
2 slightly
2 slovaks
2 slowing
2 smashed
2 smuggling
2 sncf
2 solicitors
2 somerset
2 somogyi
2 source
2 sparked
2 speaker
2 speakers
2 specialist
2 speed
2 spied
2 spread
2 stabbing
2 stadium
2 staid
2 standby
2 standing
2 stated
2 statements
2 status
2 staying
2 stealing
2 stephen
2 sterling
2 stevens
2 stones
2 stood
2 storm
2 stranded
2 strategic
2 streets
2 strength
2 strengtheni
2 stressed
2 strict
2 strikes
2 striking
2 stringent
2 stronger
2 studies
2 studios
2 studying
2 subject
2 subsidiary
2 sue
2 suggest
2 suprise
2 surgery
2 surrey
2 sutherland
2 swanley
2 swedish
2 swire
2 swiss
2 switzerland
2 tackle
2 tai
2 takes
2 tankers
2 taxis
2 teaching
2 tebbit
2 technology
2 televising
2 terminal
2 terms
2 territories
2 thai
2 thailand
2 theft
2 these
2 thick
2 things
2 thomas
2 threaten
2 threatening
2 thrown
2 thursday
2 ties
2 tilting
2 tolba
2 tongue
2 tool
2 torture
2 totally
2 tourism
2 tourist
2 tourists
2 tracking
2 trades
2 trains
2 trans
2 transferred
2 travelling
2 trawler
2 treasured
2 treating
2 trend
2 tropical
2 truce
2 truck
2 tune
2 turkey
2 unionist
2 unprecedent
2 unusual
2 unveiled
2 upon
2 uvf
2 vacant
2 van
2 various
2 vasconcello
2 vehicle
2 vessel
2 veterans
2 video
2 viewers
2 views
2 villa
2 violated
2 virginia
2 virtually
2 visa
2 vishwanath
2 visitors
2 voice
2 void
2 volunteers
2 vorkuta
2 vredendal
2 wages
2 wake
2 walkout
2 ward
2 warehouse
2 warming
2 waste
2 watchdog
2 watched
2 we
2 wear
2 weather
2 whatsoever
2 widow
2 wildlife
2 wilks
2 william
2 wilson
2 windhoek
2 witnesses
2 woolwich
2 worker
2 workman
2 workmen
2 worldwide
2 worsen
2 wounding
2 writ
2 xinjiang
1 aboard
1 abode
1 aborted
1 absence
1 absent
1 absolutely
1 abused
1 abuses
1 academic
1 accelerated
1 accents
1 acceptable
1 accepting
1 accessible
1 accommodate
1 accommodati
1 accomplice
1 accordingly
1 accountants
1 accusation
1 achievable
1 achieving
1 acknowledge
1 acknowledge
1 acquire
1 acquitted
1 acted
1 activist
1 activistis
1 activity
1 acts
1 adami
1 addenbrooke
1 addressing
1 adds
1 adele
1 adjusted
1 adjustment
1 administere
1 admitting
1 adopt
1 adopted
1 advance
1 advise
1 advisers
1 advisory
1 advocates
1 afford
1 afterwards
1 aggravated
1 aggressive
1 agreeing
1 agreements
1 aims
1 airbase
1 aires
1 airline
1 alarming
1 alcohol
1 aldergrove
1 alebrto
1 alerted
1 alight
1 alison
1 allegation
1 alleging
1 allen
1 allied
1 allocate
1 allotments
1 allows
1 alone
1 alun
1 always
1 am
1 amassed
1 amazon
1 ambassador
1 amendments
1 americas
1 amethi
1 amongst
1 amounted
1 amounts
1 anaylysts
1 anc
1 andean
1 anders
1 andreas
1 andreotti
1 andrew
1 angela
1 anglicans
1 anglo
1 angola
1 ann
1 annoucement
1 announcment
1 anonymously
1 answered
1 anton
1 anxiety
1 anything
1 anywhere
1 ap
1 apap
1 apart
1 apology
1 apparent
1 appeals
1 appearance
1 appears
1 appliances
1 application
1 arctic
1 ardboe
1 ardoyne
1 argue
1 argumen
1 armoure
1 arne
1 arrival
1 arsenal
1 arson
1 arthur
1 article
1 artille
1 ash
1 ashdown
1 ashore
1 asking
1 aspect
1 assasin
1 assassi
1 assault
1 asse~0
1 assembl
1 assess
1 assessin
1 assignm
1 assisti
1 assocat
1 assumes
1 astrono
1 asylum
1 atlanta
1 attache
1 attacki
1 attend
1 attitud
1 attorne
1 au
1 audi
1 audit
1 auguste
1 authori
1 authori
1 autonora
1 autumn
1 auxilia
1 avenge
1 avianca
1 avon
1 aware
1 awarene
1 axe
1 ayatoll
1 ayrshir
1 backed
1 backgro
1 bad
A close up of the hyperbola obtained when plotting the frequency (y axis) of a word against its rank (x axis, /1000).