As a group that has spent many years applying open hypermedia technologies to the Web, we have long been interested in finding out how in practice people are building Web sites: what anchors they are choosing to link from in their own pages and what pages they are choosing to link to. Since the first use of the terms hypertext and hypermedia by Nelson in the mid-1960's, associative linking has been at the heart of the hypermedia authoring, and is the essence of non-sequential writing - this is the type of linking that we were most keen to investigate.
The initial objective of this study was to discover when and where open hypermedia technology might be most usefully applied, and help us to design better systems to support link authoring and maintenance. We were keen to map the types of links authored in Web pages onto an established link taxonomy, such as [DeRose 1989] and attempt to identify patterns in WWW link usage.
However, when we embarked on this study we found, largely by inspection, that there are very few examples of Web sites where there is evidence of linking from within the content sections of documents. So rather than having many sites to analyse as we first expected, we found ourselves asking the question - is the WWW killing hypermedia?.
Our poster presents our search for examples of good subject-based hypertext linking, the linking statistics that we drew from those pages and the linking practises that the statistics represent.
In order to find suitable web pages to analyse, we canvassed for recommendations among colleagues in our research group and from readers of "Hypertext Kitchen" (http://hypertext.pair.com/), a web site for hypertext writers and researchers. From our enquiries we discovered a small number of scientific and technical sites with interesting linking strategies and rather more in the hyperliterature community. Of these we chose to focus our analysis on the scientific/technical sites to work on some broad statistics that described each site's linking practise.
NASA's Astronomy Picture of the Day (http://antwrp.gsfc.nasa.gov/apod/) links each day's text not just to relevant information from previous days, but also external educational and scientific Web pages which explain or illustrate any key phrases and technical terms used in the text. Scientific American (http://www.sciam.com/) provides a similar service for its "Enhanced Articles", also providing more general related article links in the page's navigation section. Although both these sites share a similar brief on the public understanding of science, Scientific American's British counterpart New Scientist (http://www.newscientist.com/) provides no within-text links, only separate navigation functions and lists of related articles, serving as a useful intra-genre comparison.
Although the web pages we analysed collated to form a relatively small data set, we recognised that the automated extraction of statistical information from these pages would be preferable to manual analysis. We therefore implemented the WebSeg tool to operate on our data set.
The WebSeg tool is able to extract link information from a set of web pages and then perform various statistical regressions on this information to provide us with visual insights into linking practices.
As well as simple link count and density metrics, we also wanted to analyse the purpose of each link so it was also important that WebSeg was able to distinguish between links occuring in navigation sections and links occuring in content sections. To provide this distinction, WebSeg automatically segmented the web pages in our data set into logical sections, based on structural-visual clues in the HTML markup (such as section headings, horizontal rules). WebSeg then calculated a link density metric for the identified section, from which we could determine whether the section was a navigation section (high link density) or a content section (lower link density), and hence derive the purpose of the links in the web pages.
In our initial study, we generated the following statistics from our data set :
A full presentation of the results of our analysis is presented in [Carr et al, 2000].
Sites that link separately and internally to related items and previous issues require only a simple metadata match to ensure that the general links provided are relevant. Sites that indulge in integrated content linking require both author input to create suitable links in the first place and editorial input to ensure consistency of approach over time. As an online (linked) version of a printed text, the SCIAM enhanced articles go through separate author and editorial processes. APOD pages, by contrast, are not written by independent authors but by the editors of the site. Also by contrast, each APOD page is written explicitly to be linked: the textual content is constructed to function as an abstract with links providing all the detailed or background information required by the various readership profiles. As such, the task of writing the hypertext is seen by the authors as easier than that of writing a plain text. This is because a plain text must express every idea and elaboration that is necessary to the understanding of the subject, whereas a hypertext can be written as a skeleton of the necessary information with links being used to add 'flesh' to the subject.
Perhaps that this kind of linking process (or more simply "writing process") is sufficiently at odds with people's normal experience of literacy explains that it is not more commonplace. The effort required to locate high quality potential Web pages to link to may be an issue here: competent editorial experience and a knowledge of the kinds of material available in a particular subject domain are key to this kind of linking, skills which the average Web site creator does not possess. Unless these issues can be addressed, we foresee that WWW may kill hypermedia as we know it.
Thanks to Robert Nemiroff, co-producer of APOD, for insight into APOD's editorial and linking policies and processes. Thanks also to Mark Bernstein and readers of the Hypertext Kitchen.