The Development Of A Sound Viewer For An Open Hypermedia System

Stuart Goose and Wendy Hall
University of Southampton,
Highfield, Hants, UK
SO17 1BJ
Fax: +44 703 592865
E-mail: {sg93r,wh}@ecs.soton.ac.uk


Contents

  1. Introduction
  2. Visual Representation of Sound
  3. Microcosm
  4. Overview of the Sound Viewer
  5. Program Design
  6. Future Work
  7. Conclusions

Abstract

While rich support for a wide variety of media such as text, video and image is common among contemporary hypermedia systems, so too is the inadequate support for audio. The primary reason that audio has not attracted as much attention as other media can be attributed to its obvious lack of visual identity. The main focus of this work was to identify a generic and meaningful visual representation of audio within a hypermedia context, and significantly promote hypermedia support for audio through the provision of a sound viewer.

This paper describes the inherent difficulties in providing a consistent interface to audio, and discusses in some depth the issues raised during the development process. The sound viewer is then introduced and the associated concepts described. The creation and traversal of links to and from audio are facilitated by the sound viewer across formats including WAV (proprietary digital sound file format from Microsoft), CD (Compact Disc) Audio and MIDI (Musical Instrument Digital Interface). The resultant viewer provides a unified and extensible framework for interacting with audio from within an open hypermedia environment. The open hypermedia system Microcosm was used as the development platform for this work. Microcosm can be augmented to supply a hypermedia link service to additional media with minimal overhead.


1. Introduction

Although vision is generally accepted as the primary sense for normal sighted people, the auditory sense has some unique qualities [1, 2]:

  • auditory stimuli has the ability to make a longer lasting impression upon an individual than visual stimuli [3, 4].
  • better recall can generally be attained if the information was received aurally, as opposed to being read.
  • humans can react faster to auditory stimuli faster than to visual stimuli.
  • A charismatic orator is generally remembered for the mesmerising delivery of a famous speech, similarly a musical virtuoso may create an impact upon an audience after performing a remarkable solo. Our memory of such events serve to illustrate how people respond positively to exhilarating auditory stimuli. The richness of expression that this medium can convey is generally acknowledged, whether the sound is music or the spoken word.

    Many applications have benefited from the inclusion of non-speech sound, which has traditionally been used in the interface to denote warnings or status information. Experimental evidence exists to suggest that audio confirmation reduces errors [5]. The suitability of sound as a primary aid to navigation has also been satisfied [6], with users demonstrating that up to eight targets on the screen could be located with reasonable accuracy and speed using only auditory clues.

    Video and audio are examples of temporal media, with systems such as Intermedia[7] and The Elastic Charles [8] among the first credited with providing explicit handling and management of temporal media. Interesting work has also been published regarding auditory icons [9]. Through his SonicFinder [10], Gaver associated natural and environmental sounds to actions occurring within the interface, allowing them to purvey greater meaning to the user. Rather than employing natural sounds to represent actions, Blattner et al. [11] devised synthetic sounds which they referred to as earcons. An earcon is constructed from musical fragments called motives, with each one having a semantic interpretation. These motives are then blended using recognised musical techniques. Variations upon a theme can be generated, allowing a family of earcons to represent a genre of similar conditions, e.g. errors. Work such as this, reaffirms the importance and advantages of audio support within media enriched environments.

    A significant amount of commercial software is currently available that facilitates manipulation of audio media from within a GUI (Graphical User Interface). A variety of these tools and their respective user interfaces were observed in order to identify the most effective and successful paradigms adopted for work within an audio context. PC and Macintosh applications such as AudioTrax, SoundTools, SADiE and WaveEdit were among those examined.

    Given that sound is an essential medium for communicating thoughts and ideas together with the evidence that people are well disposed to auditory stimuli, we postulate that audio support within a hypermedia system is a fundamental requirement. Many hypermedia applications would benefit greatly from the inclusion of audio media, whether in the form of musical extracts or recorded interviews with people of interest.

    Considering voice is used as our primary means of communicating to one another, it would be natural to assume that audio would form an integral component in any hypermedia system, but we have found this not to be the case. Many hypermedia systems support a wide variety of media such as text, video and pictures, but audio has been gravely neglected. The fundamental reason that audio has not attracted as much attention as other media can be attributed to its obvious lack of visual identity. Even within such influential systems as Intermedia [7] and Athena Muse [12], there appears to be no documented evidence of the development of presentation components for audio that support mechanisms for link creation and traversal.

    The visual nature of window-based media viewers demand an appropriate graphical representation for each medium. The main focus of this work was to identify a generic and meaningful visual representation of audio within a hypermedia context, and significantly promote hypermedia support for audio through the provision of a sound viewer. The subsequent sections describe the inherent difficulties in providing a consistent interface to audio, and introduce the sound viewer as a solution to some of these problems.

    The developers of Intermedia introduced an interesting concept of active anchors [13], where the consequence of traversing a link to an active anchor results in the presentation of dynamic information associated with that anchor. In this paper we describe a new link attribute which enables the authoring of temporal anchors, and show how this can be incorporated into a presentation component for audio within an open hypermedia framework.

    2. Visual Representation of Sound

    A sound viewer is perhaps a contradiction in terms, as sound is invisible to the eye. The non-visual manifestation of this medium is in complete contrast with the array of media viewers available. In seeking an appropriate graphical metaphor for sound, a brief review of existing methods was conducted.

    2.1 Waveform

    The selection of a waveform to represent sound has been widely adopted by many computer applications in the audio domain. This is principally because it is the most accessible representation of this medium and is simple to render. The waveform for a given sound sequence can be deduced by sampling the sequence at given rate. Greater definition and accuracy of the waveform can be observed through proportional increases in the sampling rate.

    2.2 Manuscript

    Traditionally, composers have documented music upon manuscript. This entails using five parallel lines (the stave) to represent pitch, and suitably annotating it with symbols to represent notes and silences of varying duration. The language of music is universal and affords a high degree of precision and expression.

    2.3 Piano Roll

    Another representation employed by the music community is piano roll. A single octave of a piano keyboard is drawn vertically on the left hand side of a display, while markers, denoting a key depression, scroll from left to right across the display at regular time intervals. This method allows a musician to learn and play tunes via mimicry.

    2.4 No Visualisation (sound engineers)

    Currently the majority of sound editing by studio engineers is performed manually without any visual aids. All an engineer requires is fine control over the audio device for positioning the start and stop locations, and a playback facility for subsequent refinement of these positions. Time and position indicators are also relied upon for noting and subsequently relocating specific sequences.

    2.5 Summary

    The clear advantage of having a waveform displayed graphically is that an accurate and visible manifestation of the sound sequence is provided. The waveform can then be used as a basis for editing, making selections and performing actions upon. Other than providing a visible manifestation, a waveform is of limited benefit to a naive user when they listen to a sound sequence. An untrained user cannot easily relate a waveform to the sound they hear. The most information they are liable to gather from this representation is whether there is any sound or not, and hence whether it is relatively loud or quiet.

    As indicated earlier, manuscript can yield accuracy and power of expression, but this representation would only be desirable if the user had the ability to interpret such a notation. The task of deducing both manuscript and piano roll representations from audio and subsequently displaying it is a non-trivial process, and such technology is currently only available for the MIDI format. The manuscript and piano roll representations would also only be applicable to music, as they would be meaningless if the sound recording was of a conversation. Having to draw any distinction between conversation and music would be clearly undesirable as the representation would have to change to reflect the type of the current sequence. Automatically detecting whether the sound is of music or conversation is also non-trivial and not within the focus of this work.

    Despite the prevalence of the waveform within the interface of the majority of commercial audio packages, we remained sceptical of its value for the reasons expressed earlier and were concerned that it would be relegated to that of a token audio representation.

    When investigating the feasibility of displaying a waveform, it transpired that retrieving the information required was possible from the WAV format but not from MIDI or CD audio. As one of the key aims of this project was to provide a sound viewer with a uniform user interface to the spectrum of audio formats, the unsuitability of a waveform representation was confirmed. The visual representation selected should not compromise the current and future audio formats supported.

    The main requirements of a typical hypermedia author/user for the manipulation of audio was compiled and the essential points are listed below:

  • to have fine control over the audio device
  • to be able to identify the current position and duration of the sound sequence
  • the ability to make a selection over a portion of the sound sequence, playback and accurately refine that selection in preparation for creating an anchor
  • to be able to identify clearly any links present in the sound sequence
  • to be able to follow any links from the sound sequence in an intuitive manner
  • When reflecting upon these requirements it became apparent that the need for a specific conventional representation for sound was not as great as at first thought. The majority of hypermedia users are unlikely to be highly musically trained, so the requirement for any specialist representation is limited, although a complete system would address this need. The fundamental requirements of the average hypermedia user are not dissimilar to those of the studio engineer, discussed earlier, but with the obvious addition of link service capabilities. Although the sound viewer would lack a conventional graphical representation of sound, the more this approach was considered and how it related to hypermedia, the more it seemed both plausible and practical. Before we describe the user interface in detail it is necessary to introduce the host hypermedia system.

    3. Microcosm

    Microcosm is the product of research into open hypermedia systems conducted by the Multimedia Group at Southampton. It provides an open hypermedia environment, allowing linking between many different media types without the need to modify these resources in any way, which made it an ideal experimentation vehicle for the sound viewer.

    The most advanced version of Microcosm runs on PC under Microsoft Windows 3.1, but parallel developments for UNIX and Macintosh platforms are underway. It has been well documented in the literature [14, 15, 16, 17] and we only give a brief overview here to explain the necessary concepts.

    3.1 Brief Overview

    The Microcosm model was first described in [14]. It is best understood as a set of autonomous communicating processes which supplement the operating system facilities. From within Microcosm the user interacts with a viewer. A viewer is responsible for providing an interface for allowing the author/user to manipulate a particular media type and also to create and traverse links. Messages to perform such actions are sent from the viewer to Microcosm, which then dispatches the message through a chain of filters. Each of the filters is then given the opportunity to respond to the message. Based on the message contents, some filters may add new messages to the chain. Eventually the message(s) will emerge from the chain arriving at the Link Dispatcher. This filter examines the messages to see if they contain any available actions (such as links to follow), and if so presents these actions to the user.

    Microcosm supports a wide variety of media such as text, video, audio and images. The system can be augmented to support additional media with minimal overhead, this is largely attributable to the open design of the system. When another media type is to be supported, a new viewer must be implemented, or an existing application's functionality augmented.

    3.2 Viewers

    As described above, a viewer is responsible for providing an interface for allowing the author/user to manipulate a particular media type and also to create and traverse links.

    The reasons for augmenting third party applications with the hypermedia services are two fold: the original application is already available and ideal for viewing and manipulating the data; that the user is comfortable with the operation of the original application. Common sense suggests that most users would prefer to access hypermedia link services from within a familiar application as opposed to learning to use a dedicated hypermedia viewer.

    Microcosm supports three different classes of viewer - fully aware, partially aware and unaware - as detailed in [15, 18]. In order to provide full link service capability, the sound viewer was implemented as a fully aware Microcosm viewer.

    3.3 Inadequacies With Audio Support In Microcosm

    Elementary support for audio has been present within Microcosm for a considerable period now, but only through the invocation of a standard Microsoft Windows application to play a WAV file from the start through to completion. This limited support for sound has had a significant negative impact on the amount of audio material incorporated within Microcosm applications by authors. Some of the central reasons being:

  • WAV file format was the only audio media supported. The average user was not prepared to use the Microsoft Windows Sound Recorder to create a WAV file. This process involves a degree of experimentation in order to determine the optimum recording levels to keep distortion to a minimum. Reasonable quality recordings of conversations could be achieved, but recording music proved to be less successful. An attempt was made to record an excerpt of an opera from CD audio, which upon playback, culminated in the higher registers of the female vocal sounding intolerably distorted.
  • The maximum permitted length of a WAV file is dictated by how much computer memory is available. WAV files also consume an enormous amount of disk space even for fairly short sound excerpts. For example, a sound sequence 31.5 seconds in duration results in the generation of a 347.7 Kbytes WAV file. Incorporating several AVI video (proprietary digital video file format from Microsoft) and WAV sound files into a hypermedia application has the potential to consume copious amounts of storage space.
  • Originally, when a user followed a link to an audio document Microcosm would launch the Microsoft Windows Sound Recorder application with the WAV file to be played. From that moment onwards Microcosm was unable to have any influence over the application. The author may have preferred to have a single instance of a WAV file in storage and define the start and end points of numerous separate excerpts over that single sequence. This scenario could not be achieved with the existing configuration. A similar effect could be accomplished but only by creating multiple WAV files, undesirable because of the recording process and the considerable drain upon storage, compounded by the probability that some of the audio material will be replicated in these WAV files.
  • The most serious limitation with the existing arrangement is that links cannot be created in or followed from audio media. This is a fundamental service that any hypermedia system would be expected to provide. Every other fully aware viewer within Microcosm supports the actions of creating and follow links; we needed to find a way to extend this basic requirement to encompass sound.
  • Consequently, there was a pressing requirement for a purpose-built Microcosm sound viewer to address these shortcomings. The new viewer described in this paper provides an interface for allowing the author/user to manipulate the sound, and also facilitates creation and traversal of links to and from audio media. The audio media formats that this viewer currently supports are WAV, CD Audio and MIDI.

    4. Overview of the Sound Viewer

    The design of an intuitive user interface is of paramount importance to an application such as this as it will dictate how easy or difficult any interaction with audio media is to be. This factor will also impact upon how prevalent audio media becomes within any subsequent hypermedia applications. Fortunately, there already exists a well understood user model for interaction with audio devices. The sound viewer interface exploits the conventional control panel used by cassette decks and compact disc machines, as most people are well acquainted with their operation. Those familiar features have been augmented to provide hypermedia functionality. The resultant user interface can be seen in Figure 1: The user interface of the sound viewer working with Microcosm.

    Two concepts central to the operation of the viewer are those of the local and global views. A local view is a defined subset of a sound sequence, whereas a global view is the sequence in its entirety. For example, if an author was building a hypermedia application about a classical symphony it would be desirable if small excerpts, composed of indexes into the piece, could be created and subsequently played when required. These small excerpts that can be played on request constitute a local view, but at all times the user may be granted access to the global view and appreciate the symphony in its entirety.

    4.1 Detail and Overview Windows

    The sound viewer has two white rectangles almost spanning the width of the application window. The large rectangle at the top will be referred to as the detail window, and the smaller window beneath it will be referred to as the overview window.

    The length of the overview window represents the length of the sound sequence to be heard. The highlighted rectangle within the overview window moves from left to right as the sound sequence plays, providing a visual clue to the current position. If any links are present within the sound sequence they are represented as horizontal lines drawn in the overview window at the relative position in the sound sequence. The length of the line represents the period of time the link is valid.

    The highlighted rectangle in the overview window represents the exploded view seen in the detail window above. As the highlighted rectangle within the overview window moves along, a more detailed view of any links is scrolled horizontally within the detail window, both in synchronisation with the sound playing. Links are represented as horizontal lines within the overview window, but shaded rectangles within the detail window. Displayed within the shaded link rectangle is a textual annotation stating the destination document media type followed by a brief description of the link relationship (entered by the author when it was created).

    The notion of using a fisheye view to improve navigation and ease browsing large and complex objects has been widely investigated [19, 20]. Although not a classic fisheye view, the detail and overview windows combine to provide detailed local information placed within the wider context.

    4.1.1 Zoom Facility

    As mentioned previously, the highlighted rectangle in the overview window represents the exploded view seen in the detail window. The size of the highlighted rectangle in the overview window is representative of the view in the detail window. This feature allows the highlighted rectangle to be resized in the horizontal direction, using the mouse in the same manner a window is resized, and observe the corresponding view in the detail window being updated to reflect this change. Thus the smaller the highlighted rectangle within the overview window, the higher the zoom factor in the detail window, and vice-versa. This feature is demonstrated in Figure 2: Showing the effect when the highlighted rectangle in the overview window has been resized.

    4.1.2 Changing Current Position

    The overview window is really an embellished scroll bar. The current position in the sound sequence can be altered by "dragging" the highlighted rectangle to the desired location. The user may also click at a specified point within the overview window and the highlighted rectangle will move there, causing the sound sequence to played from that point onwards.

    4.1.3 Mouse Pointer

    To provide visual clues for the user, the mouse pointer changes in appearance to indicate that the mouse is in the correct position to either move or resize the highlighted rectangle within the overview window. The familiar four arrowheaded mouse cursor is displayed whenever the user places the mouse over the rectangle, indicating that it may be moved. The also familiar horizontal double arrowheaded mouse cursor is displayed whenever the mouse is positioned over either the left or right edge of the rectangle, indicating that it may be resized.

    4.2 Link Following

    As described earlier, links are the annotated shaded rectangles displayed in the detail window. The brief descriptions on the links serve as clues to the relationship binding the source and destination documents. If the user hears either a musical or vocal phrase that they find interesting and find that a link is present in the detail window, the link annotation can be read. If they are stimulated by this information and decide that they wish to find out more about the relationship, they will want to be able to follow this link.

    To follow a link, a user must position their mouse pointer over the relevant link rectangle in the detail window and perform a double click action. This action works when a sound sequence is being played and when paused.

    4.3 Position Counters

    Beneath the overview window are three boxes housing some additional features. At the bottom left a box entitled Position Counters can be seen. There are two times displayed within this box, one showing the time elapsed since the start of the track and the other the duration of the sound sequence to be heard.

    4.4 Selection Indicators

    Another box is situated at the bottom centre entitled Selection which contains two edit boxes labelled Start and Stop. An author will demand a simple but accurate mechanism when selecting a portion of sound over which an anchor may be created. These two edit boxes are equipped with up and down arrow icons which can be activated to increment or decrement the respective start and stop positions. The selection of sound is conventionally highlighted in black within the detail window, leaving the author free to play the selection and make subsequent refinements.

    4.5 Link Authoring

    Once the user is satisfied with the portion of sound that they have selected, the Start Link or End Link option from the Action menu can be selected to create a link anchor. The Microcosm link creation mechanisms are well documented [15, 21]. The link creation process using the sound viewer can be followed from Figure 3 through to Figure 4.

    Within the sound viewer, in the top left corner of Figure 3, a selected portion of sound is highlighted in black. The required destination document can be seen within the text viewer in the top right corner of this figure. Start Link has been selected from the Action menu of the sound viewer, and End Link selected from the Action menu of the text viewer. The Start Link and End Link windows then appear and can be seen displaying their respective selection information. Once the Complete... button is pressed the Linker window is displayed. The Linker dialog box allows the user to select a link type together with any associated attributes, or accept the default settings. A textual link description may also be entered at this stage. Once this short process has been completed the user selects OK to forge the link. It only makes sense at the moment to create specific links from audio because to create local or generic links would require intensive sound processing capabilities. This process in shown in Figure 3: A link being authored between sound and text can be seen in this screenshot.

    Once the new link has been created, the linkbase filter, responsible for link creation, sends the sound viewer a message endorsing this fact. The sound viewer dynamically updates itself to incorporate the new link, both within the overview and detail windows, which can be seen in Figure 4: This screen shot shows the sound viewer after dynamically updating itself to incorporate the new link, both within the overview and detail windows.

    4.6 Temporal Anchors

    Palaniappan et al [13] incorporated the concept of active anchors within Intermedia, where the traversal of a link to an active anchor resulted in the presentation of dynamic information associated with that anchor. The link authoring filter of Microcosm was extended to support a new link attribute to enable the authoring of temporal anchors.

    A temporal anchor is a link with respect to time. Temporal anchors are only applicable to temporal media such as audio and video. The semantics of a temporal anchor require that it is automatically activated by the viewer when the current position in the media is equivalent to the start position of which the anchor is valid. This feature enables additional information to be viewed at the appropriate time within the sound sequence. Imagine a piece of classical music is being played, with the accompanying pages of score being displayed automatically at specified intervals allowing a musician to chart its progress. The score would have previously required digitising, but the temporal anchors are the mechanism for triggering their display at the appropriate moments.

    4.7 Audio Controls

    The last box situated at the bottom right of the sound viewer interface, entitled Audio Sequence Controls, contains a collection of buttons that can be used to control the local view of the sound sequence. The operation of the Play, Pause and Repeat controls are self explanatory. If a portion of sound is currently selected then the audio controls operate upon the selected portion.

    4.8 Memory Store

    During a sound sequence the user may hear something that is particularly interesting that they may wish to return to at a later stage for closer inspection. This requirment can be fulfilled by using the memory feature of the viewer. Beneath the local audio controls are two buttons, Memory In and Memory Out. When Memory In is clicked the viewer records the current position in the audio media, essentially an audio bookmark. At any time the user may interrogate the contents of the memory by clicking on the Memory Out button. When this button is clicked a window is opened with the time of each memorised position displayed within a list box. The user may return to an entry by selecting a time and choosing the Move To button, or double-clicking with the mouse, to move to that particular location in the sound sequence.

    4.9 Custom Features

    The choice of time format can be decided by the user via the Units menu option. The two formats currently supported are Milliseconds and Minutes/Seconds.

    An extensive set of audio controls for navigating the global view is provided by selecting the Controls menu option. The extended control panel can be seen in Figure 5. This can be viewed in Figure 5: The extended control panel facilitates further browsing.

    5. Program Design

    Once the program specification had been clarified and the user interface issues confronted, a prototype was constructed. It soon became clear that this complex task could be divided into a number of components each with their own responsibilities. The modular architecture of the program can be appreciated in Figure 6: A diagram depicting the structure of the primary components of the application.

    The UNIX operating system maintains a layer of abstraction over the hardware, allowing both the user and the programmer to profit from a high level of device independence. It was decided that the design of the sound viewer would embody this concept to insulate those concerned from all audio device details. The front end of the program exploits a suite of audio device independent functions in order to control the audio media. This layer of abstraction allows the user interface aspects to be conveniently divorced from any audio device specifics. Beneath the abstraction, each of the supported audio formats supply a suite of functions that can be called from the audio device independent layer. Each of the media specific functions rely heavily upon the Microsoft Windows MCI (Media Control Interface) for this implementation. The sound viewer currently supports three audio formats, but due to the modular and extensible design, alternative and emerging formats may be rapidly supported.

    The audio hardware must be capable of working asynchronously rather than hogging system resources and degenerating the performance of Microsoft Windows. In order to run this program satisfactorily the computer hardware must meet the minimum specification of a standard multimedia PC [22].

    6. Future Work

    This section documents various ideas for extending the sound viewer. The first subsection discusses possible improvements while the second considers extensions specifically for musicians.

    6.1 Additional Features

    The general paradigm adopted within GUI based applications for making a selection is to "click and drag" the mouse over a word or phrase which is subsequently inverted to black. The current method provided within the sound viewer is an accurate and convenient mechanism, but it would make the selection process complete if this additional mechanism was supported.

    The brief text descriptions that annotate the links are composed of the destination document type, e.g. TEXT or VIDEO, and the link description separated by a colon. As each media type supported within Microcosm has an associated icon it, an improvement would be to paste this icon onto the link rectangle in preference to the text description as is done at present.

    Although the algorithm to arrange the links achieves the optimum use of the screen space available, it is a finite resource. The viewer imposes a current maximum of six links, that can be valid over the same time period, to be displayed on the screen at any one time. One solution, consistent with that of a text application, would be to supply a vertical scroll bar to allow the user to view any links not within the current display area.

    6.2 Musical Extensions

    The sound viewer could be enhanced to provide additional hypermedia functionality that would be of significant use to musicians. Support could be provided for generating and displaying manuscript and piano roll representations of music, but perhaps more interestingly, techniques for computing links into digital audio media. One could search for transposed recurrences of melodies or chords, intervallic successions and other interesting musical patterns. This could prove a fruitful area for future work.

    Musical texturing could also be supported when Microcosm is able to provide a degree of synchronisation. The rich texture of a piece of music can be appreciated when gradually constructed, building up the sound by introducing each instrument, or track, layer by layer. This approach allows the user to better understand the relationships between the rhythmic, melodic and harmonic content.

    7. Conclusions

    In this paper several major inadequacies of the audio support within Microcosm were presented. The new sound viewer has provided an alternative strategy for tackling these problems.

    The first problem being that WAV was the only audio format supported, and the associated difficulties with transferring sound recordings to this format. This process could become time consuming and often resulted in low quality recordings. The second problem was concerned with audio files consuming large quantities of storage space. One obvious solution to the above problems was to provide CD audio support. This alleviates concern over storage consumption, as the only information now required to be stored is the start and stop positions that the CD is to be played from and to. The prevalence of CD audio in recent years has meant that there is an immense catalogue of CD audio material available, making it vital that authors have the ability to incorporate this medium into their applications.

    Given that the technology to render or play both static and dynamic media on computers has been available for a considerable period, the challenge still remains to provide usable and powerful hypermedia linking mechanisms for them. The remaining problems alluded to earlier related to the absence of hypermedia support for audio. This paper has detailed the way in which the sound viewer provides controls for manipulating audio media together with the hypermedia support for creating and traversing links.

    At present the sound viewer supports only the specific/button link type within Microcosm, allowing visible point-to-point links to authored. Text-retrieval algorithms exist which have been used as a basis for creating dynamic links, but this concept can be related to the spectrum of multimedia data. The MAVIS project [23, 24] is currently examining techniques for achieving content-based indexing and retrieval for images. Preliminary results suggest that these techniques could be successfully applied to other media types, including video and audio. This advancement would allow the range of link types provided by Microcosm to become applicable to the panoply of media types.

    The provision of link anchors for video data is a worthy pursuit and tackled by various research groups. The Microcosm video viewer [25] allows authors to "click and drag" with the mouse to define a region of interest and then bind it within time and space of the video sequence. As the video sequence plays, anchors appear as polygonal outlines moving in time with the video and disappear when no longer valid. This mechanism provides a natural presentation of anchors that remains consistent with the model employed within other Microcosm viewers. While this approach has its merits it only provides link information with respect to the current frame. The user is afforded no overview or visual indication of the presence or relative position of any anchors within the sequence, such as is provided by the overview window in the sound viewer. It is highly likely that users of the video viewer would appreciate the combined benefits that both representations offer as has been explored in the sound viewer.

    As described above, a temporal anchor is a link with respect to time. A temporal anchor is automatically activated by the viewer when the current position in the media is equivalent to the start position of which the anchor is valid. The musical scenario outlined earlier required images of the score to be automatically displayed at specified moments throughout the sound sequence. This requirement raises the issue of synchronisation. Can it be guaranteed, within a specified tolerance, that the image of the music score will be displayed in order that the user may successfully follow the progression of the piece ?

    Various researchers have confronted the issues of specifying and delivering synchronised multimedia documents e.g., [26, 27, 28, 29]. This is a crucial area for the future development of multimedia information systems. At present, the sound viewer sends a follow link message upon which Microcosm responds by processing the link and invoking a viewer with the relevant document. This speed of this process is dependent upon a variety of factors: the speed of the machine hardware; the current processing load of the machine; the efficiency of the link processing stage; and whether the resultant document has to be retrieved over a network or not.

    Of the many proposed solutions to this difficult task, the most favourable is to provide such support in amongst the layers of the operating system. Existing mechanisms such as semaphores, message passing and clock scheduling have proved inadequate on their own, so new mechanisms such as triggering have been developed to complement these existing services. These problems are exacerbated when considering what types of mechanisms are required for synchronous co-ordination within a distributed environment.

    Alternative high level approaches may be sought through designing the document viewers in such a way that they may communicate and hence collaborate. A generic or composite viewer could be designed that would be responsible for playing and rendering all media types, thus giving it absolute control over the rate at which each medium may progress.

    8. Acknowlegements

    The authors are indebted to all members of the Multimedia Lab at Southampton for their help and advice, especially Nick Beitner, Ian Heath, Rob Wilkins and Hugh Davis. In addition, we are also very grateful to Deborah Swanberg of the University of San Diego for her constructive comments on the initial draft of this paper.

    This project was undertaken in collaboration with the Music Department at the University of Southampton, and our thanks extend to Nick Cook and Dan Leech-Wilkinson for their very helpful comments and for trialing the system.


    References

    [1] Card, S.K., Moran, T.P. and Moran A., The Psychology of Human-Computer Interaction, Published by LEA, 1983.

    [2] Eysenck, M., Keane, M., Cognitive Psychology: A Students Handbook, 1986.

    [3] Averbach, E. and Coriell, A.S., Short-term Memory in Vision, Bell System Technical Journal, 40, 309-328, 1961.

    [4] Darwin, C.J., Turvey M.T. and Crowder, R.G., An Auditory Analogue of the Sperling Partial Report Procedure: Evidence For Brief Auditory Storage, Cognitive Psychology, 3, 255-267, 1972.

    [5] Monk, A.F., Mode Errors: A User-centred Analysis and Some Preventative Measures Using Keying Contingent Sound, International Journal of Man-Machine Studies, 24(1), 1986.

    [6] Pitt, I. and Edwards, A., Navigating the Interface by Sound for Blind Users, In D. Diaper & N. Hammond (eds) HCI '91: People and Computers VI, BCS HCI SIG, 373-383, Cambridge University Press, 1991.

    [7] Yankelovich, N., Haan, B.J., Meyrowitz, N.K., Drucker, M., Intermedia: The Concept and the Construction of a Seamless Information Environment, IEEE Computer, 81, January 1988.

    [8] Brondmo, H.P., Davenport, G., Creating and viewing The Elastic Charles - a Hypermedia Journal. In McAleese R. and Green C., eds., Hypertext: State of the ART, 43-51, Intellect Ltd, 1990.

    [9] Gaver, W., Auditory Icons: Using Sound in Computer Interfaces, Human Computer Interaction, 2(2), 167-177, 1986.

    [10] Gaver, W., The SonicFinder: An interface that uses auditory icons, Human Computer Interaction, 4(1), 67-94, 1989.

    [11] Blattner, M., Sumikawa, D., Greenberg, R., Earcons and icons: Their structure and common design principles, Human Computer Interaction, 4(1), 11-44, 1989.

    [12] Champine, Geer, Ruh, Project Athena as a Distributed Computer System, IEEE Computer, 40, September 1990.

    [13] Palaniappan, M., Yankelovich, N., Sawtelle, M., Linking Active Anchors: A Stage In The Evolution Of Hypermedia, Hypermedia, 2(1), 47, January 1990.

    [14] Fountain, A.M., Hall, W., Heath, I., Davis, H.C., MICROCOSM: An Open Model for Hypermedia With Dynamic Linking, In Rizk, A., Streitz N., Andre, J., eds., Hypertext: Concepts, Systems and Applications. The Proceedings of The European Conference on Hypertext, INRIA, France, November 1990, Cambridge University Press, 1990.

    [15] Davis, H.C., Hall, W., Heath, I., Hill, G.J., Wilkins, R.J., Towards an Integrated Environment with Open Hypermedia Systems, In Proceedings of the ACM Conference on Hypertext, ECHT `92, Milan, Italy, 181-190, December 1992.

    [16] Hall, W., Heath, I., Hill, G.J., Davis, H.C., Wilkins, R.J., The Design and Implementation of an Open Hypermedia System, Computer Science Technical Report 92-19, Department of Electronics and Computer Science, University of Southampton, UK, 1992.

    [17] Hall, W., Ending the Tyranny of the Button, IEE Multimedia, 1(1), 60-68, Spring 1994.

    [18] Knight, S.J., Davis, H.C., Light Hypermedia Services: A Study of Third Party Application Integration, European Conference on Hypertext, 41-50, September 1994.

    [19] Furnas G.W., Generalised Fisheye Views. In Proceedings of CHI 1986 Human Factors in Computing Systems, Boston, Mass., ACM Press, 16-23, 1986.

    [20] Noik E.G., Exploring Large Hyperdocuments: Fisheye Views of Nested Networks. In Hypertext 93: The Proceedings of the Fifth ACM Conference on Hypertext, Seattle, 192-205, November 1993.

    [21] Hall, W., Davis, H.C., Hypermedia Link Services and Their Application to Multimedia Information Management, to appear in Journal of Information and Software Technology Special Issue on Multimedia. Also available as Computer Science Technical Report 93--19, Department of Electronics and Computer Science, University of Southampton, UK, 1993.

    [22] Multimedia PC Marketing Council, Standard Multimedia PC Configuration, Washington DC, 1992.

    [23] Wilkins, R.J., Griffiths, S.R., Lewis, P.H., Hall, W., Davis, H.C., MAVIS: Content Based Navigation Within Microcosm, Demonstrated at the European Conference on Hypertext, September 1994.

    [24] Wilkins, R.J., Griffiths, S.R., Lewis, P.H., Hall, W., Davis, H.C., The MAVIS Project-Extending Generic Links and Content-Based Retrieval to Non-Textual Documents in the Microcosm Model, Research Journal, Department of Electronics and Computer Science, University of Southampton, UK, December 1994.

    [25] Beitner, N.D., Multimedia Support in Hypermedia, Mini-Thesis, Department of Electronics and Computer Science, University of Southampton, UK, July 1993.

    [26] Little, T., Ghafoor, A., Synchronisation and Storage Models for Multimedia Objects, IEEE Journal on Selected Areas of Communications, 8(3), 413-427, April 1990.

    [27] Steinmetz, R., Synchronisation Properties in Multimedia Sytems, IEEE Journal on Selected Areas in Communications, 8(3), 401-412, April 1990.

    [28] Hardman L, Bulterman, D.C.A., van Rossum, G., The Amsterdam Hypermedia Model: Extending Hypertext to Support Real Multimedia, Hypermedia, 5(1), 47-69, 1993.

    [29] Buchanan, M.C. and Zellweger, P.T., Specifying Temporal Behaviour in Hypermedia Documents, In Lucarella, D., Nanard, J., Paolini, P., eds. The Proceedings of the ACM Conference on Hypertext, ECHT `92 Milano, ACM, 181-190, November 1992.