CSTR 94-03
(c) University of Southampton
Department of Electronics and Computer Science
University of
Southampton
Southampton SO17 1BJ
Abstract
Many hypermedia systems support a wide variety of media such as text, video and pictures, but audio has been somewhat neglected. The central reason that audio has not attracted as much attention as other media is its obvious lack of visual identity. The visual nature of window-based applications, especially window-based hypermedia viewers, meant that the main focus of this work was to identify a meaningful representation of audio within a hypermedia context.
This paper introduces the sound viewer and describes the associated concepts. The issues raised during development are also discussed in some depth. This viewer facilitates the creation and traversal of links to and from sound media. The audio media formats supported are WAV[1], CD[2] Audio and MIDI[3]. The resultant viewer provides a unified and extensible framework for interacting with audio media operating from within a open hypermedia environment.
Microcosm is the product of research into open hypermedia systems conducted by a group within the department. Microcosm can be augmented to support additional media with minimal overhead. This is largely attributable to the open design of the system which is consequently an ideal experimentation vehicle for the sound viewer design.
Contents
auditory stimuli has the ability to make a longer lasting impression upon an individual than visual stimuli.
an individual generally has better recall if the information was received aurally, as opposed to the individual reading it.
Charismatic orators are often remembered for their mesmerising delivery of famous speeches. The three main issues a good public speaker must concern themself with are tone of voice, content and body language. Of the three issues, the importance of warm, interesting and varied tone of voice far outweighs the other two issues, and serves to illustrate how we respond positively to exhilarating auditory stimuli. Therefore audio support within any hypermedia system is an essential component for conveying both music and the spoken word.
After observing commercial software and also consulting possible users (both computer aware and musically aware), some contrasting opinions were gathered regarding this central issue.
One advantage of having a waveform displayed graphically is that it provides an accurate and visible manifestation of the sound sequence. The waveform can then be used as a basis for editing, making selections and performing actions upon.
The consensus of opinion among those consulted was that a waveform was of limited benefit when listening to a sound sequence. A naive user cannot easily relate a waveform to sound. The most information they can glean from this representation is whether there is any sound or not, and hence whether it is relatively loud or quiet.
The task of deducing the manuscript representation from the audio media and subsequently displaying it is not a trivial process, and such technology is currently only available for the MIDI format.
Another representation used by the music community is piano roll. This is where a single octave of a piano is displayed vertically on the left hand side of the screen while markers depicting which keys are depressed at regular time intervals scroll from left to right across the display.
So if waveforms were considered to be of little help to the uninitiated user, what other representations might be useful ?
Currently the majority of sound editing by studio engineers is performed manually without any visual aids. All an engineer requires is fine control over the audio device for positioning the start and stop locations, and a playback facility for subsequent refinement of these positions.
When investigating the feasibility of displaying a waveform, it transpired that retrieving the information required was only possible from the WAV file format. The manuscript and piano roll representations would only be applicable to music, as they would be meaningless if the sound recording was of a conversation. Having to make any distinction between conversation and music was clearly undesirable as the appropriate representation would then vary. Detecting whether the sound is of music or conversation is non-trivial and not within the focus of this work.
A list of requirements of a typical hypermedia author/user was made. The essential points are listed below:
to have fine control over the audio device
to be able to identify the current position and duration of the sound sequence
the ability to make a selection over a portion of the sound sequence to author an anchor, playback and subsequently refine that selection
to be able to identify any links in the sound sequence
to be able to follow any links in the sound sequence
When reviewing these requirements it became clear that the need for a specific conventional representation for the sound was not of such great importance as first thought. The majority of hypermedia users are unlikely to be musically trained, so the requirement for any specialist representation is limited, although a complete system would address this requirement.
During the specification stage initial reactions to the viewer possibly not having a conventional graphical representation of the sound were unfavourable. But the more the idea was considered and how it related to hypermedia, the more it seemed both plausible and practical. The user interface is elucidated upon in section 3.
A significant amount of commercial software is currently available that facilitates manipulation of audio media from within a GUI[4] environment. A variety of these tools and their respective user interfaces were observed in order to identify the most effective and successful paradigms adopted for work within an audio context.
Video and audio are examples of temporal media, with systems such as Intermedia and The Elastic Charles [Brondmo90] among the first to provide handling for temporal media. The developers of Intermedia also introduced the concept of active anchors [Palaniappan90], where the consequence of traversing a link to an active anchor results in the presentation of dynamic information associated with that anchor.
The most advanced version of Microcosm runs on PC under Microsoft Windows 3.1, but parallel developments for UNIX and Macintosh platforms are underway.
Microcosm supports a wide variety of media such as text, video, audio and bitmaps. The system can be augmented to support additional media with minimal overhead, this is largely attributable to the open design of the system. When another media type is to be supported, a new viewer must be implemented, or an existing application's functionality augmented.
The reasons for augmenting the host application with the hypermedia services are two fold. The most obvious reason being that the original application is already available and also ideal for viewing and manipulating the data. The alternative is developing a Microcosm viewer for that media type. The second reason being that the user is comfortable with the operation of the host application. Many users would prefer to master these hypermedia services from within a familiar application as opposed to learning to use a totally new application.
Microcosm Aware This type of document viewer is able to communicate with Microcosm at an intimate level. Messages are able to pass freely both to and from the rest of the system and so the viewer is fully integrated.
Microcosm Unaware These are viewers that are normally external applications that have not been modified in any way but are used to display information in the Microcosm environment. They are invoked as a separate within the system but then exist outside of the Microcosm framework. Because of the nature of generic links, this type of viewer can make use of them provided they can pass selection-based information between themselves and the Microcosm system. By exploiting features of the GUI that allow applications to share data (for example, by the clipboard) this can be achieved and thus even a Microcosm oblivious application can have very limited hypermedia functionality.
Partially Microcosm Aware In between these two extremes there are applications which are initially Microcosm unaware, but can be modified to become at least partially Microcosm aware. These are applications which have some form of programmability, and provide methods to allow communication with external applications by whatever means available, for example using the DDE interface that the Microcosm system provides.
Through these three methods, any existing application can be integrated into the hypermedia system to some degree, allowing Microcosm to provide the linking service between them. Information can be stored in whatever format is the most convenient, and the application used to view that format can be used to display it, or another can be written. Either way, the system itself is not limited to the types of information it can display and link between, but is open instead.
WAV file format was the only audio media supported. The average user was not prepared to use the Microsoft Windows Sound Recorder to create a WAV file. This process involves a degree of experimentation in order to determine the optimum recording levels to keep distortion to a minimum. Reasonable quality recordings of conversations could be achieved, but recording music proved to be less successful. An attempt was made to record an operatic excerpt from CD audio, which upon playback, resulted in the higher registers of the female vocal being intolerably distorted.
The maximum length a recording to a WAV file can be is determined by how much computer memory is available. WAV files also consume an enormous amount of disk space even for fairly short sound excerpts. For example a WAV file 31.5 seconds in duration is 347.7 Kbytes. Incorporating several AVI[5] video and WAV sound files into a hypermedia application has the potential to consume very large quantities of storage space.
When a user followed a link to an audio document Microcosm would launch the Microsoft Windows Sound Recorder application with the WAV file to be played. From that moment onwards Microcosm was unable to have any influence over the application. The author may have preferred to have a single copy of a WAV file in storage and have links to shorter extracts from that single conversation or piece of music. This scenario could not be achieved, and once launched, the entire WAV file would have to be played to completion. The above situation could only be achieved by creating several WAV files, which is wholly undesirable due to the recording process being time consuming and compounded by the possibility that some of the audio material will be replicated in these WAV files.
But perhaps the most serious limitation with this situation is that links cannot be created in or followed from audio. This is a fundamental service that any hypermedia system would be expected to provide. Every other viewer within Microcosm supports the actions of creating and follow links, it was now time to extend this basic requirement to encompass sound.
Consequently there was a pressing requirement for a purpose-built Microcosm sound viewer to address these shortcomings. The new viewer described in this paper provides an interface for allowing the author/user to control the sound, and also facilitates creation and traversal of links to and from audio media. The audio media formats that this viewer supports are WAV, CD Audio and MIDI.
Fortunately, there already exists a well understood user model for interaction with audio devices. This interface exploits the conventional control panel used by cassette decks and compact disc machines, as most people are familiar with their operation. Those familiar features have been augmented to provide hypermedia functionality. The resultant user interface can be seen in Fig 2. Several screen shots of the sound viewer in use within Microcosm can be found in the Appendix.
The individual features of the interface are now described and, where necessary, new terminology is introduced to describe its operation.
Two concepts that are central to the operation of the viewer are the concepts of the local and global views. A local view is an authored subset of a sound sequence, e.g. the introduction to a song. A global view is the song in its entirety. For example, if an author was developing a hypermedia application about a classical symphony it would be desirable to be able to select small excerpts of the music that can be played when required. If the symphony was stored on CD audio the author could make a link to a certain excerpt which could be played on request (a local view). But at any time the user may have access to the global view and could listen to the symphony in its entirety.
Fig
2 : The user interface of the sound viewer working with Microcosm.
The length of the overview window represents the length of the sound sequence to be heard. The black rectangle within the overview window moves from left to right as the sound sequence plays, providing a visual clue to the current position. If any links are present within the sound sequence they are represented as horizontal lines drawn in the overview window at the relative position in the sound sequence. The length of the line represents the period of time the link is valid.
The black rectangle in the overview window represents the exploded view that can be seen in the detail window above. As the black rectangle within the overview window moves along a more detailed view of any links is horizontally scrolled within the detail window, both in synchronisation with the sound playing. Links are represented as horizontal lines within the overview window, but within the detail window they are mustard coloured rectangles. Displayed within the mustard link rectangle is a textual annotation stating the destination document media type followed by a brief description of the link relationship (entered by the author when it was created).
To follow a link, the user must position the mouse pointer over the relevant link rectangle displayed within the detail window and perform a double click action. This action can be performed both when a sound sequence is being played and when stopped.
Once the user is satisfied with the portion of sound that they have selected, the Start Link or End Link option from the Action menu can be selected to create a link anchor. The Microcosm link creation mechanisms are well documented [Heath92]. The link creation process using the sound viewer can be followed in Fig 6 and Fig 7 in the Appendix.
The Action menu option is common to all Microcosm viewers, but this only has two items within it. The first item is Start Link and the second item End Link. These items become active once a legal selection has been made.
The Controls menu option only has one item within it. The item is Audio Console. If this item is selected then a window is displayed providing an extensive set of audio controls for navigating the global view. This extended control panel can be seen in Fig 5 in the Appendix.
Fig. 3: A diagram depicting the major components of the program.
The front end of the program used a suite of audio device independent functions in order to control the audio media. This layer of abstraction allowed the user interface front-end to be conveniently separated from any audio device specifics. At a lower level, each of the audio formats supported has a suite of functions that are called from the audio device independent layer. Each of the media specific functions rely heavily upon the Microsoft Windows MCI[6] for this implementation.
The brief text descriptions that annotate the buttons are divided into two pieces. The first word is the destination document type, e.g. TEXT or VIDEO. A colon separates this information from the link description that then follows. Each media type supported within Microcosm has an associated icon that is displayed when the user selects a document to view. It would be better to draw this icon onto the button rather than output the text description as is done at present.
A VU meter has been developed which would provide a visual association between the scrolling of the graphics windows and the playing of the audio media. Unfortunately we have not found how to retrieve the current output levels from the audio hardware. When this has been obtained it will be trivial to incorporate this feature into the sound viewer.
Although the algorithm to arrange the links makes the best use of the screen space available, it is a finite resource. The viewer imposes a current maximum of six links, that can be valid over the same time period, to be displayed on the screen at any one time. A solution that would be consistent with a text application would be to supply a vertical scroll bar to allow the user to view any links not within the current display area.
The sound viewer could enhanced to provide additional hypermedia functionality that would be of significant use to musicians. Support could be provided for generating and displaying manuscript and piano roll representations of music, but perhaps more interestingly, providing support for computed links into digital audio media. One could search for transposed recurrences of melodies or chords, intervallic successions and other interesting musical patterns. This could prove a fruitful area for future work.
Musical texturing could also be supported when Microcosm is able to provide a degree of synchronisation. Texturing is where a piece of music is gradually constructed, building up the sound by introducing each instrument or track layer by layer. This approach allows the user to appreciate the integration of the rhythmic, melodic and harmonic content of the piece more easily.
A temporal anchor is a link with respect to time. Temporal anchors are only applicable to temporal media such as audio and video. A temporal link is activated when the current position in the media is equal to the start position of the link.
This feature enables additional information to be viewed simultaneously, and automatically, within the sound sequence. The following scenario exemplifies their use. Imagine a piece of classical music is being played, with the accompanying pages of score being displayed at specified intervals automatically allowing a musician to chart its progress. The score would have previously required digitising, but the temporal anchors are the mechanism for triggering them off.
The Microcosm Linker has now been extended to provide an automatic button type allowing users to author such links in temporal media.
The first problem was that WAV was the only audio format supported, and the associated difficulties with transferring sound recordings to this format. This process could become time consuming and often resulted in low quality recordings. The second problem was concerned with audio files consuming large quantities of storage space.
One obvious solution to the above problems was to provide CD audio support, which has two major implications. The first being that storage consumption is no longer a serious concern, as all that requires storing is the start and stop positions that the CD is to be played from and to. The prevalence of CD audio in recent years has meant that there is an immense catalogue of CD audio material available, making it vital that authors have the ability to incorporate this medium into their applications.
The remaining problems discussed earlier related to the absence of hypermedia support for audio. This document has detailed the way in which the sound viewer provides controls for manipulating audio media and also the hypermedia support for creating and traversing links.
The sound viewer is to be incorporated within the next major release, Microcosm version 3.
Also to Wendy Hall, Hugh Davis and Adrian Pickering for their focus and guidance over the duration of the project.
A very big thank you to Nick Cook and Dan Leech-Wilkinson from the Department of Music for providing me with the superb hardware on which to develop this project, and also for their comments and suggestions when this project was in its infancy.
Fig
4 : In this screen shot it can be seen that the black rectangle in the
overview window has been resized to make it larger. The larger the black
rectangle within the overview window, the lower the zoom factor in the detail
window, and vice-versa.
Fig
5 : This screen shot shows the extended control panel in a window beneath
the main application window.
Fig
6 : A link being authored between sound and text can be seen in this screen
shot.
Within the sound viewer at the top left of the figure, a selected portion of sound is highlighted in black. A highlighted selection can also be seen within the text viewer in the bottom left corner of the figure. Start Link has been selected from the Action menu of the sound viewer, and End Link has been selected from the Action menu of the text viewer. The Start Link and End Link windows can be seen displaying the respective selections. Once the Complete button has been hit the Linker window is displayed. The user then enters a link description followed by OK to forge the link.
Fig
7 : This screen shot shows the state of the sound viewer after the link has
been authored. The sound viewer has been dynamically updated to incorporate
the new link, both within the overview and detail windows. The user can clear
the highlighted selection by performing a single mouse click within the detail
window.
[Champine90] Champine, Geer, Ruh, Project Athena as a Distributed Computer System, Computer, pp40, September 1990.
[Eysenck86] Michael Eysenck, Mark Keane, Cognitive Psychology: A Students Handbook, Published 1986.
[Fountain90] Andrew M. Fountain, Wendy Hall, Ian Heath and Hugh C. Davis, MICROCOSM: An Open Model for Hypermedia With Dynamic Linking, in A. Rizk, N. Streitz and J. Andre (eds), Hypertext: Concepts, Systems and Applications. The Proceedings of The European Conference on Hypertext, INRIA, France, November 1990, Cambridge University Press, 1990.
[Goose93] Stuart Goose, Sound Viewer for Microcosm, 3rd Year Project Report, Department of Electronics and Computer Science, University of Southampton, 1993.
[Heath92] Ian Heath, An Open Model for Hypermedia: Abstracting Links from Documents, PhD Thesis, Department of Electronics and Computer Science, University of Southampton, 1992.
[MPC] Multimedia PC Marketing Council, Standard Multimedia PC Configuration, Washington DC, 1992.
[Palaniappan90] Murugappan Palaniappan, Nicole Yankelovich, Mark Sawtelle, Linking Active Anchors: A Stage In The Evolution Of Hypermedia, Hypermedia, Volume 2 Number 1 pp47, January 1990.
[Yankelovich88] Yankelovich, Haan, Meyrowitz, Drucker, Intermedia: The Concept and the Construction of a Seamless Information Environment, Computer, pp81, January 1988.