The ContentSpec Protocol: Providing Document Management Services for OHP

Jon Griffiths, Sigi Reich and Hugh Davis
e-mail: {jpg96r | sr | hcd}@ecs.soton.ac.uk
Multimedia Research Group
Department of Electronics and Computer Science
University of Southampton
Southampton, SO17 1BJ, UK

Abstract

Users often face difficulties in accessing documents referenced by open hypermedia systems due to the different protocols employed. This is particularly a problem for OHP as it will often need to access documents in many varied hyperbase systems’ stores or other third party information repositories when navigating links. This position paper proposes a document management protocol, called the ContentSpec Protocol (CSP), for use in combination with the OHP, and with open hypermedia systems in general. This protocol would enable standardised access to documents in third party data stores, creating a wider information space for open hypermedia users to explore.

1. Introduction: Calling for a Document Management Protocol

There is growing demand for the definition of a standard storage protocol for open hypermedia systems. The Flag Taxonomy [Østerbye and Wiil 1996] is a conceptual reference model which identifies four functional modules and four protocols in open hypermedia systems. One of the missing protocols is a standardised storage protocol, used to encapsulate the Flag’s storage manager from the viewer and data manager modules. [Wiil and Leggett 1996], when examining the seven major dimensions associated with a hypermedia system, indicate that a commonly agreed standard for data exchange between hypermedia systems and information stores is a prerequisite for a truly open hypermedia system. [Grønbæk and Wiil 1997] updated the DeVise Hypermedia Framework as presented in [Grønbæk et al. 1994] and identified the need for a hypermedia service <=> hypermedia database protocol which they called the Document Manager Protocol. [Wiil and Whitehead 1997] experimented with the Extended HyperDisco model: using a wrapper to integrate a third party repository (the Chimera server) to interoperate with HyperDisco. They found that they needed a standard protocol in open hypermedia systems for dealing with the integration of external information repositories, which they called the OHS Storage Protocol. [Goose et al. 1997] suggest a Document Management Service Protocol when discussing a reference architecture for an open hypermedia system, used to enable an open hypermedia system to interoperate with document management systems. [Nürnberg and Leggett 1998] have identified that open hypermedia systems require a standard to enable structure processors to make requests to storage engines, which they call a Store Protocol. The Open Hypermedia Protocol specification paper [Davis et al. 1997] also needs a standardised protocol to enable communication between the Open Hypermedia Protocol and document management systems, where such a protocol is called the Document Management Protocol.

It is the functionality of a standardised storage protocol for assisting the OHP in particular which forms the focus of this paper. We examine the functionality that such a document management protocol should possess, focusing on managing document fragments and how to assist open hypermedia systems facilitate collaborative authoring. Then, a proposal for a standardised storage protocol, called the ContentSpec Protocol, used in conjunction with the OHP is considered.

2. Document Management Protocol Functionality

The properties of a standardised storage protocol for managing documents should include the following [Alton 1995, ODMA 1997, Schulzrinne et al. 1998]:

For the OHP specifically, additional useful functionality for the standardised document management protocol includes assisting the OHP with referencing and retrieving fragments of documents (as opposed to entire documents, see section 3), and enabling the OHP to facilitate collaborative working between open hypermedia authors [OHSWG 1997] (see section 4).

3. Document Fragments

Document readers are often only interested in a segment of a document they are downloading, but typically they are forced to download the entire document, as is the case when retrieving pages over the World Wide Web. The danger is that if the segment of interest is part of a large document, it can take a long time to retrieve a document over the Internet of which they are mostly not interested in.

The standardised document management protocol can address the problem of downloading document segments. It can be used to reference segments of large multimedia documents by only retrieving parts of the document between boundaries specified by the document reader. For example retrieving a fragment of a text document between two user-identified boundaries, such as between two tokens embedded within the document or two particular words. The protocol could also be used to retrieve specific paragraphs, sections, pages or chapters of a text document.

The protocol could also be used to retrieve a segment of an image file, for example enabling users to identify an area of the image to be retrieved, such as a rectangle in pixel dimensions, where the image is a large bitmap file which would otherwise take a long time to download. The standard protocol could be used to specify two time boundaries which identify the location of a sound or movie segment to be retrieved, such as retrieving a 10 minute sound/movie fragment beginning at 30 minutes and ending at 40 minutes within a two hour long file. The protocol could also be used for retrieving segments of streaming sound or movie documents in a similar way, in situations where users are only interested in viewing a portion of the streamed media. The standard protocol may also use frames for referencing segments of a video file, for example retrieving a number of frames which are positioned at a specified number of frames into the movie document.

4. Collaborative Authoring

Collaborative information sharing enables working groups dispersed over the Internet to jointly author documents [Dix 1996]. The standard protocol can offer functionality to allow collaborative authors access to documents stored in document management systems. Whilst there are potentially many different modes of collaboration that may be adopted by remote authors [Wiil 1991], there are three modes in particular which are of interest as regards the document management protocol functionality: synchronous, parallel asynchronous and working on individual fragments asynchronously. Synchronous collaboration is where two or more authors are working on the same document at the same time and can see each other’s edits simultaneously. Parallel asynchronous collaboration is where authors edit the same document working independently of one another. The third type of collaboration is where authors work separately on different segments of the same document.

As promulgated by [Haake and Wang 1998] open hypermedia systems provide potential for enabling collaborative authoring, and the Open Hypermedia Protocol combined with a document management protocol offers an opportunity to assist in this process. It is not a solution in itself, but a document management protocol can be of benefit to the functioning of a collaborative open hypermedia environment. One such support that a document management protocol can offer is controlling access to documents during the collaborative authoring modes. The protocol could be used to allow a user to check-out a document and give permission for later users to be able to check-out that same document either for synchronous or asynchronous editing, during the time the original user already has it checked out. The protocol must be capable of returning responses to the querying user where the document is already locked or the user does not have access rights on the specified document. When checking-in a synchronously or asynchronously edited document, the protocol can be used to store different versions of the same document in the information repository. For checking-out fragments of a document, the protocol would be used to not only retrieve the fragment, but lock it so as to prevent other users from checking out the same fragment, whilst still permitting future users to check out other fragments of the same document for parallel asynchronous work. The protocol would also allow the separate document fragments to be checked back into the repository. It might provide appropriate commands so that the user could instruct the re-building of the document to incorporate the new document fragments.

5. Related Work

The ODMA [ODMA 1997] have developed an API to enable third party applications to initiate actions in a document management system, but it only provides a common interface for document management systems and does not operate within hypermedia environments.

The WebDAV initiative [Whitehead and Wiggins 1998] aims to support authoring and versioning over the Internet through extending HTTP, but only WWW browsers can be adapted to communicate using the standardised storage protocol.

The BSCW group use the World Wide Web as the medium for enabling collaboration between remote authors through extending a Web server to provide a set of basic facilities for collaborative information sharing [Bentley et al. 1995], but collaboration is restricted to the WWW only.

HTTP does not fulfil the criteria mentioned in section 2. However, it provides several extension mechanisms, such as PEP or those in HTTP-NG (see http://www.w3.org) and could be used as a transport mechanism for CSP.

6. Proposal for a Document Management Protocol for OHP

The Open Hypermedia Protocol requires a standardised storage protocol as has been described by [Davis et al. 1997]. This will enable the CSF to query third party information repositories for retrieving documents when manipulating links stored in the OHP link server; necessary because the OHP is not in itself a protocol for the actual storage and retrieval of documents. The protocol proposal is for a standardised storage protocol based on the contentspec concept derived from the Open Hypermedia Protocol; it is called the ContentSpec Protocol (abbreviated to CSP). It will perform those document management retrieval functions outlined in section 2. It will interact with typical information repositories, such as the World Wide Web, file servers, databases, hyperbases and document management systems. The protocol will perform that functionality described in this paper: enabling management of different document types, it will include ability to open, lock, retrieve, close and save documents in third party information repositories, manipulating document meta-data, and enable querying of repositories for document information.

The ContentSpec Protocol’s primary aim will be to operate within the OHP environment, but it will also be designed to work within open hypermedia system environments in general and adapted to take full advantage of the features which those environments have to offer. For example it will be used to manipulate segments of multimedia documents, rather than forcing users to retrieve documents in their entirety; and it will be used to assist the OHP to enable collaborative authoring, a task for which the OHP was designed to facilitate within open hypermedia systems [OHSWG 1997]. The ContentSpec Protocol can also provide further functionality to assist the operation of the OHP. It can be used to provide commands for informing the OHP link service whenever documents are manipulated within an information repository (for example moving, deleting, renaming or merging) where the integrity of links in the OHP link service may be affected. The protocol can also be used to load into cache those documents related to the current document being viewed, in advance of the user actively going to retrieve them [Nürnberg et al. 1996]. The protocol can also be used to manage different document versions.

The initial prototype of the ContentSpec Protocol is presently under development. Current implementation focuses on integrating the OHP CSF with document management systems via the ODMA API using CSP. The commands of the ContentSpec Protocol are expressed in the Extensible Mark-up Language (XML). Figure 1 provides an example of the reference architecture that CSP will be operating in. Each type of document store is served by the CSP, although it may not be able to exploit CSP’s full range of document manipulation functionality due to the document management limitations that that particular type of data store may possess, for example WWW servers do not typically offer document locking. The CSP will rely on different underlying transport mechanisms for sending messages, for example it might use HTTP for the Web, TCP/IP for document management systems and DDE for the local file store.

fig1.gif (3134 bytes)

Figure 1: Reference architecture for the CSP.

7. Summary and Conclusion

The ContentSpec Protocol meets the demand for a document management standard necessitated by the OHP in its quest to provide interoperability between link services and third party applications. The paper has also shown that the open hypermedia community requires a storage standard between third party applications and information repositories, a role which the CSP could also fulfil. The CSP can also be used to help address further outstanding open hypermedia issues, for example managing document fragments and supporting open hypermedia collaborative authoring.

References

[Alton 1995] Ken Alton. How To Choose A Technical Document and Workflow Management System: A White Paper on a Significant New Technology. Autodesk. 1995.

[Bentley et al. 1995] Richard Bentley, Thilo Horstmann, Klaas Sikkel and Jonathan Trevor. Supporting Collaborative Information Sharing with the World Wide Web: The BSCW Shared Workspace System. Proceedings of the Fourth World Wide Web Conference, Boston, USA, 11-14 December, 1995.

[Davis et al. 1997] Hugh Davis, Sigi Reich and Antoine Rizk. Towards Interoperability in Open Hypermedia Linkservices. Open Hypermedia Systems Work Group 3.5, 1997.

[Dix 1996] Alan Dix. Challenges and Perspectives for Cooperative Work on the Web. Proceedings of the ERCIM workshop on CSCW and the Web, Sankt Augustin, Germany, 7-9 February 1996.

[Goose et al. 1997] Stuart Goose, Andy Lewis and Hugh Davis. OHRA: Towards an Open Hypermedia Reference Architecture and a Migration Path for Existing Systems. Proceedings of the 3rd Workshop on Open Hypermedia Systems, Southampton, 6-11 April 1997.

[Grønbæk et al. 1994] Kaj Grønbæk, Jens A. Hem, Ole L. Madsen and Lennert Sloth. Cooperative Hypermedia Systems: A Dexter-Based Architecture. Communications of the ACM, Volume 37, Number 2, pp.64-74, February 1994.

[Grønbæk and Wiil 1997] Kaj Grønbæk and Uffe Kock Wiil. Towards a Reference Architecture for Open Hypermedia. In Proceedings of the 3rd Workshop on Open Hypermedia Systems, Hypertext’97, Southampton, UK, April, 1996.

[Haake and Wang 1998] Jörg M. Haake and Weigang Wang. Collaboration Support in Open Hypermedia Environments. In Proceedings of the 4th Workshop on Open Hypermedia Systems, Hypertext’98, Pittsburgh, USA, June 1998.

[Nürnberg et al. 1996] Peter J. Nürnberg, John J. Leggett, Erich R. Schneider and John L. Schnase. Hypermedia Operating Systems: A New Paradigm for Computing. Proceedings of the ACM Conference on Hypertext’96, Washington D.C., USA, ACM Press, March 1996.

[Nürnberg and Leggett 1998] Peter J. Nürnberg and John J. Leggett. A Vision for Open Hypermedia Systems. In Journal of Digital Information (JoDI), Volume 1, Issue 2, January 1998.

[ODMA 1997] Open Document API version 2.0, 19 September 1997.
Available as http://www.aiim.org/odma/odma20.htm [1998 August 01]

[OHSWG 1997] Open Hypermedia Systems Working Group. OHSWG Compendium. 5 November 1997.
Available as http://www.ohswg.org/ohswg.html [1999 January 06]

[Østerbye and Wiil 1996] Kasper Østerbye and Uffe K. Wiil. The Flag Taxonomy of Open Hypermedia Systems. Proceedings of the ACM Conference on Hypertext’96, Washington D.C., USA, pp.129-139, ACM Press, March 1996.

[Schulzrinne et al. 1998] H. Schulzrinne, A. Rao and R. Lanphier. Real Time Streaming Protocol (RTSP). RFC2326, April 1998.

[Whitehead and Wiggins 1998] E. James Whitehead Jr. and Meredith Wiggins. WEBDAV: IETF Standard for Collaborative Authoring on the Web. IEEE Internet Computing, September – October 1998.

[Wiil 1991] Uffe K. Wiil. Using Events as Support for Data Sharing in Collaborative Work. In proceedings of the International Workshop on CSCW, Berlin, Germany, pp.162-176, April 1991.

[Wiil and Leggett 1996] Uffe K. Wiil and J. Leggett. The HyperDisco Approach to Open Hypermedia Systems. Proceedings of the ACM Conference on Hypertext ’96, Washington D.C., USA, pp.140-148, ACM Press. March 1996.

[Wiil and Whitehead 1997] Uffe Kock Wiil and E. James Whitehead Jr. Interoperability and Open Hypermedia Systems. Proceedings of the 3rd Workshop on Open Hypermedia Systems, Southampton, 6-11 April 1997.