[Top] | [Back] | [Next] | [Bottom] |
"A distributed system is one in which I cannot get something done because a machine I've never heard of is down" - Leslie Lamport
2.1 The Nature of Distribution of Resources
The general trend of down-sizing that has taken place over the past fifteen years has had a significant impact on the way electronic information is organised and manipulated. Services, processing and data have moved from the traditional central mainframe of the 1970s into intranetworks of organisation-wide workstations and personal computers. Information is arranged, accessed and processed at a level that is more local to the user.2.2 Distributed Information Systems
Historically, the protocols that were initially developed during the mid 1970s attempted to provide access to sets of collective information across a network. For example, the File Transfer Protocol (FTP) (Postel et al., 1985), allows users to transfer files across the network from remote machines to their own. However, this can only be achieved once the Internet address of the FTP server which hosts the file has been determined and the position of the file in the server's file hierarchy located. For large sites, the latter can be a non-trivial task.
More recently, however, protocols and systems are being developed not only to provide network access to clusters of files, such as that illustrated above, but also to describe some form of relationship between those files. These systems can then accept queries which are performed upon this relationship structure to provide potentially useful information about the nature of the files contained within the system.
2.3 Distributed Resource Locators
Distributed resource locators, as their name implies, provide the user with tools to locate and subsequently access files that are shared across machines within an inter-network. Generally, they make accesses in a specific protocol to some form of distributed resource server which attempts to fulfil or resolve their actions, either personally or by contacting other distributed resource servers.2.3.1 Archie
As mentioned earlier, one of the problems associated with FTP is that before a file can be retrieved, a user must know the Internet address of the FTP server on which it resides. When the Internet was in its earliest stages and was relatively small, this was quite successful. However, as the Internet has grown the number of FTP servers has increased dramatically.
The Archie system provides a much needed service that is lacking within the FTP protocol, but can not be considered as a truly distributed information system. This is due to the fact that at periodic intervals, each Archie server must re-index its file list with each individual FTP server to ensure that it is up-to-date.
2.3.2 Wide Area Information Server
Wide Area Information Server (WAIS) (Kahle et al., 1991) is a searching tool that allows users to specify a search criteria and then apply this to a set of selected resources. It is then the task of the WAIS server to pass the query on to each WAIS server supporting a particular resource. These servers perform a full text search against the query and return a list of matches, ranked in order of relevance.
Although the majority of resources that are available through WAIS servers are of a textual nature, since it is far easier to index, graphical images are being made accessible through the use of simple keyword association.
2.3.3 Gopher
The Gopher system (Anklesaria et al., 1993) is based around the concept of a distributed filing system that forms a hierarchical tree in which individual Gopher servers can incorporate information; intermediate nodes are equivalent to directories and leaf nodes are documents that may be rendered by Gopher clients.
At a first glance, Gopher may be regarded as similar in nature to FTP. However, unlike FTP, the Gopher protocol allows Gopher servers to be transparently integrated to provide a seamless view on a distributed document space, the sum of which is known as Gopherspace. This form of integration allows a Gopher server to place a pointer to another Gopher server at any point within its hierarchy, thus forming a primitive logical hierarchy. Additionally, Gopher search servers can be used to provide a virtual node that is the result of some search that has been performed over some part of Gopherspace.
2.4 Distributed Hypermedia
Essentially, hypermedia is the concept of managing information in a structured and associative manner. Early hypermedia pioneers, such as Vannevar Bush (Bush, 1945), were among the first to consider using a machine (what was to become the latter day computer) as the medium through which to store, link and access this information. Interestingly enough, Bush predicted the information explosion that would occur with electronic information as far back as 1945!
The first hypermedia systems to become available (for example, Guide (Owl, 1987) and Hypercard (Apple, 1987)) were separate applications in their own right and required that all data be specially formatted and committed to the central management of the system. This approach presented two distinct disadvantages; the problem of converting large volumes of information and the problem of reusing that information outside of the hypermedia system.
These, and other limitations of early hypermedia systems, have led recent research into `open' hypermedia, that is, a hypermedia system which is able to use data in its native form, without having to embed additional information which will affect its use outside of the hypermedia system's environment. Open hypermedia also advocates integration with other applications on the desktop and an open linking model, where multiple and different views can be placed upon a collection of data simply by altering the linking structure. A discourse of open hypermedia principles, motivations and potential advantages is given by Davis (Davis et al., 1992), Grønbæk (Grønbæk et al., 1992) and Pearl (Pearl, 1989).
As hypermedia systems increase in complexity and attempt to integrate with more existing systems, they have come under the same pressures as early centralised computing systems: to become distributed by expanding hypermedia data and functionality into and across networks. Distributed hypermedia systems can offer users greater flexibility in the data that they can access and share, alternative views on distributed information resources through the implementation of their own linking structures and increased heterogeneity across platforms and network architectures. The general advantages of distributed hypermedia are outlined by Goose and Dale (Goose et al., 1996).
The following sections give a brief survey of a cross-section of distributed open hypermedia systems that are in current use.
2.4.1 World Wide Web
Perhaps the most used and well-know distributed hypermedia system in existence today is the World Wide Web, also know as the Web, WWW and W³ (Berners-Lee et al., 1992). The Web was originally developed at the international organisation CERN in Switzerland for its high energy physics community, but its applicability in a wider context soon became apparent.
The Web is essentially a client/server model where Web servers offer and perform services on behalf of Web clients. The basic element of information that servers can deal with is a document which is stored in the Hypertext Mark-up Language (HTML) (Berners-Lee et al., 1995). HTML is a subset of the Standardised General Mark-up Language that is oriented toward describing links and general layout within a document. Web clients are able to access servers and place requests through a communication protocol called the Hypertext Transfer Protocol (HTTP) (Berners-Lee, 1995b). It is the task of the Web client to parse HTML documents, to render their contents and to present any links that are present for the user. The HTML format also allows non-textual information, such as bitmaps, to be included within a document.
Documents are specified on Web servers through the use of a Uniform Resource Locator (URL). A URL consists of three components; the communication protocol to use, the Internet address of the Web server and a directory path representing the location of the document within the file hierarchy of the Web server. URLs are flexible enough to allow a number of protocols to be used, for example, FTP, Gopher, Network News (NNTP), etc.
The most influential development of the Web has been the introduction of Hot Java and the Java programming language (Gosling et al., 1995). Applets are executable code that can be embedded within HTML documents, transferred across the network and executed on a variety of platforms, in an attempt to move processing from the server and onto the client. They achieve heterogeneity by being written in Java, an object-oriented programming language that is compiled to platform-independent byte-code and then interpreted by a Java-aware Web client. The application for applet code has yet to be fully realised, since current examples are quite primitive. However, it is a significant step forward for Web technology and will present some interesting developments for the future.
2.4.2 Hyper-G
Hyper-G (Kappe et al., 1993) is a distributed hypermedia project that is being developed at the Technical University of Graz in Austria whose original aims were to provide a general-purpose university information system to support a wide range of activities.
In a similar manner to the Web, Hyper-G functionality is provided by a set of Hyper-G servers. However, unlike the Web, Hyper-G provides a unified view on distributed resources by allowing aggregations of documents, called collections, to span multiple Hyper-G servers. Moreover, Hyper-G permits a more flexible linking model due to the fact that links are stored in external link databases. Although Hyper-G supports bi-directional links, they are still of a button-oriented nature, which makes global and repetitive linking difficult.
The ability to integrate with other distributed information systems makes Hyper-G a powerful hypermedia tool. Indeed, it has been suggested that all existing Web servers could be replaced with Hyper-G servers without major problems (Flohr, 1995). Hyper-G's main weakness appears to be its lack of extensibility due to a non-modular architecture, which can make it difficult to customise to user requirements.
2.4.3 Microcosm: The Next Generation
Microcosm: The Next Generation (MCMTNG) (Goose et al., 1995) is a distributed open hypermedia system that is based around the philosophy and framework of the Microcosm open hypermedia system. Both systems have been developed at the Multimedia Research Laboratory within the University of Southampton.
Unlike both the Web and Hyper-G, the functionality of the MCMTNG system is represented by a set of asynchronous, communicating processes. This modular, process-driven architecture has the advantage that processes can be added to the system to increase or augment its functionality, or to customise it to a particular user's requirements. What is different about the MCMTNG system is that it exists at the user-level, rather than at the site or domain level; there can be multiple MCMTNG systems executing within a domain for a number of given users simultaneously.
When MCMTNG systems inside or outside of a domain wish to communicate, they initially make contact through a domain server. The domain server provides information about the MCMTNG systems that are executing within the domain and the information resources that are available. Once the communicating parties have been established, communication subsequently occurs through the two message routers of the respective MCMTNG systems.
2.5 Distributed Information Management
Distributed information management is a term that has been traditionally been used to describe the integration and management of distributed database systems. However, more recently, it has been used to describe the integration and management of distributed information systems and resources across networks and protocols (De Roure, 1996).
The following sections describe four issues which are key to achieving successful distributed information management.
2.5.1 Resource Discovery
The purpose of resource discovery is to search through distributed information systems and to present new sources of relevant information to the user. Since resource discovery mechanisms are being employed due to the fact that there are too many distributed information systems for the user to search manually, the searching algorithm must be accurate to ensure that relevant data is not overlooked and data that is not relevant is discarded before it reaches the user.2.5.2 Information Integrity
As information becomes distributed across wide areas, information integrity becomes a real need and a real problem. Due to the problems of packet loss and latency associated with networks, it is difficult to ensure that consistency updates are made in a timely fashion. Additionally, when considering collaborative working environments, versioning and update control needs to be implemented to ensure that edits are not lost.2.5.3 Navigation Assistance
Navigation assistance is a process of assisting the user in navigating some form of information resource. This information resource could be the information contained within a distributed information system or could be the information generated by a number of resource discovery algorithms. Either way, a navigational assistance algorithm can be employed to protect the user from information overload.
"A parallel would be the human reference librarian who does not comprehend the material in articles being sought, but does understand the conventions of card catalogues, abstract collections, citation indexes and bibliographical references. Because these relations can be made explicitly in hypertext they can be utilised without, for instance, having any deep comprehension of the meaning of any article title."
Wilkins (Wilkins, 1994) further describes navigation assistance as an algorithm that can fulfil the following requirements:
However, integration is more than protocol conversion, since there is a semantic problem to be overcome. For example, how are links translated between the Web and MCMTNG? If there is more information represented in a MCMTNG link, how is this stored within a Web link? Further more, is it possible to apply links across distributed information systems? If so, where are these links stored and who resolves them?
This illustrates that there are two fundamental approaches to system integration: arming distributed information management tools with the necessary information to be able to converse with multiple distributed information systems and equipping them with the necessary information to make semantic protocol conversions.
Currently, each distributed information system possesses its own set of tools to assist the user in navigating its information resource. However, these tools are only of value within that particular distributed information system and generally only perform the function of resource discovery. Therefore, there is a real need to develop tools that cross distributed information system boundaries to provide the user with a view on the entire collection of information resources that are available.
Additionally, distributed information management shows that the user requires more tools than simplistic resource discovery mechanisms. They require additional discovery aids, navigation aids and maintenance aids, each of which should integrate with a wide range of distributed information systems and should also perform protocol conversion (both syntactic and semantic) in a sensible fashion.
This thesis advocates that a potential solution to this problem is to employ agent technology to perform these task on behalf of the user. These agents would be given goals by the user and work autonomously and intelligently to achieve those goals. The next chapter examines the state of the art in agent technology and attempts to classify the differing views on what an agent actually represents.
[Top] | [Back] | [Next] | [Bottom] |