How to recommend music to film buffs: enabling the provision of recommendations from multiple domains
University of Southampton, School of Electronics and Computer Science,
In broad terms, Recommender Systems use machine learning techniques to process historical data about their user's interests, encoded in user profiles. Once the algorithms
used have been trained on user profiles, their output is used to compile a ranked list of all resources available for recommendation, based on each profile.
Collaborative Filtering is the most widespread method of carrying this out, building on the intuition that similar people will be interested in the same things. The point of failure in this approach lies in that similarity can only be assessed between users that have expressed their preferences on a common set of resources. This requirement prohibits the sharing of preference data across different systems, and causes additional problems when new resources for recommendation become available, or when new users subscribe to the system.
I propose that the difficulty can be overcome by identifying and exploiting semantic relationships between the resources available for recommendation themselves. Moreover, systems that are able to assess the strength of the relationship between any two resources can provide recommendations from multiple domains. For example, music recommendations can be made based on a person's film taste if strong semantic relationships can be identiffed between certain films and the music he/she listens to.
As such the contributions made by this dissertation can be summarised in the following:
1. Facilitating the comparison of heterogeneous resources
The use of Wikipedia is proposed for this purpose, under the assumption that hyper-links between articles in Wikipedia convey latent semantic relationships between the concepts they describe. Thus, a methodology for projecting domain resources onto Wikipedia has been developed. The assumption is then validated by showing evidence that the projections are successful in retaining similarity between domain resources, in three independent domains.
2. Enabling the provision of recommendations from multiple domains
The aforementioned projections encode the links present in Wikipedia articles that are found to correspond to domain resources, and can be viewed collectively as a graph. In addition, the Internet is populated with social networks of people who express their preferences on a given set of resources in the form of ratings. Members of such communities are included as nodes in the graph and ratings regarding domain resources represented as edges. A reversible Markov chain model was implemented to describe the probabilities associated with the traversal of edges in the integrated graph. Nodes that represent resources and other concepts the user is known to be interested in are then identified in the graph. Using these nodes as a starting point, the resource nodes most likely to be reached after an arbitrarily large number of edge traversals are considered the most relevant to the user and are recommended. Experimental results show that the framework is successful in predicting user preferences in domains different to those of the input.
Actions (login required)