Robust, scalable, and practical algorithms for recommender systems

Ghazanfar, Mustansar Ali (2012) Robust, scalable, and practical algorithms for recommender systems. University of Southampton, Faculty of Physical and Applied Sciences, Doctoral Thesis, 240pp.

Record type: Thesis (Doctoral)

Abstract

The purpose of recommender systems is to filter information unseen by a user to predict whether a user would like a given item. Making effective recommendations from a domain consisting of millions of ratings is a major research challenge in the application of machine learning and data mining. A number of approaches have been proposed to solvethe recommendation problem, where the main motivation is to increase the accuracy of the recommendations while ignoring other design objectives such as scalability, sparsity and imbalanced dataset problems, cold-start problems, and long tail problems. The aim of this thesis is to develop recommendation algorithms that satisfy the aforementioned design objectives making the recommendation generation techniques applicable to a wider range of practical situations and real-world scenarios.

With this in mind, in the first half of the thesis, we propose novel hybrid recommendation algorithms that give accurate results and eliminate some of the known problems with recommender systems. More specifically, we propose a novel switching hybrid recommendation framework that combines Collaborative Filtering (CF) with a content-based filtering algorithm. Our experiments show that the performance of our algorithm is better than (or comparable to) the other hybrid recommendation approaches available in the literature. While reducing the dimensions of the dataset by Singular Value Decomposition (SVD), prior to applying CF, we discover that the SVD-based CF fails to produce reliable recommendations for some datasets. After further investigation, we find out that the SVD-based recommendations depend on the imputation methods used to approximate the missing values in the user-item rating matrix. We propose various missing value imputation methods, which exhibit much superior accuracy and performance compared to the traditional missing value imputation method - item average. Furthermore, we show how the gray-sheep users problem associated with a recommender system can effectively be solved using the K-means clustering algorithm. After analysing the effect of different centroid selection approaches and distance measures in the K-means clustering algorithm, we demonstrate how the gray-sheep users in a recommender system can be identified by treating them as an outlier problem. We demonstrate that the performance (accuracy and coverage) of the CF-based algorithms suffers in the case of gray-sheep users. We propose a hybrid recommendation algorithm to solve the gray-sheep users problem.

In the second half of the thesis, we propose a new class of kernel mapping recommender system methods that we call KMR for solving the recommendation problem. The proposed methods find the multi-linear mapping between two vector spaces based on the structure-learning technique. We propose the user- and item-based versions of the KMR algorithms and offer various ways to combine them. We report results of an extensive evaluation conducted on five different datasets under various recommendation conditions. Our empirical study shows that the proposed algorithms offer a state-of-the-art performance and provide robust performance under all conditions. Furthermore, our algorithms are quite flexible as they can incorporate more information|ratings, demographics, features, and contextual information|easily into the forms of kernels and moreover, these kernels can be added/multiplied. We then adapt the KMR algorithm to incorporate new data incrementally. We offer a new heuristic namely KMRincr that can build the model without retraining the whole model from scratch when new data are added to the recommender system, providing significant computation savings. Our final contribution involves adapting the KMR algorithms to build the model on-line. More specifically, we propose a perceptron-type algorithm namely KMR percept which is a novel, fast, on-line algorithm for building the model that maintains good accuracy and scales well with the data. We provide the temporal analysis of the KMR percept algorithm. The empirical results reveal that the performance of the KMR percept is comparable to the KMR, and furthermore, it overcomes some of the conventional problems with recommender systems.

Text

MusiThesisFinal_Mirrored.pdf - Other

Download (3MB)