Robust, scalable, and practical algorithms for recommender systems
Robust, scalable, and practical algorithms for recommender systems
 
  The purpose of recommender systems is to filter information unseen by a user to predict whether a user would like a given item. Making effective recommendations from a domain consisting of millions of ratings is a major research challenge in the application of machine learning and data mining. A number of approaches have been proposed to solvethe recommendation problem, where the main motivation is to increase the accuracy of the recommendations while ignoring other design objectives such as scalability, sparsity and imbalanced dataset problems, cold-start problems, and long tail problems. The aim of this thesis is to develop recommendation algorithms that satisfy the aforementioned design objectives making the recommendation generation techniques applicable to a wider range of practical situations and real-world scenarios.
With this in mind, in the first half of the thesis, we propose novel hybrid recommendation algorithms that give accurate results and eliminate some of the known problems with recommender systems. More specifically, we propose a novel switching hybrid recommendation framework that combines Collaborative Filtering (CF) with a content-based filtering algorithm. Our experiments show that the performance of our algorithm is better than (or comparable to) the other hybrid recommendation approaches available in the literature. While reducing the dimensions of the dataset by Singular Value Decomposition (SVD), prior to applying CF, we discover that the SVD-based CF fails to produce reliable recommendations for some datasets. After further investigation, we find out that the SVD-based recommendations depend on the imputation methods used to approximate the missing values in the user-item rating matrix. We propose various missing value imputation methods, which exhibit much superior accuracy and performance compared to the traditional missing value imputation method - item average. Furthermore, we show how the gray-sheep users problem associated with a recommender system can effectively be solved using the K-means clustering algorithm. After analysing the effect of different centroid selection approaches and distance measures in the K-means clustering algorithm, we demonstrate how the gray-sheep users in a recommender system can be identified by treating them as an outlier problem. We demonstrate that the performance (accuracy and coverage) of the CF-based algorithms suffers in the case of gray-sheep users. We propose a hybrid recommendation algorithm to solve the gray-sheep users problem.
In the second half of the thesis, we propose a new class of kernel mapping recommender system methods that we call KMR for solving the recommendation problem. The proposed methods find the multi-linear mapping between two vector spaces based on the structure-learning technique. We propose the user- and item-based versions of the KMR algorithms and offer various ways to combine them. We report results of an extensive evaluation conducted on five different datasets under various recommendation conditions. Our empirical study shows that the proposed algorithms offer a state-of-the-art performance and provide robust performance under all conditions. Furthermore, our algorithms are quite flexible as they can incorporate more information|ratings, demographics, features, and contextual information|easily into the forms of kernels and moreover, these kernels can be added/multiplied. We then adapt the KMR algorithm to incorporate new data incrementally. We offer a new heuristic namely KMRincr that can build the model without retraining the whole model from scratch when new data are added to the recommender system, providing significant computation savings. Our final contribution involves adapting the KMR algorithms to build the model on-line. More specifically, we propose a perceptron-type algorithm namely KMR percept which is a novel, fast, on-line algorithm for building the model that maintains good accuracy and scales well with the data. We provide the temporal analysis of the KMR percept algorithm. The empirical results reveal that the performance of the KMR percept is comparable to the KMR, and furthermore, it overcomes some of the conventional problems with recommender systems.
  
    
      Ghazanfar, Mustansar Ali
      
        d188e6f7-ad66-46e9-ad86-dfff2a5d8b78
      
     
  
  
   
  
  
    
      May 2012
    
    
  
  
    
      Ghazanfar, Mustansar Ali
      
        d188e6f7-ad66-46e9-ad86-dfff2a5d8b78
      
     
  
    
      Prugel-Bennett, Adam
      
        b107a151-1751-4d8b-b8db-2c395ac4e14e
      
     
  
       
    
 
  
    
      
  
 
  
  
  
    Ghazanfar, Mustansar Ali
  
  
  
  
   
    (2012)
  
  
    
    Robust, scalable, and practical algorithms for recommender systems.
  University of Southampton, Faculty of Physical and Applied Sciences, Doctoral Thesis, 240pp.
  
   
  
    
      Record type:
      Thesis
      
      
      (Doctoral)
    
   
    
    
      
        
          Abstract
          The purpose of recommender systems is to filter information unseen by a user to predict whether a user would like a given item. Making effective recommendations from a domain consisting of millions of ratings is a major research challenge in the application of machine learning and data mining. A number of approaches have been proposed to solvethe recommendation problem, where the main motivation is to increase the accuracy of the recommendations while ignoring other design objectives such as scalability, sparsity and imbalanced dataset problems, cold-start problems, and long tail problems. The aim of this thesis is to develop recommendation algorithms that satisfy the aforementioned design objectives making the recommendation generation techniques applicable to a wider range of practical situations and real-world scenarios.
With this in mind, in the first half of the thesis, we propose novel hybrid recommendation algorithms that give accurate results and eliminate some of the known problems with recommender systems. More specifically, we propose a novel switching hybrid recommendation framework that combines Collaborative Filtering (CF) with a content-based filtering algorithm. Our experiments show that the performance of our algorithm is better than (or comparable to) the other hybrid recommendation approaches available in the literature. While reducing the dimensions of the dataset by Singular Value Decomposition (SVD), prior to applying CF, we discover that the SVD-based CF fails to produce reliable recommendations for some datasets. After further investigation, we find out that the SVD-based recommendations depend on the imputation methods used to approximate the missing values in the user-item rating matrix. We propose various missing value imputation methods, which exhibit much superior accuracy and performance compared to the traditional missing value imputation method - item average. Furthermore, we show how the gray-sheep users problem associated with a recommender system can effectively be solved using the K-means clustering algorithm. After analysing the effect of different centroid selection approaches and distance measures in the K-means clustering algorithm, we demonstrate how the gray-sheep users in a recommender system can be identified by treating them as an outlier problem. We demonstrate that the performance (accuracy and coverage) of the CF-based algorithms suffers in the case of gray-sheep users. We propose a hybrid recommendation algorithm to solve the gray-sheep users problem.
In the second half of the thesis, we propose a new class of kernel mapping recommender system methods that we call KMR for solving the recommendation problem. The proposed methods find the multi-linear mapping between two vector spaces based on the structure-learning technique. We propose the user- and item-based versions of the KMR algorithms and offer various ways to combine them. We report results of an extensive evaluation conducted on five different datasets under various recommendation conditions. Our empirical study shows that the proposed algorithms offer a state-of-the-art performance and provide robust performance under all conditions. Furthermore, our algorithms are quite flexible as they can incorporate more information|ratings, demographics, features, and contextual information|easily into the forms of kernels and moreover, these kernels can be added/multiplied. We then adapt the KMR algorithm to incorporate new data incrementally. We offer a new heuristic namely KMRincr that can build the model without retraining the whole model from scratch when new data are added to the recommender system, providing significant computation savings. Our final contribution involves adapting the KMR algorithms to build the model on-line. More specifically, we propose a perceptron-type algorithm namely KMR percept which is a novel, fast, on-line algorithm for building the model that maintains good accuracy and scales well with the data. We provide the temporal analysis of the KMR percept algorithm. The empirical results reveal that the performance of the KMR percept is comparable to the KMR, and furthermore, it overcomes some of the conventional problems with recommender systems.
         
      
      
        
          
            
  
    Text
 MusiThesisFinal_Mirrored.pdf
     - Other
   
  
  
 
          
            
          
            
           
            
           
        
        
       
    
   
  
  
  More information
  
    
      Published date: May 2012
 
    
  
  
    
  
    
  
    
  
    
  
    
  
    
  
    
     
        Organisations:
        University of Southampton, Southampton Wireless Group
      
    
  
    
  
  
        Identifiers
        Local EPrints ID: 343761
        URI: http://eprints.soton.ac.uk/id/eprint/343761
        
        
        
        
          PURE UUID: ac91463e-a8f4-465c-a706-0e5914ad0a75
        
  
    
        
          
        
    
        
          
            
          
        
    
  
  Catalogue record
  Date deposited: 28 Jan 2013 16:23
  Last modified: 14 Mar 2024 12:07
  Export record
  
  
 
 
  
    
    
      Contributors
      
          
          Author:
          
            
            
              Mustansar Ali Ghazanfar
            
          
        
      
          
          Thesis advisor:
          
            
              
              
                Adam Prugel-Bennett
              
              
            
            
          
        
      
      
      
    
  
   
  
    Download statistics
    
      Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
      
      View more statistics