Multilinguality in knowledge graphs
Multilinguality in knowledge graphs
 
  Content on the web is predominantly in English, which makes it inaccessible to individuals who exclusively speak other languages. Knowledge graphs can store multilingual information, facilitate the creation of multilingual applications, and make these accessible to more language communities. In this thesis, we present studies to assess and improve the state of labels and languages in knowledge graphs and apply multilingual information. We propose ways to use multilingual knowledge graphs to reduce gaps in coverage between languages.
We explore the current state of language distribution in knowledge graphs by developing a framework - based on existing standards, frameworks, and guidelines - to measure label and language distribution in knowledge graphs. We apply this framework to a dataset representing the web of data, and to Wikidata. We find that there is a lack of labelling on the web of data, and a bias towards a small set of languages. Due to its multilingual editors, Wikidata has a better distribution of languages in labels. We explore how this knowledge about labels and languages can be used in the domain of question answering. We show that we can apply our framework to the task of ranking and selecting knowledge graphs for a set of user questions A way of overcoming the lack of multilingual information in knowledge graphs is to transliterate and translate knowledge graph labels and aliases. We propose the automatic classification of labels into transliteration or translation in order to train a model for each task. Classification before generation improves results compared to using either a translation- or transliteration-based model on their own. A use case of multilingual labels is the generation of article placeholders for Wikipedia using neural text generation in lower-resourced languages. On the basis of surveys and semi-structured interviews, we show that Wikipedia community members find the placeholder pages, and especially the generated summaries, helpful, and are highly likely to accept and reuse the generated text.
  
    University of Southampton
   
  
    
      Kaffee, Lucie-Aimée
      
        8975c12f-9033-47ed-a2eb-b674b707c2ac
      
     
  
  
   
  
  
    
      October 2021
    
    
  
  
    
      Kaffee, Lucie-Aimée
      
        8975c12f-9033-47ed-a2eb-b674b707c2ac
      
     
  
    
      Carr, Leslie
      
        0572b10e-039d-46c6-bf05-57cce71d3936
      
     
  
       
    
 
  
    
      
  
 
  
  
  
    Kaffee, Lucie-Aimée
  
  
  
  
   
    (2021)
  
  
    
    Multilinguality in knowledge graphs.
  University of Southampton, Doctoral Thesis, 199pp.
  
   
  
    
      Record type:
      Thesis
      
      
      (Doctoral)
    
   
    
    
      
        
          Abstract
          Content on the web is predominantly in English, which makes it inaccessible to individuals who exclusively speak other languages. Knowledge graphs can store multilingual information, facilitate the creation of multilingual applications, and make these accessible to more language communities. In this thesis, we present studies to assess and improve the state of labels and languages in knowledge graphs and apply multilingual information. We propose ways to use multilingual knowledge graphs to reduce gaps in coverage between languages.
We explore the current state of language distribution in knowledge graphs by developing a framework - based on existing standards, frameworks, and guidelines - to measure label and language distribution in knowledge graphs. We apply this framework to a dataset representing the web of data, and to Wikidata. We find that there is a lack of labelling on the web of data, and a bias towards a small set of languages. Due to its multilingual editors, Wikidata has a better distribution of languages in labels. We explore how this knowledge about labels and languages can be used in the domain of question answering. We show that we can apply our framework to the task of ranking and selecting knowledge graphs for a set of user questions A way of overcoming the lack of multilingual information in knowledge graphs is to transliterate and translate knowledge graph labels and aliases. We propose the automatic classification of labels into transliteration or translation in order to train a model for each task. Classification before generation improves results compared to using either a translation- or transliteration-based model on their own. A use case of multilingual labels is the generation of article placeholders for Wikipedia using neural text generation in lower-resourced languages. On the basis of surveys and semi-structured interviews, we show that Wikipedia community members find the placeholder pages, and especially the generated summaries, helpful, and are highly likely to accept and reuse the generated text.
         
      
      
        
          
            
  
    Text
 Thesis unsigned
     - Version of Record
   
  
  
    
  
 
          
            
          
            
           
            
           
        
          
            
  
    Text
 PTD_Thesis_Kaffee-SIGNED
    
   
  
    
      Restricted to Repository staff only
    
  
  
 
          
            
           
            
           
        
        
       
    
   
  
  
  More information
  
    
      Published date: October 2021
 
    
  
  
    
  
    
  
    
  
    
  
    
     
    
  
    
  
    
  
    
  
  
        Identifiers
        Local EPrints ID: 456783
        URI: http://eprints.soton.ac.uk/id/eprint/456783
        
        
        
        
          PURE UUID: e8100e78-097c-4376-9e8c-6cd31ea91d5b
        
  
    
        
          
            
              
            
          
        
    
        
          
            
              
            
          
        
    
  
  Catalogue record
  Date deposited: 11 May 2022 16:42
  Last modified: 17 Mar 2024 02:32
  Export record
  
  
 
 
  
    
    
      Contributors
      
          
          Author:
          
            
              
              
                Lucie-Aimée Kaffee
              
              
                 
              
            
            
          
         
      
        
      
      
      
    
  
   
  
    Download statistics
    
      Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
      
      View more statistics