The University of Southampton
University of Southampton Institutional Repository

Topic modelling applied on innovation studies of Flemish companies

Topic modelling applied on innovation studies of Flemish companies
Topic modelling applied on innovation studies of Flemish companies
Mapping innovation in companies for the purpose of official statistics is usually done through business surveys. However, this traditional approach faces several drawbacks like a lack of responses, response bias, low frequency, and high costs. Alternatively, text-based models trained on web-scraped text from company websites have been developed to complement or substitute traditional business surveys. This paper utilises web scraping and text-based models to map the business innovation in Flanders with a focus on identifying different types of innovation through topic modelling. More specifically, the scraped web texts are used to identify innovative economic sectors or topics, and to classify firms into these topics using Top2Vec and Lbl2Vec. We conclude that both models can be successfully combined to discover topics (or sectors) and classify companies into these topics which results in an additional parameter for mapping innovation in different regions.
Innovation, Lbl2vec, Text Analysis, Text classification, Top2vec, Topic modeling, Web scraping
2573-234X
1-12
Crijns, Annelien
03f86383-909c-4c81-929a-4a742d2abe12
Vanhullebusch, Victor
8a62c176-ed98-49cc-b331-f399554f3f72
Reusens, Manon
3dc14c4b-793a-41d6-b7bd-64303cda1c42
Reusens, Michael
4264e5fa-ed9c-4446-ae74-a4248ae94a49
Baesens, Bart
f7c6496b-aa7f-4026-8616-ca61d9e216f0
Crijns, Annelien
03f86383-909c-4c81-929a-4a742d2abe12
Vanhullebusch, Victor
8a62c176-ed98-49cc-b331-f399554f3f72
Reusens, Manon
3dc14c4b-793a-41d6-b7bd-64303cda1c42
Reusens, Michael
4264e5fa-ed9c-4446-ae74-a4248ae94a49
Baesens, Bart
f7c6496b-aa7f-4026-8616-ca61d9e216f0

Crijns, Annelien, Vanhullebusch, Victor, Reusens, Manon, Reusens, Michael and Baesens, Bart (2023) Topic modelling applied on innovation studies of Flemish companies. Journal of Business Analytics, 6 (4), 1-12. (doi:10.1080/2573234X.2023.2186274).

Record type: Article

Abstract

Mapping innovation in companies for the purpose of official statistics is usually done through business surveys. However, this traditional approach faces several drawbacks like a lack of responses, response bias, low frequency, and high costs. Alternatively, text-based models trained on web-scraped text from company websites have been developed to complement or substitute traditional business surveys. This paper utilises web scraping and text-based models to map the business innovation in Flanders with a focus on identifying different types of innovation through topic modelling. More specifically, the scraped web texts are used to identify innovative economic sectors or topics, and to classify firms into these topics using Top2Vec and Lbl2Vec. We conclude that both models can be successfully combined to discover topics (or sectors) and classify companies into these topics which results in an additional parameter for mapping innovation in different regions.

Text
Topic_modeling_applied_on_innovation_studies_of_Flemish_companies - Accepted Manuscript
Download (535kB)

More information

Accepted/In Press date: 24 February 2023
e-pub ahead of print date: 3 March 2023
Published date: 2023
Additional Information: Publisher Copyright: © 2023 The Operational Research Society.
Keywords: Innovation, Lbl2vec, Text Analysis, Text classification, Top2vec, Topic modeling, Web scraping

Identifiers

Local EPrints ID: 477464
URI: http://eprints.soton.ac.uk/id/eprint/477464
ISSN: 2573-234X
PURE UUID: dd85f613-f56f-4c34-baf4-67c38190a8d1
ORCID for Bart Baesens: ORCID iD orcid.org/0000-0002-5831-5668

Catalogue record

Date deposited: 06 Jun 2023 17:09
Last modified: 17 Mar 2024 07:43

Export record

Altmetrics

Contributors

Author: Annelien Crijns
Author: Victor Vanhullebusch
Author: Manon Reusens
Author: Michael Reusens
Author: Bart Baesens ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×