Topic modelling applied on innovation studies of Flemish companies
Topic modelling applied on innovation studies of Flemish companies
Mapping innovation in companies for the purpose of official statistics is usually done through business surveys. However, this traditional approach faces several drawbacks like a lack of responses, response bias, low frequency, and high costs. Alternatively, text-based models trained on web-scraped text from company websites have been developed to complement or substitute traditional business surveys. This paper utilises web scraping and text-based models to map the business innovation in Flanders with a focus on identifying different types of innovation through topic modelling. More specifically, the scraped web texts are used to identify innovative economic sectors or topics, and to classify firms into these topics using Top2Vec and Lbl2Vec. We conclude that both models can be successfully combined to discover topics (or sectors) and classify companies into these topics which results in an additional parameter for mapping innovation in different regions.
Innovation, Lbl2vec, Text Analysis, Text classification, Top2vec, Topic modeling, Web scraping
1-12
Crijns, Annelien
03f86383-909c-4c81-929a-4a742d2abe12
Vanhullebusch, Victor
8a62c176-ed98-49cc-b331-f399554f3f72
Reusens, Manon
3dc14c4b-793a-41d6-b7bd-64303cda1c42
Reusens, Michael
4264e5fa-ed9c-4446-ae74-a4248ae94a49
Baesens, Bart
f7c6496b-aa7f-4026-8616-ca61d9e216f0
2023
Crijns, Annelien
03f86383-909c-4c81-929a-4a742d2abe12
Vanhullebusch, Victor
8a62c176-ed98-49cc-b331-f399554f3f72
Reusens, Manon
3dc14c4b-793a-41d6-b7bd-64303cda1c42
Reusens, Michael
4264e5fa-ed9c-4446-ae74-a4248ae94a49
Baesens, Bart
f7c6496b-aa7f-4026-8616-ca61d9e216f0
Crijns, Annelien, Vanhullebusch, Victor, Reusens, Manon, Reusens, Michael and Baesens, Bart
(2023)
Topic modelling applied on innovation studies of Flemish companies.
Journal of Business Analytics, 6 (4), .
(doi:10.1080/2573234X.2023.2186274).
Abstract
Mapping innovation in companies for the purpose of official statistics is usually done through business surveys. However, this traditional approach faces several drawbacks like a lack of responses, response bias, low frequency, and high costs. Alternatively, text-based models trained on web-scraped text from company websites have been developed to complement or substitute traditional business surveys. This paper utilises web scraping and text-based models to map the business innovation in Flanders with a focus on identifying different types of innovation through topic modelling. More specifically, the scraped web texts are used to identify innovative economic sectors or topics, and to classify firms into these topics using Top2Vec and Lbl2Vec. We conclude that both models can be successfully combined to discover topics (or sectors) and classify companies into these topics which results in an additional parameter for mapping innovation in different regions.
Text
Topic_modeling_applied_on_innovation_studies_of_Flemish_companies
- Accepted Manuscript
More information
Accepted/In Press date: 24 February 2023
e-pub ahead of print date: 3 March 2023
Published date: 2023
Additional Information:
Publisher Copyright:
© 2023 The Operational Research Society.
Keywords:
Innovation, Lbl2vec, Text Analysis, Text classification, Top2vec, Topic modeling, Web scraping
Identifiers
Local EPrints ID: 477464
URI: http://eprints.soton.ac.uk/id/eprint/477464
ISSN: 2573-234X
PURE UUID: dd85f613-f56f-4c34-baf4-67c38190a8d1
Catalogue record
Date deposited: 06 Jun 2023 17:09
Last modified: 17 Mar 2024 07:43
Export record
Altmetrics
Contributors
Author:
Annelien Crijns
Author:
Victor Vanhullebusch
Author:
Manon Reusens
Author:
Michael Reusens
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics