|
Persistent Identifier
|
doi:10.21950/FWEML6 |
|
Publication Date
|
2025-03-27 |
|
Title
| Automatic financial term extractor |
|
Author
| Moreno-Sandoval, Antoniohttps://ror.org/01cby8j38ORCIDhttps://orcid.org/0000-0002-9029-2216
Porta, Jordihttps://ror.org/01cby8j38ORCIDhttps://orcid.org/0000-0001-5620-4916
Carbajo-Coronado, Blancahttps://ror.org/01cby8j38ORCIDhttps://orcid.org/0000-0001-7693-0042 |
|
Point of Contact
|
Use email button above to contact.
Moreno-Sandoval, Antonio (Universidad Autónoma de Madrid. Laboratorio de Lingüística Informática) |
|
Description
| The creation of this dataset is framed in the Spanish national project CLARA-FINT. The aim of this task within the project was to create an automatic financial term extractor for Spanish. In order to do so, the first step was to apply linguistic annotation on texts, namely annual reports from the main Spanish listed companies in the IBEX 35 index. The next step involved the use of these annotations to fine-tune a model for the financial term extraction task. This dataset contains the fine-tuned model, i.e., the automatic extractor. It is described in the paper PORTA-ZAMORANO, J., CARBAJO-CORONADO, B., MORENO-SANDOVAL, A. (2024).
It is a bert-multilingual model that was fine-tuned for the financial terms extraction task. Texts annotated by linguists were used for fine-tuning as training data. Said texts contained financial terms that were highlighted within their context. |
|
Subject
| Arts and Humanities; Computer and Information Science |
|
Keyword
| automatic term extractor
financial terms
linguistic annotation
fine-tuning
machine learning
computational linguistics |
|
Related Publication
| Is Supplement To: PORTA-ZAMORANO, J., CARBAJO-CORONADO, B., MORENO-SANDOVAL, A. (2024). Extraction and Structuring of Financial Terminology. Procesamiento del Lenguaje Natural, 73, 139-149. DOI 10.26342/2024-73-10 handle http://hdl.handle.net/10486/715714
CARBAJO CORONADO, B., A. MORENO SANDOVAL (2024) 'Financial concepts extraction and lexical simplification in Spanish.' RAEL: Revista Electrónica de Lingüística Aplicada, 22.1, 164-180. DOI: 10.58859/rael.v23i1.590 handle http://hdl.handle.net/10486/710979
MORENO SANDOVAL, A., A. GISBERT, H. MONTORO (2020) 'Fint-esp: a corpus of financial reports in Spanish' en Miguel Fuster-Márquez, Carmen Gregori-Signes, José Santaemilia Ruiz (eds.): Multiperspectives in Analysis and Corpus Design, Granada: Editorial Comares, pp. 89-102. handle http://hdl.handle.net/10486/718875
MORENO SANDOVAL, A. (2021) 'Financial Narrative Processing in Spanish.' Valencia: Tirant lo Blanch, 2021
CARBAJO B., C. VARGAS, A. MORENO (2022) 'Extracción automática de terminología financiera: estado de la cuestión y propuesta para el desarrollo de FINTERM.' 39º Congreso Internacional de la Asociación Española de Lingüística Aplicada. Las Palmas de Gran Canaria, 28 de abril
References: Moreno-Sandoval, A., Campillos-Llanos, L., & García-Serrano, A. (2024). Language Resources in Spanish for Automatic Text Simplification across Domains (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2409.20466 handle http://hdl.handle.net/10486/715711 |
|
Notes
| Methodology:
- Cleaning the texts.
- Manual annotation of the financial terms by linguists.
- Fine-tuning a language model using the previous annotations; gold standard.
- Applying the resulting model and subsequent human validation.
|
|
Language
| Spanish |
|
Funding Information
| Agencia Estatal de Investigación: PID2020-116001RB-C31 |
|
Depositor
| Moreno-Sandoval, Antonio |
|
Deposit Date
| 2025-03-13 |
|
Data Type
| code |
|
Software
| Python scripts |