ID persistente
|
doi:10.21950/FWEML6 |
Fecha de publicación
|
2025-03-27 |
Título
| Automatic financial term extractor |
Autor
| Moreno-Sandoval, Antoniohttps://ror.org/01cby8j38ORCIDhttps://orcid.org/0000-0002-9029-2216
Porta, Jordihttps://ror.org/01cby8j38ORCIDhttps://orcid.org/0000-0001-5620-4916
Carbajo-Coronado, Blancahttps://ror.org/01cby8j38ORCIDhttps://orcid.org/0000-0001-7693-0042 |
Contacto
|
Utilice el botón de e-mail de arriba para contactar.
Moreno-Sandoval, Antonio (Universidad Autónoma de Madrid. Laboratorio de Lingüística Informática) |
Descripción
| The creation of this dataset is framed in the Spanish national project CLARA-FINT. The aim of this task within the project was to create an automatic financial term extractor for Spanish. In order to do so, the first step was to apply linguistic annotation on texts, namely annual reports from the main Spanish listed companies in the IBEX 35 index. The next step involved the use of these annotations to fine-tune a model for the financial term extraction task. This dataset contains the fine-tuned model, i.e., the automatic extractor. It is described in the paper PORTA-ZAMORANO, J., CARBAJO-CORONADO, B., MORENO-SANDOVAL, A. (2024).
It is a bert-multilingual model that was fine-tuned for the financial terms extraction task. Texts annotated by linguists were used for fine-tuning as training data. Said texts contained financial terms that were highlighted within their context. |
Materia
| Artes y humanidades; Ciencias de la información y computación |
Palabra clave
| automatic term extractor
financial terms
linguistic annotation
fine-tuning
machine learning
computational linguistics |
Publicación relacionada
| IsSupplementTo: PORTA-ZAMORANO, J., CARBAJO-CORONADO, B., MORENO-SANDOVAL, A. (2024). Extraction and Structuring of Financial Terminology. Procesamiento del Lenguaje Natural, 73, 139-149. DOI 10.26342/2024-73-10 handle http://hdl.handle.net/10486/715714
CARBAJO CORONADO, B., A. MORENO SANDOVAL (2024) 'Financial concepts extraction and lexical simplification in Spanish.' RAEL: Revista Electrónica de Lingüística Aplicada, 22.1, 164-180. DOI: 10.58859/rael.v23i1.590 handle http://hdl.handle.net/10486/710979
MORENO SANDOVAL, A., A. GISBERT, H. MONTORO (2020) 'Fint-esp: a corpus of financial reports in Spanish' en Miguel Fuster-Márquez, Carmen Gregori-Signes, José Santaemilia Ruiz (eds.): Multiperspectives in Analysis and Corpus Design, Granada: Editorial Comares, pp. 89-102. handle http://hdl.handle.net/10486/718875
MORENO SANDOVAL, A. (2021) 'Financial Narrative Processing in Spanish.' Valencia: Tirant lo Blanch, 2021
CARBAJO B., C. VARGAS, A. MORENO (2022) 'Extracción automática de terminología financiera: estado de la cuestión y propuesta para el desarrollo de FINTERM.' 39º Congreso Internacional de la Asociación Española de Lingüística Aplicada. Las Palmas de Gran Canaria, 28 de abril
References: Moreno-Sandoval, A., Campillos-Llanos, L., & García-Serrano, A. (2024). Language Resources in Spanish for Automatic Text Simplification across Domains (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2409.20466 handle http://hdl.handle.net/10486/715711 |
Notas
| Methodology:
- Cleaning the texts.
- Manual annotation of the financial terms by linguists.
- Fine-tuning a language model using the previous annotations; gold standard.
- Applying the resulting model and subsequent human validation.
|
Idioma
| Español |
Información de la subvención
| Agencia Estatal de Investigación: PID2020-116001RB-C31 |
Depositante
| Moreno-Sandoval, Antonio |
Fecha de depósito
| 2025-03-13 |
Tipo de datos
| code |
Software
| Python scripts |