ID persistente
|
doi:10.21950/JXFKRB |
Fecha de publicación
|
2025-04-01 |
Título
| List of financial terms |
Autor
| Carbajo-Coronado, Blancahttps://ror.org/01cby8j38ORCIDhttps://orcid.org/0000-0001-7693-0042
Moreno-Sandoval, Antoniohttps://ror.org/01cby8j38ORCIDhttps://orcid.org/0000-0002-9029-2216
Porta, Jordihttps://ror.org/01cby8j38ORCIDhttps://orcid.org/0000-0001-5620-4916 |
Contacto
|
Utilice el botón de e-mail de arriba para contactar.
Moreno-Sandoval, Antonio (Universidad Autónoma de Madrid. Laboratorio de Lingüística Informática) |
Descripción
| The creation of this dataset is framed in the Spanish national project CLARA-FINT. It is a dataset with financial texts from the main Spanish listed companies' annual reports. Usually, said reports are publicly available under their respective shareholders website sections. The creation of a manually-annotated gold standard by linguists is explored, as well as its subsequent use for fine-tuning a language model to further extract more terms automatically. These new terms are validated by humans afterwards and then incorporated to the definitive list of terms. There are 13,958 terms in total.
A list of terms in TXT format, with a term in each line. It contains financial terms that were extracted from the annual reports of the main Spanish companies listed in the IBEX 35 index. Firstly, a manual extraction is performed by annotation. This results in a gold standard. Secondly, this gold standard is employed to fine-tune a language model, which also generates new terms to add to the definitive list, upon human validation. There are in total 13,958 terms in the list. |
Materia
| Artes y humanidades; Ciencias de la información y computación |
Palabra clave
| financial terms
annual reports
IBEX 35
fine-tuning
automatic extractor |
Publicación relacionada
| CARBAJO CORONADO, B., A. MORENO SANDOVAL (2024) 'Financial concepts extraction and lexical simplification in Spanish.' RAEL: Revista Electrónica de Lingüística Aplicada. 22/1: 164-180. DOI: 10.58859/rael.v23i1.590. handle http://hdl.handle.net/10486/710979
PORTA-ZAMORANO, J., CARBAJO-CORONADO, B., MORENO-SANDOVAL, A. (2024). Extraction and Structuring of Financial Terminology. Procesamiento del Lenguaje Natural, 73, 139-149. DOI: 10.26342/2024-73-10. handle http://hdl.handle.net/10486/715714
MORENO SANDOVAL, A., A. GISBERT, H. MONTORO (2020) 'Fint-esp: a corpus of financial reports in Spanish' en Miguel Fuster-Márquez, Carmen Gregori-Signes, José Santaemilia Ruiz (eds.): Multiperspectives in Analysis and Corpus Design, Granada: Editorial Comares, pp. 89-102. handle http://hdl.handle.net/10486/718875
MORENO SANDOVAL, A. (2021) 'Financial Narrative Processing in Spanish.' Tirant lo Blanch. ISBN papel: 9788418802423, ISBN ebook: 9788418802430.
CARBAJO B., C. VARGAS, A. MORENO (2022) 'Extracción automática de terminología financiera: estado de la cuestión y propuesta para el desarrollo de FINTERM.' 39º Congreso Internacional de la Asociación Española de Lingüística Aplicada. Las Palmas de Gran Canaria, 28 de abril. |
Notas
| Methodology:
- Cleaning the texts.
- Manual annotation of the financial terms by linguists.
- Fine-tuning a language model using the previous annotations; gold standard.
- Applying the resulting model, an automatic extractor, to get predictions of potential terms. This provides a list of terms to be validated.
- Human-validating said potential terms.
- Finally, this results in a bigger list of terms, consisting of the gold standard terms and the new terms obtained after human validation.
|
Idioma
| Español |
Información de la subvención
| Agencia Estatal de Investigación: PID2020-116001RB-C31 |
Depositante
| Moreno-Sandoval, Antonio |
Fecha de depósito
| 2025-03-13 |
Fecha de recolección
| Start Date: 2021-09-01 ; End Date: 2024-12-31 |
Tipo de datos
| Text |