ID persistente
|
doi:10.21950/WRH0SO |
Fecha de publicación
|
2025-04-01 |
Título
| The financial narrative summarisation shared task (FNS 2022 & 2023): Datasets |
Autor
| Moreno-Sandoval, Antoniohttps://ror.org/01cby8j38ORCIDhttps://orcid.org/0000-0002-9029-2216
Carbajo-Coronado, Blancahttps://ror.org/01cby8j38ORCIDhttps://orcid.org/0000-0001-7693-0042 |
Contacto
|
Utilice el botón de e-mail de arriba para contactar.
Moreno-Sandoval, Antonio (Universidad Autónoma de Madrid. Laboratorio de Lingüística Informática) |
Descripción
| Financial Narrative Processing (FNP) consists of workshops organized by Lancaster University at international NLP conferences to address various aspects of automatic processing of financial narratives, including automatic summarization. The LLI-UAM participated in 2022 and 2023 by creating Spanish-language datasets for the FNS shared task (evaluating AI systems using the same dataset to compare different approaches).
The dataset consists of complete annual reports from companies, chairmen's letters (which are considered summaries of the reports), and a version created by linguists that consists of a summary of the chairmen's letters in fewer than 1,000 words. Based on the dataset, participants train their models to generate summaries similar to the chairman's letter or the simplified version for new evaluation reports that were not shared during training. The evaluation is conducted using the ROUGE metric.
The dataset is composed of 262 financial reports taken from the FinT-esp corpus. The reports were originally in PDF format and were converted into plain text, removing tables, footnotes, headers, and retaining only the narrative content. The length of the reports ranges from 40 to 400 pages, with an average of 36,285 words. A total of 262 chairman's letters were extracted, and an additional 262 summary documents were created, each containing fewer than 1,000 words. This publication is about the dataset from the 2022 and 2023 competition.
These are txt files containing the full report, their respective chairmen's letters, and the summaries of these letters. They belong to The Financial Narrative Summarisation Shared Task (2022 and 2023). |
Materia
| Artes y humanidades; Administración y empresas |
Palabra clave
| dataset
summarization
shared task
FNS
annual reports
chairman´s letter |
Publicación relacionada
| EL-HAJ, M., N. ZMANDAR, P. RAYSON, A. ABURA’ED, M. LITYAK, N. PITTARAS, G. GIANNAKOPOULOS, A. KOSMOPOULOS, B. CARBAJO CORONADO, A. MORENO SANDOVAL (2022) 'The financial Narrative Summarisation Shared Task (FNS 2022).' In Proceedings of the LREC 2022, Marseille, June. handle http://hdl.handle.net/10486/711014
ZAVITSANOS, E., KOSMOPOULOS, A., GIANNAKOPOULOS, G., LITVAK, M., CARBAJO-CORONADO, B., MORENO-SANDOVAL, A., EL-HAJ, M. (2023) 'The Financial Narrative Summarisation Shared Task (FNS 2023)' in proceedings of the 5th Financial Narrative Processing Workshop (FNP 2023) at the 2023 IEEE International Conference on Big Data (IEEE BigData 2023), Sorrento, December. handle http://hdl.handle.net/10486/711231
CARBAJO CORONADO, B., A. MORENO SANDOVAL (2024) 'Financial concepts extraction and lexical simplification in Spanish.' RAEL: Revista Electrónica de Lingüística Aplicada. 22/1: 164-180. DOI: 10.58859/rael.v23i1.590. handle http://hdl.handle.net/10486/710979
PORTA-ZAMORANO, J., CARBAJO-CORONADO, B., MORENO-SANDOVAL, A. (2024). Extraction and Structuring of Financial Terminology. Procesamiento del Lenguaje Natural, 73, 139-149. DOI 10.26342/2024-73-10 handle http://hdl.handle.net/10486/715714
MORENO SANDOVAL, A., A. GISBERT, H. MONTORO (2020) 'Fint-esp: a corpus of financial reports in Spanish' en Miguel Fuster-Márquez, Carmen Gregori-Signes, José Santaemilia Ruiz (eds.): Multiperspectives in Analysis and Corpus Design, Granada: Editorial Comares, pp. 89-102. handle http://hdl.handle.net/10486/718875
MORENO SANDOVAL, A. (2021) 'Financial Narrative Processing in Spanish.' Tirant lo Blanch. ISBN papel: 9788418802423, ISBN ebook: 9788418802430.
CARBAJO B., C. VARGAS, A. MORENO (2022) 'Extracción automática de terminología financiera: estado de la cuestión y propuesta para el desarrollo de FINTERM.' 39º Congreso Internacional de la Asociación Española de Lingüística Aplicada. Las Palmas de Gran Canaria, 28 de abril. |
Notas
| Methodology:
- Collection of financial reports.
- Cleaning of the reports.
- Extraction of the chairmen's letters from each report.
- Generation of summaries of the letters carried out by linguists.
|
Idioma
| Español |
Información de la subvención
| Agencia Estatal de Investigación: PID2020-116001RB-C31 |
Depositante
| Moreno-Sandoval, Antonio |
Fecha de depósito
| 2025-03-13 |
Fecha de recolección
| Start Date: 2021-09-01 ; End Date: 2024-12-31 |