ID persistente
|
doi:10.21950/V8VSSO |
Fecha de publicación
|
2025-07-22 |
Título
| The Financial Document Causality Detection Shared Task (FinCausal 2025): Dataset |
Autor
| Carbajo-Coronado, Blancahttps://ror.org/01cby8j38ORCIDhttps://orcid.org/0000-0001-7693-0042
Moreno-Sandoval, Antoniohttps://ror.org/01cby8j38ORCIDhttps://orcid.org/0000-0002-9029-2216
Torterolo Orta, Yanco Amorhttps://ror.org/01cby8j38ORCIDhttps://orcid.org/0000-0002-3688-3293
Gozalo, Paulahttps://ror.org/01cby8j38ORCIDhttps://orcid.org/0000-0002-7505-5212 |
Contacto
|
Utilice el botón de e-mail de arriba para contactar.
Moreno-Sandoval, Antonio (Universidad Autónoma de Madrid. Laboratorio de Lingüística Informática) |
Descripción
| The Financial Document Causality Detection Shared Task (FinCausal 2025) aims to improve causality identification in the financial domain through textual data. This shared task focuses on determining causality associated with both events and quantified facts. In this task, a cause can be the justification of a statement or the reason explaining an outcome. Therefore, it is a relation detection task. The main difference compared to the 2023 edition is that the task is framed as a Question Answering (QA) problem. The question is posed in an abstractive manner, while the predicted answer must be extractive. Additionally, the Semantic Answer Similarity (SAS) metric has been introduced.
Participants, given the context and the abstractive question, must extract the literal answer from the context that responds to that question. The questions seek causal-type relationships, either causes or effects.
The task dataset has been extracted from a corpus of Spanish financial annual reports from 2014 to 2018. Participants are provided with a CSV file containing the following fields: ID; Text; Question; Answer.
The standard way to participate is to fine-tune a model using the data annotated by linguists (including Inter-Annotator Agreement, IAA), and then use the fine-tuned model to predict the "ANSWER" field in the test set.
This publication refers to the dataset used in the competition.
This is a dataset from the FinCausal 2025 competition. It is designed for participants to use it to fine-tune their models and complete the task with the highest possible similarity to the gold standard, according to the established metrics.
It consists of texts annotated by linguists, where a context, an abstractive question, and its corresponding extractive answer—which addresses the causal nature of the question—are provided.
There are two versions available: one in English and one in Spanish. |
Materia
| Ciencias de la información y computación |
Palabra clave
| dataset
question answering
FinCausal
shared task
causality
cause-effect
annual reports
financial texts |
Publicación relacionada
| MORENO SANDOVAL, A., PORTA, J., CARBAJO-CORONADO, B., TORTEROLO, Y., SAMY, D. (2025) The Financial Document Causality Detection Shared Task (FinCausal 2025). In Proceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st Workshop on Large Language Models for Finance and Legal (LLMFinLegal), pages 214–221, Abu Dhabi, UAE. Association for Computational Linguistics. handle http://hdl.handle.net/10486/721102
PORTA-ZAMORANO, J., CARBAJO-CORONADO, B., MORENO-SANDOVAL, A. (2024). Extraction and Structuring of Financial Terminology. Procesamiento del Lenguaje Natural, 73, 139-149. DOI 10.26342/2024-73-10. handle http://hdl.handle.net/10486/715714
MORENO-SANDOVAL, A., PORTA-ZAMORANO, J., CARBAJO-CORONADO, B., SAMY, D. MARIKO, D., EL-HAJ, M. (2023) 'The Financial Document Causality Detection Shared Task (FinCausal 2023)' in proceedings of the 5th Financial Narrative Processing Workshop (FNP 2023) at the 2023 IEEE International Conference on Big Data (IEEE BigData 2023). Sorrento, diciembre. handle http://hdl.handle.net/10486/711233
MORENO SANDOVAL, A. (2021) 'Financial Narrative Processing in Spanish.' Tirant lo Blanch. ISBN papel: 9788418802423, ISBN ebook: 9788418802430.
MORENO SANDOVAL, A., A. GISBERT, H. MONTORO (2020) 'Fint-esp: a corpus of financial reports in Spanish' en Miguel Fuster-Márquez, Carmen Gregori-Signes, José Santaemilia Ruiz (eds.): Multiperspectives in Analysis and Corpus Design, Granada: Editorial Comares, pp. 89-102. handle http://hdl.handle.net/10486/718875 |
Notas
| Methodology
- Collection of financial reports.
- Cleaning of the reports.
- Linguistic annotation: generation of an abstractive question and its corresponding extractive answer based on the context
- Validation through IAA (inter-annotator agreement).
|
Idioma
| Inglés; Español |
Fecha de producción
| 2024-09-01 |
Información de la subvención
| Agencia Estatal de Investigación: PID2023-151280OB-C21 |
Depositante
| Moreno-Sandoval, Antonio |
Fecha de depósito
| 2025-07-16 |
Período de tiempo cubierto
| Start Date: 2024-07-01 ; End Date: 2025-01-31 |
Software
| Pages
Excel
Calc |
Dataset relacionado
| Moreno-Sandoval, Antonio; Carbajo-Coronado, Blanca; Porta, Jordi, 2025, 'The financial document causality detection shared task (FinCausal 2023): Dataset,' https://doi.org/10.21950/2JOAZJ, e-cienciaDatos, V1. |