This is a fine-tuned model for bidirectional ES-EN financial translation. This repository contains the low-rank adapters (LoRA) resulting from fine-tuning the google/translategemma-12b-it model with a parallel dataset of financial reports from IBEX 35 companies. This model was specifically fine-tuned to be more adaptable to different input sizes (up to 2k tokens for optimal performance). A sample of the dataset can be found here: Moreno-Sandoval, A., Torterolo Orta, Y. A., Roseti, S. M., Carbajo-Coronado, B., & Porta, J. (2025). Financial ES-EN parallel corpus from annual reports (Version 1) [Dataset]. e-cienciaDatos. https://doi.org/10.21950/85MWYP
This model was developed within the framework of the following publication: Torterolo Orta, Y. A., Chatzi, M., & Moreno-Sandoval, A. (2026). TranslateGemma for ES-EN financial reports: Exploring adaptability to variable-sized contexts. In A. Moreno Sandoval & P. Martínez (Eds.), Proceedings of the 7th Financial Narrative Processing Workshop (FNP 2026) at LREC 2026 (pp. 87–97). ELRA.https://hdl.handle.net/10486/774941
This paper explores bidirectional financial Machine Translation (MT) between Spanish and English, focusing on the specialized domain of annual reports from IBEX 35 companies. Fine-tuned models are compared against zero-shot scenarios through a series of experiments, testing factors such as prompting strategies and model size. On the one hand, this work studies a combination of existing fine-tuning strategies aimed at improving the adaptability of MT models to variable-sized contexts, and, on the other hand, it analyzes the limitations detected in current evaluation metrics. Results are mixed: fine-tuned models show an improvement in both short and long-context scenarios in traditional metrics, while zero-shot predictions are clearly favored by neural metrics. In fact, in a reference-free assessment, the source and the human reference received worse scores than the off-the-shelf prediction models. Consequently, fine-tuning on the human-made dataset hardly improves the neural metrics against zero-shot generations. This suggests that neural metrics tend to favor the fluency of MT generations and literalness over creativity, among other technical limitations regarding long-context adaptability. From a practical standpoint, the low Translation Edit Rate (TER) scores suggest that specialized fine-tuning remains the most viable path for companies to implement efficient Machine Translation Post-Editing (MTPE) workflows, given the stylistic alignment.