The dataset is part of the project: GRESEL-UAM: About GRESEL: AI Generation Results Enriched with Simplified Explanations Based on Linguistic Features (Resultados de Generación de IA Enriquecidos con Explicaciones Simplificadas Basadas en Características Lingüísticas).
This dataset is part of the publication titled "Assessing a Literary RAG System with a Human-Evaluated Synthetic QA Dataset Generated by an LLM: Experiments with Knowledge Graphs," which will be presented in September 2025 in Zaragoza, within the framework of the conference of the Sociedad Española para el Procesamiento del Lenguaje Natural (SEPLN). The work has already been accepted for publication in SEPLN’s official journal, Procesamiento del Lenguaje Natural.
This dataset consists of the Trafalgar knowledge graph database, based on the novel by Benito Pérez Galdós and implemented in Neo4j. This database is used in the RAG experiments presented in the publication. As a Knowledge Graph (KG), it offers several advantages over conventional RAG approaches (which are explored in the paper). The database structures the text of the novel and links elements such as paragraphs and chapters, as well as named entities like character names, places, and ships. More information about its creation and structure can be found in the methodology section.
This is a neo4j.dump file, which contains an export of the Trafalgar database. This file can be used to replicate the database used in the experiments described in the paper.