Art-GenEvalGPT

Versión 1.0

D'Haro Enríquez, Luis Fernando; Gil Martín, Manuel; Luna Jiménez, Cristina; Esteban Romero, Sergio; Estecha Garitagoitia, Marcos; Bellver Soler, Jaime; Fernández Martínez, Fernando, 2024, "Art-GenEvalGPT", https://doi.org/10.21950/LBNLGA, e-cienciaDatos, V1

Revise los Estándares de citas de datos.

Contactar con el propietario

Acceso a las estadísticas completas del dataset

Visualizaciones
663

0 Citas (desde Crossref)

Descripción	Description of the project ASTOUND is an EIC funded project (No. 101071191) under the HORIZON-EIC-2021-PATHFINDERCHALLENGES-01 call. The aim of the project is to develop an artificial conscious AI based on the Attention Schema Theory (AST) proposed by Michel Graziano. This theory proposes that consciousness arises from the brain's ability to create and maintain a simplified model of its own processing, particularly focusing attention on certain aspects of its internal and external environment. The project entails creating an AI system capable of exhibiting consciousness-like behaviours by implementing principles from the AST. This involves constructing a model that simulates attentional processes, allowing the AI to prioritise and focus on relevant information while disregarding irrelevant stimuli. The ASTOUND project will provide an Integrative Approach for Awareness Engineering to establish consciousness in machines, and targeting the following goals: Develop an AI architecture for Artificial Consciousness based on the Attention Schema Theory (AST) through an internal model of the state of the attention. Implement the proposed architecture into a contextually aware virtual agent and prove improved performance thanks to the Attention Schema; for instance, by providing coherent discussion, self-regulation, short-and-long term memory, personalisation capabilities. Define novel ways to measure the presence and level of consciousness in both humans and machines. Description of the dataset The dataset includes synthetic dialogues in the art domain that can be used for training a chatbot to discuss artworks within a museum setting. Leveraging Large Language Models (LLMs), particularly ChatGPT, the dataset comprises over 13,000 dialogues generated using prompt-engineering techniques. The dialogues cover a wide range of user and chatbot behaviours, including expert guidance, tutoring, and handling toxic user interactions. The ArtEmis dataset serves as a basis, containing emotion attributions and explanations for artworks sourced from the WikiArt website. From this dataset, 800 artworks were selected based on consensus among human annotators regarding elicited emotions, ensuring balanced representation across different emotions. However, an imbalance in art styles distribution was noted due to the emphasis on emotional balance. Each dialogue is uniquely identified using a "DIALOGUE_ID", encoding information about the artwork discussed, emotions, chatbot behaviour, and more. The dataset is structured into multiple files for efficient navigation and analysis, including metadata, prompts, dialogues, and metrics. Objective evaluation of the generated dialogues was conducted, focusing on profile discrimination, anthropic behaviour detection, and toxicity evaluation. Various syntactic and semantic-based metrics are employed to assess dialogue quality, along with sentiment and subjectivity analysis. Tools like the MS Azure Content Moderator API, Detoxify library and LlamaGuard aid in toxicity evaluation. The dataset's conclusion highlights the need for further work to handle biases, enhance toxicity detection, and incorporate multimodal information and contextual awareness. Future efforts will focus on expanding the dataset with additional tasks and improving chatbot capabilities for diverse scenarios. (2023-10-01)
Materia	Ingeniería; Ciencias de la información y computación
Palabra clave	Synthetic Dialogues, Chatbots, Artificial Intelligence, Art Domain, Natural Language Processing, Attention Schema Theory (AST), Consciousness
Publicación relacionada	Gil-Martín, M., Luna-Jiménez, C., Esteban-Romero, S., Estecha-Garitagoitia, M., Fernández-Martínez, F., D’Haro, L. F. (2024). Art_GenEvalGPT: a dataset of synthetic art dialogues with ChatGPT.
Notas	Future efforts will focus on expanding the dataset with additional tasks and improving chatbot capabilities for diverse scenarios. METHODOLOGY Dialogues were generated using ChatGPT prompted by instructions tailored to simulate conversations between an expert and a user discussing artworks. Different behaviours in the chatbot and the user were included as part of the instructions. A total number of 4 behaviours are included: 1) the chatbot acts as an art expert or tour guide, providing information about a given artwork and answering questions from the user; 2) the chatbot acts as a tutor or professor, in which the chatbot asks questions to the user and the user may provide correct or incorrect answers. Then the chatbot will provide feedback to the user; 3) the chatbot will have an anthropic or non-anthropic behaviour. Meaning anthropic that the chatbot turns will include opinions or feelings that the chatbot could also experiment based on the artwork (the emotion information is extracted from the ArtEmis original human annotations); and 4) the user has a toxic behaviour (i.e., the user’s turns contain politically incorrect sentences that may contain harmful comments about the content of the artwork, the artists, the styles, or including questions that are provocative, aggressive or non-relevant). The released dataset is based on the ArtEmis dataset and extends it by incorporating dialogues, multiple behaviours and including metadata obtained to assess its quality. From the original dataset, we took a total of 800 artworks with a balanced distribution of emotions to avoid bias in the handling of emotions by the chatbot. A total of 13,870 dialogues were collected, including 378 unique artists, 26 different art styles, and balancing the 4 behaviours mentioned above. The dataset was automatically analysed by using ChatGPT and GPT-4 models on different tasks, e.g., detecting that the factual information provided in the dialogues also was the one provided in the instruction prompt during the generation. Then, instructing the models to detect the presence of toxic comments or anthropic behaviour. Finally, additional libraries and models such as Detoxify, Microsoft Azure Content Moderation Services or LlamaGuard from Meta, were used to automatically label dialogues and turns with labels to indicate toxicity and probabilities of the classification when possible. FILES - filename_codes.json: Contains a structured taxonomy with codes for identifying the different elements of the dataset. It includes codes for profiles, such as painting, expert, and user profiles. Additionally, it contains codes for various attributes such as emotions, toxicity and biases. - metadata.csv: Comma-separated values (CSV) file containing detailed information about each dialogue in the dataset. It includes data such as the author and style of the artwork, emotions, goals, roles, toxicity, and anthropology. This files server as a comprehensive reference for understanding the context and characteristics of each dialogue within the dataset. - prompts.csv: A CSV file that stores the prompts used in generating the dialogues by the ChatGPT model. These prompts provide instructions and guidelines for initiating conversations between the expert and user within the context of discussing artworks in a museum setting. - dialogues.csv: A CSV file containing the actual dialogues generated by the ChatGPT model. Each dialogue entry consists of conversational turns between the expert and user agents. - metrics.csv: A CSV file providing a summary of evaluation metrics obtained to assess the quality and characteristics of the generated dialogues. It includes dialogue-level metrics, toxicity level and categories, syntactic and semantic-based objective metrics, and sentiment analysis results. This file aids in evaluating the performance of the AI chatbot and identifying areas for improvement in dialogue generation. - toxic.csv: A CSV file that contains information about toxicity levels observed within the generated dialogues. It comprises boolean columns, one representing whether the dialogue should be toxic within the prompt, other whether toxicity detection using the Detoxify library with a toxic threshold of 0.4 has identified toxic content within the dialogue, other whether toxicity detection using the Microsoft Azure Content Moderator service has identified toxic content within the dialogue, and one indicates whether toxicity detection using the LLAMA Guard has identified toxic content within the dialogue.
Licencia/Acuerdo de uso de los datos	CC-BY-4.0

Filtrado por

	1 a 3 de 3 Ficheros. Seleccionando varios ficheros no se pueden descargar más de 10 GB.	Formato Original Formato de Archivo (.tab)
	Dialogues.tab Datos tabulares - 36,2 MB Publicado 28 feb. 2024 4 Descargas 2 Variables, 13870 Observaciones A CSV file containing the actual dialogues generated by the ChatGPT model. Each dialogue entry consists of conversational turns between the expert and user agents.	Vista previa "Dialogues.tab" Acceso al fichero File Access Público Opciones de descarga Valores separados por comas (Formato del fichero original) Delimitados por tabuladores RData Download Metadata Metadatos variables Citas de fichero de datos XML de EndNote RIS BibTeX Opciones de exploración View Data
	Metadata.tab Datos tabulares - 11,5 MB Publicado 28 feb. 2024 4 Descargas 13 Variables, 13870 Observaciones Comma-separated values (CSV) file containing detailed information about each dialogue in the dataset. It includes data such as the author and style of the artwork, emotions, goals, roles, toxicity, and anthropology. This files server as a comprehensive reference for understanding the context and characteristics of each dialogue within the dataset	Vista previa "Metadata.tab" Acceso al fichero File Access Público Opciones de descarga Valores separados por comas (Formato del fichero original) Delimitados por tabuladores RData Download Metadata Metadatos variables Citas de fichero de datos XML de EndNote RIS BibTeX Opciones de exploración View Data
	Prompts.tab Datos tabulares - 16,8 MB Publicado 28 feb. 2024 5 Descargas 2 Variables, 13870 Observaciones A CSV file that stores the prompts used in generating the dialogues by the ChatGPT model. These prompts provide instructions and guidelines for initiating conversations between the expert and user within the context of discussing artworks in a museum setting.	Vista previa "Prompts.tab" Acceso al fichero File Access Público Opciones de descarga Valores separados por comas (Formato del fichero original) Delimitados por tabuladores RData Download Metadata Metadatos variables Citas de fichero de datos XML de EndNote RIS BibTeX Opciones de exploración View Data

Metadatos de cita

ID persistente del dataset	doi:10.21950/LBNLGA
Fecha de publicación	2024-02-28
Título	Art-GenEvalGPT
Autor	D'Haro Enríquez, Luis Fernando (Universidad Politécnica de Madrid) - ORCID: 0000-0002-3411-7384 Gil Martín, Manuel (Universidad Politécnica de Madrid) - ORCID: 0000-0002-4285-6224 Luna Jiménez, Cristina (Universidad Politécnica de Madrid) - ORCID: 0000-0001-5369-856X Esteban Romero, Sergio (Universidad Politécnica de Madrid) - ORCID: 0009-0008-6336-7877 Estecha Garitagoitia, Marcos (Universidad Politécnica de Madrid) - ORCID: 0000-0001-8153-0182 Bellver Soler, Jaime (Universidad Politécnica de Madrid) - ORCID: 0009-0006-7973-4913 Fernández Martínez, Fernando (Universidad Politécnica de Madrid) - ORCID: 0000-0003-3877-0089
Contacto	Utilice el botón de e-mail de arriba para contactar. D'Haro Enríquez, Luis Fernando (Universidad Politécica de Madrid)
Descripción	Description of the project ASTOUND is an EIC funded project (No. 101071191) under the HORIZON-EIC-2021-PATHFINDERCHALLENGES-01 call. The aim of the project is to develop an artificial conscious AI based on the Attention Schema Theory (AST) proposed by Michel Graziano. This theory proposes that consciousness arises from the brain's ability to create and maintain a simplified model of its own processing, particularly focusing attention on certain aspects of its internal and external environment. The project entails creating an AI system capable of exhibiting consciousness-like behaviours by implementing principles from the AST. This involves constructing a model that simulates attentional processes, allowing the AI to prioritise and focus on relevant information while disregarding irrelevant stimuli. The ASTOUND project will provide an Integrative Approach for Awareness Engineering to establish consciousness in machines, and targeting the following goals: Develop an AI architecture for Artificial Consciousness based on the Attention Schema Theory (AST) through an internal model of the state of the attention. Implement the proposed architecture into a contextually aware virtual agent and prove improved performance thanks to the Attention Schema; for instance, by providing coherent discussion, self-regulation, short-and-long term memory, personalisation capabilities. Define novel ways to measure the presence and level of consciousness in both humans and machines. Description of the dataset The dataset includes synthetic dialogues in the art domain that can be used for training a chatbot to discuss artworks within a museum setting. Leveraging Large Language Models (LLMs), particularly ChatGPT, the dataset comprises over 13,000 dialogues generated using prompt-engineering techniques. The dialogues cover a wide range of user and chatbot behaviours, including expert guidance, tutoring, and handling toxic user interactions. The ArtEmis dataset serves as a basis, containing emotion attributions and explanations for artworks sourced from the WikiArt website. From this dataset, 800 artworks were selected based on consensus among human annotators regarding elicited emotions, ensuring balanced representation across different emotions. However, an imbalance in art styles distribution was noted due to the emphasis on emotional balance. Each dialogue is uniquely identified using a "DIALOGUE_ID", encoding information about the artwork discussed, emotions, chatbot behaviour, and more. The dataset is structured into multiple files for efficient navigation and analysis, including metadata, prompts, dialogues, and metrics. Objective evaluation of the generated dialogues was conducted, focusing on profile discrimination, anthropic behaviour detection, and toxicity evaluation. Various syntactic and semantic-based metrics are employed to assess dialogue quality, along with sentiment and subjectivity analysis. Tools like the MS Azure Content Moderator API, Detoxify library and LlamaGuard aid in toxicity evaluation. The dataset's conclusion highlights the need for further work to handle biases, enhance toxicity detection, and incorporate multimodal information and contextual awareness. Future efforts will focus on expanding the dataset with additional tasks and improving chatbot capabilities for diverse scenarios. (2023-10-01)
Materia	Ingeniería; Ciencias de la información y computación
Palabra clave	Synthetic Dialogues Chatbots Artificial Intelligence Art Domain Natural Language Processing Attention Schema Theory (AST) Consciousness
Publicación relacionada	Gil-Martín, M., Luna-Jiménez, C., Esteban-Romero, S., Estecha-Garitagoitia, M., Fernández-Martínez, F., D’Haro, L. F. (2024). Art_GenEvalGPT: a dataset of synthetic art dialogues with ChatGPT. Luna-Jiménez, C., Gil-Martín, M., D’Haro, L. F., Fernández-Martínez, F., San-Segundo, R. (2024). Evaluating Emotional and Subjective Responses in Synthetic Dialogues: A Multi-stage Framework with Large Language Models.
Notas	Future efforts will focus on expanding the dataset with additional tasks and improving chatbot capabilities for diverse scenarios. METHODOLOGY Dialogues were generated using ChatGPT prompted by instructions tailored to simulate conversations between an expert and a user discussing artworks. Different behaviours in the chatbot and the user were included as part of the instructions. A total number of 4 behaviours are included: 1) the chatbot acts as an art expert or tour guide, providing information about a given artwork and answering questions from the user; 2) the chatbot acts as a tutor or professor, in which the chatbot asks questions to the user and the user may provide correct or incorrect answers. Then the chatbot will provide feedback to the user; 3) the chatbot will have an anthropic or non-anthropic behaviour. Meaning anthropic that the chatbot turns will include opinions or feelings that the chatbot could also experiment based on the artwork (the emotion information is extracted from the ArtEmis original human annotations); and 4) the user has a toxic behaviour (i.e., the user’s turns contain politically incorrect sentences that may contain harmful comments about the content of the artwork, the artists, the styles, or including questions that are provocative, aggressive or non-relevant). The released dataset is based on the ArtEmis dataset and extends it by incorporating dialogues, multiple behaviours and including metadata obtained to assess its quality. From the original dataset, we took a total of 800 artworks with a balanced distribution of emotions to avoid bias in the handling of emotions by the chatbot. A total of 13,870 dialogues were collected, including 378 unique artists, 26 different art styles, and balancing the 4 behaviours mentioned above. The dataset was automatically analysed by using ChatGPT and GPT-4 models on different tasks, e.g., detecting that the factual information provided in the dialogues also was the one provided in the instruction prompt during the generation. Then, instructing the models to detect the presence of toxic comments or anthropic behaviour. Finally, additional libraries and models such as Detoxify, Microsoft Azure Content Moderation Services or LlamaGuard from Meta, were used to automatically label dialogues and turns with labels to indicate toxicity and probabilities of the classification when possible. FILES - filename_codes.json: Contains a structured taxonomy with codes for identifying the different elements of the dataset. It includes codes for profiles, such as painting, expert, and user profiles. Additionally, it contains codes for various attributes such as emotions, toxicity and biases. - metadata.csv: Comma-separated values (CSV) file containing detailed information about each dialogue in the dataset. It includes data such as the author and style of the artwork, emotions, goals, roles, toxicity, and anthropology. This files server as a comprehensive reference for understanding the context and characteristics of each dialogue within the dataset. - prompts.csv: A CSV file that stores the prompts used in generating the dialogues by the ChatGPT model. These prompts provide instructions and guidelines for initiating conversations between the expert and user within the context of discussing artworks in a museum setting. - dialogues.csv: A CSV file containing the actual dialogues generated by the ChatGPT model. Each dialogue entry consists of conversational turns between the expert and user agents. - metrics.csv: A CSV file providing a summary of evaluation metrics obtained to assess the quality and characteristics of the generated dialogues. It includes dialogue-level metrics, toxicity level and categories, syntactic and semantic-based objective metrics, and sentiment analysis results. This file aids in evaluating the performance of the AI chatbot and identifying areas for improvement in dialogue generation. - toxic.csv: A CSV file that contains information about toxicity levels observed within the generated dialogues. It comprises boolean columns, one representing whether the dialogue should be toxic within the prompt, other whether toxicity detection using the Detoxify library with a toxic threshold of 0.4 has identified toxic content within the dialogue, other whether toxicity detection using the Microsoft Azure Content Moderator service has identified toxic content within the dialogue, and one indicates whether toxicity detection using the LLAMA Guard has identified toxic content within the dialogue.
Idioma	Inglés
Fecha de producción	2023-10-01
Información de la subvención	EC/HE: 101071191
Distribuidor	D'Haro Enríquez, Luis Fernando (Universidad Politécica de Madrid)
Fecha de distribución	2024-02-14
Depositante	D'Haro Enríquez, Luis Fernando
Fecha de depósito	2024-02-21
Software	MS Azure OpenAI API, specifically ChatGPT and GPT-4 versions 2023-03-15-preview Detoxify (https://github.com/unitaryai/detoxify Microsoft Azure Content Moderation Services (https://learn.microsoft.com/en-us/azure/ai-services/content-moderator/overview) LlamaGuard (https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/)
Datasets relacionados	Achlioptas, P., Ovsjanikov, M., Haydarov, K., Elhoseiny, M., & Guibas, L. J. (2021). Artemis: Affective language for visual art. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 11569-11579).; Mohamed, Y., Khan, F. F., Haydarov, K., & Elhoseiny, M. (2022). It is okay to not be okay: Overcoming emotional bias in affective image captioning by contrastive data collection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 21263-21272)

Condiciones de uso del dataset

Licencia/Acuerdo de uso de los datos

Tanto por nuestras Normas de la comunidad como por las buenas prácticas científicas, se espera que se acredite su uso de forma correcta mediante una cita. Por favor, use la cita de datos mostrada en la página del dataset.

Creative Commons Attribution 4.0 International License. CC-BY-4.0

Versión del dataset	Resumen	Colaboradores	Publicado en
No se encontraron registros.

Editar fichero

Este fichero ha sido eliminado (o sustituído) en la versión actual. No puede editarse.

Ficheros restringidos

Añadiendo límites de acceso a los ficheros publicados. Puede añadir o editar las condiciones de uso para ficheros restringidos y permitir a los usuarios solicitar el acceso a los mismos.

Condiciones de acceso para ficheros restringidos

Pedir acceso

Habilitar la solicitud de acceso

Editar Embargo

El fichero o ficheros seleccionados ya se han publicado. Contacte con un administrador para cambiar la razón o la fecha del embargo del fichero o ficheros.

Borrar ficheros

Se borrará el fichero después de que pulse el botón Borrar.

Los ficheros no se eliminarán de las versiones publicadas previamente en el dataset.

Fichero(s) seleccionado(s)

Por favor, seleccione uno o más ficheros.

Compartir dataset

Compartir este dataset en sus redes sociales favoritas.

Citas del dataset

Las citas de este dataset son recolectadas desde Crossref mediante DataCite usando el estándar Make Data Count. Si quiere más información sobre estas estadísticas, puede mirar en la Guía de Usuario.

Lo siento, no se encontraron citas.

Fiches restringidos seleccionados

El/los fichero(s) seleccionado(s) no puede(n) descargarse porque no tiene derechos de acceso.

Opciones de descarga

Los ficheros seleccionados son demasiado grandes para descargarlos en un ZIP.

Puede seleccionar ficheros individuales que ocupen menos del límite de 9,3 GB en la tabla de ficheros, o usar el API de acceso a los datos para acceder a los ficheros mediante un programa.

Fichero(s) seleccionado(s)

Por favor seleccione el fichero o ficheros que quiere descargar.

Fiches restringidos seleccionados

El/los fichero(s) restringido(s) no puede(n) descargarse porque no tiene derechos de acceso.

Pulse en Continuar para descargar los ficheros a los que tiene acceso.

Eliminar dataset

¿Está seguro de que quiere eliminar el dataset?. No podrá deshacer la operación.

Eliminar versión preliminar

¿Está seguro de que quiere eliminar esta versión preliminar? No podrá deshacer la operación.

URL privada de dataset sin publicar

Las URLs privadas solo pueden usarse con versiones sin publicar de datasets.

URL privada de dataset sin publicar

¿Está seguro de que quiere deshabilitar la URL privada? Si ha compartido esta URL privada con otras personas, su dataset sin publicar dejará de estar accesible para ellos.

Borrar ficheros

Se borrará/n el/los fichero/s después de que pulse el botón Borrar.

Los ficheros no se eliminarán de las versiones publicadas previamente en el dataset.

Procesar

Este dataset contiene ficheros de acceso restringido que no puede procesar porque no tiene derechos de acceso.

Eliminar acceso al dataset

¿Está seguro de que quiere realizar la retirada? La(s) versión(es) seleccionada(s) no volverá(n) a estar disponible(s) para el público.

Eliminar acceso al dataset

¿Está seguro de que quiere retirar este dataset? No volverá a estar disponible para el público.

Detalles de las diferencias de versión

Por favor, seleccione dos versiones para ver sus diferencias.

Detalles de las diferencias de versión

Versión:
última modificación:

Fichero(s) seleccionado(s)

Por favor seleccione el fichero o ficheros a los que quiere pedir acceso.

Fichero(s) seleccionado(s)

No se puede acceder a los ficheros embargados. Puede seleccionar fichero(s) sin embargo en su petición de acceso.

Editar etiquetas

Seleccionar etiquetas existentes o crear otras nuevas que describan sus ficheros. Cuando se crea una etiqueta nueva, ésta se añade como una opción de etiqueta para todos los ficheros de este dataset. Cada fichero puede tener más de una etiqueta.

Petición de acceso

Tiene que Identificarse para solicitar acceso.

Condiciones de uso del dataset

Este dataset está disponible con las siguientes condiciones. Por favor, confirme y/o complete la siguiente información para continuar.

Licencia/Acuerdo de uso de los datos

Creative Commons Attribution 4.0 International License. CC-BY-4.0

Previsualizar libro de visitas

Tras descargar los ficheros del libro de visitas pregunta por la información siguiente.

Nombre del libro de visitas

Datos recogidos

Información de la cuenta

Descarga de fichero empaquetado

Use la URL de descarga con el comando wget o un gestor de descargas para descargar este fichero empaquetado. La descarga mediante un navegador web no se recomienda. Guía de usuario - Descarga de un archivo empaquetado de e-cienciaDatos mediante su URL

URL de descarga

https://edatos.consorciomadrono.es/api/access/datafile/

Petición de acceso

Puede confirmar y/o completar la información pedida para solicitar el acceso a los ficheros de este dataset.

Procesar lotes de trabajo

Limpiar procesos por lotes

Dataset	ID persistente del dataset	Cambiar lotes de trabajo

Procesar lotes de trabajo

Enviar a revisión

Enviar este dataset a revisión por el conservador/revisor de esta dataverse para su posible publicación.

Publicar dataset

¿Está seguro de que quiere volver a publicar este dataset?

Indique si es una actualización de versión mayor o menor.

Revisión menor (1.1)

Revisión mayor (2.0)

Licencia/Acuerdo de uso de los datos

Creative Commons Attribution 4.0 International License. CC-BY-4.0

Descripción de la licencia

Creative Commons Attribution 4.0 International License.

Publicar dataset

Este dataset no se puede publicar hasta que Universidad Politécnica de Madrid sea publicado por su administrador.

Publicar dataset

Este dataset no se puede publicar hasta que Universidad Politécnica de Madrid y e-cienciaDatos sean publicados.

Devolver al autor

Enviar este dataset al colaborador para su modificación.