ID persistente del dataset
|
doi:10.21950/1RRAWJ |
Fecha de publicación
|
2021-05-04 |
Título
| HESML V1R5 Java software library of ontology-based semantic similarity measures and information content models |
Autor
| Lastra-Díaz, Juan J. (Universidad Nacional de Educación a Distancia (UNED)) - ORCID: orcid.org/0000-0003-2522-4222
Lara-Clares, Alicia (Universidad Nacional de Educación a Distancia (UNED))
Garcia-Serrano, Ana (Universidad Nacional de Educación a Distancia (UNED)) |
Contacto
|
Utilice el botón de e-mail de arriba para contactar.
Lastra-Díaz Juan J. (Universidad Nacional de Educación a Distancia (UNED)) |
Descripción
| This dataset introduces HESML V1R5 which is the fifth release of the Half-Edge Semantic Measures Library (HESML) detailed in [13]. HESML V1R5 is a linearly scalable and efficient Java software library of ontology-based semantic similarity measures and Information Content (IC) models for ontolgies like WordNet, SNOMED-CT, MeSH, GO and any other ontologies based on the OBO file format. HESML V1R5 implements most ontology-based semantic similarity measures and Information Content (IC) models reported in the literature, as well as the evaluation of three pre-trained word embedding models. It also provides a XML-based input file format in order to specify the execution of reproducible word/concept similarity experiments based on WordNet, SNOMED-CT, MeSH, or GO without software coding. HESML V1R5 introduces the following novelties: (1) the parsing and in-memory representation of the SNOMED-CT, MeSH and any other ontologies based on the OBO file format such as the Gene Ontology (GO); (2) a new collection of efficient path-based similarity measures based on the reformulation of previous path-based measures which are based on the new Ancestors-based Shortest-Path Length (AncSPL) algorithm; and (3) a collection of groupwise similarity measures. HESML library is freely distributed for any non-commercial purpose under a CC By-NC-SA-4.0 license, subject to the citing of the two mains HESML papers as attribution requirement. However, HESML distribution also includes other datasets, databases or data files whose use require the attribution acknowledgement by any user of HEMSL. Thus, we urge to the HESML users to fulfill with licensing terms related to other resources distributed with the library as detailed in its companion release notes. (2020-04-30) |
Materia
| Ciencias de la información y computación |
Palabra clave
| HESML
semantic measures library
Ontology-based semantic similarity measures
Word embeddings
Information Content (IC) models
WordNet
UMLS
SNOMED-CT
MeSH
Gene Ontology (GO) |
Publicación relacionada
| J.J. Lastra-Díaz, A. Lara-Clares, A. García-Serrano, HESML: a real-time semantic measures library for the biomedical domain with a reproducible survey, BMC Bioinformatics. 23:23 (2022). doi: 10.1186/s12859-021-04539-0 https://rdcu.be/cEvsU
Lastra-Dı́az, J.J., Garcı́a-Serrano, A., 2015b. A novel family of IC-based similarity measures with a detailed experimental survey on WordNet. Eng. App. of Artif. Intell. 46, 140–153.
Lastra-Dı́az, J.J., Garcı́a-Serrano, A., 2015a. A new family of information content models with an experimental survey on WordNet. Knowledge-Based Systems 89, 509–526.
Lastra-Dı́az, J.J., Garcı́a-Serrano, A., 2016a. A refinement of the well-founded Information Content models with a very detailed experimental survey on WordNet. Technical Report TR-2016-01. UNED. http://e-spacio.uned.es/fez/view/bibliuned:DptoLSI-ETSI-Informes-Jlastra-refinement
Lastra-Dı́az, J.J., 2017. Recent Advances in Ontology-based Semantic Similarity Measures and Information Content Models based on WordNet. Ph.D. thesis. Universidad Nacional de Educación a Distancia (UNED). http://e-spacio.uned.es/fez/view/tesisuned:ED-Pg-SisInt-Jjlastra
Lastra-Dı́az, J.J., Goikoetxea, J., Hadj Taieb, M.A., Garcı́a-Serrano, A., Ben Aouicha, M., Agirre, E., 2019b. A reproducible survey on word embeddings and ontology-based methods for word similarity: linear combinations outperform the state of the art. Engineering Applications of Artificial Intelligence 85, 645–665.
Lastra-Dı́az, J.J., Goikoetxea, J., Hadj Taieb, M.A., Garcı́a-Serrano, A., Ben Aouicha, M., Agirre, E., 2019c. Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity. Data in Brief.
Lastra-Dı́az, J.J., Goikoetxea, J., Hadj Taieb, M., Garcı́a-Serrano, A., Ben Aouicha, M., Agirre, E., 2019a. A large reproducible benchmark of ontology-based methods and word embeddings for word similarity. Submitted to Information Systems
Lastra-Dı́az, J.J., Garcı́a Serrano, A., 2018. HESML V1R4 Java software library of ontology-based semantic similarity measures and information content models. Mendeley Data, v4. doi: 10.17632/t87s78dg78.4 http://dx.doi.org/10.17632/t87s78dg78.4
Lastra-Dı́az, J.J., Garcı́a Serrano, A., 2017. HESML V1R3 Java software library of ontology-based semantic similarity measures and information content models. Mendeley Data, v3.
Lastra-Dı́az, J.J., Garcı́a-Serrano, A., 2016d. HESML vs SML: scalability and performance benchmarks between the HESML V1R2 and SML 0.9 semantic measures libraries. Mendeley Data, v1. doi: 10.17632/5hg3z85wf4.1 http://doi.org/10.17632/5hg3z85wf4.1
Lastra-Dı́az, J.J., Garcı́a-Serrano, A., 2016c. HESML V1R2 Java software library of ontology-based semantic similarity measures and information content models. Mendeley Data, v2.
Lastra-Dı́az, J.J., Garcı́a-Serrano, A., 2016f. WordNet-based word similarity reproducible experiments based on HESML V1R1 and ReproZip. Mendeley Data, v1. doi: 10.17632/65pxgskhz9.1 http://doi.org/10.17632/65pxgskhz9.1
Lastra-Dı́az, J.J., Garcı́a-Serrano, A., 2016b. HESML V1R1 Java software library of ontology-based semantic similarity measures and information content models. Mendeley Data v1.
Lastra-Dı́az, J.J., Garcı́a-Serrano, A., 2016e. WNSimRep: a framework and replication dataset for ontology-based semantic similarity measures and information content models. Mendeley Data v1. doi: 10.17632/mpr2m8pycs.1 http://doi.org/10.17632/mpr2m8pycs.1
Aronson, A.R., 2001. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program, in: Proceedings of the AMIA Annual Symposium, ncbi.nlm.nih.gov. pp. 17–21.
Miller, G.A., 1995. WordNet: A Lexical Database for English. Communications of the ACM 38, 39–41. |
Notas
| This work was partially supported by the UNED predoctoral grant started in April 2019 (BICI N7, November 19th, 2018). |
Idioma
| Inglés |
Fecha de producción
| 2020-04-30 |
Información de la subvención
| UNED: BICI N7 |
Depositante
| Admin, Dataverse |
Fecha de depósito
| 2020-07-21 |
Software
| Netbeans, Version: 8
Java, Version: 8 |
Datasets relacionados
| Lastra-Dı́az, J.J., Goikoetxea, J., Hadj Taieb, M.A., Garcı́a-Serrano, A., Ben Aouicha, M., Agirre, E., 2019d. Word similarity benchmarks of recent word embedding models and ontology-based semantic similarity measures. e-cienciaDatos, v1. http://dx.doi.org/10.21950/AQ1CVX. |