Enriching information extraction pipelines in clinical decision support systems

Almeida, João Rafael Duarte de

Enriching information extraction pipelines in clinical decision support systems

Almeida, João Rafael Duarte de

Dirigida por:

A. Pazos Director
José Luis Oliveira Codirector/a

Universidad de defensa: Universidade da Coruña

Fecha de defensa: 20 de marzo de 2023

Tribunal:

Rui Pedro Lopes Presidente/a
Virginia Mato-Abad Secretaria
Joel Perdiz Arrais Vocal

Departamento:

Ciencias de la Computación y Tecnologías de la Información

Tipo: Tesis

Teseo: 797964 DIALNET RUC editor

Resumen

Multicentre health studies are important to increase the impact of medical research findings due to the number of subjects that they are able to engage. To simplify the execution of these studies, the data-sharing process should be effortless, for instance, through the use of interoperable databases. However, achieving this interoperability is still an ongoing research topic, namely due to data governance and privacy issues. In the first stage of this work, we propose several methodologies to optimise the harmonisation pipelines of health databases. This work was focused on harmonising heterogeneous data sources into a standard data schema, namely the OMOP CDM which has been developed and promoted by the OHDSI community. We validated our proposal using data sets of Alzheimer’s disease patients from distinct institutions. In the following stage, aiming to enrich the information stored in OMOP CDM databases, we have investigated solutions to extract clinical concepts from unstructured narratives, using information retrieval and natural language processing techniques. The validation was performed through datasets provided in scientific challenges, namely in the National NLP Clinical Challenges (n2c2). In the final stage, we aimed to simplify the protocol execution of multicentre studies, by proposing novel solutions for profiling, publishing and facilitating the discovery of databases. Some of the developed solutions are currently being used in three European projects aiming to create federated networks of health databases across Europe.