Interoperable PRECISE4Q Data Management and Harmonisation Module

Create a harmonized data infrastructure in order to normalize and precisely characterise data from heterogeneous and longitudinal sources in different formats, languages and degrees of structure.

The first objective aims to provide the data infrastructure as a prerequisite for modelling. To this end, a data warehouse that is capable of hosting and providing the data used in the study will be created. The data warehouse will be built after identification of all potential data source characteristics, user needs and data exchange needs. It will adhere to all legal and technological standards allowing state-of-the-art date storage and exchange, a crucial prerequisite for the project. Furthermore, data will be harmonized to allow pooling of similar data in heterogeneous data sources. Data harmonization efforts will revolve around collecting requirements for data integration and modelling, constructing a thesaurus for the languages of the used data sources and, consequently, a common ontology-based data model. Next to this, natural language processing will be used to structure clinical texts. Data harmonization and the development of models will represent an iterative process organised in release cycles. To allow early development of models, the first release cycle of harmonized data will comprise the structured data, whereas a later release will provide the unstructured data that need more time to extract.