News & Events

Data processing and analysis toolbox for AICCELERATE

8 - September

Data is the key to unlocking the full potential of AI solutions in healthcare. The challenge thereby, however, is to successfully make use of the abundance of already collected data. Mere data collection does not suffice, thus it takes advanced technologies for processing and analysis to put health data into context and foster a smarter hospital of the future, where technologies using AI improve efficiency at various levels, including diagnosis, treatment recommendations, or patient flow management. The quality of health data is the defining cornerstone of any AI model in order to enable solutions robust enough to be used under real-life conditions. A highly valuable source of data are Electronic health records (EHRs), which Symptoma helps to make available with its proprietary data processing and analysis toolbox.

Symptoma’s toolbox for automated medical content analysis as the key to unlocking information in EHRs

Electronic health records (EHRs) enable a comprehensive overview of a patient’s medical history and efficient exchange of information between different healthcare providers through the digital storage of patient data. In addition to that, the information they contain can also be analyzed at the population level (e.g., for people with a particular disease) to gain new insights into diseases or possible therapeutic approaches.

Data in a structured form, such as age or gender, is easily machine-processable and therefore already being used in many healthcare settings. EHRs, however, contain not only structured data, but also a wealth of free text, such as findings, notes from healthcare professionals, or discharge summaries. While data in structured form can be easily processed by machines, free text presents a challenge that requires a specialized approach to make the valuable information it contains understandable and thus usable by machines: So-called “natural language processing” (NLP).

Symptoma’s toolbox will enable the unlocking of information from free text in EHRs by preparing and enriching the data to predict important outcomes and extract meaningful insights for the Smart Hospital Care Pathway Engine (SHCP) in the AICCELERATE project.

Concept extraction to improve accuracy

An example of information extraction from free text is the so-called “concept extraction”. This method addresses the issue of dealing with different words and phrases that express or are related to the same condition.

For example, let’s take the following clinical note that a healthcare professional might record in a patient’s file during admission to the hospital: “She reports fever and chills”.

With the help of concept extraction, which is based on a large knowledge base of symptoms and diseases, it is possible to extract the symptoms from this clinical note and assign them to the correct concepts “fever” and “chills”.

Does the word “fever” have to appear exactly the same in the text to be recognized? No! With the mapping of synonyms or associations of concepts in our knowledge base, we can also extract the relevant concept (fever) from sentences such as “She reports high temperature” or even put them into a broader context and find potential causes for a symptom.

What differentiates Symptoma from other solutions currently available is the fact that Symptoma’s technology is not limited to recognizing medical terms and expressions. In the clinical note “Patient feels sick after eating tiramisu,” for example, Symptoma correctly suggests ‘salmonella’ as a possible cause for the patient’s symptoms. The capability of accounting for the doctor’s respectively patient’s organic language is one of the main values for the application in real-life healthcare settings.

Dealing with the peculiarities of clinical text

Clinical free text, as found in EHRs, has some peculiarities compared to general domain text or even biomedical articles, such as scientific publications or textbook text. These make machine processing even more challenging and require tools that, despite these challenges, make it possible to correctly extract the relevant information.

In clinical notes, such as those noted by healthcare workers in medical records, e.g., during ward rounds, those peculiarities become evident at first glance: Clinical notes are often written in short, keyword-like form, and contain medical jargon, abbreviations that may be ambiguous, or implicit information that is only apparent from the context. Further, they often contain a large number of negations, since a significant part of medical documentation and decision-making is related to the exclusion of symptoms or diseases.

Therefore, another important component in the processing of clinical texts is the correct identification of negations. In a sentence like “She denies any fever but reports nausea”, “fever” must not be mis-extracted as a symptom present, whereas “nausea” must be recognized as a symptom. To enable this important distinction between existing and negated symptoms and diseases, so-called “negation detection” tools are used, which significantly increase the accuracy of concept extraction.

Embeddings to increase model performance

Another promising possibility to make the information contained in free text usable for downstream tasks are the so-called ‘embeddings’: Entities such as words, sentences, or documents are mapped in a vector space using statistical methods or machine learning, so that entities with a greater similarity in meaning (fever, high temperature) are closer together than entities with less similarity (e.g., fever, blindness).

Embeddings based on state-of-the-art deep learning models have led to major advances in a variety of NLP tasks in recent years and can also be used as additional variables in classic machine learning models to increase model performance. Further, embeddings of more complex data structures, such as knowledge graphs, can be learned, representing all entities and relations of this data structure while preserving their semantic meaning.

In the AICCELERATE project, a knowledge graph is created that enables a multi-relational and multi-modal representation of patients. Based on this knowledge graph, embeddings are learned, which will serve as a comprehensive patient representation that can be used for all AI models.

By correctly extracting and enriching health data, Symptoma plays an important role in paving the way for holistic patient data representation and therefore enables the development of patient-facing AI solutions as one of the main focuses of the AICCELERATE project.

Further information: www.symptoma.com

Kathrin Blagec – Data Scientist, Symptoma GmbH
Elisabeth Rabl – Marketing Assistant, Symptoma GmbH