SCAIView Literature mining

The information retrieval system SCAIView Neuro allows for such semantic searches in large text collections by combining free text searches with the ontological representations of entities derived by ProMiner, the named entity recognition system developed by Fraunhofer SCAI. ProMiner is used to pre-process and index biomedical publications referenced in PubMed (PubMed currently incorporates more than 29 Mio. documents), but also from other arbitrary incorporated sources. Query results can be generated by human via a web graphical user interface (SCAIView Neuro), but also by any information system using program interface commands (SCAIView API).

Figure: SCAView Neuro User Interface for interactive Queries on Documents

Such queries gives results to queries such as “Which genes / proteins / SNPs / drugs are related to a certain disease, pathway or epigenetics?”

This text mining methods proved efficient in protein-protein interaction extraction. It can be extended to identify relationships between other concept types such as chemicals, proteins, phenotypes and/or drugs. Technically, this step requires application of syntactic parser on the document corpus and analysis of the syntactic dependencies aided by a vocabulary of key terms (typically verbs, their nominalized forms and synonyms) that are known to express biomedical relations or processes. The goal is to separate sentences in which the bio-entities co-occur from those in which they show some sort of interaction or typed relatedness.

An essential activity in AETIONOMY is to generate hypotheses about multiscale mechanisms of neurodegenerative pathophysiology. In order to retrieve the main mechanisms involved in these neurodegenerative disorders, a list of pathways and mechanistic knowledge was extracted from PubMed using SCAIView Neuro. This list was preprocessed and curated due to the large number of synonyms found in the literature leading to a final inventory of pathways and mechanisms, that served as a guideline for annotating each individual triples. This includes of course also all well-known mechanisms (e.g., amyloid cascade, neuro-inflammation, mitochondrial dysfunction…), which were stored in our mechanism repository NeuroMMSig (storing 126 AD and 76 PD mechanisms).