Data Mining

In order to retrieve the main mechanisms involved in the neurodegenerative disorders, a list of pathways and mechanistic knowledge was extracted from PubMed using SCAIView, a literature mining machinery from Fraunhofer SCAI. This list was preprocessed and curated due to the large number of synonyms found in the literature leading to a final inventory of pathways and mechanisms, that served as a guideline for annotating each individual statement (triplet/assertion) in the disease models (Biological Expression Language - BEL). We also emphasized inclusion of all well-known mechanisms (e.g., amyloid cascade, neuro-inflammation, mitochondrial dysfunction, …) as entries to our mechanism repository ‘NeuroMMSig’.

The next step was to individually annotate and evaluate all of the triplets in the models with their respective candidate mechanisms. During this process, we performed literature and database searches in order to find out, to which candidate mechanism the entities in each BEL statement belonged. We carefully reviewed each statement to determine to which mechanism(s) the statement has been associated in the literature. Not all statements were assigned to a mechanism and some of them were assigned to several because some entities and their relationships might be involved in several pathophysiological mechanisms. The procedure described in this paragraph was achieved within approximately 1 year of work for the AD model and 6 months of work for the PD due to their size. During this process, database models were created including, for instance, which entities were assigned to candidate mechanisms and other multimodal enrichment data.

Multimodal data is necessary in order to map biological entities to the clinical studies in neurodegeneration since they contain not only genetic markers, but variables from brain scans to neuro-psychological assessments. Conventional pathway analysis tools such as Gene Set Enrichment Analysis (GSEA)/Molecular signatures (MSIG), are limited to the molecular gene and in particular gene expression layer. In contrast, NeuroMMSig entries were enriched with imaging features, variant information Single Nucleotide Polymorphism (SNPs), miRNA, clinical studies, and drugs/chemicals, making them essentially multiscale and multimodal representations of candidate mechanisms. The complexity of mechanistic information represented enables NeuroMMSig to accept not only molecular (e.g., gene expression) information. As a consequence, the approach taken with NeuroMMSig is overcoming several of the limitations associated with conventional pathway analysis tools.

You can find in the following list links to information on all our data mining approaches: