Approaches for Patient Subgroup Stratification

AETIONOMY aims for establishing a molecular disease taxonomy of neurodegenerative diseases. At its core this goal implies the existence of molecularly defined patient sub-groups, which could diverge from the current classification of neurodegenerative diseases. As outlined above, AETIONOMY has taken a knowledge driven approach to define AD and PD disease mechanisms. The question is whether these mechanisms can - possibly in combination - discriminate patient sub-groups and specifically help identifying mixed AD/PD subtypes. The latter would call for a substantial revision of the way, in which neurodegenerative diseases are understood at present.

In order to address these questions, partners UCB and Fraunhofer have established a data mining methodology to group AD and PD patients using SNP based genotypes and 15 shared AD/PD mechanisms derived from BEL. This mechanism enhanced approach involves a mapping of SNPs to genes encoded in 15 molecular mechanisms and dimensionality reduction (e.g. autoencoder networks) followed by clustering with mixture of autoencoders and sparse Non-Negative Matrix Factorization. Our method has been applied to a merged ADNI and PPMI dataset, which contains de novo AD/ PD patients and those, who converted into AD during the course of the study. Identified clusters were well separated, statistically stable and showed (after correction for age, ethnicity and gender effects) statistically significant differences w.r.t. key clinical features in AD, such as language and cognition tests as well as specific brain volumes. A similar assessment with respect to clinical PD properties is ongoing.

The next step is a validation of the established grouping using genotypes from the independent ROSMAP cohort (AD) and the AETIONOMY PD study.

In parallel partner UCB is working on a sub-group identification using other available omics data types from ROSMAP: proteomics, DNA methylation, CHIPseq and gene expression. Moreover, these data will be used understand differences between genotype based clusters.

Bayesian Modelling

Our multi-modal GBM modal can be used to stratify patients: The figure below shows that the GBM model correctly yields highly separated cumulative hazard curves for the 10% patients with highest (red) and lowest (green) AD risk, respectively. Both patient groups significantly differ by a number of features, e.g. PET imaging diagnosis and neuropsychological test results. Moreover, there was a significant difference in APOE4 status and rs405509 genotype: ~73% of high risk patients have at least one mutant APOE4 allele. According to dbSNP, rs405509 is located in the APOE4 gene region and synergizes with the APOE4 4 allele in the impairment of cognition. The T allele has been identified as a risk factor for AD in the literature (Ma et al. 2016).


AETIONOMY disease hypotheses to be tested are represented as networks and stored in NeuroMMSig. The system is developed to enable patient subgroup stratification based on multimodal and multiscale patterns that indicate a perturbation of mechanisms. A small description of some of the proposed mechanisms is depicted in the Table on the right.

The process of neuroinflammation and the immune system are involved in the pathology of both, AD and PD (Table 1). In fact, there are currently other IMI projects, such as the PHAGO project, trying to target key players in AD within these biological processes. For that reason, NeuroMMSig has been enriched with mechanistic subgraphs related to these two processes such as chemokine signaling, cytokine signaling, interferon signaling, toll like receptor, inflammatory response, and immune system response subgraphs. These subgraphs contain biomarkers selected from WP5 such as YKL-40, TLR4, and MRP14. Having these biomarkers in the subgraphs will allow testing the generated hypothesis once the clinical studies have been carried out. The clinical measurements can be mapped to nodes in the networks calculating a score for each patient enabling patient subgroup identification. Since NeuroMMSig is inherently multimodal, not only the biomarkers will be mapped but also other indices like imaging features or metabolites.

AD specific hypothesesPD specific hypotheses
Syndecan-mediated uptake (heparan sulfate proteoglycan (HSPG) – mediated uptake) hypothesis; related to endocytosis processesLRRK2 (most relevant SNP found on literature)
KANSL1 and the corticotropin-releasing hormone receptor – related “shared mechanism” identified by genetics and imaging analysis on chromosome17Epigenetics (SNCA methylation in the CNS) PDE4D biomarker
AD-diabetes comorbidity Insulin signaling crosstalk to major AD pathophysiology mechanismsMitochondrial dysfunction

Table: Relevant examples of candidate mechanisms for AD and PD.

The use of knowledge graphs representing pathophysiology mechanisms for the stratification of patient subgroups is a non-trivial undertaking. Whereas the clustering of clinical data can identify patterns of clinical readouts, that can be tested in independent clinical data sets for their ability to stratify patients according to the identified pattern, a mechanism candidate needs first to be mapped to variables in clinical data and the significance of the values for the mapped variables needs to be estimated or calculated (e.g., based on thresholds). In the case of discrete variables (e.g., SNPs), the absence or presence of a SNP can be scored. SNPs are likely to be the most frequently used variables to be mapped, as they are routinely measured in research cohorts such as ADNI and PPMI and they are widely used to strategy patient (risk) subgroups. As single SNPs may not be directly “mappable” (because e.g., SNPs linked to mechanisms have not been measured in a study cohort due to different technology platforms for SNP detection), methods for the assignment of SNPs to loci have to be applied. NeuroMMSigDB entries come with a LD-block annotation, which allows for definition of loci and a mapping of SNPs in NeuroMMSigDB mechanisms to SNPs measured in cohorts via LD-blocks.

Some NeuroMMSigDB entries comprise disease stage annotations and such association can be used as a partitioning concept (which, however, does not go beyond the diagnosis of the clinical experts recruiting the patients in the cohort). However, if combined with other modalities (SNPs, imaging readouts), the stage-specific assignment and the mechanistic context may gain an explanatory potential that would trigger more in-depth analysis of that mechanism and its role in stage-specific phenotypes (e.g., certain neuro-psychological assessments; progression patterns; biomarker trajectories). We expect to get more insights in the possible mapping of candidate mechanisms to disease stages during the validation against independent cohort data.