BN AD Risk Model

To better understand the contribution of individual features for the AD risk prediction by our multi-modal GBM model we ranked variables according to their relative importance in a final GBM model that was trained on all available data. The top 25 most relevant variables comprised, besides baseline diagnosis (DX), results of different neuro-psychological assessments (Alzheimer’s Disease Assessment Scale Cognitive Plus - ADAS13, ADAS11, Functional Assessment Questionnaire - FAQ, Clinical Dementia Rating - CDRSB, Rey Auditory Verbal Learning Test - RAVLT, Everyday Cognition Study Partner Report - EcogSPMem), neuro-imaging features (Region Hippocampus, Region Enthorhinal, Region MidTemp, Region WholeBrain8), PET and FDG PET imaging diagnosis (AV45, FDG), APOE4 status as well as patient age. Furthermore, different features describing the genetic population sub-structure (EV1, EV2, ...) as well as the SNP functional impact on cell cycle were contained. It has been suggested that dysfunction in neuronal cell cycle reentry plays a fundamental role in AD pathology (Nagy, Esiri, and Smith 1998). More specifically, the hypothesis has been stated that the disease is caused by aberrant re-entry of different neuronal populations into the cell division cycle.

Notably, most of the top 25 were selected highly stable during the 10 times repeated 10-fold cross-validation procedure (Figure above). That means the vast majority of GBM models trained during the cross-validation procedure contained the same most relevant features. This finding specifically includes the above mentioned cell cycle. Altogether there were 170 featues that were selected at least in 50 out of 100 times (see full list in Supplementary material). These features contained the neuro-psychological assessments (ADAS, Ecog, RAVLT, CDRSB, MMSE, FAQ), PET scanning results (AV45, FDG), APOE4 status, age, baseline diagnosis, educational status as well as different brain regions and pathways (including cell cycle). The most stably selected SNP rs10509663 (selected 70/100 times) has been associated with CSF levels of amyloid-beta. Misfolding of this peptide is a well known hallmark of AD that results into the characteristic plaques in the brain of AD patients.

Interestingly, immune system and ribosome were found as most stably selected pathways (84/100 times). It has recently been indicated that activation of the innate immune system plays a crucial role in disease progression (Heneka, Golenbock, and Latz 2015). Ribosome dysfunction has been observed as an early event in AD development (Ding et al. 2005).

The most influential SNP in the final GBM model was rs9871760 (selected 36/100 times), which has been associated to the whole brain volume (Furney et al. 2011). The TT or CT genotypes of the second most relevant SNP rs3756577 (CAMK2A, selected 32/100) have been associated with a nearly 8 times risk reduction for AD (Bufill et al. 2015). Two other examples include rs4263408 (selected 32/100 times) and rs6859 (selected 60/100 times). The SNP rs4263408 (UBE2K) has been found to affect amyloid-beta concentrations (Chouraki et al. 2014). The SNP rs6859 (NECTIN2) has been associated with late AD onset (Abraham et al. 2008). Altogether the cumulative relative influence of all genetically derived features (including APOE4 status) was ˜22% in our model, and 109/170 features that were selected at least 50/100 times during the repeated cross-validation procedure were genomically derived.

Next step is the validation of the established machine learning model with the help of the external AddNeuroMed cohort.