We have used a linear mixed modeling approach to develop a longitudinal model for AD. A linear mixed model (LMM) comes into play when the values in the data are not independent of each other. It takes ﬁxed eﬀects and random eﬀects into consideration. Fixed eﬀects are expected to have a systematic and predictable inﬂuence on the data while random eﬀect is assumed to have a non-systematic, or unpredictable eﬀect. The data we deal with here is in longitudinal format, which means one subject is measured over a certain number of months/years. A random eﬀect is added for each subject to deal with this type of situation, because the individual diﬀerences can be modeled by assuming diﬀerent intercepts for each subject. This LMM is further applied to predict trends of sub-populations over various stages of the disease. The disease indicators are an essential component of the LMM, which help to predict the prognosis of the disease. These disease indicators also known as predictors enable the prediction of a particular value of a biomarker at a speciﬁc time point. The longitudinal model developed helps to predict the values of biomarkers from diverse set of predictors. The combination of ﬁxed and random eﬀect predictors are found to have a signiﬁcant eﬀect on the biomarkers. Fixed eﬀects such as the repetitive time point of measurement and the gender of the subject play an important role in the prediction of brain volumes such as hippocampal and ventricle volume and cerebrospinal ﬂuid biomarkers such as Abeta. These predicted values further assist in the generation of the trajectories for separate stages of the disease over longitudinal time points of measurement.
(Publication in progress)
The different disease progression modeling (DPM) of Alzheimer’s disease (AD) are done in the course of this study in which cross sectional as well as longitudinal clinical data are needed. Besides, an easy access and as high as possible of the number of records are the other criteria which play pivotal roles in data set selection for our DPM purposes. Further, DPM of AD as the most complex multifactorial neurodegenerative diseases required measurements from multiple scales ranging from cellular and molecular up to neuronal levels. Considering all these principles, by carrying out an extensive research, the Alzheimer's Disease Neuroimaging Initiative (http://adni.loni.usc.edu) chose and our further analysis founded on it. Since 2004 , ADNI researchers have been collected several types of data such as clinical, genetic and imaging from study volunteers throughout their participation in the study (CN:500,MCI:1000,AD:200) in different phases namely ADNI 1, ADNI GO and ADNI 2. However, for utilization of the subject measurements tracking, all measurements spread across multiple tables (except genetic data) loaded into one data table called “adnimerge”, consists of around 13000 records of approximately 1700 unique subjects. Volumetric measures (hippocampus, enthorinal, ventricles, mid-temporal volume and intracranial volume), CSF biomarkers (Aß, Tau, P-tau) and cognitive tests (MMSE,CDRSB and MOCA) are the features embedded in adnimerge data table, since they are the most studied biomarkers for AD. Although, adnimerge table provides the most important measurements needed for Event-based modeling and longitudinal analysis, for other DPM approaches like Bayesian modeling and clustering the genetic data such as SNPs and gene expression data is also took into consideration. As the subjects did not fill all the ongoing annual follow up for different reasons or some biomarkers did not measure for them ADNI suffers the non-ignorable number of missing values in which the details of how is dealt with them for our different DPM approaches explained in disease modeling tab.
A suitable set of data was required for different types of data analysis, baseline as well as longitudinal. An extensive search was carried out to select an appropriate clinical study which could assist in these kinds of analyses, which must enable pattern identification and generation of different biomarker trajectories.
The selection of the data was narrowed down to Alzheimer's Disease Neuroimaging Initiative (ADNI) and we laid the foundation of our work on it due to the following reasons:
ADNI was found to be one of the most signiﬁcant and widely used data resource, which could easily be made accessible to the researchers.
ADNI is the study with the highest number of longitudinal records available at present. This characteristic of ADNI makes it rather unique amongst the publicly available AD studies.
We considered the following mentioned list of measures for our analysis. The “adnimerge” is a table in which the measurable features are merged from all the major tables (12,328 in this case). The other tables considered for CSF biomarkers and plasma biomarkers were “upennbiomk6” (692 records) and “upennplasma” (2,454 records) respectively. The data from these tables were divided into 4 subgroups:
Volumetric measures: Volumes for Hippocampus, ventricles, entorhinal and intracranial volume (ICV).
CSF biomarkers: CSF ABETA, TAU and PTAU.
Plasma biomarkers: Plasma AB42, AB40, AB42RatioAB40.
Cognitive tests: MMSE, CDR-SB, MOCA, Cognition
The above-mentioned biomarkers were selected because of the following reasons:
The CSF biomarkers are the most studied biomarkers for AD and are the earliest to be affected in the disease.
The volumetric measures selected were available in the “adnimerge” table, and this table has the maximum number of records in the ADNI data. These volumetric measures as compared to the other measures had less number of missing records.
We wanted to test the influence of plasma biomarkers in the different stages of the disease and that was the reason for the selection of the plasma biomarkers.
The cognitive tests like MMSE and CDR-SB are the commonly measured tests in AD and they have a large number of records in “adnimerge” table as compared to the other cognitive tests. MOCA score is considered as an alternative for MMSE especially for measurements in the earlier stages of AD.
Initial data transformation steps, which included data cleaning, normalization, mixing and outlier removal, were performed on the ADNI data particularly for the above-mentioned biomarkers and measures.
The data was tested for the missing values and a “non-random” pattern of missing data (MNAR) was observed for the volumetric and the cognitive measures in the “adnimerge” table and plasma measures in the “upennplasma” (plasma measures) tables. It is not recommended to run multiple imputations on this type of data, so the missing data was removed for further analysis.
The pattern of missing data for the CSF measures was random and the missing values were very few (up to 5 for each measure out of 692), therefore these values were imputed.
The cognition measures were enhanced by mixing the measures of MMSE and of MOCA. These measures included the MOCA score of the subjects in the earlier stages and the MMSE score of the subjects in the later stages.
We implemented a linear mixed modelling (LMM) approach to develop a longitudinal model for AD. The LMM approach comes into play when the values in the data are not independent of each other. Unlike the linear model, it deals with both fixed effects and random effects. Fixed effects are expected to have a systematic and predictable influence on the data. Random effects on the other hand are supposed to be something that can be expected to have a non-systematic or unpredictable or ‘random’ influence on the data.
The data we deal here are in longitudinal format, which means one subject is measured for a certain number of months/years. A random effect for each subject should be added to deal with this type of situation because the individual differences can be modelled by assuming different random intercepts for each subject. It means that a different intercept value is assigned to each subject and these intercepts are estimated by the mixed model. Various predictors were considered for the development of the longitudinal model.
A wide set of models were created for each biomarker consisting of different predictors and the best model was selected based on the analysis of variance (ANOVA).
An algorithm was developed which consisted of the best selected model for each biomarker to predict the response variable values (the set of biomarker and measures) from the set of predictor variables.
The data from ADNI were stratified according to the four subgroups of biomarkers and measures mentioned in the above section. These set of stratified data were given as input to the algorithm and a model was developed based on each stage at each time point of measurement. These separate stage models were then merged and integrated as a cumulative longitudinal disease model.
Linear mixed longitudinal model enables to predict the trajectories of the group of subjects in the stratified cohort over various stages of the disease. This prediction process is assisted by the disease indicators. The disease indicators are an essential component of the LMM, which help to predict the prognosis of the disease. These disease indicators also known as predictors enable the prediction of particular value of a biomarker at a speciﬁc time point in the disease. With the help of the longitudinal model, it is now possible to predict the probable trajectory of disease progression in normal group of individuals. Partner FRAUNHOFER has developed a ‘longitudinal study viewer’, which also contributes to the IMI-project EPAD (more information under the URL: http://epad.scai.fraunhofer.de). The longitudinal viewer currently displays a version of the Clifford Jack models that have underlying numerical data (not just drawings). Furthermore, we have major trajectories of ADNI generated by the above-mentioned modeling approach in that viewer. We have also tried to combine the biomarkers together in and created a similar view to the Clifford Jack in this viewer.
The individual level trajectories were obtained for each measure and biomarker.
The modeling equations gave rise to the mean values of biomarkers over each time point for the stratified cohort of individuals. Trajectories for different stages of Alzheimer's (NL = normal, MCI = mild cognitive impairment) and Dementia were obtained from these predicted mean values.
The mean values were chosen to examine the overall trajectory of the model as the values at each month varied largely. The 95% conﬁdence interval and prediction interval was also calculated for each of these average values. These trajectories were incorporated in the longitudinal viewer. The link for the longitudinal viewer is http://epad.scai.fraunhofer.de. This viewer has not yet been publicly released.
The individual level trajectories were obtained for each measure and biomarker.
Several models were developed for each biomarker and measure and the best model was selected based on the p-value and AIC value. The corresponding value for each model is illustrated in the Table on the right side.