Trajectories for PPMI
Z transformed and binarized UPDRS score were used as the target variable, while other clinical data were used as predictor variables in supervised machine learning algorithms to assess the performance of xgboost, random forest and elastic net for binary classification. The performance was evaluated by taking the average of test AUC value obtained during each testing phase of cross-validation. The solid horizontal line in box plot illustrated in Figure 19 shows the median AUC value for each method.
Xgboost model showed the best performance with a median AUC value of 0.778 and was hence used for feature selection.
Xgboost model trained on entire data was used to extract most relevant and least redundant features. Importance of each feature was obtained using functions from the xgboost package. Out of the three component of importance i.e. gain, cover and frequency, the gain was selected to interpret the relative importance of each feature. The value quantifies information gained by producing a split in decision tree using that particular feature. 160 clinical variables showed positive gain and were selected for further used for Bayesian modelling. The top twenty features are shown below in figure 20. Table 2 shows variable description and the group to which features belong.
|UPDR S1||UPDRS I: evaluation of mentation, behavior, and mood||UPDRS|
|NP3RIGU_CL||3.3c Rigidity - UE - Contralateral||UPDRS|
|NP3RIGL_CL||3.3e Rigidity - LE - Contralateral||UPDRS|
|QUIP||Questionnaire for Impulsive-Compulsive Disorders in PD||Non-motor|
|ALDH1A1..rep.2.||ALDH1A1 (rep 2) (Ct)||Biological|
|HRSUP||Supine heart rate||Medical history|
|Abeta.42||Abeta 42 (pg/ml)||Biological|
|SYSSUP||Supine BP - systolic||Medical history|
|RBD.pos||RBD Positive: RBD >= 5||Non-motor|
|STAI.Trait||STAI - Trait Subscore||Non-motor|
|UPDRS2||UPDRS II: self-evaluation of the activities of daily life (ADLs)||UPDRS|
|HSPA8..rep.1||HSPA8 (rep 1) (Ct)||Biological|
|NP3FTAPL||3.4b Finger Tapping Left Hand||UPDRS|
|NP3TTAP_IL||3.7 Toe tapping - foot - Ipsilateral||UPDRS|
|GAPDH..rep.2.||GAPDH (rep 2) (Ct)||Biological|
|DIASTND||Standing BP - diastolic||Medical history|
|UPSIT||University of Pennsylvania Smell ID Test (UPSIT)||Non-motor|
Figure 1: Boxplot of AUC values obtained during repeated cross-validation for extreme gradient boosting(xgboost) , random forest and elastic net learning algorithms. Higher the AUC, better is model performance.
Figure 2: Relative importance of top twenty clinical variables found important in prediction of clinical endpoint. Higher the gain score, better is the predictive contribution of clinical feature in stratifying patients into slow progressing PD and fast progressing PD.
Traditionally used univariate approach to model a disease has limited power to assess the disease state and progression. By combining a panel of biomarkers, we seek to provide a method which discriminates patients as slow-progressors and fast-progressors. The xgboost model uses a combination of data from different groups of bio-markers in order to dissect the heterogeneous population into two homogeneous groups.
The different modalities of data are reflected in our selected features. We get a combination of biological, imaging, non-motor and patient history features at baseline, which can be used to stratify patients and indicate the rate of disease progression.