Introduction

Prevention of cardiovascular disease is based on the tenet that atherosclerotic disease occurs over time; risk factors and lifestyle are contributory, and appropriate modification can delay the onset of cardiovascular events. Previously established cardiovascular risk assessment models such as Framingham Risk Score (FRS)1, Systematic Coronary Risk Evaluation (SCORE)2 and QRISK2 score3 predict future risk based on well-established medical risk factors and pay little attention to lifestyle factors.

Recent studies highlighted the potential of lifestyle data in predicting cardiovascular risk4. The INTERHEART study found that nine risk factors including smoking, history of hypertension or diabetes, waist/hip ratio, dietary patterns, physical activity, consumption of alcohol, blood apolipoproteins (Apo), and psychosocial factors, accounted for 90% of the population attributable risk for myocardial infarction in men and 94% in women. These suggest room to improve cardiovascular risk assessment by incorporating new factors such as physical activity status, lifestyle and dietary habits5,6 alongside traditional risk predictors.

Prior studies have shown demonstrated increased cardiovascular risk from elevated blood pressure7,8,9; the PAMELA study found that combining office, rest and ambulatory blood pressure help predict cardiovascular mortality up to an area under curve of 0.81. Risk modelling have also been done of dietary and lifestyle behaviour, although the frequency of the studies is lower due to the need to administer detailed, time consuming food frequency questionnaires10. However due to the diverse data sources, and data types including time series, an integrated assessment tool combining lifestyle, diet, ambulatory physiological parameters, and clinical risk markers have not been performed to our knowledge.

Cardiovascular risk scores derived from traditional biostatistical methods such as logistic regression and Cox proportional hazard models11,12,13 provide parsimonious interpretation. However, their strict assumptions such as homoscedasticity, distribution normality and relationship linearity tend to oversimplify complex relationships and limit applications14. Machine learning algorithms (MLA) in studies15,16,17 were able to overcome these statistical drawbacks and improve discriminatory performance over traditional models. More recently, ensemble modelling techniques have also gained popularity such as in prediction of heart disease18,19, diabetes and hypertension20 cancer diagnosis and classification21,22. An ensemble model combines the different MLAs into one predictive model. Compared to a single model, an ensemble model is more robust and offers higher goodness-of-fit and better prediction accuracy23.

The aim of this paper is to investigate the additive value of four groups of risk factors, based on ease of information availability and regular clinical workflow, (lifestyle survey questionnaires, clinical blood tests, 24 h ambulatory blood pressure and heart rate monitoring) using ensemble MLA, in cardiovascular risk prediction. Due to limitations of the traditional biostatistical models, we used an ensemble MLA technique to learn the complex and non-linear interactions amongst the different groups of risk factors. To date, the application of ensemble MLA on lifestyle factors and clinical variables for individualised CVD risk assessment remains underexplored.

Methods

Data source and study population

Data used in this study was drawn from a SingHEART prospective longitudinal cohort study (ClinicalTrials.gov Identifier: NCT02791152). The study is a multi-ethnic population-based study conducted on healthy Asians, aged 21–69 years old without known diabetes mellitus or prior cardiovascular disease (Ischemic heart disease, stroke, peripheral vascular disease). The study complied with the Declaration of Helsinki and written informed consent were given by participants. The study was approved by the SingHealth Centralized Institutional Review Board.

We included 600 volunteers, aged of 30 years with valid calcium score, into the main analysis of this study. Two hundred volunteers under the age of 30 years, who did not have a calcium score were excluded, as the calcium score was the main outcome of our analysis.

Subset analysis for activity tracker data was performed on 430 out of the 600 volunteers who had adequate data. Although subjects recruited were issued an activity tracker to be worn over a period of five days with first and last days of the study being partial days, there was inconsistent wearing of the activity. Discounting the partial days, each subject would yield an activity log for three complete tracking days (or equivalent to days with > 20 valid hours of steps and sleep data)24,25. For data consistency and quality, subjects with improper activity tracker usage i.e. activity reading log less than five days and/or sleep reading log less than three days were censored.

Markers of CVD risk and outcome

Coronary artery calcium (CAC) scoring was used as the modelling outcome. The coronary calcium is a specific marker of coronary atherosclerosis, a precursor for coronary artery disease26; it also reflects arterial age under the influence of underlying comorbidities and lifestyle. The CAC score was also regarded as the best marker for risk prediction of cardiovascular events27,28.

This study stratified subjects into two classes of CVD risk. Low risk if their coronary artery calcium score were 0, and high risk if calcium score were 100 and above. Subjects who did not fall into these 2 categories were considered intermediate risk.

The aim of this study is to look at how accurate the machine learning algorithm is in handling different data types, in the task of predicting high risk and low risk patients, based on calcium score.

Data variables used for MLA: lifestyle survey questionnaires, clinical blood tests, ambulatory blood pressure and activity tracking data

Table 1 summarizes the data from SingHEART that was used in this study.

Table 1 List of risk factors used for prediction in this study.

Data variables were categorized into four groups; lifestyle survey questionnaires, blood test data, 24-h ambulatory blood pressure, and activity tracking data by commercially available Fitbit Charge HR29.

Data pre-processing, transformation and imputation were performed on the raw data. Variables selected were based on their a priori knowledge from previous publications on cardiovascular risk assessment1,2,3, and physician expert advice. In total, there were 30, 17, 12 and 16 unique variables in the respective groups: survey questionnaire, 24 h blood pressure and heart rate monitoring, blood tests and Fitbit data.

Framingham risk score (FRS) as the comparator

The Framingham 10-year risk score was computed using seven traditional risk factors: gender, age, single timepoint systolic blood pressure, Total Cholesterol (TC), High Density Lipoprotein (HDL), smoking status and presence of diabetes. A Framingham risk score of < 10% is consider low risk, while ≥ 20% is considered high risk30.

Modelling pipeline

Figure 1 shows the methodological framework of the present study. Exploratory analysis showed that ensemble MLA classifiers were superior at discriminating low risk individuals while ensemble MLA regressors performed better identifying individuals with high CVD risk. To leverage on the merits of both the classifiers and regressors MLA, we used both approaches for our model.

Figure 1
figure 1

Modelling flow chart using ensemble MLA for cardiovascular risk prediction.

The ensemble classifiers produce a binary prediction outcome; low or non-low risk. The ensemble regressors makes a numerical prediction on the calcium score for individuals classified as non-low risk, and stratify into three bins of low, high, and intermediate risk. The predicted numerical values may range from negative to positive number. Negative predicted values were first converted to zero and subsequently the continuous predictions were converted to discrete bins using unique value percentile discretization ensuring records with the same numerical prediction are assigned the same risk category. Finally, the prediction outcome resides in a decision node build on a rule-based logic. The decision node assigns an outcome of low risk if classifiers predict an individual to be low in CVD risk, high risk if classifier predicts non-low risk and regressor predicts high risk. Patients with incongruent classifiers and regressor outcomes are considered unclassified.

The ensemble models in both classification and regression phase each fit three base learners (naive bayes (NB), random forest (RF) and support vector classifier (SVC) for classification prediction, and generalized linear regression (GLM), support vector regressor (SVR) and stochastic gradient descent (SGD) for regression prediction). These base learners were chosen based on preliminary analysis, where these models showed efficiency in handling missing values and outliers.

The ensemble model then uses majority vote to determine the class label in classification phase. For the regression phase, the ensemble model averages the normalized predictions from the base regressor models to form a numerical outcome.

All models were trained on a stratified five-fold cross-validation. As SingHEART data had an imbalanced CVD risk distribution of risk based on the calcium score (low risk 63.4%, high risk 8.3%, intermediate risk 18.7%) we oversampled the training set for the minority class labels to allow model to better learn features from the under-represented classes31. The data were first partitioned into five mutually exclusive subsets, with each subset sharing the same proportion of class label as original dataset. At each iteration, the MLAs trained on four parts (80%) and validated on the fifth, the holdout set (20%). The process repeats five times, with five different but overlapping training sets. The resulting metrics from each fold were averaged to produce a single estimate.

To simulate access to the different variable groups as per clinical workflow and ease of information availability, we assessed the performance of individual variable group, and in combination as per the following:

Model 1: Survey Questionnaire.

Model 2: 24 h ambulatory blood pressure and heart rate.

Model 3: Clinical blood results.

Model 4: Model 1 + Model 2.

Model 5: Model 1 + Model 3.

Model 6: Model 1 to Model 3.

Model 6*: Model 1 to Model 3 with feature selection.

Model 7: Physical activity and sleep trackers (exploratory subset analysis).

Variables in model 6* were reduced using SVC recursive feature elimination with cross-validation (SVC-RFECV) method to automatically select the best set of predictors that yield the highest area under Receiver Operating Characteristic curves (AUC). Model 1–6 were trained using 600 subjects.

We also performed exploratory analysis using MLA on the Fitbit Charge HR data (Model 7). Model 7 was trained on a subset of 430 subjects constrained by availability of valid activity tracking data.

Evaluation methodology and metrics

Since no single metric can objectively evaluate the cardiovascular risk prediction, we evaluate the performance of our models at CVD risk class level using a panel of metrics; sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), F1-score and Area under Receiver Operating Characteristic curves (AUC). Overall discriminative ability of the model was described by the area under received operating characteristic curve (ROC). All AUC metrics were accompanied by 95% confidence interval (CI) and standard deviation (SD).

To better understand the relative importance of different risk factors, we conduct a post-hoc approach to rank the variables by their contribution to CVD risk prediction. Feature importance were obtained from the SVC algorithm where the relative importance was determined by the absolute size of the coefficients in relation to others. All statistical analyses were conducted on Python version 3.7 environment and all MLAs and evaluation metrics were constructed using Scikit-learn libraries.

Results

Baseline characteristics

The SingHEART data consist of 800 anonymized individuals. After excluding cases no coronary calcium scan and other missing information, 600 subjects were used for this analysis. Tables 2, 3, 4, 5 presents the demographics, lifestyle survey questionnaires, clinical blood test and activity tracking data characteristics stratified by CVD risk class. The p-values displayed are obtained by comparing low and high risk categories. Continuous variables are presented in mean values with ± standard deviations while variables are categorical responses are expressed in count and percentage.

Table 2 Demographics by risk categories.
Table 3 Self-reported lifestyle factors and 24 h blood pressure and heart rate monitoring data by risk categories.
Table 4 Blood test variables by risk categories.
Table 5 Fitbit Charge HR data by risk categories.

The cohort had a mean age of 49.6 years (range 29 to 69 years) and 46% were males. All the factors in the Framingham Risk score were significantly different between the low and high-risk classes on univariate analysis.

In novel parameters such as 24 h ambulatory blood pressure and heart rate, higher measures and derivatives of blood pressure measurement were congruously associated with increased risk (p-value < 0.001). Patients with lower risk had a lower mean average heart rate.

Model performance

AUC for 4 individual variable groups of survey questionnaires, clinical blood tests, 24 h blood pressure and heart rate monitoring, and activity tracker all performed better than the conventional FRS for both low risk and high risk patients (p-value < 0.001). Of all the individual variable groups, survey questionnaires achieved the highest AUC score for both low risk (AUC 0.715 95% CI 0.681–0.750) and high risk (AUC 0.710 95% CI 0.653–0.766). Adding clinical blood tests to survey questionnaire did not improve AUC for both the low risk (p-value = 0.441) and high risk (p-value = 0.715) categories. Adding 24 h blood pressure and heart rate monitoring significantly improved the overall performance compared to the Model 1 using survey questionnaire only, with significant p-values of 0.01 for low risk and 0.005 for high risk groups.

Table 6 demonstrated the cross validated model performance, by evaluating sensitivity, specificity, positive predictive value, negative predictive value, F1 and AUC. FRS had high sensitivity (91.4%) and low specificity (32.9%) in detecting low risk individuals, and low sensitivity (3.7%) and high specificity (99.3%) in detecting high risk individuals. The MLA models achieved a better balance between sensitivity and specificity.

Table 6 Performance of conventional Framingham Risk Score and MLA models by variable groups in low risk categories.

The continuous net reclassification of the lifestyle questionnaire survey variables over FRS in our population were 18% for low cardiovascular risk prediction and 39% for high cardiovascular risk prediction. For the combined Model 6*, the continuous net reclassification over FRS were 25% and 119% for low and high risk categories respectively. Figure 2 shows the receiver operating curves comparing the various models in the low and high cardiovascular risk groups based on their CAC.

Figure 2
figure 2

ROC curves for low risk group (left) and high risk group (right). Colours and line style represent the prediction performance for different models. Prediction performance for both low and high risk groups were significantly better in model 5* compared to FRS.

Conventional risk factor variables such as age, blood pressure readings, gender and family history of ischemic heart disease were the top ranking contributors to risk prediction in Model 1 (lifestyle survey). Other less conventional but important contributors include self-assessed physical activity and sleep hours.

For Model 2, 24-h blood pressure and heart rate monitoring, percentage time of blood pressure > 120/80 mmHg appeared to be most important compared to other blood pressure readings. Average real variability of blood pressure during wake period and percentage time of nocturnal diastolic hypertension ≥ 70 mmHg were also featured by the model.

In Model 3, clinical blood test variables, conventional risk factor variables of glucose, AST, haemoglobin, albumin and cholesterol readings topped the feature importance ranking.

In the exploratory analysis concerning activity tracking data, minutes in “fairly active” and “very active”, and sleep-related activity log particularly, minutes of REM and minutes of light sleep data were more important features than average daily steps, distance and floors.

Summing all the factors, age, medication for blood pressure and dyslipidemia, albumin, glucose, wake period diastolic hypertension, LDL cholesterol, self-reported physical activity were the top predictors across multiple models (see Fig. 3).

Figure 3
figure 3

The top 15 features of MLA models showing the relative importance of the different variables in CVD risk prediction. Age, glucose, cholesterol LDL, wake period blood pressure variability, medication for BP and dyslipidemia, triglycerides and albumin reading were some common predictors across the different versions.

Discussion

This study looked at four groups of variables (survey questionnaires, clinical blood tests, 24 h ambulatory blood pressure and heart rate monitoring and activity trackers) and their association with CAC score, for cardiovascular risk classification. We designed our modelling approach by first examining the discriminatory performance of variables in readily accessible, self-reported survey questionnaire group, which did not require equipment or blood test. The incremental contribution to the models’ performance were examined by sequentially adding other groups of variables, simulating availability of information as per clinical workflow. This was compared to the traditional FRS framework.

Previous well-established risk scores such as FRS1, SCORE2 and QRISK2 score3 were mostly derived using traditional risk factors like age, total cholesterol, HDL, systolic BP, smoking and diabetes, excluding physical activity, lifestyle and dietary habits. In our study, we found the risk estimation derived from the FRS framework to be suboptimal with an AUC of 0.622 and 0.515 when applied on the Asian population of low and high risk profiles respectively. The moderate performance of FRS in our cohort corresponds prior published literature in primary care clinics in Asia32, although some other larger cohort studies suggest higher areas under the curve of up to 0.76833,34. While traditional risk factors remain robust, we hypothesize that non-traditional, personalized risk factors such as dietary intake, physical activity and ambulatory blood pressure can contribute to individual cardiovascular risk assessment. Recent studies such as CARDIA35 has demonstrated such potential, and we explored these novel variables using machine learning algorithms. Beyond enhancing individualised cardiovascular risk prediction, this allows users to identify modifiable behavioural factors that can improve risk profiles.

In this healthy Asian ethic population, we found that variables from survey questionnaire achieved an AUC of 0.715 and 0.710 for individuals with low and high CVD risk respectively. Interestingly, we observed that the addition of clinical blood tests on top of survey questionnaire risk factors did not significantly enhance ensemble MLA’s ability in classifying low and high cardiovascular risk, with non-significant p-values when the combined model (Model 5) was compared to the survey questionnaire model (Model 1). This suggests that potential of designing MLA-based survey questionnaire that can be easily implemented, for risk stratification. The survey questionnaire, without need for blood tests is less cumbersome and can be implemented as a population-wide survey, to risk stratify patients. This finding complements the currently available health risk appraisals36 which highlights health risk, but does not diagnose or risk stratify patients, which our current model can do. Our model can potentially vary risk outputs based on changes in lifestyle behaviours included within the questionnaires; this gives patients an actionable plan beyond medications, to reduce their cardiovascular risk.

The ideal cut-off for hypertension has been a constant debate37,38,39 and our study revealed interesting predictors which requires further study. While in-clinic and self-measured blood pressure are single timepoint measurements, they do not reflect the actual variability and time-in-range for blood pressure when a person is performing their activities. There has been varying results in the correlation of blood pressure with cardiovascular events and end-organ outcomes40,41,42. However there has been supporting studies, suggesting that the blood pressure of 120/80 will be optimal in preventing adverse cardiovascular events, especially strokes42,43,44. Our MLA models have identified that a greater percentage time in blood pressure < 120/80 is associated with a better cardiovascular profile. This brings about a new concept of time in range, which is an increasingly important measure in diabetology45, Our study suggests that time-in-range may be extrapolated to hypertension. Additionally, our study also indicated the importance of the daytime variability of blood pressure, which is increasingly recognised to be a marker of cardiovascular risk to be also an important contributor. This concept is supported by recent studies demonstrating association of increased variability with cardiovascular risk46,47,48. Although current blood pressure monitoring devices are single time-point, future wearables may be able to provide the time-in-range readouts and diurnal variability, which were important components associated with atherosclerosis in our study.

The physical activity data in our subgroup also revealed interesting findings in that active minutes were more important than total step count in predicting coronary atherosclerosis. This suggests that achieving the required metabolic equivalents and target heart rate is more important than distance travelled or steps taken in line with physical activity guideline of achieving 150 min of moderate physical exercise per week49.

A practical application of our findings would be in terms of statin prescription, by being able to modestly discern low risk and non-low risk, defined as calcium score 0 and calcium more than 0. The American College of Cardiology suggests patients with zero calcium score on coronary arteries (very low risk patients) can defer of statin therapy in the absence of elevated cardiac risk of ≥ 20% in 10 years50. In this study, we found our ensemble MLA performed better than the Framingham risk score in identifying low risk individuals (p-value < 0.001).

While there have been numerous studies on CVD risk prediction, studies involving the application of ensemble MLA on contemporary risk factors such as lifestyle and ambulatory physiological data on Asian population remains understudied. In51, a study modelled on survey-based responses suggest promising findings in detection of cardiovascular risk patients. Our work extends previous findings by examining the predictive value of the different groups of risk factors and their combined effect to classify CVD risk among healthy asymptomatic individuals in Asian population. Another key contribution of our study is identifying novel risk factors which contributes to CVD risk classification. Our approach prioritizes on easily obtainable variables where inputs to the risk prediction models is not restricted to laboratory or other advanced cardiac imaging test for classification of CVD risk; our models are versatile in that while providing more information helps refine risk prediction, simple health behaviour and lifestyle inputs can already provide a risk prediction. From a population health perspective, this helps create patient self-awareness of health status, and motivate higher risk patients to seek therapy early, thereby lowering health care expenditure in long run. This work therefore present opportunities for use of self-assessed questionnaire data as a preliminary low-cost option to screen healthy individuals for CVD risk. Finally, we also demonstrated the suitability of machine learned models when on applied on dataset with numerous potential predictors. The use of an ensemble modelling technique to synthesize the outcome of multiple base learners can increase model’s robustness and prevent overfitting.

Limitation and future work

In our subanalysis of physical activity Fitbit charge HR parameters, we found that data from such devices were unable to risk stratify patients with high confidence. We attribute the inconclusive performance due to relatively small sample size of patients with adequate Fitbit data, especially for patients in the high risk categories. Patients with high CVD risk accounts for 9.2% (55 out of 600) of the dataset in comparison to 70.2% (421) patients in low risk. Congruent with prior studies, we found associations between activity tracker determined physical activities, sleeping hours and sleep quality with cardiovascular health52, but we will need a larger sample size study before such parameters can be reliably incorporated into a risk model.

Our study is limited by a smaller sample size of patient with high CVD risk defined as calcium score ≥ 100. Individuals with high CVD risk accounts for 20.1% (124) of the dataset in comparison to 70.2% (421) individuals in low risk. We addressed the class-imbalance problem with synthetic minority oversampling technique (SMOTE) by generating synthetic samples of the minority class. SMOTE will not only mitigates the problem of overfitting caused by random oversampling, it will also create more instances of the minority class for MLA to learn53. We also performed only internal validation. This model is built on data from an Asian population, applicability to other populations will require further calibration. Additionally, we only assessed the performance of the model in high and low risk patients; this is due to the limited sample size and to prevent overfitting of the data. We will present this data after the completion of our prospective trial consisting of at least 2000 patients.

As an extension to current work, longitudinal follow-up information will be added enriched the dataset by examining the continuity of each variable across different time points. A prospective trial evaluating this model is planned to provide a larger sample size for learning and model evaluation. Deep learning frameworks capable of capturing the complex interactions while preserving the order and temporal elements of the multiple readings can be explored in place of MLAs for more accurate CVD risk classification.