The Framingham Heart Study is a long term prospective study of the subtypes of cardiovascular disease among a population of free living subjects in the community of Framingham Massachusetts, the study started in the late 1940’s 1. In 1998, The Framingham Risk Score was introduced to estimate the 10-year cardiovascular risk of an individual using statistical methods 2. Since then and based on this study, several researchers explored the use of statistical and machine learning model to create better predictions of risks associated with cardiovascular diseases. Other studies used similar datasets to experiment the utility and performance of machine learning methods and investigate their usefulness in predicting the risk subtypes of cardiovascular diseases that are not covered by the framingham study like risk for stroke, transient ischemic attack (TIA), and heart failure. These were not covered by the framingham study and considered as one of its shortcomings until their addition in 2008 3. Methods used in several studies included Support Vector Machines (SVM), Classification Trees, AdaBoost using trees, logistic regression, and Naïve Bayes. Classification trees are a commonly used model to classify patients according to possibility of a disease occurrence. Classification trees can suffer from limited accuracy 4. Alternatively, bootstrap aggregation (bagging), boosting, and random forests are used to enhance the performance of such trees by using them as weak learners. Conventional logistic regression, nevertheless, still maintains a good performance for predicting the probability of Coronary Heart Disease risk compared with the methods proposed in the data-mining literature. A more sophisticated and algorithmic approach considered the history of a patient’s records (EHR records) and predict whether each individual patient will be hospitalized in the following year, which also uses risk matrices such as the heart disease risk factor that emerged out of the Framingham study 5.