Machine Learning Models in Prediabetes Screening: A Systematic Review
Correspondence Address :
Azmawati Mohammed Nawi,
Associate Professor, Department of Community Health, Faculty of Medicine, Cheras, Wilayah Persekutuan Kuala Lumpur, Malaysia.
E-mail: azmawati@ppukm.ukm.edu.my
Introduction: The increasing prevalence of type 2 Diabetes Mellitus (DM) can be done from identifying those with prediabetes and offer early interventions by utilising prescreening diagnostic tools. Machine learning algorithms and big data mining approaches have been postulated for predictive disease modelling in hospital and clinical settings.
Aim: To outline the relative performance accuracies in predicting prediabetes conditions in different machine learning algorithms.
Materials and Methods: A systematic literature search was conducted at Universiti of Kebangsaan, Kuala Lumpur, Malaysia, based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) review protocol, and the research question was formulated based on the keywords of “Prediabetes” (Population), “Internet of Things” and “prediction model” (Intervention) and “screening” and “risk” (Outcome). International Prospective Register of Systematic Reviews (PROSPERO) registration (CRD42021264947) was done and databases were screened on 10th-24th June 2021 via Web of Science, Scopus, PubMed, Ovid and EBSCOhost. Inclusion criteria was English language prediction studies published between 2011-2021. Review articles, editorials, proceedings, commentary articles and articles not focusing on prediabetes were excluded. The quality of the articles was ranked via the Prediction Model Risk of Bias Assessment Tool (PROBAST).
Results: A total of five articles that were published in 2014-2021 were included. The sample sizes ranged from 570 to 24,331 participants. Three studies (South Korea, United State of America (USA), Japan) suggested the applicability of the screening score prediction models for use in clinical settings related to personalised risk assessment and targeted interventions, with the predictors used being suitable for either the clinic or hospital. The simplicity of gender, age, Body Mass Index (BMI), blood pressure and waist circumference as predictors suggested that they can be utilised by the community.
Conclusion: This review highlights the fact that the heterogeneity of the population used and validation issues may affect generalisation. Future studies should address these concerns to guide advocacy among healthcare providers in clinical practice as well as in data and expertise sharing for developing and validating urgently needed prediabetic prediction models.
Artificial intelligence, Detection, Impaired glucose tolerance, Internet of things, Predevelopment diabetes, Prediction model, Risk
Diabetes Mellitus is one of the world’s most serious public health problems, causing premature death and imposing a heavy global disease burden. It is among the top ten causes of death, causing an estimated 1.5 million deaths globally in 2019 (1). The overall global burden of diabetes has increased significantly and will continue to rise in the next few decades. There was an increment of 102.9% in the global incidence of diabetes from 11.3 million in 1990 to 22.9 million in 2017. Global Diabetes-Associated Disability Adjusted Life Years (DALYs) increased by 116.7% from 32.3 million in 1990 to 70.4 million in 2019 (2). In 2021, an estimated 537 million adults had diabetes, and almost half (240 million) of them were undiagnosed (3). The prevalence of diabetes is predicted to rise to 570.9 million in 2025 and to 693 million in 2045 (2),(4). If this particular concern remains undetected and undiagnosed, it will be left untreated. Furthermore, it can lead to potential complications such as heart disease, kidney disease, diabetic retinopathy and neuropathy. The complications of DM impose significant adverse health impacts and economic burdens on countries and their healthcare systems (5).
Prediabetes is a condition where blood glucose levels are higher than normal but below the diabetes threshold (Fasting Plasma Glucose (FPG) ≥7.8 and <11.1 mmol/L) (6). It is an intermediate stage between DM and normal glucose tolerance and is defined either as impaired fasting blood glucose or Impaired Glucose Tolerance (IGT) (7). It is regarded as a high risk condition, with a high likelihood of progressing to diabetes (8). Annually, approximately 5-10% of people with prediabetes develop diabetes; however, the conversion rates vary depending on the definition of prediabetes and the population characteristics (9),(10). In 2017, an estimated 7.7% of the world population (374 million) had IGT, and this number is expected to increase to 587 million (equal to 8.4%) in 2045 (4). It has been established that lifestyle modifications and pharmaceutical interventions reduce the incidence of diabetes by an average of 20% relative risk reduction (11). Thus, one strategy for addressing the increasing prevalence of type 2 DM is to identify those with prediabetes and offer such interventions.
Risk assessment tools can be designed to predict the likelihood of a particular health outcome based on a person’s attributes and risk factors. By allowing screening to be focused on people at the highest risk, risk assessment tools aid optimisation of the resources necessary for identifying illnesses, which are typically limited (12). Thus, risk assessment tools are useful for identifying people with prediabetes who may benefit from intervention, with many advocating them as the first stage in a screening programme (13). Prescreening diagnostic tools have enabled clinicians to make better judgments and diagnose patients more quickly (14). Any delay in disease detection might result in irreversible complications, such as blindness in diabetic retinopathy or end-stage renal failure in diabetic nephropathy (15). Given the extensive screening and diagnosis processes required to protect the population from a variety of serious illnesses, certain healthcare sectors may be experiencing a shortage of diagnostic experts. As a result, computer assisted technology can be used to aid in the prescreening process, resulting in improved diagnosis and prognosis.
Machine learning is an area of artificial intelligence research that tries to learn from past experiences and uses tools such as statistics, probabilistic and optimisation algorithms for classifying newly input data (16). Previously, a powerful statistical analysis such as multivariate regression or correlation analysis was effective for constructing models by linearly combining the relevant variables (17). However, the digitisation of medical records has resulted in a wealth of multidimensional data being stored in health databases. It represents a unique opportunity for advanced machine learning approaches to pattern recognition and prediction (18). Unlike the traditional statistical method, machine learning methods use a wide range of parameters such as Boolean logic, absolute restriction, conditional probabilities, and unconventional optimisation methods for classification, nearly resembling that of a human being. Although most machine learning approaches draw concepts from statistics and probability, it has become a more powerful classification tool because it can generate a decision or inference from a dataset that conventional statistical techniques cannot (19).
Numerous machine learning techniques have been applied in clinical settings for the purpose of disease prediction, and have demonstrated a higher diagnostic accuracy than conventional methods (19). Support Vector Machines (SVM), Artificial Neural Networks (ANN), Naïve Bayes algorithm and Random Forest (RF) are widely used machine learning approaches in disease risk prediction (20). Machine learning algorithms and big data mining approaches have improved diabetes screening and prediction (21),(22). Given the increased applicability and effectiveness of machine learning algorithms for predictive disease modeling in hospital and clinical settings, we discovered little research that provides a complete evaluation of published publications that used machine learning algorithms for predicting prediabetes (23). Therefore, this review is aimed at outlining the relative performance accuracies of different machine learning algorithms in predicting prediabetes conditions.
The present systematic review was initiated at Universiti of Kebangsaan Malaysia, Kuala Lumpur, Malaysia and guided by the PRISMA review protocol (24). A systematic literature search was conducted and registered under PROSPERO (CRD42021264947) and can be retrieved at https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42021264947 (25). The PRISMA protocol aims to prompt researchers to source the right information with an accurate level of detail. Based on this protocol, we started the systematic literature review by formulating the appropriate research question. The systematic search consisted of three main sub-processes: identification, screening, inclusion.
Research question formulation: Here, the formulation of the research question was based on Patient/Population, Intervention, Comparison and Outcomes (PICO), a tool that aids authors in developing a suitable research question for a review. It is based on three main concepts: Population or problem, interest or intervention, and context or comparison and outcome (26). Based on these concepts, the three main aspects included in the review were prediabetes (population), the Internet of Things and prediction models (intervention) and screening and risk (outcome), which guided the formulation of the main research objective.
Systematic Search Strategy
The three main processes in the systematic search strategy were identification, screening and inclusion.
Identification: The identification process involved searching for any synonyms, Medical Subject Heading (MeSH) terms, related terms and variation of the main keywords: prediabetes, Internet of Things, prediction model, screening and risk (Table/Fig 1). This process provided greater coverage for finding related articles in the selected databases (Web of Science, Scopus, PubMed, Ovid, EBSCOhost) for the literature search, which was conducted within two weeks (10th-24th June 2021). Some of the distinct features of these databases were a large collection of literature and advanced search functions. We obtained 776 articles from the databases used; 28 duplicates were found and removed. The process returned 750 articles, and two articles were added from other sources, which were citations searching from the reference list of the initial included articles.
Screening: The 750 articles were screened with each database’s sorting function. The article inclusion criteria were: journal article, written in English, published in 2011-2021, observational and interventional study with qualitative or quantitative data. We excluded systematic reviews, comments or letters to the editor, abstracts from conferences, animal studies and in vivo or in vitro studies. Two teams of two or three authors independently screened the studies for inclusion. If there was disagreement at any stage, discussion leading to a consensus was made by a third author from the other team. We excluded 705 articles due to irrelevant population, intervention or outcome.
Eligibility for inclusion: The eligibility process was aimed at selecting the articles that fulfilled the study objective and was based on reading the article title and abstract. About 45 articles were manually sorted articles that satisfied the outcome of usage or development of machine learning models in prediabetes screening. Studies not related to the interest and intended outcome were excluded. Via this process, 36 articles were excluded based on irrelevant intervention, i.e., utilising only regression analysis; incomplete measures of effects on outcome; and predicting other than the prediabetes population, for example, metabolic syndrome, gestational diabetes and diabetes. In the final eligibility process, only five articles were included (Table/Fig 2) (27),(28),(29),(30),(31).
Data Extraction and Analysis
Thematic analysis was used in the present systematic review because it is considered in synthesising and integrating mixed research designs (32). Thematic analysis is a descriptive analysis that allows data to be merged with other data analysis techniques (33). The selected five articles were read in detail, especially the abstract, method, results and discussion sections. Then, the data were extracted based on whether the study was able to answer its own research questions, and the findings were simplified as tabulated in (Table/Fig 3) (27),(28),(29),(30),(31). Only after these lengthy processes thematic analysis was done. To generate relevant themes, each author identified patterns of extracted data from the reviewed articles and grouped them before successfully categorising them into different themes of screening tool in regards to practicality, usability, generalisation, missing data and validation. The themes accuracy, usefulness and accurate data representation were re-reviewed. The developed themes were then submitted to a group of panel experts well versed in systematic reviews and in public health related research. The panel expert group subsequently agreed on the themes generated as being appropriate and accurate to the results of the review.
Quality Appraisal
The quality of the final list of studies was ranked according to the PROBAST to facilitate an objective assessment of the Risk Of Bias (ROB) and relevance of studies that established, validated or latest prediction models for individualised predictions in a focused and transparent manner (34). The PROBAST was recently developed by a steering group that considered existing ROB tools and reporting guidelines and was informed by a delphi procedure involving 38 experts, and refined through piloting (34). In the present study, the teams extracted data from all included studies and assessed the ROB. If there was disagreement at any stage, discussion leading to a consensus was made by a third author from the other team. We performed qualitative analysis and appraisal of the included articles by extracting all relevant information using a predesigned standardised data extraction form. PROBAST is organised into the following four domains: participants, predictors, outcome, and analysis of different types of prediction studies, i.e., development or validation or both.
The PROBAST has a total of 20 key questions to help with the classification of ROB as low, high or unclear, and includes signalling questions to help make judgements. Signalling questions are rated as yes, probably yes, probably no, no or no information. Finally, overall judgement was made about the ROB and concerns regarding the applicability of the prediction model evaluation across all assessed domains (Table/Fig 4).
Five articles were included in this review and had been published in 2014-2021. The sample sizes of the studies ranged from 570 to 24,331 participants. Four studies were from Asia: South Korea, Japan, Qatar and China. Only one study was from North America, i.e., the USA. The types of machine learning involved were Artificial Neural Network (ANN), Support Vector Machine (SVM), Reverse Engineering Forward Simulation (REFS), XGboost (XGB), RF, Gradient Boosting Machine (GBM), Deep Learning (DL) and GA-XGBT. All studies involved the development of prediction models for prediabetes screening. Three studies used the prediabetes definition based on American Diabetes Association guidelines, while the other two studies used different sets of definitions pertaining to their own country (South Korea and China). The differing units used and ranges resulted in slightly lower thresholds for FPG. Using PROBAST, all five studies had low ROB.
Study population and databases: The largest data set was obtained retrospectively for 2007-2012 from the electronic health records of the Humedica database on US adults, whereby 24,331 adults without type 1 diabetes were entered in the database with blood glucose in the low risk range (normoglycaemic) (28). This was followed by the study from Japan, which analysed data from comprehensive medical check-ups (2006-2017) involving 9,906 healthy office workers without serious diabetes or advanced renal failure and who were aged 40-60 years (29). The study from Qatar reported the clinical, anthropometric and demographic data of 7,386 people aged between 18 to 86 years from the Qatar Biobank, which has been collecting data from the general population since 2012, from which participants with Body Mass Index (BMI) <18.5 kg/m2 were excluded (30). The South Korean study used data from the 2010 Korean National Health and Nutrition Examination Survey (KNHANES) involving 4,685 participants, excluding participants with diabetes, and involved South Korean adults aged 51-54 years (27). Lastly, the study from China used 2011-2019 data from Shuguang hospital affiliated with Shanghai university of traditional chinese medicine, comprising 570 prediabetic participants aged 57-68 years (31).
Validation and missing data: Three studies performed internal validation using the same database, and it was either ten fold cross-validation or five fold cross validation (29),(30),(31). The study from the USA performed external validation using datasets that were not from the Humedica database (28). The South Korean study performed both internal (10 fold cross validation) and external validation using 2011 KNHANES data (27). Regarding management of missing data, two studies excluded missing data (27),(29) and two studies used imputation (28),(30). The study from China did not mention missing data (31).
ROC accuracy and applicability of machine learning in different settings: Three studies suggested the applicability of the screening score prediction models for use in clinical applications related to personalised risk assessment and targeted interventions (27),(28),(29). The South Korean study used age, gender, family history of diabetes, alcohol intake, BMI, waist circumference, FPG and systolic and diastolic blood pressure as predictors in the ANN and SVM models, which showed Areas Under the Curve (AUC) of 0.768 and 0.761, respectively (27). Anderson JP et al., reported an AUC of 0.72 for its REFS, which used age, BMI, HDL, triglycerides, alanine transaminase, CRP and body temperature as predictors (28). In 2018, using a complete oral glucose tolerance test profile that consisted of one-hour plasma glucose, one-hour immunoreactive insulin, two-hour plasma glucose and two-hour immunoreactive insulin, reported AUC of 0.75-0.78 for the two XGboost models, which outperformed the logistic regression model (29). Given the predictors used, these three machine learning algorithms are suitable for clinic and hospital settings.
The study from China used tongue image data sets, which consisted of deep feature, colour and texture feature and fusion of features. The authors reported that the GA_XGBT had an AUC of 0.93 (colour and texture feature), 0.816 (deep feature) and 0.914 (fusion of features) for predicting prediabetes (31). Using tongue image datasets was only applicable to clinical application in hospital settings, due to the fact that tongue images need to be collected by specially trained researchers using specialised machines (31).
Gender, age, BMI, blood pressure and waist circumference were used as predictors in DL, GBM, XGB and RF, and AUC of 0.81 were observed from all four approaches. Moreover, the study from Qatar reported that these four machine learning approaches did not outperform the logistic regression model (30). Due to their simplicity, the authors used gender, age, BMI, blood pressure and waist circumference as predictors, suggesting that other than the clinical setting, they can also be utilised by the community (general public) (16).
To the best of our knowledge, this is the first systematic review of the machine learning model approach used for screening people with prediabetes. The inclusion criteria were established in order to identify individuals who would benefit from interventions aimed at early detection and prevention of DM.
The machine learning based prediabetes risk score model provides an alternative screening tool that is inexpensive and simple to administer to people who appear to be healthy in the general population (35),(36). Compared to the traditional screening method of IGT testing, the prediction model incorporates other modifiable risk factors such as body temperature, smoking habit, BMI, tongue image information, blood pressure and waist circumference, which does not require any additional procedure and is non invasive (28),(30),(31),(35). The implication is reflected in the convenience and practicality for further increasing early detection strategies in population-based settings.
Despite the high accuracy of the prediabetes prediction models identified in the present review, their usability in the clinical field is an important issue to be considered. The application of the prediction models at tertiary healthcare centres requires the attention of clinical experts as in the case of tongue feature (31), imposing additional resources and cost for screening purposes. Not withstanding, implementing an artificial intelligence model with standardised equipment for collecting and interpreting information may not be cost-effective in the overwhelmed hospital setting. In contrast, routine clinical measures are preferred and more practicable for aiding mass screening in the population.
With the expansion of electronic health records, more robust and advanced computational approaches such as machine learning have become the focus of disease prediction research (20),(37). The supervised machine learning model considers complex nonlinear relationships between dependent and independent variables of multiple data types (38). The unique characteristic of an ANN includes a black box at the centre of the decision-making processes. However, it has been argued that the same machine learning model may produce results with varying degrees of accuracy for the same dataset based on the selection of different underlying parameters (35),(39). The high precision generates exclusive terms for specific populations, thus posing challenges to generalisation across different settings.
In general, the characteristics of the data used for developing the prediction models were well described; however, a few models sampled specific databases according to employment status and tertiary healthcare centres rather than population-based sampling (29),(31). Recruitment through specific databases is likely to result in a non representative population and hence should be avoided (40),(41). Future use of such prediction models should be approached cautiously, as the outcome will be accurate only for screening in a similar group of people.
Furthermore, it is imperative that model development studies clearly explain the treatment of missing data (38),(39),(42). The majority of the prediction model developers analysed in the present review opted to exclude missing data rather than use imputation to assign plausible values to fit in (29),(31),(43). Multiple imputation produces more valid results and better discrimination compared to excluding missing data upon integration into the model because it accounts for the average values of the parameters used within a model, hence reading through the trends (20),(44). By including the missing data, a prediction model justifies the true real world population characteristics.
Validation of the prediction model is crucial for evaluating the discrimination ability and calibration to support the stability of the suggested model (20). External validation, regarded as the gold standard, should be performed prior to considering a prediction tool for use in the real world setting (37),(41). However, only two of the studies included in the present review reported on external validation. Whereas the prediction model ensembles performed reasonably well in predicting prediabetes, with AUC ranging between 0.72 and 0.93, critical observations should be made during replication in other study populations for potential bias.
Evaluating the application of the prediction model in clinical practice is vital before proceeding with advocacy activities (40),(45). All of the articles included in this review discussed the subsequent impact on healthcare practises, highlighting the importance of a greater emphasis on model use prior to development. Based on personalised risk assessment, the healthcare provider can plan more targeted interventions tailored to the person’s need (28),(29),(30). With the growing nature of electronic health information data-keeping, prediction models using machine learning provide better usability for the healthcare provider to stratify patients according to different risk factors.
The prediction models for prediabetes reviewed here incorporate the computational approach, utilising big data for model development. Big data enables greater insight into information on the real world population, hence the representativeness. The factor variables used in the model development accurately measure a specific individual characteristic, which is based on routine clinical parameters used in healthcare settings.
Limitation(s)
There are limited studies on the development of prediction models for prediabetes and machine learning to date, thus, it is difficult to assess the superiority of one model over another. In the present review, prediction models derived from certain populations, particularly high-income countries, may not be applicable to other populations with different regional settings due to the distinctive genetic makeups and socio-economic backgrounds.
Prediction model studies on prediabetes are available and appear to show good accuracy outcomes. However, this review highlights the fact that the heterogeneity of the population used and validation issues may affect generalisation. Future studies should address these concerns to guide advocacy among healthcare providers. While the clinical data measured vary widely between prediction studies, a comparison is only possible when a common dataset benchmark is established. Therefore, there is an urgent need for data and expertise sharing for developing and validating prediabetic prediction models.
Author contributions: All authors contributed to the design and implementation of the research, analysis of the results and writing of the manuscript.
DOI: 10.7860/JCDR/2022/53411.16385
Date of Submission: Nov 23, 2021
Date of Peer Review: Jan 08, 2022
Date of Acceptance: Feb 23, 2022
Date of Publishing: May 01, 2022
AUTHOR DECLARATION:
• Financial or Other Competing Interests: None
• Was Ethics Committee Approval obtained for this study? NA
• Was informed consent obtained from the subjects involved in the study? NA
• For any images presented appropriate consent has been obtained from the subjects. NA
PLAGIARISM CHECKING METHODS:
• Plagiarism X-checker: Nov 22, 2021
• Manual Googling: Feb 22, 2022
• iThenticate Software: Mar 22, 2022 (18%)
ETYMOLOGY: Author Origin
- Emerging Sources Citation Index (Web of Science, thomsonreuters)
- Index Copernicus ICV 2017: 134.54
- Academic Search Complete Database
- Directory of Open Access Journals (DOAJ)
- Embase
- EBSCOhost
- Google Scholar
- HINARI Access to Research in Health Programme
- Indian Science Abstracts (ISA)
- Journal seek Database
- Popline (reproductive health literature)
- www.omnimedicalsearch.com