JCDR - Binary prediction, Naïve Bayes model, Predictive accuracy

Abstract Material and Methods Results Discussion Conclusion Acknowledgement References DOI and Others

Article in PDF How to Cite Citation Manager Readers' Comments (0) Audio Visual Article Statistics Link to PUBMED Print this Article Send to a Friend

Advertisers Access Statistics Resources

Dr Mohan Z Mani

"Thank you very much for having published my article in record time.I would like to compliment you and your entire staff for your promptness, courtesy, and willingness to be customer friendly, which is quite unusual.I was given your reference by a colleague in pathology,and was able to directly phone your editorial office for clarifications.I would particularly like to thank the publication managers and the Assistant Editor who were following up my article. I would also like to thank you for adjusting the money I paid initially into payment for my modified article,and refunding the balance.
I wish all success to your journal and look forward to sending you any suitable similar article in future"

Dr Mohan Z Mani,
Professor & Head,
Department of Dermatolgy,
Believers Church Medical College,
Thiruvalla, Kerala
On Sep 2018

Prof. Somashekhar Nimbalkar

"Over the last few years, we have published our research regularly in Journal of Clinical and Diagnostic Research. Having published in more than 20 high impact journals over the last five years including several high impact ones and reviewing articles for even more journals across my fields of interest, we value our published work in JCDR for their high standards in publishing scientific articles. The ease of submission, the rapid reviews in under a month, the high quality of their reviewers and keen attention to the final process of proofs and publication, ensure that there are no mistakes in the final article. We have been asked clarifications on several occasions and have been happy to provide them and it exemplifies the commitment to quality of the team at JCDR."

Prof. Somashekhar Nimbalkar
Head, Department of Pediatrics, Pramukhswami Medical College, Karamsad
Chairman, Research Group, Charutar Arogya Mandal, Karamsad
National Joint Coordinator - Advanced IAP NNF NRP Program
Ex-Member, Governing Body, National Neonatology Forum, New Delhi
Ex-President - National Neonatology Forum Gujarat State Chapter
Department of Pediatrics, Pramukhswami Medical College, Karamsad, Anand, Gujarat.
On Sep 2018

Dr. Kalyani R

"Journal of Clinical and Diagnostic Research is at present a well-known Indian originated scientific journal which started with a humble beginning. I have been associated with this journal since many years. I appreciate the Editor, Dr. Hemant Jain, for his constant effort in bringing up this journal to the present status right from the scratch. The journal is multidisciplinary. It encourages in publishing the scientific articles from postgraduates and also the beginners who start their career. At the same time the journal also caters for the high quality articles from specialty and super-specialty researchers. Hence it provides a platform for the scientist and researchers to publish. The other aspect of it is, the readers get the information regarding the most recent developments in science which can be used for teaching, research, treating patients and to some extent take preventive measures against certain diseases. The journal is contributing immensely to the society at national and international level."

Dr Kalyani R
Professor and Head
Department of Pathology
Sri Devaraj Urs Medical College
Sri Devaraj Urs Academy of Higher Education and Research , Kolar, Karnataka
On Sep 2018

Dr. Saumya Navit

"As a peer-reviewed journal, the Journal of Clinical and Diagnostic Research provides an opportunity to researchers, scientists and budding professionals to explore the developments in the field of medicine and dentistry and their varied specialities, thus extending our view on biological diversities of living species in relation to medicine.
‘Knowledge is treasure of a wise man.’ The free access of this journal provides an immense scope of learning for the both the old and the young in field of medicine and dentistry as well. The multidisciplinary nature of the journal makes it a better platform to absorb all that is being researched and developed. The publication process is systematic and professional. Online submission, publication and peer reviewing makes it a user-friendly journal.
As an experienced dentist and an academician, I proudly recommend this journal to the dental fraternity as a good quality open access platform for rapid communication of their cutting-edge research progress and discovery.
I wish JCDR a great success and I hope that journal will soar higher with the passing time."

Dr Saumya Navit
Professor and Head
Department of Pediatric Dentistry
Saraswati Dental College
Lucknow
On Sep 2018

Dr. Arunava Biswas

"My sincere attachment with JCDR as an author as well as reviewer is a learning experience . Their systematic approach in publication of article in various categories is really praiseworthy.
Their prompt and timely response to review's query and the manner in which they have set the reviewing process helps in extracting the best possible scientific writings for publication.
It's a honour and pride to be a part of the JCDR team. My very best wishes to JCDR and hope it will sparkle up above the sky as a high indexed journal in near future."

Dr. Arunava Biswas
MD, DM (Clinical Pharmacology)
Assistant Professor
Department of Pharmacology
Calcutta National Medical College & Hospital , Kolkata

Dr. C.S. Ramesh Babu
" Journal of Clinical and Diagnostic Research (JCDR) is a multi-specialty medical and dental journal publishing high quality research articles in almost all branches of medicine. The quality of printing of figures and tables is excellent and comparable to any International journal. An added advantage is nominal publication charges and monthly issue of the journal and more chances of an article being accepted for publication. Moreover being a multi-specialty journal an article concerning a particular specialty has a wider reach of readers of other related specialties also. As an author and reviewer for several years I find this Journal most suitable and highly recommend this Journal."
Best regards,
C.S. Ramesh Babu,
Associate Professor of Anatomy,
Muzaffarnagar Medical College,
Muzaffarnagar.
On Aug 2018

Dr. Arundhathi. S
"Journal of Clinical and Diagnostic Research (JCDR) is a reputed peer reviewed journal and is constantly involved in publishing high quality research articles related to medicine. Its been a great pleasure to be associated with this esteemed journal as a reviewer and as an author for a couple of years. The editorial board consists of many dedicated and reputed experts as its members and they are doing an appreciable work in guiding budding researchers. JCDR is doing a commendable job in scientific research by promoting excellent quality research & review articles and case reports & series. The reviewers provide appropriate suggestions that improve the quality of articles. I strongly recommend my fraternity to encourage JCDR by contributing their valuable research work in this widely accepted, user friendly journal. I hope my collaboration with JCDR will continue for a long time".

Dr. Arundhathi. S
MBBS, MD (Pathology),
Sanjay Gandhi institute of trauma and orthopedics,
Bengaluru.
On Aug 2018

Dr. Mamta Gupta,
"It gives me great pleasure to be associated with JCDR, since last 2-3 years. Since then I have authored, co-authored and reviewed about 25 articles in JCDR. I thank JCDR for giving me an opportunity to improve my own skills as an author and a reviewer.
It 's a multispecialty journal, publishing high quality articles. It gives a platform to the authors to publish their research work which can be available for everyone across the globe to read. The best thing about JCDR is that the full articles of all medical specialties are available as pdf/html for reading free of cost or without institutional subscription, which is not there for other journals. For those who have problem in writing manuscript or do statistical work, JCDR comes for their rescue.
The journal has a monthly publication and the articles are published quite fast. In time compared to other journals. The on-line first publication is also a great advantage and facility to review one's own articles before going to print. The response to any query and permission if required, is quite fast; this is quite commendable. I have a very good experience about seeking quick permission for quoting a photograph (Fig.) from a JCDR article for my chapter authored in an E book. I never thought it would be so easy. No hassles.
Reviewing articles is no less a pain staking process and requires in depth perception, knowledge about the topic for review. It requires time and concentration, yet I enjoy doing it. The JCDR website especially for the reviewers is quite user friendly. My suggestions for improving the journal is, more strict review process, so that only high quality articles are published. I find a a good number of articles in Obst. Gynae, hence, a new journal for this specialty titled JCDR-OG can be started. May be a bimonthly or quarterly publication to begin with. Only selected articles should find a place in it.
An yearly reward for the best article authored can also incentivize the authors. Though the process of finding the best article will be not be very easy. I do not know how reviewing process can be improved. If an article is being reviewed by two reviewers, then opinion of one can be communicated to the other or the final opinion of the editor can be communicated to the reviewer if requested for. This will help one’s reviewing skills.
My best wishes to Dr. Hemant Jain and all the editorial staff of JCDR for their untiring efforts to bring out this journal. I strongly recommend medical fraternity to publish their valuable research work in this esteemed journal, JCDR".

Dr. Mamta Gupta
Consultant
(Ex HOD Obs &Gynae, Hindu Rao Hospital and associated NDMC Medical College, Delhi)
Aug 2018

Dr. Rajendra Kumar Ghritlaharey

"I wish to thank Dr. Hemant Jain, Editor-in-Chief Journal of Clinical and Diagnostic Research (JCDR), for asking me to write up few words.
Writing is the representation of language in a textual medium i e; into the words and sentences on paper. Quality medical manuscript writing in particular, demands not only a high-quality research, but also requires accurate and concise communication of findings and conclusions, with adherence to particular journal guidelines. In medical field whether working in teaching, private, or in corporate institution, everyone wants to excel in his / her own field and get recognised by making manuscripts publication.

Authors are the souls of any journal, and deserve much respect. To publish a journal manuscripts are needed from authors. Authors have a great responsibility for producing facts of their work in terms of number and results truthfully and an individual honesty is expected from authors in this regards. Both ways its true "No authors-No manuscripts-No journals" and "No journals–No manuscripts–No authors". Reviewing a manuscript is also a very responsible and important task of any peer-reviewed journal and to be taken seriously. It needs knowledge on the subject, sincerity, honesty and determination. Although the process of reviewing a manuscript is a time consuming task butit is expected to give one's best remarks within the time frame of the journal.
Salient features of the JCDR: It is a biomedical, multidisciplinary (including all medical and dental specialities), e-journal, with wide scope and extensive author support. At the same time, a free text of manuscript is available in HTML and PDF format. There is fast growing authorship and readership with JCDR as this can be judged by the number of articles published in it i e; in Feb 2007 of its first issue, it contained 5 articles only, and now in its recent volume published in April 2011, it contained 67 manuscripts. This e-journal is fulfilling the commitments and objectives sincerely, (as stated by Editor-in-chief in his preface to first edition) i e; to encourage physicians through the internet, especially from the developing countries who witness a spectrum of disease and acquire a wealth of knowledge to publish their experiences to benefit the medical community in patients care. I also feel that many of us have work of substance, newer ideas, adequate clinical materials but poor in medical writing and hesitation to submit the work and need help. JCDR provides authors help in this regards.
Timely publication of journal: Publication of manuscripts and bringing out the issue in time is one of the positive aspects of JCDR and is possible with strong support team in terms of peer reviewers, proof reading, language check, computer operators, etc. This is one of the great reasons for authors to submit their work with JCDR. Another best part of JCDR is "Online first Publications" facilities available for the authors. This facility not only provides the prompt publications of the manuscripts but at the same time also early availability of the manuscripts for the readers.
Indexation and online availability: Indexation transforms the journal in some sense from its local ownership to the worldwide professional community and to the public.JCDR is indexed with Embase & EMbiology, Google Scholar, Index Copernicus, Chemical Abstracts Service, Journal seek Database, Indian Science Abstracts, to name few of them. Manuscriptspublished in JCDR are available on major search engines ie; google, yahoo, msn.
In the era of fast growing newer technologies, and in computer and internet friendly environment the manuscripts preparation, submission, review, revision, etc and all can be done and checked with a click from all corer of the world, at any time. Of course there is always a scope for improvement in every field and none is perfect. To progress, one needs to identify the areas of one's weakness and to strengthen them.
It is well said that "happy beginning is half done" and it fits perfectly with JCDR. It has grown considerably and I feel it has already grown up from its infancy to adolescence, achieving the status of standard online e-journal form Indian continent since its inception in Feb 2007. This had been made possible due to the efforts and the hard work put in it. The way the JCDR is improving with every new volume, with good quality original manuscripts, makes it a quality journal for readers. I must thank and congratulate Dr Hemant Jain, Editor-in-Chief JCDR and his team for their sincere efforts, dedication, and determination for making JCDR a fast growing journal.
Every one of us: authors, reviewers, editors, and publisher are responsible for enhancing the stature of the journal. I wish for a great success for JCDR."

Thanking you
With sincere regards
Dr. Rajendra Kumar Ghritlaharey, M.S., M. Ch., FAIS
Associate Professor,
Department of Paediatric Surgery, Gandhi Medical College & Associated
Kamla Nehru & Hamidia Hospitals Bhopal, Madhya Pradesh 462 001 (India)
E-mail: drrajendrak1@rediffmail.com
On May 11,2011

Dr. Shankar P.R.

"On looking back through my Gmail archives after being requested by the journal to write a short editorial about my experiences of publishing with the Journal of Clinical and Diagnostic Research (JCDR), I came across an e-mail from Dr. Hemant Jain, Editor, in March 2007, which introduced the new electronic journal. The main features of the journal which were outlined in the e-mail were extensive author support, cash rewards, the peer review process, and other salient features of the journal.
Over a span of over four years, we (I and my colleagues) have published around 25 articles in the journal. In this editorial, I plan to briefly discuss my experiences of publishing with JCDR and the strengths of the journal and to finally address the areas for improvement.
My experiences of publishing with JCDR: Overall, my experiences of publishing withJCDR have been positive. The best point about the journal is that it responds to queries from the author. This may seem to be simple and not too much to ask for, but unfortunately, many journals in the subcontinent and from many developing countries do not respond or they respond with a long delay to the queries from the authors 1. The reasons could be many, including lack of optimal secretarial and other support. Another problem with many journals is the slowness of the review process. Editorial processing and peer review can take anywhere between a year to two years with some journals. Also, some journals do not keep the contributors informed about the progress of the review process. Due to the long review process, the articles can lose their relevance and topicality. A major benefit with JCDR is the timeliness and promptness of its response. In Dr Jain's e-mail which was sent to me in 2007, before the introduction of the Pre-publishing system, he had stated that he had received my submission and that he would get back to me within seven days and he did!
Most of the manuscripts are published within 3 to 4 months of their submission if they are found to be suitable after the review process. JCDR is published bimonthly and the accepted articles were usually published in the next issue. Recently, due to the increased volume of the submissions, the review process has become slower and it ?? Section can take from 4 to 6 months for the articles to be reviewed. The journal has an extensive author support system and it has recently introduced a paid expedited review process. The journal also mentions the average time for processing the manuscript under different submission systems - regular submission and expedited review.
Strengths of the journal: The journal has an online first facility in which the accepted manuscripts may be published on the website before being included in a regular issue of the journal. This cuts down the time between their acceptance and the publication. The journal is indexed in many databases, though not in PubMed. The editorial board should now take steps to index the journal in PubMed. The journal has a system of notifying readers through e-mail when a new issue is released. Also, the articles are available in both the HTML and the PDF formats. I especially like the new and colorful page format of the journal. Also, the access statistics of the articles are available. The prepublication and the manuscript tracking system are also helpful for the authors.
Areas for improvement: In certain cases, I felt that the peer review process of the manuscripts was not up to international standards and that it should be strengthened. Also, the number of manuscripts in an issue is high and it may be difficult for readers to go through all of them. The journal can consider tightening of the peer review process and increasing the quality standards for the acceptance of the manuscripts. I faced occasional problems with the online manuscript submission (Pre-publishing) system, which have to be addressed.
Overall, the publishing process with JCDR has been smooth, quick and relatively hassle free and I can recommend other authors to consider the journal as an outlet for their work."

Dr. P. Ravi Shankar
KIST Medical College, P.O. Box 14142, Kathmandu, Nepal.
E-mail: ravi.dr.shankar@gmail.com
On April 2011 Anuradha

Dear team JCDR, I would like to thank you for the very professional and polite service provided by everyone at JCDR. While i have been in the field of writing and editing for sometime, this has been my first attempt in publishing a scientific paper.Thank you for hand-holding me through the process.

Dr. Anuradha
E-mail: anuradha2nittur@gmail.com
On Jan 2020

Important Notice

Original article / research

Year : 2023 | Month : March | Volume : 17 | Issue : 3 | Page : YC06 - YC10

Full Version

An Auxiliary Approach to Prediction of Binary Outcome with Bayesian Network Model: Exploration with Data for Recurrence of Breast Cancer

Published: March 1, 2023 | DOI: https://doi.org/10.7860/JCDR/2023/59472.17598
Sachit Ganapathy, KT Harichandrakumar, Kadhiravan Tamilarasu, Prasanth Penumadu, N Sreekumaran Nair

1. PhD Scholar, Department of Biostatistics, Jawaharlal Institute of Post Graduate Medical Education and Research, Puducherry, India. 2. Assistant Professor, Department of Biostatistics, Jawaharlal Institute of Post Graduate Medical Education and Research, Puducherry, India. 3. Professor, Department of Medicine, Jawaharlal Institute of Post Graduate Medical Education and Research, Puducherry, India. 4. Additional Professor, Department of Surgical Oncology, Jawaharlal Institute of Post Graduate Medical Education and Research, Puducherry, India. 5. Professor and Head, Department of Biostatistics, Jawaharlal Institute of Post Graduate Medical Education and Research, Puducherry, India.

Correspondence Address :
N Sreekumaran Nair,
Professor and Head, Department of Medicine, Jawaharlal Institute of Post Graduate Medical Education and Research, Puducherry, India.
E-mail: nsknairmanipal@gmail.com

Abstract

Introduction: Logistic regression is the classical statistical model that is incorporated to predict a binary outcome variable. These models have theoretical assumptions of independence of predictor variables and linearity of association with the outcome in the logarithmic scale. Alternative models developed in the machine learning context like Naïve Bayes model with similar assumptions and Bayesian Network (BN) model can be used for binary prediction.

Aim: To compare the predictive performance of logistic regression, Naïve Bayes and BN model in predicting the recurrence of Breast cancer.

Materials and Methods: The dataset was procured from UCI Machine Learning repository on recurrence of breast cancer. The study was done on retrospective data from December 2021 to July 2022. The sample size was boosted with the bootstrapping with logistic regression model. The dataset was split into training (70%) and testing (30%) dataset for internal validation. The effect estimates of the potential prognostic variables were estimated using multiple logistic regression model. Naïve Bayes and BN model was also learnt from the training dataset. The indices of predictive accuracy were estimated for the models in both training and testing dataset.

Results: Degree of malignancy and side of affected breast were found to be significant predictors of recurrence of breast cancer. BN model had the least misclassification rate and the best sensitivity in comparison to other models in spite of imbalance in outcome variable.

Conclusion: BN model performed the best in comparison to logistic regression model when the assumptions of logistic regression model were violated and there is imbalance in proportion of outcome.

Keywords

Binary prediction, Naïve Bayes model, Predictive accuracy

Statistical models in health care have been extensively developed to help in medical decision-making (1). They assist at the process of making important decisions to archive specific clinical outcomes and also in managing resources to be allocated. Prognostic modeling has had immense application in the field of medicine (2). Prognostic models estimate the probability of an outcome of a condition and also explore the relationship of factors affecting this outcome. Unlike other models which incorporate a single explanatory variable and consider other variables as confounders, prognostic models focus on incorporating the combined effect of variables to predict the outcome. They are particularly important in selecting the right treatment and managing resources (2).

When the outcome variable is binary, logistic regression model is preferred for the prognosis of disease outcome (3). Binary logistic regression model encompasses the effect of predictor variables on the dependent binary variable by linearising the relationship using a log link function. Although the performance of logistic regression as a prognostic model has been good, practically, various assumptions are violated (4). One of the most important assumptions of logistic regression is that the predictor variables are independent of one another. This assumption is almost never true in medical research, especially in the prognostic model (5). Regression models which are developed in the frequentist context have the assumption of normality for the error term and homoscedasticity for each level of the independent variable in the model. In spite of these assumptions being violated, logistic regression is widely used. There are some alternative predictive models suggested in literature which can be used as an alternative to logistic regression model which can overcome these assumptions (6). BN model are graphical representations which consists of Directed Acyclic Graphs (DAG) with nodes and edges which can be used to query a binary outcome variable (7). Naïve Bayes models are simple classifiers which are a subset of BN models which considers conditional independence between the set of independent variables to predict the outcome variable (8). These are some alternative models that can be explored for the prediction of binary outcome variable.

Breast cancer is one of the most prominent cancer affecting women around the world (9). Although, recently, there have been advances that has improved the survival outcomes like mortality, recurrence of breast cancer still persists to be around 8-11% after different treatment modalities in India (10). It has been established in literature that some of the most common prognostic factors associated with recurrence of breast cancer includes age, menopausal status, pathological N stage, pathological T stage, treatment modality, HER2, eGFR, oestrogen and progesterone receptors (11).

The prognosis of medical condition such as cancer is dependent on multiple factors which are correlated to one another. Clinical, sociodemographic and treatment modalities given play a crucial role in the progression of breast cancer. Several statistical and machine learning models have been implemented in the prediction of recurrence of breast cancer that has proven to be excellent in their predictive ability (12),(13). Although they have proven to be good, it is imperative that we consider incorporating the expert opinion into these models which can bring in a better insight into the practical use of the models (14). This is the gap between clinical and model experts that needs to be bridged. BN models are an alternative approach which can incorporate the dependency between the factors with supervised learning from data and expert opinion. Data have also shown that hybrid BN models have good predictive accuracy and intuitive explanation ability (15). In this study, our objective was to assess the predictive ability of Naïve Bayes model and BN model compared to logistic regression model in predicting the recurrence of breast cancer.

Material and Methods

The present exploratory study from a retrospective secondary data of breast cancer cases was conducted from December 2021 to July 2022 in Jawaharlal Institute of Post Graduate Medical Education and Research, Puducherry.

Models: The Naïve Bayes Model-Naïve Bayes classifier are probabilistic classifiers that is based on Bayes theorem which uses the properties of conditional independence to compactly represent high-dimensional probability distribution (16). The variables are not completely marginally independent in the case of this classifier model. The Naïve Bayes classifier model can be constructed for an outcome variable Y with possible distinct classes {c1,c2â€¦ck} which are mutually exclusive and exhaustive. Naïve Bayes model, though, makes a very strong assumption about the independent variables. In the presence of n independent variables X1,X2â€¦Xn which are potential factors affecting the outcome variable Y, the Naïve Bayes assumption states that Xi’s are conditionally independent of each other given the outcome of the individual. Formally, it is represented as:

(X_i ? X_-i | Y) for all i

Naïve Bayes model can be represented as a BN model although the assumptions of independency are strong and generally not true practically. The joint probability distribution of Naïve Bayes model accounting for the assumption is given by

Bayesian Network (BN) model: BN models are graphical representation of the interdependencies between variables represented by a DAG and conditional probabilities. Let ‘G’ be a DAG, then it consists a set of variables, ‘X’ and a set of directed edges, ‘E’ connecting these set of variables represented by nodes (17). In BN models, a node without a parent node is parametrised by the assumed prior distribution, whereas those with parent nodes are parametrised by conditional probability given by P(X|parent(X)). The joint conditional probability of all the variables in the BN model is given by:

P(x1,x2,â€¦,xp)=i=1pP(xi|Parent(xi))

Building a BN model includes steps of variable selection, structure learning and parameter learning, which can be undertaken by supervised learning from the data including expert opinion.

Dataset: The dataset for building the Naïve Bayes model was procured from an online database, UCI Machine Learning Repository (18). The data was with reference to a Breast cancer study to predict the recurrence of event based on certain attributes. The total sample size in the dataset was 286. There were a total of nine variables in the dataset including age, menopause status, tumour size, number of nodes involved, presence of node caps, degree of malignancy, breast, breast quadrant and status of irradiation. The dataset was sourced from Institute of Oncology, University Medical Center, Ljubljana, Yugoslavia by M. Zwitter and M. Soklic in 1988 available from: https://archive.ics.uci.edu/ml/datasets/breast+cancer. The dataset obtained was inflated to a sample size of 1000 with the help of logistic regression equation with all the variables in the existing dataset as predictor variables for recurrence as the outcome. The total effective sample size used in the current manuscript was 1000 after inflation.

Variables in the model: The dataset depicted the multivariable classification of the patients for the prognosis of Breast cancer. The event of interest here was the recurrence of the disease. The dataset contained the information for all the samples. The variables in the model were defined and categorised based on the criterion from the 8th edition of AJCC Cancer Staging Form Supplement (19). The variables in the model are defined and the recategorisation is given below:

1. Age of the patients at the time of diagnosis:
a. 10-39 years
b. 40-49 years
c. 50-59 years and
d. ≥60 years.
2. Whether the patient was pre-or post-menopausal at the time of the diagnosis:
a. <40 years
b. ≥40 years and
c. premenopausal
3. The greatest diameter of the excised tumour. Based on the tumour size chart, they were categorised as
a. T1 (0-2 cm),
b. T2 (2-5 cm) and
c. T3 (>5 cm).
4. The number of axillary lymph nodes that contain metastatic breast cancer visible on histological examination:
a. 0-2,
b. 3-9 and
c. >10
5. The presence of tumour as a capsule of the lymph node, which over time with more aggressive disease, tumour may replace the lymph node.
6. The histological grade of the tumour.
• 1,
• 2 and
• 3 where Grade 1 predominantly consists of cells that retain their usual characteristics and Grade 3 predominantly consists of cells that are highly abnormal.
7. The side of the affected breast.
8. The breast was also divided into five quadrants using nipple as a central point; categorised as
• left-up
• left-down
• right-up
• right-low and
• centre
9. Whether radiation therapy, was given or not.

Statistical Analysis

The dataset was classified into two parts as training and testing dataset. Approximately, 70% of the data was used for training the model and the rest of the 30% of the data was used for testing the classification accuracy of the model. The distribution of the prognostic variables across the binary outcome of recurrence was assessed in the training, testing and the entire dataset. The univariate logistic regression was performed initially and with p-value <0.15 as the cut-off, the potential factors were used to build the multiple logistic regression model. A p-value <0.05 was considered to be statistically significant in the final model.

All the models were trained using training dataset and then tested using both training and testing dataset. Logistic regression model was built with all the potential prognostic variables. The predicted probabilities were estimated from the model. Naïve Bayes model with Laplace smoothing was used to develop the model. BN model was built with two important steps. The structure learning of the BN model was carried out based on the Tree Augmented Network (TAN) method (20). Conditional probabilities associated with each node was estimated using Expectation-Maximisation (EM) method (21). Misclassification rate, sensitivity, specificity, Positive Predictive Value (PPV) and Negative Predictive Value (NPV) were estimated in both training and testing dataset. All the statistical analysis was performed in R Studio Version 1.2.1335 and Netica 6.09 for Bayes nets. The Naïve Bayes model was built using the naivebayes package.

Results

The distribution of all the factors in the model across both the outcome category in both training and testing dataset is given in (Table/Fig 1). Logistic regression model was used and the effect estimates from univariate and multiple logistic regression estimates were obtained and the results are shown in (Table/Fig 2). It was found that degree of malignancy and the side of the breast were the two variables which significantly contributed in the prediction of recurrence of breast cancer from multiple logistic regression model. BN model developed from the TAN method for structure learning and EM method for parameter learning is given as (Table/Fig 3). The probability distribution associated with each variable is given in the network model.

In the training dataset, it was found that logistic regression had a misclassification rate of 33.52%, BN model with 31.09% whereas it was estimated to be 33.38% for Naïve Bayes classifier as given in (Table/Fig 4). When the same model was used to classify the recurrence status in testing dataset, logistic regression had a misclassification rate of 35.1%, BN model had 36.42% whereas it was 34.77% for Naïve Bayes classifier. The sensitivity was poor for all the models. Specificity was excellent for all the models, 96.96% for LR model, 91.52% for BN model and 97.83% for NB model in training dataset. In the testing dataset it was estimated to be 91.83% for LR model, 87.02% for BN model and 92.31% for NB model in testing dataset. PPV was estimated to be 56.25% for LR model, 60% for NB model and 60.6% for BN model in training dataset. In testing dataset, it was estimated to be 22.73% for LR model, 23.81% for NB model and 30.56% for BN model. NPV was estimated to be 66.97% for LR model, 66.86% for NB model and 70.28% for BN model in training dataset. In testing dataset, it was estimated to be 68.21% for LR model, 68.33% for NB model and 65.56% for BN model.

Discussion

In the present study, the prognostic factors associated with recurrence of breast cancer were determined. It was found that degree of malignancy and side of the affected breast had an impact on the outcome. A study has shown that tumour size, grade of the cancer, nodal status and hormonal factors along with smoking status to have significant association with recurrence of breast cancer (22). A study have also pointed out that receiving neoadjuvant chemotherapy reduced the risk of recurrence for breast cancer (23). The current dataset had variables related to the disease status and not with lifestyle characteristics. The primary objective of this study was to compare the predictive ability of BN, Naive Bayes and Logistic regression model. It was found that even with imbalance in the proportion of outcome variable, BN model outperformed the other models overall. The misclassification rate was least for BN model and it provided a better ability in predicting the recurrence of breast cancer with better sensitivity, which is the key in these models.

Naïve Bayes model and logistic regression have already been applied for predicting the recurrence of breast cancer and has proven to have performed considerably well (24). Naïve Bayes classifier offers a novel approach for categorising patients and offers good performance with low algorithmic cost and high speed of computation. Another study has shown that Naïve Bayes model performs as well as other equivalent machine learning techniques (25). With just seven prognostic factors, nomogram based on Naïve Bayes model gave 80% accuracy suggesting the model can be translated to practical use. Bayesian classifiers have gained importance in classification problem in health care studies and have performed better than classical approach to prognostic modeling (26). Even amongst the Bayesian classifiers, Naïve Bayes model with tree augmented structure and gradient boosting has shown to perform well in predictive accuracy (27). A study by Choi J et l., has showed that hybrid BN models have excellent predictive ability in comparison to any other machine learning algorithms in predicting breast cancer prognosis (15). It was seen that hybrid BN models had AUC of 0.935 as compared to 0.930 and 0.813 for artificial neural network and classical BN model. BN models have also been applied in the prediction of risk of triple negative breast cancer with epidemiological factors and has shown to perform well (28). Studies have compared the predictive accuracy of BN model with other machine learning algorithms like support vector machine and artificial neural network for a binary outcome, and have proven that they are better or comparable at handling missing data and predictive accuracy (29),(30). BN model has further illustrated that it can incorporate complex interactions of prognostic factors and individualising patient care in oncology (31). This suggests that we have to try to translate the machine algorithms such as BN model as a more viable option for clinicians to use.

Witteveen A et al., on the other hand has also reported that conventional logistic regression models have outperformed BN model in predictive accuracy related to breast cancer (32). Although BN model performed better in the development cohort, on validation, it was seen that LR models had a C-statistic of 0.71 whereas it was 0.67 for BN model. The difference observed in the overall predictive ability between the models is not high. Generally, it is seen that the difference in the AUC or C statistic was seen to be less than 0.05 in studies [33,34]. A study by Holm CE et al., has also shown that proper internal and external validation is unaccounted for BN models (35).

Limitation(s)

Our study was limited to the factors that were a part of the source of secondary data which did not include some important established prognostic factors in recurrence of breast cancer. Variables such as Her2, oestrogen receptors, progesterone receptors and eGFR values could have improved the predictive ability of the models. The proportion of outcome had imbalance and therefore, a Synthetic Minority Oversampling Technique (SMOTE) for imbalanced classification can further strengthen the predictive accuracy of the models. External validation was not performed in the study with an independent dataset for generalisability of the model. Other estimates could have also been estimated for showing the predictive accuracy of models, such as AUC, Gini coefficient and C-index which suggests the overall discriminatory ability of the model but this study was with the intention of suggesting alternative techniques for predicting a binary outcome.

Conclusion

BN model can be used as an alternative model for predicting a binary outcome in the recurrence of breast cancer. The predictive ability of BN model was found to be better and it can handle imbalanced classification better. They also provide with a visually intuitive model with lesser assumptions. With further improving the model, they can provide a better predictive model to be used bed-side for clinicians.

Acknowledgement

Dr. P. Venkatesan for his contribution in helping to understand the models that were used in the application in this study.

References

Malehi AS, Pourmotahari F, Angali KA. Statistical models for the analysis of skewed healthcare cost data: A simulation study. Health Econ Rev. 2015;5(1):11. [crossref][PubMed]

Vogenberg FR. Predictive and prognostic models: Implications for healthcare decision-making in a modern recession. Am Health Drug Benefits. 2009;2(6):218-22.

Steyerberg EW, Eijkemans MJ, Harrell FE Jr, Habbema JD. Prognostic modeling with logistic regression analysis: In search of a sensible strategy in small data sets. Med Decis Making. 2001;21(1):45-56. Doi: 10.1177/0272989X0102100106. PMID: 11206946.[crossref][PubMed]

Schreiber-Gregory D, Bader K. Logistic and linear regression assumptions: Violation recognition and control. Proc Midwest SAS User Group. 2018;01-21.

Senaviratna NAMR, Cooray TMJA. Diagnosing multicollinearity of logistic regression model. Asian J Probab Stat. 2019;5(2):01-09. [crossref]

Westreich D, Lessler J, Funk MJ. Propensity score estimation: Neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression. J Clin Epidemiol. 2010;63(8):826-33. [crossref][PubMed]

Cobb BR, Rumí R, Salmerón A. Bayesian network models with discrete and continuous variables. Advances in probabilistic graphical models. 2007:81-102. [crossref]

Leung KM. Naive Bayesian Classifier [Internet]. 2007; Polytechnic University. Available from: https://cse.engineering.nyu.edu/~mleung/FRE7851/f07/naive BayesianClassifier.pdf

Global Burden of Disease Cancer Collaboration, Fitzmaurice C, Abate D, Abbasi N, Abbastabar H, Abd-Allah F, et al. Global, regional, and national cancer incidence, mortality, years of life lost, years lived with disability, and disability-adjusted life-years for 29 cancer groups, 1990 to 2017: A systematic analysis for the global burden of disease study. JAMA Oncol. 2019;5(12):1749-68.

10.

Rangarajan B, Shet T, Wadasadawala T, Nair NS, Sairam RM, Hingmire SS, et al. Breast cancer: An overview of published Indian data. South Asian J Cancer. 2016;5(3):86-92. [crossref][PubMed]

11.

Kim JY, Lee YS, Yu J, Park Y, Lee SK, Lee M, et al. Deep learning-based prediction model for breast cancer recurrence using adjuvant breast cancer cohort in tertiary cancer center registry. Front Oncol [Internet]. 2021 [cited 2022 Jul 16];11. Available from: https://www.frontiersin.org/articles/10.3389/ fonc.2021.596364 [crossref][PubMed]

12.

Kim W, Kim KS, Lee JE, Noh DY, Kim SW, Jung YS, et al. Development of novel breast cancer recurrence prediction model using support vector machine. J Breast Cancer. 2012;15(2):230-38. [crossref][PubMed]

13.

Ahmad LG, Eshlaghy AT, Poorebrahimi A, Ebrahimi M, Razavi AR. Using Three machine learning techniques for Predicting breast cancer recurrence. J Health Med Inform. 2013;4:124.

14.

Å trumbelj E, BosnicÂ´ Z, Kononenko I, Zakotnik B, Grašic? Kuhar C. Explanation and reliability of prediction models: The case of breast cancer recurrence. Knowl Inf Syst. 2010;24(2):305-24. [crossref]

15.

Choi JP, Han TH, Park RW. A hybrid Bayesian network model for predicting breast cancer prognosis. J Kor Soc Med Informatics. 2009;15(1):49-57. [crossref]

16.

Mitchell TM. Machine learning. International ed., [Reprint.]. New York, NY: McGraw-Hill; 20. 414 p. (McGraw-Hill series in computer science).

17.

Bayesian Networks and Decision Graphs [Internet]. [cited 2022 Jul 16]. Available from: https://link.springer.com/book/10.1007/978-1-4757-3502-4.

18.

UCI Machine Learning Repository: Breast Cancer Data Set [Internet]. [cited 2020 Nov 23]. Available from: http://archive.ics.uci.edu/ml/datasets/ Breast+Cancer?ref=datanews.io.

19.

Zanoni DK, Patel SG, Shah JP. Changes in the 8 th Edition of the American Joint Committee on Cancer (AJCC) Staging of Head and Neck Cancer: Rationale and Implications. Curr Oncol Rep. 2019;21(6):52. [crossref][PubMed]

20.

Friedman N, Geiger D, Goldszmidt M. Bayesian Network Classifiers. Mach Learn. 1997;29(2):131-63. [crossref]

21.

Ji Z, Xia Q, Meng G. A review of parameter learning methods in Bayesian Network | SpringerLink. In: Advanced Intelligent Computing Theories and Applications [Internet]. Cham: Springer; 2015 [cited 2022 Oct 27]. pp. 03-12. Available from: https://link.springer.com/chapter/10.1007/978-3-319-22053-6_1. [crossref]

22.

Lafourcade A, His M, Baglietto L, Boutron-Ruault MC, Dossus L, Rondeau V. Factors associated with breast cancer recurrences or mortality and dynamic prediction of death using history of cancer recurrences: The French E3N cohort. BMC Cancer. 2018;18(1):171. [crossref][PubMed]

23.

Stankov A, Bargallo-Rocha JE, Silvio AÃ‘S, Ramirez MT, Stankova-Ninova K, Meneses-Garcia A. Prognostic factors and recurrence in breast cancer: Experience at the national cancer institute of Mexico. ISRN Oncol. 2012;2012:825258. [crossref][PubMed]

24.

Kim W, Kim KS, Park RW. Nomogram of Naive bayesian model for recurrence prediction of breast cancer. Healthc Inform Res. 2016;22(2):89-94. [crossref][PubMed]

25.

Dumitru D. Prediction of recurrent events in breast cancer using the Naive bayesian classifcation. Analele Univ Din Craiova Ser Mat Informatica? . 2009;36.

26.

Al-Aidaroos KM, Bakar AA, Othman Z. Medical data classification with Naive bayes approach. Information Technology Journal. 2012;11(9):1166-74. [crossref]

27.

Banu AB, Thirumalaikolundusubramanian P. Comparison of Bayes classifiers for breast cancer classification. Asian Pac J Cancer Prev. 2018;19(10):2917-2920. Doi: 10.22034/APJCP.2018.19.10.2917. PMID: 30362322; PMCID: PMC6291060.

28.

Huang Y, Zheng C, Zhang X, Cheng Z, Yang Z, Hao Y, et al. The usefulness of bayesian network in assessing the risk of triple-negative breast cancer. Acad Radiol. 2020;27(12):e282-91. [crossref][PubMed]

29.

Jayasurya K, Fung G, Yu S, Dehing-Oberije C, De Ruysscher D, Hope A, et al. Comparison of bayesian network and support vector machine models for two-year survival prediction in lung cancer patients treated with radiotherapy. Med Phys. 2010;37(4):1401-07. [crossref][PubMed]

30.

Correa M, Bielza C, Pamies-Teixeira J. Comparison of bayesian networks and artificial neural networks for quality detection in a machining process. Expert Syst Appl. 2009;36(3):7270-79. [crossref]

31.

Reijnen C, Gogou E, Visser NCM, Engerud H, Ramjith J, van der Putten LJM, et al. Preoperative risk stratification in endometrial cancer (ENDORISK) by a bayesian network model: A development and validation study. PLoS Med. 2020;17(5):e1003111. [crossref][PubMed]

32.

Witteveen A, Nane GF, Vliegen IMH, Siesling S, IJzerman MJ. Comparison of logistic regression and bayesian networks for risk prediction of breast cancer recurrence. Med Decis Mak Int J Soc Med Decis Mak. 2018;38(7):822-33. [crossref][PubMed]

33.

Cho SM, Austin PC, Ross HJ, Abdel-Qadir H, Chicco D, Tomlinson G, et al. Machine learning compared with conventional statistical models for predicting myocardial infarction readmission and mortality: A systematic review. Can J Cardiol. 2021;37(8):1207-14. [crossref][PubMed]

34.

Clark DO, Stump TE, Tu W, Miller DK. A comparison and cross-validation of models to predict basic activity of daily living dependency in older adults. Med Care. 2012;50(6):534-39. [crossref][PubMed]

35.

Holm CE, Grazal CF, Raedkjaer M, Baad-Hansen T, Nandra R, Grimer R, et al. Development and comparison of 1-year survival models in patients with primary bone sarcomas: External validation of a Bayesian belief network model and creation and external validation of a new gradient boosting machine model. SAGE Open Med. 2022;10:20503121221076388.[crossref][PubMed]

Tables and Figures

[Table / Fig - 1] [Table / Fig - 2] [Table / Fig - 3] [Table / Fig - 4]

DOI and Others

DOI: 10.7860/JCDR/2023/59472.17598

Date of Submission: Aug 05, 2022
Date of Peer Review: Oct 13, 2022
Date of Acceptance: Nov 11, 2022
Date of Publishing: Mar 01, 2023

AUTHOR DECLARATION:
• Financial or Other Competing Interests: None
• Was Ethics Committee Approval obtained for this study? No
• Was informed consent obtained from the subjects involved in the study? Yes
• For any images presented appropriate consent has been obtained from the subjects. NA

PLAGIARISM CHECKING METHODS:
• Plagiarism X-checker: Aug 06, 2022
• Manual Googling: Nov 01, 2022
• iThenticate Software: Nov 10, 2022 (7%)

ETYMOLOGY: Author Origin

JCDR is now Monthly and more widely Indexed .

Emerging Sources Citation Index (Web of Science, thomsonreuters)
Index Copernicus ICV 2017: 134.54
Academic Search Complete Database
Directory of Open Access Journals (DOAJ)
Embase
EBSCOhost
Google Scholar
HINARI Access to Research in Health Programme
Indian Science Abstracts (ISA)
Journal seek Database
Google
Popline (reproductive health literature)
www.omnimedicalsearch.com