Worldwide, breast cancer is one of the most threatening killers to mid-aged women. The diagnosis of breast cancer aims to classify spotted breast tumor to be Benign or Malignant. With recent developments in data mining technique, new model structures and algorithms are helping medical workers greatly in improving classification accuracy. In this study, a model is proposed combining ensemble method and imbalanced learning technique for the classification of breast cancer data. First, Synthetic Minority Over-Sampling Technique (SMOTE), an imbalanced learning algorithm is applied to selected datasets and second, multiple baseline classifiers are tuned by Bayesian Optimization. Finally, a stacking ensemble method combines the optimized classifiers for final decision. Comparative analysis shows the proposed model can achieve better performance and adaptivity than conventional methods, in terms of classification accuracy, specificity and AuROC on two mostly-used breast cancer datasets, validating the clinical value of this model.
Published in | Applied and Computational Mathematics (Volume 7, Issue 3) |
DOI | 10.11648/j.acm.20180703.20 |
Page(s) | 146-154 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2018. Published by Science Publishing Group |
Data Mining, Breast Cancer, Ensemble Method, Imbalanced Learning
[1] | Akay, M. F., “Support vector machines combined with feature selection for breast cancer diagnosis.” Expert Systems with Applications, vol. 36, no. 2, 2009, pp. 3240-3247. |
[2] | Asri, H., Mousannif, H., Moatassime, H. A., and Noel, T., “Using machine learning algorithms for breast cancer risk prediction and diagnosis.” Procedia Computer Science, vol. 83, 2016, pp. 1064-1069. |
[3] | Breiman, L., “Bagging predictors.” Machine Learning, vol. 24, no. 2, 1996, pp. 123-140. |
[4] | Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P., “SMOTE: synthetic minority over-sampling technique.” Journal of Artificial Intelligence Research, vol. 16, no. 2002, pp. 321-357. |
[5] | Coulter, D. M., Bate, A., Meyboom, R. H., Lindguist, M., and Edwards, I. R., “Antipsychotic drugs and heart muscle disorder in international pharmacovigilance: data mining study.” BMJ, vol. 322, no. 7296, 2001, pp. 1207-1209. |
[6] | Emamjomeh, A., Goliaei, B., Zahiri, J., and Ebrahimpour, R., “Predicting protein–protein interactions between human and Hepatitis C virus via an ensemble learning method.” Molecular Biosystems, vol. 10, no. 12, 2014, pp. 3147-3154. |
[7] | Eom, J., Kim, S., and Zhang, B., “AptaCDSS-E: a classifier ensemble-based clinical decision support system for cardiovascular disease level prediction.” Expert Systems with Applications, vol. 34, no. 4, 2008, pp. 2465-2479. |
[8] | Han, B., and Cook, P., “A stacking-based approach to twitter user geolocation prediction.” In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Sonifa, Bulgaria, August 4-9, 2013, pp. 7-12. |
[9] | He, H. L., Zhang, W. Y., and Zhang, S., “A novel ensemble method for credit scoring: adaption of different imbalance ratios.” Expert Systems with Applications, vol. 98, 2018, pp. 105-117. |
[10] | Hsieh, S. L., Hsieh, S. H., Cheng, P. H., Chen, C. H., Hsu, K. P., Lee, I. S., Wang, Z., and Lai, F., “Design ensemble machine learning model for breast cancer diagnosis.” Journal of Medical Systems, vol. 36, no. 5, 2011, pp. 2841-2847. |
[11] | Johnston, M. E., Langton, K. B., Haynes, R. B., and Mathieu, A., “Effects of computer-based clinical decision support systems on clinician performance and patient outcome: a critical appraisal of research.” Annals of Internal Medicine, vol. 120, no. 2, 1994, pp. 135-142. |
[12] | Karabatak, M., and Ince, M. C., “An expert system for detection of breast cancer based on association rules and neural network.” Expert Systems with Applications, vol. 36, no. 2, 2009, pp. 3465-3469. |
[13] | Niemeijer, M., Ginneken, B. V., Russell, S. R., Suttorp-Schulten, M. S., and Abramoff, M. D., “Automated detection and differentiation of drusen, exudates, and cotton-wool spots in digital color fundus photographs for diabetic retinopathy diagnosis.” Investigative Ophthalmology & Visual Science, vol. 48, no. 5, Jan. 2007, pp. 2260-2267. |
[14] | Osareh, A., and Shadgar, B., “Machine learning techniques to diagnose breast cancer.” In Proceedings of the 5th International Symposium on Health Informatics and Bioinformatics, Antalya, Turkey, April 20-22, 2010, pp. 114-120. |
[15] | Peña-Reyes, C. A., and Sipper, M., “A fuzzy-genetic approach to breast cancer diagnosis.” Artificial Intelligence in Medicine, vol. 17, no. 2, 1999, pp. 131-155. |
[16] | Perez-Iratxeta, C., Bork, P., and Andrade, M. A., “Association of genes to genetically inherited diseases using data mining.” Nature Genetics, vol. 31, no. 3, 2002, pp. 316-319. |
[17] | Prather, J. C., Lobach, D. F., Goodwin, L. K., Hales, J. W., Hage, M. L., and Hammond, W. E., “Medical data mining: knowledge discovery in a clinical data warehouse.” In Proceedings of the 1997 American Medical Informatics Association Annual Fall Symposium, Nashville, USA, Oct. 25-29, 1997, pp. 101-105. |
[18] | Sarwar, A., Sharma, V., and Gupta, R., “Hybrid ensemble learning technique for screening of cervical cancer using papanicolaou smear image analysis.” Personalized Medicine Universe, vol. 4, 2015, pp. 54-62. |
[19] | Schapire, R. E., “The strength of weak learnability.” Machine Learning, vol. 5, no. 2, 1990, pp. 197-227. |
[20] | Snoek, J., Larochelle, H., and Adams, R. P., “Practical Bayesian optimization of machine learning algorithms.” Neural Information Processing Systems, 2012, pp. 2951-2959. |
[21] | Strack, B., DeShazo, J. P., Gennings, C., Olmo, J. L., Ventura, S., Cios, K. J., and Clore, J. N., “Impact of HbA1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records.” Biomed Research International, 2014, pp. 1-11. |
[22] | “U.S. Breast Cancer Statistics.” Breastcancer.org, Jan. 9, 2018, www.breastcancer.org/symptoms/understand_bc/statistics. |
[23] | Wang, Y., Rimm, E. B., Stampfer, M. J., Willett, W. C., and Hu, F. B., “Comparison of abdominal adiposity and overall obesity in predicting risk of Type 2 diabetes among men.” The American Journal of Clinical Nutrition, vol. 81, no. 3, 2005, pp. 555-563. |
[24] | Wilson, A. M., Thabane, L., and Holbrook, A., “Application of data mining techniques in pharmacovigilance.” British Journal of Clinical Pharmacology, vol. 57, no. 2, 2003, pp. 127-134. |
[25] | Wolberg, W. H., and Mangasarian, O. L., “Multisurface method of pattern separation for medical diagnosis applied to breast cytology.” Proceedings of the National Academy of Sciences of the United States of America, vol. 87, no. 23, 1990, pp. 9193-9196. |
[26] | Wolpert, D. H., “Stacked generalization.” Neural Networks, vol. 5, no. 2, 1992, pp. 241-259. |
[27] | Xu, Y., Mo, T., Feng, Q., Zhong, P., Lai, M., and Chang, I. C., “Deep learning of feature representation with multiple instance learning for medical image analysis.” In Proceedings of 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, Florence, Italia, May 4-9, 2014, pp. 1626-1630. |
[28] | Yavuz, E., Eyupoglu, C., and Sanver, U., “An ensemble of neural networks for breast cancer diagnosis.” In Proceedings of International Conference on Computer Science and Engineering, Antalya, Turkey, Oct 5-8, 2017, pp. 538-543. |
[29] | Zheng, B., Yoon, S. W., and Lam, S. S., “Breast cancer diagnosis based on feature extraction using a hybrid of k-means and support vector machine algorithms.” Expert Systems with Applications, vol. 41, no. 4, 2014, pp. 1476-1482. |
APA Style
Tongan Cai, Hongliang He, Wenyu Zhang. (2018). Breast Cancer Diagnosis Using Imbalanced Learning and Ensemble Method. Applied and Computational Mathematics, 7(3), 146-154. https://doi.org/10.11648/j.acm.20180703.20
ACS Style
Tongan Cai; Hongliang He; Wenyu Zhang. Breast Cancer Diagnosis Using Imbalanced Learning and Ensemble Method. Appl. Comput. Math. 2018, 7(3), 146-154. doi: 10.11648/j.acm.20180703.20
AMA Style
Tongan Cai, Hongliang He, Wenyu Zhang. Breast Cancer Diagnosis Using Imbalanced Learning and Ensemble Method. Appl Comput Math. 2018;7(3):146-154. doi: 10.11648/j.acm.20180703.20
@article{10.11648/j.acm.20180703.20, author = {Tongan Cai and Hongliang He and Wenyu Zhang}, title = {Breast Cancer Diagnosis Using Imbalanced Learning and Ensemble Method}, journal = {Applied and Computational Mathematics}, volume = {7}, number = {3}, pages = {146-154}, doi = {10.11648/j.acm.20180703.20}, url = {https://doi.org/10.11648/j.acm.20180703.20}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.acm.20180703.20}, abstract = {Worldwide, breast cancer is one of the most threatening killers to mid-aged women. The diagnosis of breast cancer aims to classify spotted breast tumor to be Benign or Malignant. With recent developments in data mining technique, new model structures and algorithms are helping medical workers greatly in improving classification accuracy. In this study, a model is proposed combining ensemble method and imbalanced learning technique for the classification of breast cancer data. First, Synthetic Minority Over-Sampling Technique (SMOTE), an imbalanced learning algorithm is applied to selected datasets and second, multiple baseline classifiers are tuned by Bayesian Optimization. Finally, a stacking ensemble method combines the optimized classifiers for final decision. Comparative analysis shows the proposed model can achieve better performance and adaptivity than conventional methods, in terms of classification accuracy, specificity and AuROC on two mostly-used breast cancer datasets, validating the clinical value of this model.}, year = {2018} }
TY - JOUR T1 - Breast Cancer Diagnosis Using Imbalanced Learning and Ensemble Method AU - Tongan Cai AU - Hongliang He AU - Wenyu Zhang Y1 - 2018/08/03 PY - 2018 N1 - https://doi.org/10.11648/j.acm.20180703.20 DO - 10.11648/j.acm.20180703.20 T2 - Applied and Computational Mathematics JF - Applied and Computational Mathematics JO - Applied and Computational Mathematics SP - 146 EP - 154 PB - Science Publishing Group SN - 2328-5613 UR - https://doi.org/10.11648/j.acm.20180703.20 AB - Worldwide, breast cancer is one of the most threatening killers to mid-aged women. The diagnosis of breast cancer aims to classify spotted breast tumor to be Benign or Malignant. With recent developments in data mining technique, new model structures and algorithms are helping medical workers greatly in improving classification accuracy. In this study, a model is proposed combining ensemble method and imbalanced learning technique for the classification of breast cancer data. First, Synthetic Minority Over-Sampling Technique (SMOTE), an imbalanced learning algorithm is applied to selected datasets and second, multiple baseline classifiers are tuned by Bayesian Optimization. Finally, a stacking ensemble method combines the optimized classifiers for final decision. Comparative analysis shows the proposed model can achieve better performance and adaptivity than conventional methods, in terms of classification accuracy, specificity and AuROC on two mostly-used breast cancer datasets, validating the clinical value of this model. VL - 7 IS - 3 ER -