ACC PARADOX

Abstract

High accuracy is not necessarily an indicator of high model quality and therein lies the accuracy paradox of predictive analytics: predictive models with a given level of accuracy may have greater predictive power than models with higher accuracy since accuracy can be improved easily by using classifiers where the classes are not balanced, for example, if a single class contains most of the data, then classifying all the cases as this majority class would produce an accurate result.

We develop tools to analyze the behavior of multiple-class, or multi-class, classifiers by means of entropic measures on their confusion matrix or contingency table.

Keywords

Information theory, accuracy paradox, imbalanced sets, multi-class classifiers, machine learning, de Finetti, ROC, entropy-modulated accuracy (EMA), normalized information transfer factor (NIT), confusion matrix

Publications

  • Valverde-Albacete FJ, Peláez-Moreno C. 100% Classification Accuracy Considered Harmful: The Normalized Information Transfer Factor Explains the Accuracy Paradox. Paris MGA, ed. PLoS ONE. 2014;9(1):e84217. doi:10.1371/journal.pone.0084217. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3888391/
  • Francisco J. Valverde-Albacete, Carmen Peláez-Moreno, Two information-theoretic tools to assess the performance of multi-class classifiers, Pattern Recognition Letters, Volume 31, Issue 12, 1 September 2010, Pages 1665-1671, ISSN 0167-8655, http://dx.doi.org/10.1016/j.patrec.2010.05.017.
    http://www.sciencedirect.com/science/article/pii/S0167865510001662

Demos & Tools (or Downloads)

 

 

Contact Persons
Carmen Peláez-Moreno (carmen at tsc.uc3m.es)
Francisco J.Valverde-Albacete (fva at tsc.uc3m.es)

Comments are closed.