Saturday, November 23, 2013

On Vague Notions of Accuracy

Accuracy is not a reliable metric for the real performance of a classifier, because it will yield misleading results if the data set is unbalanced. -The Internet

Now this is what I call a multivariate frequency distribution chart, in that it barely qualifies as such, hence Art.
(aka - Categorganization)

As it turns out, the concept of a Confusion Matrix is itself so confusing, splitting into entirely different classes of semanticity, that it requires a Confusion Matrix itself in order to be understood. Taking only two of these categories, (or the "machine-learning" and "statistics" disciplines), one can distribute the various names given to the idea of a confusion matrix into four potential states, those being either one discipline, the other, both, or none. Being that "Contingency Table" falls under both, perhaps it is the most apt label for the concept. Other applicable terms include: multi-variate frequency distribution of variables, two-class prediction problem, binary classification, prediction value vs. actual value, instantiation map.

POST SCRIPT: On Theft and Creation in the Digital Age

Ship of Theseus
It seems like you can replace any component of a ship, and it is still the same ship. So you can replace them all, one at a time, and it is still the same ship. However, you can then take all the original pieces, and assemble them into a ship. That, too, is the same ship you began with.

Sorites paradox
aka the new Climate Change paradox: If you remove a single grain of sand from a heap, you still have a heap. Keep removing single grains, and the heap will disappear. Can a single grain of sand make the difference between heap and non-heap?