Posted on 14 December 2012

Part 12 in a series of videos recorded from ACM MIRUM 2012 in Nara, Japan.

In the research literature on machine learning, there are many classifiers which purport to have excellent classification performance, yet that doesn't necessarily mean that the results generalize to the heteregeneous data encountered in the real world. In this talk, Bob Sturm presents research which attempts to reproduce the results achieved by two genre classifiers: one which uses features from bags of frames, and another which uses sparse coding.

In one experiment, the two classifiers are shown to produce persistent misclassifications for particular genres. Some of these misclassifications are acceptable, perhaps resulting from faults in the data set, but others are not. In another interesting experiment, a song is passed through a variety of frequency-selective filters, each with a different frequency response. Such minor acoustic processing does not change the actual genre of the song. Yet, the filters are capable of tricking a genre classifier into changing its prediction. These experiments show that, although a genre classifier may produce excellent classification accuracies and confusion tables, that doesn't necessarily mean that its success generalizes to other audio data or recording conditions.