Posted on 31 July 2010

Paper details:

Most musical instrument recognition systems rely upon spectral information to classify sounds. Can temporal information improve classification accuracy even further?

Evidence from the psychoacoustic literature suggests that both spectral and temporal content carry information about acoustic timbre through the human auditory system. In other words, because humans recognize musical instruments so effortlessly, then perhaps machines would also benefit from a combination of spectral and temporal information. Researchers use a model known as the cortical representation to emulate the information output from the middle stage of the human auditory system. Although some engineers have tried to build music information retrieval (MIR) systems that use the cortical representation, its high dimensionality makes it difficult to employ in practical systems.

But what if we can embody the traits of the cortical representation into another representation that also includes spectral and temporal content? One such candidate is nonnegative matrix factorization (NMF), a tool that can extract spectral and temporal information from spectrograms.

In this paper, we test the usefulness of temporal information extracted using NMF in instrument recognition. We mimic the multiresolution aspect of the cortical representation by using a multiresolution gamma filterbank that parameterizes the shape of temporal envelopes. Our results show that this method of temporal processing can classify among isolated sounds among 24 instrument classes with 92.3% accuracy.