Posted on 28 October 2011

Paper details:

Sparse coding and nonnegative matrix factorization (NMF) have revolutionized the area of music transcription. These methods usually decompose spectra from input signals as linear combinations of dictionary spectra. Often, the dictionary is either learned (as in NMF) or synthetic (e.g., harmonic templates, Gabor atoms, etc.).

Instead, what if you already have a massive overcomplete dictionary containing millions of well-labeled atoms from real-world sounds? Assuming that the dictionary is large and representative enough, there is no longer a need to learn. However, coding (i.e., finding the coefficients of the input with respect to the dictionary) remains a problem. With such a large dictionary, complexity becomes an issue.

We propose a method called Approximate Matching Pursuit (AMP) which allows you to efficiently obtain a sparse decomposition of an input vector using a massive overcomplete dictionary. AMP is a simple variant of matching pursuit or orthogonal matching pursuit. However, instead of finding an exact match between the input (or residue) and the dictionary at each iteration, we only find an approximate match. In doing so, we sacrifice a bit of accuracy to obtain significant savings in computation.

Using AMP, we decomposed input spectra of containing overlapping harmonic sounds, where the amount of polyphony varies. Our results showed that, while retrieval performance is similar to orthogonal matching pursuit based upon F-measure, execution time and computation is reduced significantly.