Posted on 20 October 2011

Paper details:

Super-resolution is the act of increasing the resolution (or sampling rate) of a signal in an intelligent manner that preserves the signal’s ground truth information, i.e., not through some parametric method like linear or sinusoidal interpolation. For example, when you use Adobe Photoshop or GIMP to make an image larger, the new image will likely be blurry because the program does not know how to restore sharp edges.

Super-resolution of images and video has been studied extensively, but not as much for audio. When an audio signal is upsampled from 4 kHz to 44.1 kHz using traditional sinc interpolation, the upsampled signal will not contain any new information that is not already present in the 4 kHz version. As a result, high-frequency components will be absent, and the signal will sound dull and muffled.

There are a few methods for audio super-resolution. For example, Smaragdis et al. have proposed a method whose basic idea is to take the low-resolution input signal and project it upon a low-resolution basis. Using that projection (or the coefficients with respect to the low-resolution basis), a high-resolution signal is constructed by multiplying the coefficients with components from a high-resolution basis.

What basis do you use? In the literature, that basis (or overcomplete dictionary — perhaps not orthogonal) is usually obtained in one of two ways: use a synthetic basis (Gabor, etc.), or learn it from real data (NMF, K-SVD, etc.).

We ask this question: What if you already have a massive overcomplete dictionary containing millions of well-labeled atoms from real-world sounds? Can you make use of it? How? With such a large dictionary, complexity becomes an issue. Using ordinary matching pursuit, each iteration has complexity that is linear in the size of the dictionary. This is not scalable.

We propose a method called Approximate Matching Pursuit which allows you to efficiently obtain a sparse decomposition of an input vector using a massive overcomplete dictionary. Using it, we perform super-resolution of signals containing piano music. Using a dictionary composed of sounds from the University of Iowa data set, we were able to accurately estimate missing high-frequency information using low-resolution inputs sampled as low as 2-4 kHz. For more details, please see the paper above.

For more about Approximate Matching Pursuit, please see our upcoming paper in ISMIR 2011.