I investigate how machines can better understand and authenticate multimedia.

Music Understanding

Scene Analysis for Music UnderstandingFor thousands of years, scholars have acknowledged a connection between music and mathematics. However, it is only the recent convergence of music, mathematics, and computation that has given birth to the field of music information retrieval (MIR) — an interdisciplinary technological area that attracts artists, scientists, and engineers alike. Thankfully, MIR could not have come at a better time. Today’s tech-savvy music lover demands a more immersive and accessible listening experience compared to what was available 300 years ago. Yet, the rate at which today’s digital data is created and distributed makes it difficult to search among such enormous musical databases.

MIR makes musical search easier than ever before. One MIR tool in particular — sparse and nonnegative factorization — has become popular for its ability to conveniently decompose an acoustic signal into musical notes or beats. This decomposition is elegant, accurate, and more intuitive to humans. Within the past five years, sparse and nonnegative factorizations have completely changed the way we perform musical signal analysis.

Although the basic factorization algorithms have shown success in representing and decomposing simple musical signals, their performance degrades significantly when signals become more complicated, e.g., containing polyphony, unfamiliar acoustic environments, or heterogeneous mixtures. Furthermore, since the community’s initial attempts to factorize musical signals, nobody has since made a breakthrough that challenges the conventional way that we use sparse and nonnegative factorization beyond the typical applications of music transcription and source separation. In our work, we attempt to unify, simplify, and improve sparse and nonnegative factorizations for the purpose of music understanding, and we apply these tools in ways never before seen.

Image Forensics

Multimedia forensic methods allow us to maintain the integrity of the multimedia data around us. For example, we can embed a watermark into a digital image to bind the identity of its owner to the image itself. However, traditional forensic approaches such as watermarking are not applicable in many real-world scenarios, for example, when we do not have access to the original data.

Instead, the research we perform at the University of Maryland focuses on intrinsic fingerprints — subsets of data which are, or have become, an intrinsic part of the data in question. By examining these intrinsic fingerprints, we can assess the authenticity of data without embedding any watermarks, thus increasing the applicability of these forensic methods.

We recently formulated a forensic methodology to identify the compression history of a digital image. By examining the intrinsic fingerprints in an image, we can tell what compression method (e.g., JPEG, etc.) was used in order to determine the origin of the image and thereby assess its authenticity. Current efforts include the examination and detection of intrinsic fingerprints in other domains, particularly digital audio. Please see the publications below for more information.

Publications

See publications.