A Conceptual Framework for Noise Reduction

Date published: 1 December 2016Tags: noise reduction, speech enhancement

A Conceptual Framework for Noise Reduction
Jacob Benesty and Jingdong Chen
Springer International Publishing, NY, (2015)
89 pp., softbound, 54.99 USD
ISBN: 978-3-319-12954-9

Purpose

This is one of the spring briefs, which presents a concise summary of cutting-edge research and practical applications across a wide spectrum of ﬁelds. Featuring compact volumes of 50–125 pages, the series covers a range of content from professional to academic. Typical topics might include: timely report of state-of-the art analytical techniques; a bridge between new research results, as published in journal articles, and a contextual literature review; a snapshot of a hot or emerging topic and in-depth case study or clinical example; and a presentation of core concepts that students must understand in order to make independent contributions.

This brief is intended to provide a solid understanding of the noise reduction problem, with a focus on speech processing, in order to design a well-targeted solution for a well-deﬁned application. The authors propose a conceptual framework that can be applied to the many different aspects of noise reduction (or speech enhancement). The monaural or binaural noise reduction problem, in the time domain or in the frequency domain, with a single microphone or with multiple microphones, is presented in a uniﬁed way. Also, the derivation of optimal linear ﬁlters is simpliﬁed as well as the performance measures for their evaluation.

Chapter 1, “Introduction,” provides a very brief description of the problem of noise reduction. The authors focus their discussion on the area of speech enhancement and speech processing, but the topic is meant to be applied to the general problem of signal enhancement.

Chapter 2, “Conceptual Framework,” is the introduction of the author’s proposed conceptual framework for noise reduction. This formulation gives a better insight into this fundamental problem. Within the framework, the authors deﬁne all important performance measures and criteria that will be of great help in the derivation of the most well-known estimators. Key discussions concern the deﬁnitions of speech intelligibility and speech quality that is used throughout the rest of the work. Sections include signal model, principle of the conceptual framework, performance measures, mean squared error (MSE)–based criterion, a summary, and references for this chapter.

Chapter 3, “Single-Channel Noise Reduction in the Time Domain,” is identiﬁed as one of the most important schemes in the fundamental topic of speech enhancement since most communication devices have only one microphone and the time-domain processing seems intuitive and natural. While this approach has been well studied in the literature, this chapter revisits this method from the perspective proposed in Chapter 2. Sections include signal model, linear ﬁltering, performance measures, MSE-based criterion, optimal ﬁlters, simulations, and references.

Chapter 4, “Single-Channel Noise Reduction in the Short-Time Fourier Transform (STFT) Domain with Interframe Correlation,” studies the same problem in Chapter 3 but in the more convenient (STFT) domain. Contrary to most conventional approaches, the authors do not assume that successive STFT frames are uncorrelated. As a consequence, the interframe correlation is now taken into account and a ﬁlter is used in each sub-band instead of just a gain to enhance the noisy signal. Sections include signal model, linear ﬁltering, performance measures, MSE-based criterion, optimal ﬁlters, particular case, simulations, and references.

Chapter 5, “Binaural Noise Reduction in the Time Domain,” deals with the important problem in applications where there is a need to produce two “clean” outputs from noisy observations picked up by multiple microphones. This chapter approaches this problem with the widely linear theory in the time domain, where both the temporal and spatial information are exploited. Sections include signal model, widely linear ﬁltering, performance measures, MSE-based criterion, optimal ﬁlters, simulations, and references.

Chapter 6, “Multichannel Noise Reduction in the STFT Domain,” exploits the spatial information available from signals picked up by a determined number of microphones at different positions in the acoustics space in order to mitigate the noise effect. The processing is performed in the STFT domain. Sections include signal model, linear ﬁltering, performance measures, MSE-based criterion, optimal ﬁlters, simulations, and references.

Summary and Recommendation

The brief is presented in six chapters, including a very short introduction on the topic. References are provided at the end of each chapter and point the reader to contemporary research as well as foundational studies in each area. The authors present their conceptual framework for studying the general problem of noise reduction in Chapter 2 and introduce two important performance measures: the speech intelligibility index and the speech quality index. Next, they propose a general MSE-based criterion from which all known estimators can be deduced. The remaining chapters show how to apply these different concepts to all classical noise reduction schemes. Excellent formulae, charts, and graphs are provided throughout the text, with many in full color.

Chuck H. Perala
Aviation and Aerospace Industry
Washington, DC, USA
cperala@yahoo.com