Auditory Scene Analysis

The root of the auditory scene analysis problem is derived from the inherent complexity of our everyday acoustic environment. At any given moment, we may be surrounded by multiple sound-generating elements such as a radio playing music, or a group of people speaking, with several of these elements producing acoustic energy simultaneously. In order to make sense of the environment, we must parse the incoming acoustic waveform into separate mental representations called auditory objects or, if they persist over time, auditory streams. The problem is compounded by the fact that each ear has access to only a single pressure wave that comprises acoustic energy coming from all active sound sources. The cocktail party problem, which is a classic example of speech perception in an adverse listening situation, can therefore be viewed as a real-world auditory scene analysis problem in which individuals must attempt to separate the various, simultaneously active sound sources while trying to integrate and follow a particular ongoing conversation over time. It should come as no surprise, therefore, that auditory perceptual organization is commonly described across two axes: organization along the frequency axis (i.e., simultaneous organization) involves the moment-to-moment parsing of acoustic elements according to their different frequency and harmonic relations; organization along the time axis (sequential organization) entails the grouping of successive auditory events that occur over several seconds into one or more streams. Understanding how speech and other complex sounds are translated from the single pressure waves arriving at the ears to internal sound object representations may have important implications for the design of more effective therapeutic interventions. Moreover, it becomes essential to understand the effects of normal aging on listeners' abilities to function in complex listening situations where multiple sound sources are producing energy simultaneously (e.g., cocktail party or music concert) if we hope to provide treatments through which individuals can better cope with these changes and continue to have fulfilling auditory experiences.

Auditory scene analysis is particularly useful for thinking about complex acoustic environments because it acknowledges both that the physical world acts upon us as our perception is influenced by the structure of sound (i.e., bottom-up contributions), just as we are able to modulate how we process the incoming signals by focusing attention on certain aspects of an auditory scene according to our goals (i.e., top-down contributions). As for the inherent properties of the acoustic world that influence our perception, sounds emanating from the same physical object are likely to begin and end at the same time, share the same location, have similar intensity and fundamental frequency, and have smooth transitions and "predictable" auditory trajectories. Consequently, it has been proposed that acoustic, like visual, information can be perceptually grouped according to Gestalt principles of perceptual organization, such as grouping by similarity and good continuation (Bregman, 1990). Figure 64.1 shows a schematic diagram of these basic components of auditory scene analysis.

Many of the grouping processes are considered automatic or primitive because they can occur irrespective of a listener's expectancy and attention. Therefore, an initial stage of auditory scene analysis following basic feature extraction involves low-level processes in which fine spectral and temporal analyses of the acoustic waveform are employed so that distinct perceptual objects can be formed. However, the perception of our auditory world is not always imposed upon us. Our knowledge from previous experiences with various listening situations can influence how we process and interpret complex auditory scenes. These higher-level schema-driven processes involve the selection and comparison between current auditory stimulation and prototypical representations of sounds held in long-term memory. It is thought that both primitive and schema-driven processes are important for the formation of auditory objects, and these two types of mechanisms might interact with each other to constrain perceptual organization.

The auditory scene analysis framework allows for the act of listening to be dynamic, since previously heard sounds may lead us to anticipate subsequent auditory events. This is well illustrated by phenomena in which the

Figure 64.1 Schematic representation of a working model of auditory scene analysis. This model postulates low-level analyses where basic sound features are processed such as frequency, duration, intensity, and location. Scene analysis can be divided into at least two modes: primitive (bottom-up) and schema-driven (top-down). Primitive processes, the subject of most auditory scene analysis research, rely on cues immediately provided by the acoustic structure of the sensory input. These processes take advantage of regularities in how sounds are produced in virtually all natural environments (e.g., unrelated sounds rarely start at precisely the same time). Schema-driven processes, on the other hand, are those involving attention, or are based on experience with certain classes of sounds—for example, the processes employed by a listener in singling out a familiar melody interleaved with distracting tones. Our behavioral goals may further bias our attention toward certain sound attributes or the activation of particular schemas depending on the context. In everyday listening situations, it is likely that both primitive and schema-driven modes are at play in solving scene analysis problems.

brain "completes" acoustic information masked by another sound, as in the continuity illusion (see Figure 64.2A).

In this illusion, discontinuous tone glides are heard as continuous when the silences are filled by noise bursts. In another example, missing phonemes can be perceptually restored when replaced by extraneous noise, and these effects occur primarily in contexts that promote the perception of the phoneme, suggesting that phonemic restoration may be guided by schema-driven processes (Repp, 1992).

With respect to speech and music, schemata are acquired through exposure to auditory stimuli; they are consequently dependent on a listener's specific experiences. For instance, in speech processing, the use of prior context helps listeners to identify the final word of a sentence embedded in noise (Pichora-Fuller, Schneider, and Daneman, 1995). Similarly, it is easier to identify a familiar melody interleaved with distractor sounds if the listener knows in advance the title of the melody (Dowling, Lung, and Herrbold, 1987) or if they have been presented with the same melody beforehand (Bey and McAdams, 2002).

Even though the role of both bottom-up and top-down processes are acknowledged in auditory scene analysis, the effects of age on these collective processes are not well defined. Yet, it is clear that deficits in both simultaneous and sequential sound organization would have dramatic consequences for everyday acoustic computations such as speech perception. For example, deficits in concurrent sound organization may result in the perceiver being unable to adequately separate the spectral components of the critical speech event from the background noise. A failure to segregate overlapping stimuli may also result in false recognition based on the properties of the two different sounds. For example, in dichotic listening procedures, which involve simultaneous presentation of auditory materials to the two ears, individuals presented with ''back'' in one ear and ''lack'' in the other ear often report hearing ''black,'' suggesting that acoustic components from two different sources may be ''miscombined'' into one percept (Handel, 1989). Moreover, difficulties in integrating acoustic information over time and maintaining the focus of auditory attention may further limit the richness of acoustic phenomenology, leading to problems in social interaction and an eventual retreat from interpersonal relations.

Although auditory scene analysis has been investigated extensively for almost 30 years, and there have been several attempts to characterize central auditory deficits in older adults, major gaps between psychoacoustics and aging research remain. The goal of this chapter is to assess the scene analysis framework as a potential account for age-related declines in processing complex acoustic signals such as speech and music. This review may assist in the development of more effective ways to assess and rehabilitate older adults who suffer from common types of hearing difficulty by evaluating the role of bottom-up and top-down processes in solving scene analysis problems. We now consider auditory scene analysis with respect to the initial parsing of the acoustic signal, before going on to discuss more schema-driven and attention-based mechanisms in audition. With respect to primitive grouping, this may be performed by segregating and grouping concurrently or sequentially presented sounds.

Blood Pressure Health

Blood Pressure Health

Your heart pumps blood throughout your body using a network of tubing called arteries and capillaries which return the blood back to your heart via your veins. Blood pressure is the force of the blood pushing against the walls of your arteries as your heart beats.Learn more...

Get My Free Ebook

Post a comment