Journal of Sonic Studies, volume 6, nr. 1 (January 2014)Iain McGregor; Phil Turner; David Benyon: USING PARTICIPATORY VISUALISATION OF SOUNDSCAPES TO COMPARE DESIGNERS’ AND LISTENERS’ EXPERIENCES OF SOUND DESIGNS
Designers chose whether they wished the listeners to be able to listen only once or multiple times. When the listeners were able to listen repeatedly, they were aware of every sound, which led to a 100% positive score for awareness. Within the 6 designs that listeners could not listen to repeatedly there were a total of 144 sound events. Listeners were aware of 98 of the sound events, which represented a level of 68% awareness. Listeners were unaware of sound events that did not have a recognisable source, such as “synth ambience” in Short Film or “stretching in and out synth transition” in Film Sound Effects. Sound events that the designers considered displeasing, such as “bathroom sounds” in Simulation or “weird branches coming out of the mouth” in Short Film, also went unnoticed by the listeners. This lack of awareness might be due to the sound event being regarded as uninformative by the designer, for example “girl’s voice” in Radio Drama and “rock first hit” in Soundscape Composition.
Spatial cues were used by 9 out of 10 designers and are well reflected in the subsequent visualisations that show the differences between the designer’s intention and the listeners’ reactions. Listeners only perceived 2 of the designs as having motion (Composition and Film Sound Effects). They thought that the design with the greatest amount of motion was Composition, with 69% of the sound events being regarded as having motion. The designer of Composition considered 42% of the sound events to have motion. For stationary sound events, the designers used almost the entire X axis (panning) and the entire Y axis (depth). The listeners experienced slightly less panning and depth for the static sound events. For sound events that had motion, the entire X and Y axes were used by the designers. In contrast, the listeners experienced the entire range of panning but a lesser amount of depth.
The Type attribute (music, sound effect, speech) is quite intuitive. Speech was predominantly used to classify identifiable words or phrases by both the designers and the listeners, such as “I’m calling you (Man)” in Auditory Display or “Butler’s voice” in Radio Drama. Music was chosen for the most part when there was a clearly identifiable melody such as “dub music” in Short Film or “flute music A” in Soundscape Composition. Sound effect was used for a wide range of sound events; examples include “birds” in Audio Logos and “recovery phase” in Sonification.
When classifying the Material attribute of sound events, gas was often chosen for sound events that involved the movement of air as in “the tonic (keynote)” of Laboratory, “the wheels of personal computers” in Simulation, or the “jet-entry” in Film Sound Effects. Liquid was predominantly selected for sound events such as the “waterfall” in Games Sound Effects and “water trickling” in Soundscape Composition.
For the classification of the Interaction attributes, impulsive was primarily used for percussive-type sound events, such as “message knocking” in Auditory Display, or “drumming fingers on a desk” in Simulation. Intermittent was chosen when sounds had a percussive element but had an underlying sustained element beneath it, as in the “dog growl” in Games Sound Effects or the “juddering anacrusis” in Composition. Continuous was applied when there was a sustained sound event without any obvious percussive elements; examples include the “room ambience” in Short Film and the “background ambience” in Composition.
Within the Temporal attributes, short was chosen for brief non-repeating sound events, such as the “poof flash” in Film Sound Effects or the “voice ‘xxxxx’” in Audio Logos. Medium was used when a sound event was of indeterminate length, neither short nor long; examples include “Sid’s voice” in Radio Drama and “sounds emitted by a key-holder while someone is walking in the passage, other noises, some steps, aeration ducts” in Simulation. Long was applied to extended uninterrupted sound events, such as “theme music” in Auditory Display or “water” in Games Sound Effects.
The Spectral attribute high was commonly applied to bright percussive sound events both natural and man-made; these included “loud chirp” in Soundscape Composition and the “front catch” in Sonification. Mid was chosen for sound events that fell between high and low as well as for sound events that had broadband spectral content. Examples of broadband content include the “distorted evolving pad” in Short Film and “jet engine fires” in Film Sound Effects. Low was selected for obvious bass content, as in “bass rumble” in Composition and “leather bass drum” in Audio Logos.
When Dynamics attributes were applied, loud was often used for short prominent sound events, such as “gunshot” in Short Film or “mid catch” in Sonification. Medium was chosen for moderate intensity sound events that provided context for a further action; examples include “gun loading” in Short Film and “safe door jiggled” in Radio Drama. Soft was used to classify gentle sound events that formed an auditory backdrop; examples include “background ambience” in Composition and “birds” in Audio Logos.
The most obvious examples of informative sound events for the Content attribute were those associated with warnings such as “low battery alert” in Auditory Display or “Ringing bell (doorbell)” in Radio Drama. Neutral was applied to sound events that were neither regarded as necessary nor unnecessary to comprehend the sound design. Examples include “chirping beeps 1” in Film Sound Effects and “big leaf crunch” in Soundscape Composition. Uninformative was retained for those sound events which were considered unnecessary, as in “leather bass drum” in Auditory Display or “voices” in Games Sound Effects, the latter of which was only uninformative from the listeners’ perspective.
Within the Aesthetics attributes, pleasing was predominantly applied to positive sound events that came from an acoustic source; examples include “birds high and loud” in Games Sound Effects and “classical guitar” in Audio Logos. Displeasing was often chosen for sound events that had negative associations, such as “dog growl” in Games Sound Effects or “distorted scream” in Short Film. Neutral was used for abstract sound events that had no physical analogue, such as “back reversal” in Sonification or “ripping detritus-drop” in Composition. The aesthetic ratings appear to be closely related to emotional responses rather than to whether a sound was considered beautiful.
Within the Clarity attribute, clear was often applied to explicit sound events that were foreground in the designs; examples include “the emission sounds of a television: a woman’s voice” from Simulation and “woman’s voice” from Radio Drama. Unclear was used for sound events that, whilst still audible, were difficult to discern, as in “female voice ‘Tomorrow’” from Games Sound Effects and “background ambience” in Composition. Neutral sound events were those which were regarded as neither clear nor unclear; examples comprise “rock bounce” from Soundscape Composition for the designers and “warning spearcon” from Auditory Display for the listeners.
In terms of the Emotions attributes, positive was applied when a sound event with obvious affirmative associations, such as “kiss” in Games Sound Effects or “success” in Auditory Display. Neutral was used when the sound events were abstract; examples include “drive phase” from Sonification and “building transitional whoosh” in Film Sound Effects. Negative denoted sound events that were designed to have an unpleasant effect; these included “door” in Audio Logos and “Chetwood’s voice” in Radio Drama.
In general terms the designers’ responses were weighted towards the middle value in 5 of the attributes (Spectral, Dynamics, Aesthetics and Emotions) (see Table 17). Only one of the attributes (Material) had a value of under 10% (liquid, 7%) according to the designers. With regard to the listeners, 4 of the attributes had responses that fell below 10%: music (9%), uninformative (6%), neutral clarity (7%) and unclear (5%). For 6 of the attributes, the internal ranking of responses was consistent between the designers and listeners, although the percentages differed. The 4 remaining attributes that did not have consistent ranking between the two groups were Type, Interaction, Aesthetics and Emotions, although the majority response was always the same. Listeners rarely rated any sound event as music unless it was exclusively musical; in contrast, the designers rated sounds that had musical elements as music. The contrast between the Material and Interaction interpretations could be a case of degrees of differentiation. Both solid and impulsive were interpreted reliably, but there was obvious variation in terms of gas/liquid. This could be an area where training is required to produce consistent differentiation. Some of the variation in responses for Clarity and Content may be due to listeners being asked to consider sound events in isolation, and being provided with descriptions rather than having to interpret what they were listening to without guidance. The differences in Emotions and Aesthetics might be due to the designers’ applying more subtlety in their sound designs than the listeners could interpret. The central weighting for Dynamics with listeners could be solely down to the reproduction apparatus, in that the designers had access to equipment with improved dynamic range. The consistency for Temporal and Spectral attributes may be down to an inherent familiarity, irrespective of training. In conclusion, all of the attributes were used by both the designers and the listeners, and as such appear to be suitable for describing soundscapes.
In order to address the reliability of the attributes, we examined 120 conditions of which a small number (21) proved to be of interest (see Table 17). When comparing designers’ and listeners’ responses, none of the attributes could be considered reliable for all of the sound designs, and interaction was not reliable for any design. However, for three of the sound designs there is a significant level of reliability for a limited number of attributes. There are two factors to consider. First, the method may not be reliable or valid: listeners and designers agree sometimes but not at other times, because the method is flawed in some way, the most obvious example being in describing interaction. Second, it is also possible that expert knowledge is very different from non-expert (Alves and Roque 2010; Cattell, Glascock and Washburn 1918; Kaufman, Baer, Cole and Saxton 2008), and we may be comparing apples with oranges – both fruit, but one is of the genus Malus, the other of the genus Citrus. What Table 18 shows is that in certain instances listeners’ and designers’ experiences can be compared with confidence, but that the scope is limited.
Coleman (Coleman 2008) highlights the distrust that designers have for non-experts’ descriptions of auditory environments. Audio professionals spend a considerable amount of time learning to shift between critical and natural listening. The visualisation allows a comparison to be made between designers’ and listeners’ listening experiences.
A simple comparison of the designer’s soundscape map of the pre-existing environment with the listeners’ illustrates where similarities and differences lie. Cross referencing what participants were aware of with all of the recorded sound events, highlights what was being attended to and what was ignored. The classification provides information about what the perceived events sound like, how relevant they are, whether they are pleasing, clear and what, if any, their emotional impact was. This informs the designer what is favourable and what is considered to be neutral or unfavourable.
4.1 Expert Evaluation
A questionnaire was sent out to all ten designers with soundscape visualisations of their design. The questionnaire addressed classification, visualisation, and applications. Designers were asked to rate how important each attribute used in the soundscape visualisations was in order to compare sound designs with listeners’ experiences. Designers were invited to choose the most appropriate way to display the audio attributes used in the classification. An adapted visual questionnaire approach was used, where each visualisation option was pictorially represented, using a check box to indicate choice. This was followed by their level of agreement with the statement that the “soundscape visualisation allowed me to compare a sound design with the experience of listeners”. The questionnaire concluded with open-ended questions about what methods they currently employed to evaluate sound designs, how they could use this method, and suggestions for changes.
All ten designers completed the questionnaire; none of the questions were omitted. Seven out of the 12 audio attributes were considered to be either important or very important by 6 of the designers. A further 4 attributes were rated as important or very important by 5 of the designers, and only a single attribute (Interaction) was rated as being either important or very important by fewer than half of the designers. Awareness, Spatial Cues, Type, Dynamics, Content, Clarity and Emotions could be chosen as a reduced set of attributes for future visualisations.
The second part of the questionnaire asked designers for their preferred choice of displaying each audio attribute. Seven out of the 12 attributes had a single method of display chosen by the majority of the sound designers. Two of the methods of visualisation were chosen by all ten of the designers: the position on a grid for the Spatial Cues and symbols for the Type. A further 2 visualisation methods were chosen by nine out of ten designers: inclusion of object for Awareness and emoticons for Emotions. Opacity for Clarity, border dashes for Interaction and shape for Content were also chosen by more than half of the designers. There was no clear single choice of display for the remaining attributes Material, Temporal, Spectral, Dynamics, and Aesthetics.
A reduced set of 7 attributes has been suggested by the designers (Awareness, Spatial Cues, Type, Dynamics, Content, Clarity, Emotions), along with appropriate methods of display (see Figure 22). All but one of the attributes (Interaction) were considered to be either “important” or “very important” by at least half of the designers, with 7 attributes being selected by the majority. All of the designers agreed that soundscape visualisation allowed them to compare a sound design with the experience of listeners.