Journal of Sonic Studies, volume 6, nr. 1 (January 2014)Iain McGregor; Phil Turner; David Benyon: USING PARTICIPATORY VISUALISATION OF SOUNDSCAPES TO COMPARE DESIGNERS’ AND LISTENERS’ EXPERIENCES OF SOUND DESIGNS
Four case studies were chosen to highlight in order to illustrate the procedure. The remaining six designs are briefly reported in section 3.5. The experts’ evaluation can be found in section 4.1.
3.1 Design 01: Auditory Displaynext section
Auditory displays have been defined by Kramer (Kramer 1994) as an interface between users and computer systems using sound and are considered a natural extension of the way in which sound is used in the physical world. Auditory displays differ from auditory interfaces in that they operate unidirectionally. An interface allows audio to be used as input as well as an output, but does not require audio to be used as an input, whereas a display provides only output (McGookin and Brewster 2004). Speech interfaces are a specialist type of auditory interfaces that are predominantly confined to speech (Raman 2012). Auditory displays can be split into the user interface audio and audio used in visualisation. User interfaces include earcons, auditory icons, sound enhanced word processors (text to speech), and other applications, whilst sound in visualisation includes audification, sonification, and auralisation (Vickers 1999).
The sound events for the auditory display had been designed for a large manufacturer of electrical appliances for a variety of their products (Audio 1). The designer recorded no spatial cues, as the sound events were tested in isolation rather than within products (see Table 3). The designer made limited use of the material and interaction attributes, recording this information for only 9 of the 32 sound events. However, all of the other attributes were applied (see Figure 2). The majority of the sound events were considered by the listeners to be informative (square) and clear (opaque) (see Figure 3). Three of the sound events that were considered to be pleasing by the designer were found to be displeasing by the listeners (border width). There is a clear difference between the designer’s and listeners’ classification of music (musical notes symbol) and sound effects (loudspeaker symbol), with the designer considering the majority of sound events to be music, possibly due to the prominent use of earcons, which are often considered to be musical in nature by designers. The listeners predominantly classified these sound events as sound effects. They considered the aesthetics of the sound events to be more evenly distributed than the designer, who considered more to be either pleasing or displeasing. The listeners also classified more sound events as neutral than the designer, who considered the majority to be positive (emoticons).
As an auditory display, the sound design might be regarded as successful, as 26 out of the 27 sound events that were classified as informative by the sound designer were also classified as informative by the listeners (see Table 4). Similarly, 31 sound events were classified as clear by the listeners. The 3 sound events that were rated by the listeners as displeasing (AR, AT and BD) along with the 2 that were found to be uninformative (AT and AX) might benefit from further review. The major difference in the listeners’ and designer’s rating of the auditory display was in terms of the sound events being considered as sound effects rather than music by the listeners. The similarities were far more prevalent, especially in terms of the Temporal, Spectral, Dynamics, Content and Clarity attributes. This might mean that listeners did not fully appreciate the hierarchical associations that are inherent in earcon design. Listeners may perceive each earcon as a separate sound effect, which would require them to learn each icon individually, rather than recognise musical similarities.
3.2 Design 02: Sonification
Sonification refers to a technique for transforming data into an audible stream that is analogous with data visualisation (Kramer et al. 1999). It can be argued that a sonification method must be objective, systematic, reproducible as well as suitable for use with different data (Hermann 2008). Data can be split into auditory streams where each stream is linked to a specific audio variable such as pitch, volume, note duration, fundamental wave shape, attack (onset) envelope, and overtone (harmonics) wave shape. This can make the data not only more informative, but potentially increase the amount of information that can be transmitted concurrently (Bly 1982).
This soundscape consisted of a 56 second video of an acceleration trace from a four man coxless rowing team sonified using a continuous tone that varied in pitch (Video 1) (Schaffert, Mattes and Effenberg 2010). The sonification is designed to help athletes improve their performance (see Table 5).
VideoObject1: Video 1: Sonification
The designer classified all of these sound events as sound effects, with values of gas for the material, informative for the content, and clear for the clarity attribute. Interactions varied from impulsive to intermittent, with a single instance of continuous (see Figure 4). The listeners were aware of all 8 sound events, and considered all but 1 to be informative. The listeners grouped the materials of the sound events into 1 gas (magenta border), 5 liquid (cyan border), 1 solid (yellow border), and 2 as both liquid and solid (see Figure 5). Listeners experienced a greater range of spectral attributes than the designer.
The sonification could be considered as successful, as almost all of the sound events were considered informative, and listeners were able to distinguish between the differences in pitch (see Table 6). The range of pitch variation could be increased so that it extended into the low range and some form of panning might be considered, if only to move the sound events into the centre of the stereo field. The designer’s and the listeners’ ratings for Type, Dynamics, Clarity and Emotions were identical. The Spectral, Content and Aesthetics attributes only differed slightly. The main differences between the designer’s and listeners’ responses were with the Y axis (depth), Material, Interaction and Temporal attributes. Whilst listeners found all of the elements informative they did experience the sound events as being further away, as well as sounding more liquid like than gas, and the Interaction being more continuous than impulsive.
3.3 Design 03: Simulation
A variety of systems exist for simulating soundscapes and/or acoustical environments. The simplest is to record an auditory environment using a multichannel microphone or multiple microphones and then to reproduce the recording through multiple loudspeakers (Bertet, Daniel and Moreau 2006; Holman 2000). More complex interactive systems are available, where a sound designer records all of the original samples and composes, or creates, a set of rules and parameters for real time soundscape generation (Schirosa, Janer, Kersten and Roma 2010; Valle, Lombardo and Schirosa 2009). Procedural audio systems, where all of the sounds are generated artificially, are also available and are commonly found in video games (Farnell 2011; L. J. Paul 2010).
A 7 minute and 25 second simulation of the soundfield of a multimedia laboratory and its immediate environment was created for this soundscape (Audio 2). A soundfield can be defined as the auditory environment surrounding a particular sound source. A soundfield represents the quantifiable characteristics of a sound source or event (Ohlson 1976). The simulation was created using a non-linear sequencing model called GeoGraphy (Valle et al. 2009). GeoGraphy had previously been tested by comparing simulations with recordings from real environments and asking listeners to identify which was which. Each sound event is a single zone, and the descriptions represent the sounds that were used to create the zone (see Table 7).
The designer considered the content of 9 of the sound events to be informative, 5 neutral and 5 uninformative. These were visualized as in Figure 6, showing different shapes to represent the different content. Sound events such as the photocopier and the film were informative, the sounds of people’s actions such as drying their hands or footsteps were neutral, and room tones were uninformative. Listeners were unaware of two of the sounds associated with the washing of hands (AJ and AK) (see Figure 7). The listeners thought that 14 sound events were informative, 1 neutral and 2 both informative and neutral. Listeners might have been trying to make sense of what they were listening to and constructing a narrative in order to understand the sequence of sound events. This could be attributed to the number of sound events that the listeners found to be clear (15), which contrasts with the designer, who rated 9 of them as clear and the remaining 10 as unclear.
The sound types were consistent (see Table 8), with all of the sound events being categorized identically by both the listeners and the designer. Four of the sound events were considered to be speech, with the remainder (15) being sound effect. There were no instances of music. The designer considered the sound events to have a greater difference in dynamics: 7 were loud, 7 medium and 5 soft. The listeners found 3 to be loud/medium, 1 soft, and the remainder (16) medium. This might suggest that the variation in dynamics is too subtle and that a greater difference needs to be applied in order to convey the range intended by the designer. More sound events were considered clear and informative by the listeners than by the designer, which is probably due to the artificial nature of listening out of context. The Aesthetics and Emotions aspects of the sound design were not communicated effectively, with almost all of the sound events being neutral.
3.4 Design 04: Game Sound Effects
The fourth design utilized sound effects used for a commercially released console video game. All of the sound events were part of a company’s sound library, for designers to use in the construction of games. Eight separate audio files were included; the shortest was less than 1 second long and the longest 1 minute and 19 seconds (Audio 3). Half of the files, which were all recordings of a female voice speaking single words, were single sound events, and the remaining 4 were atmospheric constructs with between 3 to 5 sound events (see Table 9).
The designer considered all of the 18 sound events to be informative (see Figure 8), and either speech or sound effects. Full use was made of the range of the remaining attributes. For the material attribute, gas was predominantly used to classify the voices, most of the “birds”, and some of the dogs. Liquid was consistently chosen for “water”, and solid was applied to “kiss”, “hit”, and some of the dog sounds. There was increased consistency for the Interaction attribute. The designer used continuous to classify only the water sounds, all of the birds were intermittent, and all of the voices were impulsive. Only the dog sounds were inconsistent, being either impulsive or intermittent. The majority (10) of the sound events were temporally short, only 3 were medium, and 5 were long. Atmospheric sound effects, such as the waterfall, tended to be temporally long, whereas speech was either short or medium.
The listeners rated only 12 of the 18 sound events as being informative (see Figure 9). Four were found to be uninformative, 1 was neutral, and 1 was both informative and uninformative, illustrating that there were contradictory responses. Each of the sound events classified by the listeners as uninformative, as well as the single neutral sound event, were speech. Three of these were also unclear, whilst the remaining 2 were clear. The designer regarded only 1 of the sound events as unclear.
When considering the sound design as a whole, sound effects can be considered successful when they are informative and convey the required emotions accurately. There is a difference for the two groups with regards to speech (see Table 10). The emotions are not conveyed, being consistently considered as neutral by the listeners as well as predominantly uninformative. However, the designer judged them to be both informative, conveying either positive or negative emotions. This is perhaps due to a problem with the dialogue delivery rather than the sound design. More sound events were considered clear by the listeners than by the designer, which may be due to the artificial nature of the task, where sound events were listened to in isolation, without reference to a game.
3.5 Other designs
Design 05 was a short film that had music and sound effects, but no dialogue (Video 2). The designer identified 45 sound events, but only 23 of these were recalled by the listeners (see Table 11). Twelve of the events that listeners were unaware of were classified as uninformative by the designer, but all the events that the listeners were aware of were classified as informative (18) or neutral (5) (see Figures 10 and 11).
VideoObject2: Video 2: Short film
Design 06 was a 30 second soundscape composition, composed of a piece of music and sound samples of a person playing flute by a stream (Audio 4). The designer identified 15 sound events and made full use of the width and depth codes (see Table 12, Figures 12 and 13). The listeners were aware of 9 of the events and did not perceive such a wide spatial distribution. They combined the two pieces of flute music into a single sound event.
Design 07, a 42 second section from a radio drama, consisted of 14 sound events, 5 speaking characters, and 9 sound effects (Audio 5). One event identified as unclear by the designer was not noticed by the listeners, but otherwise they were aware of all the events (see Table 13, Figures 14 and 15). The characters were also classified in emotional terms, with the Aesthetics as being neutral, rather than pleasing or displeasing.
Design 08 was a set of audio logos (audio branding) that often form part of an advert; here the aesthetics, clarity, and emotional response were most important (Audio 6). The designer considered all but 4 of the sounds to be pleasing, whereas the listeners classified only 5 as pleasing (see Table 14, Figures 16 and 17). The outcome of this evaluation could be useful to feed back to the designers what people actually thought of the designs.
Design 09 was an abstract composition, included to see if the visualisations could be used for representing complex soundscapes (Audio 7). It was presented in surround sound to the listeners. There were 26 sound events, all classified by listeners and the designer as sound effects, and the listeners were aware of all of these (see Table 15, Figures 18 and 19).
Design 10 consisted of a 30 second audio sequence of film sound effects (Audio 8). The listeners were only aware of 18 of the sound events out of a total of 32 (see Table 16, Figures 20 and 21). Listeners were unaware of all of the sound events that the designer classified as soft. However, it did not follow that listeners were aware of each loud sound event.
In table 17 it is possible to see that a few of the attributes were rated similarly such as Type, Temporal, Spectral and Emotions. There were small differences in Material in relation to the rating of liquid and gas. There were pronounced differences in Interaction, Dynamics, Content, Aesthetics and Clarity. In terms of Interaction listeners rated more sound events as continuous than the designers. For Dynamics listeners tended more towards the mid value, whereas for Content listeners more often rated the sound events as informative than the designers did. Finally, listeners found a greater percentage of the sound events to pleasing and clear than the designers considered them to be.