The Journal of Sonic Studies

To refer to this article use this url: http://journal.sonicstudies.org/vol06/nr01/a08

1. Introduction

As human-computer interaction (HCI) and interaction design abandons the desktop in favour of our pockets and purses, the hands of children, and the wider environment, so issues of non-visual modalities of interaction are being foregrounded. We currently rely on a sound or vibration to inform us that a text message has arrived, or that we are late for the dentist. Beyond these simple alerts, sound is used to great effect in the design of virtual environments (Finney and Janer 2010; Nordahl 2010; Serafin and Serafin 2004), video games (Collins 2013; Murphy and Neff 2011; Nacke, Grimshaw and Lindley 2010), and artistic installations that embrace an existing site in order to enhance and empower spaces (Batchelor 2013; LaBelle 2006; Torehammar and Hellström 2012). The use of sound in these technical systems mirrors, to some degree, the place of sound in the real world, where it provides a broader “picture” of our surroundings which, in turn, orient us within the complexities of these “information” spaces wherein we daily find ourselves.

The issues of how interaction designers should design sound for use within HCI remains unresolved, as even talking about it, much less reasoning about it, is difficult. Barrass (Barrass 2005) refers to the novelty of his 1994 approach to sonification and how that laid the foundations for a comprehensive framework for auditory display design. Brazil (Brazil 2010) stated that there was still no systematic approach to auditory display and sonic interaction design, but that there was ongoing work. MacDonald and Stockman (MacDonald and Stockman 2013) highlight that auditory display design techniques are still not unified or easily understood. The expert knowledge upon which such design depends remains locked up in the professional practice of sound designers and ranges from what Anderson (Anderson 1974) refers to as the available propositional or observable, through to what Polanyi (Polanyi 1951) describes as the unarticulated tacit or yet to be formally systematized.

Notwithstanding these problems, there have been a number of attempts at systemising the design of sound within interactive technologies. Perhaps the most widely known is Brewster (Brewster 1994), who proposed guidelines for the design of earcons. Earcons are the abstract representations of some information using sounds, first introduced by Blattner, Sumikawa, and Greenberg (Blattner, Sumikawa and Greenberg 1989). Mynatt concentrated on the usability of auditory icons, proposing a method of design that addressed identifiability, conceptual mapping, physical parameters, and user preference (Mynatt 1994). Auditory icons are everyday sounds that correspond to computer events, as first developed by Gaver (Gaver 1986). Dombois and Eckel (Dombois and Eckel 2011) have developed guidelines for audification – the process of transforming data waveforms into sound, first discussed by Frysinger (Frysinger 1990) – and are most concerned with the suitability of the source material (data-set). Despite the many different approaches to sonification, such as audification, that have been developed since the 1980s, there is still no complete set of guidelines for their design (Walker and Nees 2011), although attempts have been made to formalise the process since (Kramer et al. 1999).

Frauenberger and Stockman (Frauenberger and Stockman 2009) have evaluated the work of Barrass (Barrass 2003), suggesting that patterns of auditory design can be developed and tagged with keywords to enable specialist and non-specialist designers to access this knowledge. In the style of Alexandrian patterns (Alexander 1979), the approach is to describe typical design problems and solutions. They note that identifying patterns from individual sound design solutions is difficult and that this problem could be lessened by increasing the size of the community contributing patterns as well as allowing sufficient time for patterns to be generated and shared. Despite these efforts, sound design for auditory displays remains something of a “black art”, being confined to a gifted few (Alves and Roque 2011).

While sound design within interaction design is poorly understood, this is not the case for sound in other design disciplines. Sound designers for games typically specialize early in their career to learn the craft and the professional tools that are required for their trade. Sound designers of soundscape installations in public buildings are often musicians who have a deep knowledge of how to affect people’s feelings and behaviour through sound (Hellström, Dyrssen, Hultqvist, Mossenmark and Sjösten 2011). Sound designers for radio and theatre predominantly train as recording studio or live music engineers first, and of course film schools provide another specialist route for film sound designers (Touzeau 2008). Within film, Walter Murch was the first to describe his work as sound design, as he moved the sound between mono, stereo and quadrophonic for the film Apocalypse Now ( LoBrutto 1994 ). Ben Burtt was the first to work as what is now the recognisable role of a sound designer on Star Wars Episode IV –A New Hope, being credited for special dialogue and sound effects (Whittington 2007).

However, as sound becomes an increasingly mainstream part of interaction design, we need to find ways of accessing the design knowledge of these specialist designers by mainstream interaction designers. Sounds rarely exist in isolation, so designers need to be able to represent the whole “soundscape” of an auditory display, including the direction and loudness of the different sound events and how the display changes over time. There is a simple equivalent for the quick sketch of a visual designer: vocal sketching is an effective approach for creating monophonic sounds using the human voice. There are issues with the length of sounds and breath as well as creating complex sounds and requiring multiple contributors for harmonies (Ekman and Rinott 2010; Tahiroğlu and Ahmaniemi 2010a; Tahiroğlu and Ahmaniemi 2010b). There are a small number of professional practitioners who can successfully create complex sounds and who are less limited by the length of breath required, although they predominantly specialise in animation for film, games and television (F. Newman, 2004). Giordano, Susini and Bresin (Giordano, Susini and Bresin 2013) point out that if continuous evaluation is required, then it is advisable for participants to listen to recordings as passive listeners rather than vocally sketch the sounds themselves. While a visual sketch remains fixed on the page, sound is temporal by nature, so it can be difficult to prototype a soundscape for user evaluation. It is, of course, easy to create and play a tune, or a sound effect, but even with current software, it remains difficult to present this in a way that designers can understand what it will be like when it is fully orchestrated and deployed as part of an overall user experience.

Our way of dealing with these issues is to visualize the soundscape, so that complex temporal data can be captured and analysed in order to highlight similarities and differences in listening experiences. Visualisations can allow designers to view data quickly, identify problems and provide a consistent form of interpretation. Just as a web site designer will provide a wire frame of the design to get reactions from the clients, so we seek to provide a visualisation of a proposed soundscape design. This needs to capture the foreground and background sounds, the different types of sound, and of course the change of the soundscape over time. Allowing designers to use a visual form to represent a sound design allows them to validate their designs quickly and with confidence.

1.1 Listening, soundscapes, and sound design

next section

Listening and hearing are different (Handel 1989), and Szendy (Szendy 2008) tells us that we can choose to listen. Madell and Flexer (Madell and Flexer 2008) define hearing as the acoustic mechanism of sound being transmitted to the brain, whereas listening is the process of focusing and attending to what can be heard. Thus, listening is an active process comprising conscious choice and subjective interpretation of what is heard (Blesser and Salter 2007).

A soundscape can be defined as the surrounding auditory environment that a listener inhabits (Porteous and Mastin 1985; Rodaway 1994; Schafer 1977). The soundscape surrounds the listener and is an anthropocentric experience (Ohlson 1976). The definition has not been standardized, but there is on-going work to create an ISO standard in order to establish its definition, conceptual framework, as well as methods and measurements of its study (Brown, Kang and Gjestland 2011; Davies et al. 2013). There is no complete model of the soundscape, as interpretation is affected by the sounds which can be heard, the acoustic space which affects the sounds, and listeners’ interpretations based upon what and how they are attending to the sounds (Davies 2013).

Luigi Russolo, as part of his 1913 Futurist manifesto, encouraged musicians to analyse noise in order to expand their sensibilities (Russolo, Filliou, Pratella and Press 1967). Granö differentiated between the study of “sound” and “noise” in 1929. He mapped auditory phenomena with reference to the “field of hearing” rather than “things that exist”. Granö did not use the term soundscape; instead the concept of proximity was applied, which represented the area immediately surrounding an inhabitant (Granö 1997). The concept was revisited in 1969 when Southworth tried to establish how people perceived the sounds of Boston and how this might affect the way they experienced the city (Southworth 1969). Schafer (Schafer 1977) and Truax (Truax 2001) attempted to formalise the concept using descriptions derived from existing terms such as soundmarks, rather than landmarks. Schafer (Schafer 1993) argued that all soundscapes should be designed or regulated to display what he terms high-fidelity (distinct, easily interpreted sounds), rather than low-fidelity (indistinct, difficult to interpret sounds). Soundscapes and the individual sounds that make up a soundscape have been shown to have a physiological and psychological impact upon listeners (Cain, Jennings and Poxon 2013). Sounds that are considered unpleasant cause a reduction in heart rate, and pleasant sounds lead to an increase in respiratory rates (Hume and Ahtamad 2013).

The work of the sound designer is to create an aesthetic combination of sound events, to produce a soundscape that is informative and/or evokes an emotional response in the listener. For example, in film and other linear media, sound may be used as a sleight-of-hand, making the audience believe that something has happened (Chion 1994). Video game sound designers have adopted many of the techniques associated with film sound (R. Newman 2009), but have added interactivity so that some of the sound events are directly controlled by gamers’ actions, whilst other sounds remain passively experienced within non-interactive sequences (Collins 2008).

Sound designers routinely manipulate the attributes of sound as part of their everyday practice. These include the sound’s pitch, loudness, timbre (or overall quality of the sound), duration, and direction. For example, the length of a sound can be used to convey a character’s emotions, such as a longer doorbell ring suggesting impatience (Kaye and Lebrecht 2000). The length of a silence (or lack of sound) can be useful to convey the passage of time or a change of location (Beaman 2006). Changing a sound’s pitch can make objects seem larger or smaller or alter the age or gender of a character (Beauchamp 2005; Collins 2008). Spatial cues, such as panning, can provide an insight about what a character is attending to (Beck and Grajeda 2008; Kerins 2010).

In interaction design, designers of auditory displays are concerned both with sounds being considered informative as well as creating appropriate acoustical properties (Brewster 2008; Buxton 1989). For example, Gaver’s Sonic Finder used auditory icons such as a scraping sound for objects being dragged across a computer desktop and a scrunching sound for putting a file in the wastebasket. Similar sounds are used on Apple’s operating system to this day (Gaver 1989). Microsoft’s Outlook email client, in contrast, uses abstract earcons, such as a soft tinkling when an email arrives in the user’s in-box.

1.2 Classifying listening experiences

Before designers can present and evaluate their designs for sound events and soundscapes, they need to establish what characteristics of sounds are most important – in short, they require a vocabulary. Each researcher describes sounds from their own perspective: some focus on the spatial characteristics of sounds, others on the dynamics or the aesthetics, and others may include additional qualities such as whether a sound is a background noise. The following brief treatment of some key writers in this field offers a flavour of the resulting incertitude.

Schafer (Schafer 1977), in one of the definitive treatments of sound, was concerned with a sound’s estimated distance and its environmental factors such as reverberation. Gabrielsson and Sjogren (Gabrielsson and Sjogren 1979) identified the feeling of space and nearness associated with sound events, while Amphoux (Amphoux 1997) added orientation and reverberation. Hellström (Hellström 1998) tended this by proposing enclosure, extension, centre, distance and direction, and Mason (Mason 2002) highlighted the width, diffuseness and envelopment.

Attributes concerned with the dynamics of sound have also been highlighted: Schafer (Schafer 1977) focuses on the intensity of a sound, Gabrielsson and Sjögren (Gabrielsson and Sjögren 1979) identified loudness, Amphoux (Amphoux 1997) was concerned with scale, and Hellström (Hellström 1998) specified strong and weak dynamics.

Temporal attributes included duration (Schafer 1977), atemporality (Amphoux 1997), and rhythm (Hellström 1998). Spectral attributes related to both frequency and timbre, with Schafer (Schafer 1977) identifying the brightness or darkness and fullness or thinness of timbre. A full sound has a broader range of spectra, while a thin sound has a much narrower range. Hellström (Hellström 1998) focused on both pitch and timbre, and Mason (Mason 2002) referred to timbral frequency.

Aesthetics were considered by Gabrielsson and Sjögren (1979) and Amphoux (1997). Clarity was specified in terms of “hi-fi” or “lo-fi” environments by Schafer (1997) and as clearness and distinctness by Gabrielsson and Sjögren (1979).

This is by no means exhaustive. However, our own recent work in this area aims to simplify and clarify this diversity.

1.3 What listeners hear

While sound designers can guide listeners by providing clues about what they should be attending to (Kerins 2010; Sonnenschein 2001), there has been relatively little work on directly comparing listener and sound designer experiences. There has, however, been much work distinguishing between musicians’ and less experienced listeners’ experiences in the field of psychoacoustics (Bharucha, Curits and Paroo 2006; Marie, Kujala and Besson 2012; P. M. Paul 2009). Listening tests have been conducted within product design for the last 50 years or more and involve experienced (trained) listeners (Engelen 1998; Frank, Sontacchi and Höldrich 2010; Soderholm 1998).

Rumsey (Rumsey 1998) tells us that there are high levels of agreement when participants are experts, whereas non-experts’ responses are likely to vary more. Bech (Bech 1992) suggests that increasing the number of participants can improve the level of confidence in the findings. Yang and Kang (Yang and Kang 2005) highlight the differences between measurements and evaluations and how much they can vary, especially when it comes to different types of sound sources and levels of pleasantness. Listener testing is limited to products such as audio reproduction equipment and vacuum cleaners and has not migrated into mainstream media, and only partially into computing (Bech and Zacharov 2006). Tardieu, Susine, Poisson, Kawakami and McAdamas (Tardieu et al. 2009) found that laboratory tests of sound signals (earcons) do not fully correspond with tests conducted under real world conditions.

In a previous study, the authors attempted to establish whether listeners have the same listening experience as the person who designed the sound (McGregor and Turner 2012). Surprisingly, there was little evidence as to whether what is designed to be heard is what is actually heard. A repertory grid technique was adopted using listener and designer generated constructs. One designer and 20 listeners rated 25 elements using the same attributes (descriptors) used in this study, within a surround sound recording created by a soundscape generative system. The listeners’ modal response was compared to the designer’s. The results suggested that it is perfectly feasible to compare designers’ and listeners’ experiences and to establish points of agreement and disagreement. The authors demonstrated an ontology of sound based on user experience rather than a designer’s training, with an approach based upon long-term experiences and listeners’ conceptualisation of sound.

1.4 Visualising Soundscapes

We are, of course, not the first to propose visualizing sound and its attributes. The painter Wassily Kandinsky translated atonal music into canvases (Brougher and Zilczer 2005). Another artist, Roy de Maistre curated in 1919 the exhibition Colour in Art where the musical notes A to F were converted to colours (A = red), paintings were accompanied by music, and colour charts were made available for the audience (Alderton 2011). Gibson (Gibson 2005) displayed different frequencies using colour, as did Matthews, Fong and Mankoff (Matthews et al. 2005). Circles have been regularly used to represent sound and can be found in a variety of visualisation schemes (Azar, Saleh and Al-Alaoui 2007; Frecon, Stahl, Soderberg and Wallberg 2004; Helyer, Woo and Veronesi 2009). Servigne, Kang and Laurini (Servigne et al. 2000) proposed that the varying intensities of noise could be visualized by altering the radius of circles. Gibson also adopted this approach by indicating the volume of a sound in a mix by the object’s size, with louder being larger than quieter. Shape has been used to designate the articulation of musical notes: legato = rounded, staccato = polygon (Friberg 2004).

Abstract shapes have also been applied to visualize phonemes that are not recognised by a phoneme recognition system, with high frequency sounds having spiky irregular forms (Levin and Lieberman 2004). Opacity has been used to a limited extent to communicate the volume or loudness of a sound event (Mathur 2009; Radojevic and Turner 2002; Thalmann and Mazzola 2008). Servigne, Laurini, Kang and Li (Servigne et al. 1999) suggested that graphic semiology would be appropriate for displaying sounds, proposing that smiling faces overlaid onto a map could be used to display a participant’s preferences, a smile represented “nice”, a neutral expression “neutral”, and a frown “not so good”. Bertin’s 1967 theory of cartographic communication was used to create the visualisation. Bertin proposed that the visual variables of shape, size, value, orientation, hue, texture, x and y coordinates could be applied to point, line, and area symbols. Monmonier (Monmonier 1993) argued that Bertin’s variables were also suitable for text, which could act as symbols within visualisations.

Figure 1 holds the set of symbols we have developed iteratively based on the literature highlighted above to represent the components of a soundscape (McGregor, Crerar, Benyon and Leplatre 2008; McGregor, Leplatre, Turner and Flint 2010). Each attribute of a classified sound event is visualised according to the symbols below, and then placed on a grid according to its perceived spatial location.


Figure 1: Symbols used for visualising sound events within a sound design