Representations of the Diegetic World

It is important to begin with the state of how sonic worlds are formed across all audiovisual content. There is a Platonic notion in storytelling that is often cited in examinations of audiovisual sound theory—the diegesis, or the diegetic world. As Mary Ann Doane has noted, this is the space of the film world; in terms of sound it is any sound that emanates from the story space in which events occur (Doane 1985). Claudia Gorbman reaches back to 1950s French scholars Gerard Genette and Etienne Souriau and comes to define diegesis as “the narratively implied spatiotemporal world of the actions and characters” (Gorbman 1987). All four of these writers use the term “space” (Doane) or “world” (the latter three) to define the diegesis. This marks a modification in the term over time from its Greek etymology.[2] Nevertheless, when one uses the term “diegetic sound” today, the intention is to help us consider sound that inhabits the created world as physically present within the space of events. Anything outside of the interior space of the film world is called nondiegetic. The most common example of nondiegetic sound is a musical score, which the characters do not hear and is not physically present in the space of events. It is therefore not grounded in filmic reality.

The foundation of recorded audiovisual media (film, video) is photographic realism, which establishes the audience’s view of this diegetic framework. Sound is generally regarded as a means of reinforcing this photographic realism. “[S]ound is used to to make the image ‘credible’ within a very narrow definition of ‘realism’” (Wayne 1997: 176). Animated imagery, which is not recorded photographically, is different in that it has no connection to any pre-existing, “real” location. However, the audio methodology has transferred from film to animation using the exact same codes of recorded photography. Sounds for animation tend to come from recordings of real-world objects. This applies well to film where there is a match between what is photographically seen and heard. But in animation, applying such anachronistic sounds to imaginary visuals produces what might be called nondiegetic realism—a seeming contradiction in language that reflects the contradiction of applying sounds of modernity and nature to narratives of fantasy. Because animation has no filmed reality to reinforce, this approach takes us out of the animated world by its adherence to a realism which arises from a place outside the animated world, rather than inside it.

There are several reasons for this tendency, going back to the histories of film and of animation:

  • Historical: Film sound from its beginning played two roles: a) to attract attention to the cinematic apparatus and b) to legitimize the image (Lastra 2000). Sound as a craft was also regarded as technical rather than artistic. Animation sound relied heavily upon music for both musical and sound effects moments (Curtis 1992).
  • Technological: Location sound (or “production sound”) has always been a process of isolation—removing the location’s noise and ambience to preserve the primary signal, namely the voice.
  • Methodological: While production sound is restrictive (see No. 2), post-production sound is primarily additive, in the sense that sonic elements are re-built in the studio. Foley (character movement sounds), ADR (dialogue replacement), and sound effects recording and design are created in layers to produce the expectation of audiovisual verisimilitude and rational continuity.

Let us consider the importance of all three sound tendencies by using a simple example: footsteps. If we see someone walking in a (non-animated) film, we expect to hear that sound. Hearing these footsteps satisfies tendency No. 1: to legitimize the image. However, during film production, microphones are pointed at an actor’s voice if he is speaking dialogue in order to capture the clean sound of the voice. This is a reflection of tendency No. 2. As a result of this restriction, Foley performers inside a post-production recording studio will walk in sync to the visual image to produce the desired sound, thereby producing the expectations of tendency No. 3.

This film approach is mirrored in the animated form, except for the fact that animation has no “location sound.” Because there are no actors in existing settings, animation sound is produced from the imagination and applied to invented visuals. However, despite being an entirely different mode of production, animation takes its cues from film in its construction of filmic expectation. Imagine for example a scene similar to the above in an animated film in which a character is walking. Animation students and young animators tend to focus on this element and want to immediately produce sound for this action. This is because it is a synchronous moment that is clearly, visibly evident. There is a strong need to resolve this objective sense of rational closure. It does not matter so much what the sound is—for example, any kind of non-varying “tick tick tick” sound will suffice. The drive to fill the silence with something, anything, is strong. It is a matter of fulfilling the perceptual expectation. Professional-level animation projects will resolve the expectation through the film-based post-production methodology described in No. 3 above.