The Lacanian Gaze in Spatial Audio: How Immersive Soundscapes Redefine Non-Diegetic Intervention
The Auditory Gaze: When Sound Originates in the Viewer
Lacanian film theory has long understood the visual register as a space where the subject discovers their position within the symbolic order. The gaze—that unmasterable encounter with the object that looks back—functions as a kind of rupture in the viewer's imaginary identification with the cinematic image. But psychoanalytic film theory has been overwhelmingly visual. The gaze, as Lacan articulated it, is emphatically the visual gaze. Sound has remained, in this theoretical apparatus, secondary—a support to the visual rather than a constitutive dimension of spectatorship.
Spatial audio in 2026 demands a reconsideration of this hierarchy. When sound is no longer bound to a speaker in front of the screen but instead surrounds the listener, emanating from every point in three-dimensional space, the auditory field becomes something more than accompaniment to the visual image. It becomes, itself, a space in which the subject is positioned and gazed upon. The listener discovers that they are not merely hearing the soundtrack; they are heard within it, their body become the origin point of the entire auditory field.
This constitutes what we might call the auditory gaze—a Lacanian concept applied to the acoustic register, where the subject encounters themselves as the object of an unseen, all-encompassing auditory presence. The traditional distinction between diegetic sound (sound emanating from within the story-world) and non-diegetic sound (commentary, score, voice-over) collapses when the entire sound field is spatially distributed around the listener's body.
The Collapse of Diegetic Boundaries in Immersive Space
Classical film sound theory depended on the distinction between diegetic and non-diegetic registers. Diegetic sound originates within the story-world: a door slamming, dialogue spoken by characters, music from a radio or orchestra within the narrative space. Non-diegetic sound exists outside the diegesis: the orchestral score, the narrator's voice, sound effects designed purely for emotional effect rather than narrative motivation. This distinction organized how viewers understood the ontological status of sound—what belonged to the story-world and what represented an authorial intrusion into the viewer's consciousness.
But spatial audio destabilizes this binary. When a soundtrack is distributed throughout a three-dimensional acoustic space, with certain sounds positioned above the listener's head, others to the side, others emanating from what feels like an impossible distance, the distinction between diegetic source and non-diegetic commentary becomes untenable. Is the orchestral score, arriving from above and behind the listener's head, non-diegetic? Or has the entire three-dimensional space become diegetic, the listener positioned within a story-world that surrounds rather than displays itself?
Prestige cinema has begun to exploit this ambiguity. Spatial audio design no longer simply supports the visual narrative; it constitutes the diegetic space. A horror film shot in spatial audio can create an environment where unseen presences lurk in the acoustic periphery, where the listener's body becomes the threatened center of a hostile acoustic space. A music film can dissolve the boundary between the score and the diegetic music-making, creating an immersive auditory environment where the listener is unsure whether they are hearing a performance or an emotional commentary on performance.
The Subject as Acoustic Origin Point
In classical cinema, the viewer occupies a specific spatial position: seated before a screen, sounds emanating from speakers positioned in prescribed locations (front center, left, right, possibly rear). The acoustic space is organized around the cinema itself, the viewing apparatus. But spatial audio, particularly in formats like Dolby Atmos, places the acoustic space around the listener's body. Sounds can come from above, below, from impossible angles. The listener becomes, in effect, the center point from which the entire acoustic field is organized.
This reorganization has profound psychological and phenomenological consequences. The viewer is no longer a distant observer of a sound-space positioned in front of them. They are inside the acoustic space, their body the implicit reference point for every sound's spatial positioning. This creates what we might call acoustic embodiment—the listener's recognition that their own body is the origin point of the auditory field.
Drawing on Lacan, we can theorize this as a species of the gaze. The gaze, in Lacanian psychoanalysis, represents the moment when the subject recognizes that they are seen—when they discover their own objecthood within the symbolic order. Lacan's famous example involves encountering a tin can in a field; the tin can, glinting in the light, seems to look back at the subject, disrupting their imaginary identification with a unified perspective. The gaze is this moment of rupture, where the subject discovers themselves as an object.
In spatial audio, an analogous rupture occurs in the acoustic register. The listener, positioned at the center of a three-dimensional sound field, discovers themselves heard. Sound arrives from behind their head, from positions they cannot see, from impossibly distant points. This auditory presence, surrounding and seemingly looking through the listener, positions them as an object within an acoustic space they cannot fully master or comprehend. The gaze, traditionally visual, becomes auditory.
Non-Diegetic Intervention Reconceived: The Authority of Omni-Directional Sound
Classical film music theory understood the orchestral score as a non-diegetic intervention—a voice of the filmmaker addressing the viewer, guiding emotional response, commenting on the narrative. The score occupied a position of authority; it spoke to the viewer from a position of quasi-omniscience, addressing them from a space outside the diegesis.
But when the score becomes spatially distributed, when orchestral sound arrives from above, behind, from impossible angles, the nature of this authority transforms. The score is no longer speaking to the viewer from an external vantage point. It is surrounding the viewer, positioning them within an acoustic space that is simultaneously diegetic and non-diegetic, simultaneously narrative commentary and immersive environment.
This creates what we might call immersive non-diegesis—a mode where commentary, emotion, and authorial intention are inscribed directly into the three-dimensional acoustic space surrounding the listener. The viewer is not being addressed by the score; they are being positioned within it. They discover themselves as an object within an acoustic field designed to evoke specific emotional and narrative responses.
Consider a prestige drama where a character experiences profound emotional rupture. The orchestral score, distributed in spatial audio, can literalize this rupture by surrounding the listener with sound, by positioning them within an acoustic space that mirrors the character's psychological disintegration. The score is simultaneously non-diegetic (it does not emanate from within the story-world) and radically diegetic (it constitutes the very space in which the listener is positioned). The listener experiences the character's emotional state not through distant observation but through acoustic embodiment.
Temporal Distortion and Acoustic Phenomenology
Spatial audio introduces new possibilities for temporal manipulation in the auditory register. Classical cinema depended on montage and editing in the visual register to compress or dilate time. But spatial audio can create temporal distortions that operate independently from visual editing.
A sound positioned far above and behind the listener's head creates an acoustic perspective that suggests distance and temporal remove. Sounds layered at different spatial positions can create an effect of temporal simultaneity, as if multiple moments in time are occurring at once, each occupying a distinct acoustic position. The listener experiences temporal complexity not through the structure of editing but through the architecture of three-dimensional sound space.
This opens new possibilities for narrative temporality. A filmmaker can represent memory, anticipation, or psychological fragmentation not through flashback and flash-forward but through spatial audio design. The listener discovers themselves in multiple temporal moments simultaneously, each positioned at a different point in the acoustic space surrounding their body.
The Parallax of Sound: Perspective in the Acoustic Dimension
In visual cinema, parallax—the apparent shift in an object's position depending on the viewer's vantage point—is used to create depth and spatial dimension. Foreground objects shift position more rapidly than background objects as the camera moves, creating an intuitive sense of three-dimensional space. But parallax has traditionally been a visual phenomenon.
Spatial audio introduces acoustic parallax. Sounds positioned at different distances from the listener will appear to shift their relative spatial positions as the listener's head moves (particularly in formats that track head position). This creates a profound sense of three-dimensional acoustic space, of sounds occupying fixed positions in three-dimensional space rather than emanating from flat speaker planes.
This acoustic parallax has remarkable consequences. The listener develops an intuitive sense of acoustic space that mirrors the visual sense of three-dimensional space. They can localize sounds, predict how the acoustic field will shift with movement, understand the spatial architecture of the sound environment. This transforms the acoustic dimension from accompaniment to a fully realized spatial dimension, equal to the visual in terms of spatial complexity and cognitive engagement.
Ontological Implications: The Immersive Continuum
The traditional distinction between diegetic and non-diegetic sound depended on a clear boundary: sounds that belonged to the story-world versus sounds that addressed the viewer from outside the story-world. But spatial audio collapses this boundary. Every sound, whether originating within the narrative space or from authorial intention, becomes part of a unified immersive acoustic environment. The listener is positioned within what we might call the immersive continuum—an acoustic space that is simultaneously story-world and commentary, simultaneously diegetic and non-diegetic.
This has profound implications for how we understand the relationship between viewer and narrative. In classical cinema, the viewer occupied a position of relative stability. They watched the story unfold before them, heard the score commenting from a position of external authority. But in spatial audio, the viewer is destabilized, positioned within an acoustic space that surrounds and potentially threatens their sense of centered perspective.
The auditory gaze—that moment when the listener discovers themselves heard, positioned within an acoustic space that seems to look through them—represents a fundamental disruption of the secure viewing position. The listener is no longer a distant observer but an immersed participant, their body the implicit reference point for every auditory event. The distinction between diegetic sound and commentary dissolves into a unified acoustic field, and the listener discovers themselves as the subject upon which this field converges.
In this sense, spatial audio represents a genuine ontological shift in how cinema can position its viewer-listener. No longer addressed from without, the listener is surrounded, encompassed, positioned as the object through which the entire auditory field is organized. The gaze, in this context, is decidedly auditory.