An Ecological Approach to Dynamic and Static Camera Framing Techniques

Abstract Media practitioners and scholars have long pointed out that a camera’s angle, movement, and distance are a fundamental part of video communication, as they alter the message in meaningful ways that in turn affect the viewer’s experience. Drawing on J. J. Gibson’s ecological approach to perception, this paper argues that camera effects are not arbitrary, but are rooted in how video interplays with our perceptual and motor systems, rendering it a universally accessible mode of communication. Through a comprehensive review of empirical and experimental studies on camerawork, the ecological paradigm is proposed as a coherent framework capable of explaining seemingly unrelated camera effects, with great potential to drive future research in audiovisual communication.


Introduction
Whether played from a film strip, a DVD, or a streaming service, Scorsese's The Godfather remains The Godfather.What sets one movie apart from another, a YouTube ad, or a TV news clip is not the physical object it is stored on, but the unique way in which it structures light when screened on a two-dimensional surface.Such light patterns are meaningful to the observer due to their striking resemblance with the way light would be -and in fact, typically was-bounced off a real three-dimensional layout (Anderson, 1996;Gibson, 1979;Hochberg & Brooks, 1978).Therefore, understanding the communicative power of video as a form of communication requires a perceptual approach that takes into account how our perceptual system responds to its optical structure.
Starting with H. Munsterberg's psychology of the motion picture (1916), a variety of theoretical approaches within film and communication studies have proposed that video communication capitalizes on preexisting motor, perceptual, and cognitive functions (Anderson, 1996;Bordwell, 2010;Dudai, 2008).In film studies, the so-called "cognitive turn," spearheaded by D. Bordwell and colleagues, marked a shift away from the field's focus on the arbitrary or conventional aspects of culture (Bordwell, 1989;Bordwell & Carroll, 1996).Instead, it introduced a new research paradigm in which cinematic experiences are studied in terms of how they draw on non-filmic capacities (Bordwell, 2010).Camera-related techniques stand out as an interesting object of study within this approach because of their intuitive connection to our unmediated natural experience of seeing.In this regard, media and visual communication scholars have long argued that the communicative power of video derives, at least in part, from the camera acting as a surrogate for the viewer.Drawing on J. Meyrowitz's empirical work on media paraproxemics (1986), P. Messaris argues: "by controlling the viewer's positioning vis-à-vis the characters, objects, or events in an image, including the image sequences of film or television, the image's producer can elicit responses that have been conditioned by the viewer's experience of equivalent interrelationships with real-life people, things, and actions." (Messaris, 1998, p. 73) Among the various takes on this notion, J. D. Anderson's ecological approach to film stands out as the most relevant theoretical effort for the present work.In his seminal book The Reality of Illusion, Anderson (1996) combined Gibson's ecological psychology with neuroscientific and cognitive research of the time to explain several cinematic phenomena.Although he did not specifically address camera framing, he argued that continuity editing techniques such as over-shoulder shots and the 30-degree rule work the way they do by virtue of the viewer adopting the camera's point of view.
Building upon the research initiated by J. D. Anderson (1996), I examine camera framing techniques through the lens of J. J. Gibson's (1979) ecological theory of perception.Media practitioners and media scholars have long pointed out that the angle, movement, and distance of a camera can alter the message in meaningful ways that in turn have an impact on the experience of the viewer.In other words, camerawork is a fundamental part of the semiotic system of video communication, one of the basic units that make up its "language".A growing body of experimental work has started to examine camera-related effects, offering an evidence-based scientific understanding to what the industry has traditionally considered rules of thumb.Adopting an ecological perspective, I argue that camerawork effects are not arbitrary, but are rooted in how video interplays with our perceptual and motor systems, making video a universally accessible mode of communication.I present the ecological paradigm here as a cohesive framework capable of explaining seemingly unrelated camera effects and with great potential to drive future research in audiovisual communication.

Tenets of the ecological paradigm of relevance to video communication
Gibson was skeptical of an understanding of vision as based on information-poor sensory stimulation, a view prevalent in mainstream psychology.Instead, he built his paradigm around the way in which light is lawfully, and therefore meaningfully, structured by the three-dimensional world.Radiant light rays directly emitted from a source, like the light from a lightbulb, lack structure and are poor in information.However, when light bounces back and forth across all the surfaces in a space until reaching a state of equilibrium it becomes ambient light.Unlike radiant, ambient light is information-rich because it is lawfully structured by the physical layout.When ambient light converges into a point in space, it creates an optic array, a concept developed by Gibson to describe a 360-degree sphere where visible surfaces (i.e.not occluded) appear as visual solid angles.An optic array is determined by a point of observation that can be occupied by an eye or a camera, but importantly, optic arrays are part of the environment and not a property of vision.
When movement occurs, the optic array turns into what Gibson called optic flow, a structural change or transformation in the array that can be local or global depending on the nature of the motion.Local optic flow occurs in a section of the optic array when something in the environment moves (e.g. a leaf, an animal), while global optic flow is a transformation of the entire optic array caused by motion of point of observation through space, typically carried out by the observer.Gibson notes that far from being the exception, global flow is the basic form of visual perception in all sighted organisms: "Observation implies movement, that is, locomotion with reference to the rigid environment, because all observers are animals and all animals are mobile."(Gibson, 1979, p.56).Gibson emphasized that a key role of optic flow is to facilitate depth perception, addressing a problem that has been a point of contention in vision research for centuries.The problem of depth perception stems from the fact that the retina is flat, so it inherently provides a two-dimensional input, raising the question of whether the brain must engage in reconstructive processes to enhance the information-poor visual input from the eyes in order to perceive depth.This issue is particularly relevant to video because video displays are also two-dimensional, yet we are able to perceive video scenes as having depth.As the point of observation moves through space, the real distance (i.e.depth) between the observer and the world is lawfully translated into optical velocity, with things closer to the observer transforming in optics faster than things farther away.An example is the experience of looking through the window from a car: the side of the road will fly by, while the faraway mountains will slowly move through our visual field, even if the world around the car is in fact equally static.Thus, as pointed out by Gibson, optic flow avoids the need for the brain to reconstruct depth because it fully specifies depth information through motion, and motion is the natural state of perception.
Photography is the capture by a camera of a section of the optic array at a moment in time, resulting in an arrested array that is synthetic in nature (Anderson, 1996;Gibson, 1979).Video is like photography but it can capture optic flow, that is, local and global transformations of the array over time.Therefore, photographic techniques offer a remarkable approximation to what they would have seen had they occupied that point in space and time.The resemblance is so faithful that photography and video do not require learning or interpretation, their synthetic arrays are as directly accessible to our visual system as natural arrays are.Natural perception entails seeing an optic array that had our eyes at its center, and perception of photographic media entails seeing an optic array displaced in time and space that had the camera at its center.Far from being controversial, this understanding of natural and mediated perception as analogous was held by Gibson himself, who wrote extensively about pictorial perception (Gibson, 1979), as well as by scholars working within the ecological paradigm (Anderson, 1996;Blau, 2019), and by mainstream psychologists who were critical of some of the ecological tenets (e.g.Hochberg & Brooks, 1978).
Besides its focus on the structure of light, the ecological paradigm was among the first psychological theories to explain the way perception and action are intimately coupled.Mainstream psychology held a more passive view of perception, in which afferent signals from the senses were understood as independent of the active function of the motor system's efferent signals.Instead, Gibson insisted that perception and action cannot be understood without one another, because the primary purpose of perception is to direct the organism's action.He highlighted the role of proprioception, which he described as "self-sensitivity", referring to all those processes in which an organism perceives itself in order to guide its active interaction with the environment (Gibson, 1982).The vestibular and kinesthetic senses are a clear example of proprioception, as they track the position and movement of the body itself in order to direct motor action.Perhaps less obvious is the fact that vision is also proprioceptive.Eyes do not passively receive light from the environment, they scan and move in the search of information, and they are mounted on a head-body-system that also actively moves in order to support perception.Seeing things necessarily involves seeing oneself.We see the world from a place, the place where we stand: I know that I am here and not there.When I move, my movements will lawfully determine the structure of the global flow I will see.Our visual system is so elegantly attuned to proprioceptive information that when we walk and see full transformation in the form of global flow, instead of mistaking it as if everything were moving around us, we see ourselves moving.That is proprioception.
In exploring the link between perception and action, Gibson proposed the existence of affordances.I adopt here the definition proposed by Stoffregen et al. (2003), namely affordance as an action disposition that emerges from the relationship between the animal and its environment.An affordance is simultaneously specified by the characteristics of the environment and of the animal, so they are fully specified in the optic array, and not mental constructs.Solid, horizontal surfaces afford translation to animals who locomote but not to those that only swim, while vertical, hard surfaces might afford walking to some animals (e.g. a spider) but not others (e.g. a human).Furthermore, which affordances are more salient in a given moment depends on the context and the goal of the perceiver.To humans, water can afford both drinking and swimming.If I am opening the tap, I will perceive drinkability but if I am jumping into a pool, swimmability will be foregrounded.
Proprioception and affordances are key aspects of how technical manipulations of camerawork interplay with the perceptual system of the viewer.When we see a video, we easily perceive not only surfaces, animals and objects, but also what those afford, as well as proprioceptive information about us, the observer.Young children can distinguish the ground from the sky, or a dog from a cow in a video.With the same ease, they can identify where they are within the screen world, i.e. which things are closer or farther away, and whether they are moving closer or farther away, to the right or left of the physical space defined by the video.All this information is determined by where the camera stands and by how to moves.Similarly, affordances are visually specified in the synthetic optic array of a video just as they are in the natural array, to the extent that observers can perceive affordances not only for themselves but also on behalf of a third person, even when they are presented in video displays (Stoffregen et al., 1999).
In the following sections I examine camera-related techniques in light of the ecological principles introduced.Based on their optical properties, camerawork can result in dynamic techniques, which determine changes in global optic flow, and static techniques which involve manipulations of the arrested optic array.

Dynamic formal aspects of the camera
Camera-yielded optic flow is distinct from the first-hand global flow we experience when we move around in our everyday lives in some ways.As mentioned, global optic flow is information-rich at least in two ways: it specifies the motion of the observer, enabling proprioception, and also specifies the makeup of the spatial layout, enabling depth perception.However, camera-yielded global flow can differ from natural vision because it does not necessarily provide any additional depth information through motion, and because it might specify non-human observers.
Some dynamic techniques of camerawork such as the pan, the tilt, and the zoom, yield global flow without providing additional depth information about the filmic space.A pan or tilt is a horizontal or vertical rotation of the camera around its own axis, typically on a tripod.Pans and tilts create optical change by continuously scanning contiguous sections of a single optic array, that is, without moving the point of observation through space.Similarly, global flow caused by pulling on a zoom lens is an expansion or a contraction of a section of a single optic array.While causing a global transformation of the array, these techniques do not enhance depth perception because they do not entail motion of the point of observation.All points move at the same speed in the image, regardless of the camera distance in real life.Things in the image get bigger or smaller, but we do not perceive ourselves as moving closer or farther away from them.In practical terms, this means that pans, tilts, and zooms emphasize the flatness of the image, so they are not good candidates if the filmmaker's goal is to create a lifelike visual experience for the viewer.This could explain why the use of zoom, so commonplace in 70s and 80s Hollywood, has declined significantly in contemporary cinema.
By contrast, all camera movements that entail real movement through space yield global optic flow that facilitates depth perception despite the two-dimensionality of the image.At least two studies have shown that camera movement enhances spatial perception compared to seeing static shots of the same scene edited together (Garsoffky et al., 2007;Kipper, 1986).Besides depth-perception, dynamic camerawork techniques also vary in how they affect proprioception.Experimental research by Dayan et al. (2018) suggests that there is a tight link at the neurological level between emotion and the visual perception of self-motion on video.After quantifying the amount of global and local optic flow in film clips that were either emotional or neutral, Dayan et al. (2018) used fMRI scanning to map the viewer's neurological response.They found that brain activation in areas dedicated to the perception of motion increased for emotional clips compared to neutral clips, and more interestingly, brain activation for global flow clips correlated with greater activation in emotional processing areas compared to when viewers were presented with local flow only (Dayan et al., 2018).Although future research is needed to establish a causal relationship, this study suggests that the perception of self-motion is emotionally evocative.A plausible explanation is that by facilitating depth perception and proprioception, the events in the movie become more consequential to the observer.
In terms of proprioception, dynamic camerawork techniques can be distinguished according to whether they specify anthropomorphic or non-anthropomorphic observers, something noted by Gibson: "The modes of camera movement that are analogous to the natural movements the head-body system are, in this theory, a first-order guide to the composing of a film.The moving camera, not just the movement in the picture, is the reason for the empathy that grips us in the cinema" (Gibson, 1979, p. 298).
Non-anthropomorphic camera movement is a wide category, but the types most commonly used in filmmaking are mechanical techniques such as the tripod-based discussed above (e.g.pans, tilts), and others with more complicated equipment such as cranes and dollies.Anthropomorphic camera-flow can be achieved with handheld cameras, Steadicams, gimbals, and shoulder-rigs, and because of the different technologies involved, they all specify slightly different types of "humanness".The key to anthropomorphic camera movement is that the dynamics of the operator's movements structure the camera-yielded global flow, so that the viewer can identify walking, running, or breathing events from the global flow.Psychological research using video displays has found that viewers report a greater experience of vection (i.e. the experience of self-motion) when human-like jitter is added to global optic flow compared to smooth, non-human-like global flow (Bubka & Bonato, 2010;Palmisano et al., 2011).This is remarkable considering that the addition of visual motion should in principle increase the conflict between the visual sense, which specifies motion, and the kinesthetic sense, which specifies non-motion for the sitting spectator (Palmisano et al., 2011).A plausible explanation is that, because anthropomorphic camera movement is more analogous to our direct experience of the world, it is more effective in engaging proprioception.This has been supported by experimental research.In two separate experiments, Heimann et al. (2014Heimann et al. ( , 2019) ) found that the human-like motion of a Steadicam elicits greater motor activation at the neurological level compared to either a static perspective or to the mechanical global flow specified by a dolly or tracking shot moving forward.
As a first effort in this direction, I ran a two-part experimental study designed to disentangle the role of proprioception and the perception of depth in camera global flow (Cores-Sarría, 2022).In order to do this, I presented viewers with clips covering three conditions: a static perspective, human-like camera movement specifying breathing events, and fake human-like camera movement (i.e.jitter motion added in postproduction to a static perspective).I found that real human-like movement elicited greater physiological arousal, an index of emotional activation, than either static tripod shots or fake human-like camera movement.Furthermore, the real breathing camera performed equally well than the no-motion condition along a series of engagement measures, while the fake one, lacking depth-yielding global flow, performed the worst (Cores-Sarría, 2022).

Static formal aspects of the camera
While the dynamic aspects of camerawork are unique to video, static formal aspects can be found in all forms of pictorial communication, a broader category that encompasses video and photography, as well as other visual media such as painting, drawing, or computer generated graphics.Like dynamic aspects, static aspects determine depth-related and proprioceptive information in the image.
Since Leonardo Da Vinci's treatise on painting there has been an interest in identifying the sources of information that facilitate depth perception in images.As pointed by Cutting (2007), the basic depth sources are occlusion, height in the visual field, relative size and density, aerial perspective, accommodation, and stereopsis.They have been traditional called depth cues, but this term is shunned by ecological perceptionists because it implicitly highlights a need for reconstruction by the brain from an information-poor stimulus.While Cutting (2007) provides a more detailed description of each of these, it is worth noting that the sources of depth information listed are available in monocular vision and thus in pictorial communication, with the exception of stereopsis, which requires a double point of observation.Furthermore, some, like occlusion and visual height, provide only ordinal information regarding proximity but do not quantify distance.Others, such as relative density (what Gibson called the texture gradient) and relative size provide ratio information, so that the precise distance can be known.For instance, if we see that the same thing is twice as big in one image as in another, it is twice as close to the observer.
In photography and film specifically, the mentioned sources of depth information are determined by what I refer here as the static aspects of camerawork.A distinction can be drawn between static aspects determined by the camera lens, namely focal length and depth of field, and static aspects determined by the physical position of the camera in regards to the scene, namely angle, level, height, and distance (Bordwell et al., 2016).Focal length can be short (less than 35 mm, a wide lens), medium (a normal lens), and long (more than 100 mm, a telephoto lens) (Bordwell et al., 2016), while depth of field, more commonly referred to as focus, can result in either shallow or deep focus.Things in the world yield different sizes and lengths in the image depending on the focal length, and these distortions affect the perception of the images' spatial layout.Wide lenses exaggerate perceived distance, while telephoto lenses tend to minimize distance between objects and flatten the image.Accommodation in photographic media is regulated by the lens's focus and it determines depth of field.In shallow focus, only things at a specific distance appear as sharp, while things that are closer and farther away from that point appear as fuzzy or out of focus.In deep focus, most or all things in the image look sharp regardless of their distance with the camera.Thus, shallow focus magnifies depth differences more than deep focus, while things in deep focus tend to appear as closer to each other and flatter.To the best of my knowledge no experimental study has looked at the effects of focal length and depth of field, but it would be easy and enlightening to design experiments where these aspects are manipulated in order to test how they affect the response of the viewer to the things in the image.
The positional dimensions of angle, distance, height, and level provide depth information by determining what is occluded, what appears as closer and further away, the height of objects in the visual field, and so on.Camera position also determines the proprioceptive information available in the image by determining where we, as observers, stand in relation to the scene.For angle, height, and level, the ground is taken as the reference point.Angle and tilt describe the vertical and horizontal angles to the ground, respectively, while height describes distance from the ground regardless of angle.Conversely, camera distance, often referred to as shot type, takes the human body as a point of reference.For example, a medium shot frames the body from the middle of the chest and up, a medium long shot from the knees up, and a long shot typically includes the feet (Bordwell et al., 2016).
Empirical research on camera angle has found that low angles (i.e.camera points upward) make the subject appear generally more powerful or in a more positive light, while high angles (i.e.camera points downward) have the opposite effect, and straight-on angles are neutral (Giessner et al., 2011;Hoffman et al., 2023;Kraft, 1987;Mandell & Shaw, 1973;Meyers-Levy & Peracchio, 1992;Sevenants & d'Ydewalle, 2006).Empirical research on camera distance suggests that closer framing leads to more polarized evaluations of subjects, for instance by making televised uncivil behavior in politicians more apparent and provoking greater emotional arousal in the viewer (Mutz, 2007), or by increasing emotions of negative valence in film scenes (Benini et al., 2022).In a series of studies including experimental methods and content analyses, Bálint and colleagues have shown that closer framing tends to facilitate mental attribution, theory of mind engagement and general empathy toward the character framed from up close, compared to longer framing (Bálint et al., 2020;Bálint & Rooney, 2019;Lankhuizen et al., 2022).Some media and communication scholars have suggested that camera distance functions as a measure of interpersonal distance within the screen, what Meyrowitz (1986) and Messaris (1998) call paraproxemics.Research on interpersonal distance in real-life scenarios has found that skin conductance, which is a physiological indicator of emotional arousal, increases when a confederate increased proximity with the subject (Candini et al., 2021).
Taken together, the camera angle and distance literature support the notion that specific perspectives can polarize the response of the viewer, often in terms of emotion.For instance, my colleagues and I analyzed both camera distance and angle in a content analysis of a large dataset of pictures previously evaluated for emotional valence and arousal.We found that pictures with highly positive or negative scenes (e.g.sexual or threatening) resulted in the least intense emotional response when framed from a longer shot or a straight angle (Cores-Sarría et al., 2021).
Taking an ecological perspective, I suggest that camera position is meaningful because it regulates interpersonal distance, as other scholars have suggested, as well as distance from other objects and surfaces in the scene.In other words, positional aspects of camerawork are meaningful because they determine proprioception and the available affordances.Positioning the observer closer to an emotional stimulus should result in a stronger emotional response if proximity affects the physical affordances available.Importantly, the primary function of emotion is to guide the organism toward things in the environment that requires action (Bradley, 2000).For example, it should not elicit the same response in a viewer to see the movie's monster from a low angle and in a medium shot or close-up as to see it in a long shot and from a high angle.The first perspective positions the viewer below the source of danger and at arm's length, which would easily afford aggression.Similarly, seeing a sharp knife or a sexy naked person from up close foregrounds avoid or approach affordances in a way that a longer distance would not.This may explain why fight and love scenes are rarely filmed in long shots, even though a wider and farther shot would show the full motion of choreographed body movements better than the tight frame of a close-up.

Camera framing as a firsthand visual experience
The arguments presented here are based on the assumption that camera framing places the viewer at a point within the spatial layout specified by the video.In other words, the power of camera framing stems from it creating a firsthand, egocentric visual experience.However, this argument raises some questions.Viewers bring their own real world, egocentric perspective, so watching a video involves simultaneously perceiving two different points of observation: the one here, in my living room or the cinema, and the one there, within the film's space.Furthermore, the cinematic point of observation violates the coupling between perception and action in that we cannot act and modify the events specified in the video world.This has led some perceptionists to propose that cinematic perception should be understood as an allocentric perceptual experience, where we perceive the world from someone else's point of view (Blau, 2019).However, the optical structure captured and reproduced by a camera unmistakably specifies a point of observation, and our visual system evolved in a pre-pictorial context where we invariably occupied the point of observation specified by the optic array.Because images reproduce an optic array taken from a specific point of view, our visual experience of looking at an image must be egocentric.
To address the apparent conflict arising from a double point of observation, I propose that video viewing involves a delicate back and forth between the viewer's perspective and that of the image.To immerse oneself in a cinematic experience requires adopting the camera's point of view at the expense of ignoring our immediate surroundings.This can be facilitated by formal techniques like the ones discussed in this paper, namely using camera framing aspects to enhance three-dimensionality, proprioception, and the foregrounding of affordances that are relevant to the narrative.Furthermore, it can also be potentiated by blocking visual information from the viewer's point of view (e.g.dark cinema room or enclosed virtual reality goggles).Just as some techniques are immersive, other techniques can have the opposite effect, replacing the awareness of being in the cinematic world with an awareness of being in the real world.For instance, some esthetic experiments make the camera actively adopt the point of view of a specific character within the scene.When the camera specifies an observer who is clearly identified as someone other than ourselves, the experience becomes allocentric rather than egocentric.By pulling the perceptual focus back to the viewer's own point of view rather than that of the camera, the resulting experience tends to be non-immersive and even uncomfortable.
Drawing from an ecological perspective, I have proposed different ways in which dynamic and static aspects of camera framing affect depth perception, proprioception, and the perception of affordances in the context of video viewing.This information may be useful to media practitioners seeking to achieve specific effects, as well as to scholars seeking to understand the mechanisms underlying such effects.Depth perception is key to video as a medium of communication because images are flat surfaces, and as such, do not have depth in themselves, so there is great value in identifying techniques that facilitate information about the three-dimensional layout of the scene.As pointed out by Gibson, there is no perception without proprioception, and this holds true also in pictorial communication.Through proprioception, the filmmaker can highlight specific affordances, creating a more immersive experience where the perceptual focus is drawn to the point of observation within the video.I suggest that, because the ultimate purpose of emotion is to inform and drive behavior, camera framing might influence the emotional response of the viewer by modulating which affordances are available and foregrounded in the cinematic scene.
The communicative power of camera framing stems from the way it enables firsthand perception within the world visually specified by the video message.In Gibson's own words: "There are metaphors to describe the powerful experience aroused by the picture that locates the observer in a virtual environment: one is taken out of oneself, one is transported; one is set down in a far place.[…] What is induced in these pictures is not an illusion of reality but an awareness of being in the world.This is no illusion.It is a legitimate goal of depiction, if not the only one" (Gibson, 1979, p. 284)

Disclosure statement
No potential conflict of interest was reported by the author.