Zen in the art of sound engineering

Carlo A. Nardi

Università di Trento

If we consider the wide diffusion of visually interfaced music-production workstations, it seems hard to believe that most sound engineers still accomplish their tasks mainly by means of a thorough refinement of hearing. Starting from a conception in which senses and their interaction are culturally constructed, hearing itself is less a residue of a stereotyped oral culture, therefore implying immediacy and involvement, than a real aural perspective: techniques based on hearing allow a detachment between the hearing subject and the object heard, assuring judgments based on a method that resembles that of scientific experiment. Anyway, contrarily to natural sciences, sound engineering is acquired largely through practice, due to the difficulty to express its competences in words. Far from being a motive for discredit, in a world in which the dominant knowledge is often mediated by the written word, the body techniques thus learnt permit irreducible ways of approaching the world and the body itself. Moreover, in order to work properly, hearing needs to feel at ease, the whole body alert, attentive, thanks to being placed in a protected environment. These considerations suggest a comparison between Zen and sound engineering, as both are learnt through practice and encompass a particular setting of the sensory environment, enhancing full-bodied ways of understanding which are commonly neglected in the West.

Techniques of listening articulated listening to reason and ration-ality […]; listening became a discrete skill (Jonathan Sterne, The Audible Past)

Just as one uses a burning candle to light others with, so the teacher transfers the spirit of the right art from heart to heart, that it may be illumined (Eugen Herrigel, Zen in der Kunst des Bogenschiessens)

The essential nature of the rock experience does not consist of decoding the music as a structure of meaning but rather in being able to place one’s own significance on the sensuous experience which it provides. Thus music is performed according to an aesthetic of sensuousness” (Peter Wicke, Rockmusik: zur Ästhetik und Soziologie eines Massenmediums)


If we consider the centrality in recording studios of music-production tools with a graphic interface, it seems hard to believe that most sound engineers still accomplish their tasks mainly by means of a thorough refinement of hearing. With this paper I intend to present a kind of practice in the recording studio which falsifies certain diffused statements about both technology and sensing. More precisely, I will show how studio equipment, rather than merely substituting for human func-tions, engages the user in a mediated activity which requires a special training of the senses. Moreover I will contrast reductionist definitions of technique, which on their turn are often based on determinist, procedural and sight-centred interpretations of technology.
It is especially important to avoid the occurrence that verbal language would impose its own structuring schemes on musical practices. More in general, an exclusive focus on written word and mathematical language, with regard to the definition, storing, codification and expression of knowl-edge, would lead to a hypertrophy of abstractive competence. Concerning this, many authors – among the others, David Howes and Constance Classen  – advocate the acknowledgement of the “multiple ways in which culture mediates sensation (and sensation mediates culture)” (Howes 2005, p. ix), in order to “recover a full-bodied understanding of culture and experience” (ibidem, p. 1). In fact every kind of knowledge, even the more abstract, can’t be learnt out of the senses.

Techniques of the body

This said, extending Marcel Mauss pioneer conceptualization (Mauss 1936), a technique of the body can be defined as a more or less codified complex of rules and procedures, which is accepted by a community and which is transmitted or transmissible through training; a technique is aimed at performing a determined and recurrent intellectual and/or physical activity and can be de-scribed as a combination of the following aspects:

1)    the goal, the aim which it allows to reach, its function;
2)    the process of transformation of the materials employed;
3)    the instruments adopted to obtain this transformation;
4)    the corporeal movements required in order to use the instruments, to operate on the ma-terial, to emit the proper signs (being them aural, visual, tactile, kinaesthetic, etc.);
5)    the material or symbolic configuration of the product.

    A technique, within a cultural framework, is learnt and employed in order to respond to a determined environmental or internal situation and according to a goal. For instance, the capacity to recognize and use the signs that characterize a style within a certain music community, or the ability to obtain a ‘well-balanced’ mix, can be interpreted in this sense. This knowledge is partly conscious and partly not, partly codified and partly not, partly verbalized and partly not. In effect, it is not necessary that all the aspects of a technique are always or at all recognized or recalled; conversely, this eventuality could go as far as to inhibit the performance. Finally, I don’t want to suggest a static definition of technique: rather, the latter needs to allow a certain degree of flexibility, since the fac-tors involved are mutually changing – tastes, equipment, market, etc.

    Needless to say, the functioning of the human body in general, and of the sensorium in par-ticular, is not determined once for all. Nevertheless, “the history of the senses has been, essentially, the history of their objectification” (Mazzio 2003, p. 159). The same definition of sound has been objectified. At any rate, hearing should be conceived less as a perceptive faculty measurable in decibel-per-frequency, than as a sensorial ability acquired within a culturally informed context. De-claring that sound is just a wave with a determined frequency within the audible range, is a simplification which doesn’t take into account the meaning assigned to sound and hearing themselves in their socio-cultural elaboration. Jonathan Sterne makes himself very clear with regard to this point: “Sound is a very particular perception of vibrations. You can take the sound out of the human, but you can take the human out of the sound only through an exercise in imagination. Sounds are defined as that class of vibrations perceived – and, in a more exact sense, sympathetically produced – by the functioning ear when they travel through a medium that can convey changes in pressure (such as air). […] The hearing of the sound is what makes it. My point is that human beings reside at the centre of any meaningful definition of sound” (Sterne 2003, p. 11).

Early recording technology: displaying sound

    Nevertheless sound is not just a matter of hearing, as the history of recording technology demonstrates. More in particular, Western ranking of the senses, which places sight at the top and hearing trailing behind, tends to reproduce dominant knowledge in visual terms and through visual means. Enlightenment and the development of modern science place an especial emphasis on seeing: the textual organization of knowledge, the link between rationalism and seeing – I believe it when I see it –, the growing relevance of pieces of technology like the microscope and the telescope, the construction of the definition of the human body through its dissection, are just some examples of the upper status of sight in the West. I can mention the Latin dictum verba volant scripta manent, whose sense has been reversed: in the past it used to mean that the written word is sterile and dead as a stone, while the spoken word can spread its wings and fly; nowadays the spoken word – that is, the oral/aural – is regarded as ephemeral, while the written word – the written/visual – is considered stable and substantial (see Borges 1978). This is in tune with the founding role we assign to books, charts, photographs, designer labels, fingerprints, etc. in the definition of reality.
    Not surprisingly, some of the first machines for capturing sound were aimed just at obtaining a visual record of the latter. It is renown that in 1857 Leon Scott, a French amateur scientist, invented an instrument to make a visual tracing of sound vibrations. This tool was in-tended to replace stenography with another form of writing – something analogous to contemporary speech-recognition programs. He called this instrument phonautograph, whose etymology is also revealing. At any rate the aim of visualizing sound was even prior to Scott’s work. The visual representation of sound in wave forms (subject which I will consider in depth later) finds an ancestor in Samuel Morland’s Tuba Stentoro-Phonica, dating back to the third quarter of the seventeenth century. In a book entitled Tuba-Stentoro-Phonica: An Instrument of Excellent Use, as Well at Sea, as at Land; Invented and Variously Experimented in the Year 1670 and Humbly Presented to the King’s Most Excellent Majesty Charles II in the Year 1671 , Morland describes his theory of the conduct of sound, which he compares to the “circular Undulations of a vessel of water” produced through the percussion of its surface. The main difference is that sound spreads in the air, so it moves in spherical circles. He represented graphically this idea, that is he was able to conceive that sound functioned as waves, spreading in rays from the source, and being echoed as it encountered opposition.
    Another example of a machine which turned sound into image was Rudolph König’s ma-nometric flame, a tympanic device consisting of a speaking trumpet with a tube that led to the so called manometric capsule. In the latter, a speaking tube and diaphragm are fastened to a gas pipe; vibrations in the diaphragm vibrate the gas, and create different patterns. Revolving mirrors can be used to enhance the effect. Like the phonautograph, the purpose here is to make speech visible (Bruce 1973, p. 111). The weakness of this machine is that it doesn’t provide a durable record of the visual account.
    Considering the phonograph itself, its connotation as an instrument for writing was if not blatant at least ambiguous, just considering the name of the devices. In fact, even when phonographs and its similes started to be commonly considered as devices to store and play back recorded sound, the emphasis on writing (and reading) continued to exercise a strong influence. It is meaningful that the idea of recording as photography exerted an ongoing influence. Edison himself, writes Michael Chanan, “spoke of ‘phonographing a sound’, on linguistic model of ‘photographing a scene’, and for decades people thought of sound recording as a kind of sound photography” (Chanan 1995, p. 137). Edward Wheeler Scripture (nomen omen), a well known speech scientist who wrote several books and articles on different aspects of speech pathology was concerned with replacing arbitrary writing with a more reliable sound trace that would capture the voice in its actuality and richness. Also Theodor W. Adorno, some decades later, would express his belief that people could eventually be trained to read the grooves in a record in an analogue way as musicians could read a score .
    Still, the authoritative text remained the written one, as shown for instance by the use of the phonograph in ethnological studies: “Most early scholarly recordings were made largely for the purpose of later transcription. […] This was because it was the transcriptions that were considered the primary analytical basis for work in folklore or anthropology, not the recordings themselves” (Sterne 2003, p. 325). Hence ethnomusicologists regarded field recording as a temporary step to obtain the very ethnographic trace to be studied, the written score. Anyway, it should be remarked that there was a justified concern for the vulnerability of audio supports.

    The theoretical background of these projects, inventions and theories goes hand in hand with the foundation and consolidation of such disciplines as acoustics and otology, according to which “sound became a waveform whose source was essentially irrelevant; hearing became a mechanical function that could be isolated and abstracted from the other senses and the human body itself” (Sterne 2003, p. 23).
    Ernst Chladni, considered the founder of modern acoustics, was among the first to experiment with visual traces of sound. His homonymous sand figures on glass provided a visual evidence for the sonic events, in order to be able to analyse them through the common means of science, or through similar ones purposely conceived within acoustics. In short, his work “embodies this connection between objectification, visualization, and the reversal of the general and the specific in theories of sound” (Sterne 2003, pp. 43-44). The sovereignty of the scientific paradigm is still so strong that very often, as aforementioned, sound is identified with its acoustic representation tout court. The way music software is conceived – I apologize for the sudden temporal gap – seems to confirm this issue.

    The organization of the sensorium in the West has been assumed to a certain extent as revealing of an allegedly universal character of each singular sensorial faculty. In particular, it is believed that seeing is analytical and reflective and needs distancing, while hearing is synthetic and immediate, merging the subject with the environment. However, the traits of this dichotomy are representative of Western ideological beliefs about the sensorium; this means that what Jonathan Sterne calls “audiovisual litany” symbolizes the mainstream discourse on our sensorial abilities, but doesn’t explain their practices (Sterne 2003).

Figure 1

Practice of theory

Now I will illustrate how sound engineering reveals the biases implied by this dichotomiza-tion, drawing from a field research I conducted in various recording studios in Italy and Germany. In particular, the juxtaposition of Zen and sound engineering was suggested during a visit at Calyx Mastering studio, where Zen is habitually practised . At first I thought it was simply a curious de-tail; then I realized that the association was all but a coincidence.
Immersed in the quietness of Viktoria Park in Kreutzberg, Berlin, Calyx is quite popular in and out of Germany, having developed masters for Tosca, Jazzanova, Nuspirit Helsinki and others. The studio is run by Bo Kondren, known in German electronic music community as ‘das Ohr’, the Ear. Notwithstanding this fame, he and his assistant Henner Gerdes have been so kind to book three hours of their service for a very reasonable fee, and most of all to explain me meticulously all the phases that lead from a mix to a master .
In order to develop what I call, inverting Bourdieu’s idea, a ‘practice of theory’, the master-ing process at Calyx concerned the mix of a song of mine . My aim was to take part in the process, not just as a researcher but also with an active role in music-making; furthermore I could rely on a high familiarity with the musical material thus manipulated. Only this way I could expect to get an internal understanding of a process which cannot be grasped by mere verbal reflection or by classic visual observation.

Hearing perspectives

Starting from a conception in which senses and their interaction are culturally constructed, hearing itself is less a residue of a stereotyped oral culture, therefore implying immediacy and in-volvement, than a real aural perspective: techniques based on hearing allow instead a detachment between the hearing subject and the object heard, assuring judgments based on a method that re-sembles that of natural sciences. Talking about aural perspectives means placing the hearing person in a certain, calculated position of both symbolic and physical distance from the sounding objects, and within the resounding environment. Paraphrasing Longman Dictionary’s definition of the term ‘perspective’, from this detached positioning the subject is able to “judge in a sensible way”, “com-pare situations”, supply the perception of sound with “distance and depth”, construct her own knowledge according to the way hearing is structured. The aim is to reach an objective point of ‘view’ through the application of determined techniques of hearing, in a way that, in a mediated set-ting, can be traced back to the employ of such devices as the telegraph, the stethoscope and even the phonograph, whose use entailed the development of a capacity to discriminate between a signal and background noise.
One premise of this approach, sustained by the objectification provided by recording, is to take sound apart into its meaningful constituents and treat them individually, in a sort of musical chemistry: sonic items are parcelled, disassembled, analysed, measured and subsequently reassembled during multi-tracking, sampling, looping, and so on. In general, it is about controlling some parameters while changing others, in order to check the consequences of the intervention. Here we recognize the same procedure conducted in a scientific laboratory, with experimental and control variables. If the elaborate metering of the graphic interface portrays an instrument of precise and scientific measurement, techniques like that of dynamic equalizing provide what I call an aural evidence - that is, another instance of the scientific method applied to music production.

Then we should consider those techniques which guarantee a necessary detachment from both the characteristics of the sound system (speakers, amplifier, sound card, etc.) and those of the environment (walls reflections, etc.). In fact the latter is a medium in itself: the sound engineer needs to carry out determined artifices in order to filter the environmental ‘intrusion’ out of sound, or to control it effectively, by means of the application of techniques of hearing and a proper setting of the environment. At any rate, this result can be obtained only through a process of abstraction, as it is hard to imagine sound out of place.

Summing up, it seems that sound engineers place more reliance on their hearing than on their sight. Of course, this is an educated faculty of hearing. On the other hand, technology is employed mostly in order to isolate different sonic parameters and treat them separately (e.g. intervening on certain frequencies, filtering, etc.).


Now I want to point at some aspects of Zen which are relevant for the purposes of this dis-sertation. Zen is a Buddhist current, founded by Bodhidharma in China in 527 B.C.; in 1191 Ei-Sai introduced it in Japan, where Zen took a diverse and specific character. Even though aim of Zen is surpassing the contrast between the particular and the whole, between the self and the other, this aim can be accomplished only by means of an attentive and responsive interplay between the senses and the world – a world carefully ‘prepared’, as shown by traditional exercises like archery, tea ceremony, ink painting, theatre, flower arrangement and swords¬manship. In other words, going be-yond the deceptiveness of the world doesn’t mean staying away from the senses; rather, it is impor-tant that they are put in an appropriate condition:

“When followers of Zen fail to go beyond a world of their senses and thoughts, all their doings and movements are of no significance. But when the senses and thoughts are annihilated, all the passages to the Mind are blocked and no entrance then becomes possible. The original Mind is to be recognized along with the working of the senses and thoughts, only it does not belong to them, nor is it independent of them” (from Huang-Po's Sermon, Denshin Hoyo, Treatise On The Essentials. The Transmission Of Mind, in Suzuki 1935, p. 62, Engl. ed.).

In Zen the proper state of the body is achieved through a ritual, with a meticulous repetition of actions and a mimetic relationship between the sensing subject and the world . The ritual is a way both to reach and to express a kind of being which verbal knowledge could hardly vehicle. As a matter of fact Zen can be acquired only through direct experience. This also means that it can’t be taught through, expressed in, reduced into verbal or written language. Dorinne Kondo, about the tea ceremony, writes: “The essence of tea and of Zen is said to elude logical, discursive analysis. Zen favours experience and intuition over intellection, and although the tea ceremony has given rise to a long tradition of scholarly exegesis, the Zen arts continue to emphasize the primacy of transcendence through alogical, non-verbal means” (Kondo 1983, p. 287). Substance itself is revealed intuitively, which is not the same as by sheer chance: techniques are fundamental, but they must be embodied first, through a severe and long apprenticeship. Hence the practice of Zen gives life to a different and valuable form of knowledge; moreover, there is no need to conceptualize this understanding, let alone before experiencing it (see Herrigel 1948).

…and sound engineering

Back to the recording studio, we can see as the brief sketch of Zen herein traced helps to make sense at least of three features of sound engineering: education through practice, incorporation of automatic techniques and careful setting of the sensory environment .
In fact, contrarily to natural sciences, sound engineering is acquired largely through appren-ticeship in a studio, also due to the difficulty to express its competences in words. Far from being a reason for discredit, in a world in which the dominant knowledge is often mediated by the written word, the body techniques thus learnt permit irreducible ways of approaching the world and the body itself.
Certain choices in sound engineering have to be taken immediately, without too much think-ing. Still, the detachment from consciousness is valuable only after a thorough training. Once a competence has been introjected, the body can answer automatically to a certain situation. As a consequence, the ability to judge can be reduced by an excess of consciousness .
Moreover, in order to work properly, hearing needs to feel at ease, the whole body needs to be alert, attentive, as far as it is placed in an accurately prepared environment. Sound engineering demands to get rid of such deceiving factors, as the mediation of environment and equipment, in or-der to let hearing achieve its full potential. This includes also setting the right conditions for the body: the instability of hearing is in fact one of the main problems, when attempting to obtain a constancy in judging acoustic events. For instance, listening in repetition to the same piece reveals the instability of hearing, which would eventually bring to a sort of satiety of the latter.

Another aspect of the detachment, implied by assuming a determined hearing technique, oc-curs when a music-maker places herself into someone else’s shoes, that is when she hears what and how somebody else will hear a piece. Here the term “technique” means that a subject detaches her-self from her normal sensorial activity, aiming at matching someone else’s modality of making sense of the ‘noisy ball’ (Drobnick 2004). For instance, Antoine Hennion has stated in several cir-cumstances that main competence of a record production team is that of being able to represent the audience’s way of hearing, for which he mentions a real “dictatorship of the public” (Hennion 1983, p. 203). Furthermore, we can register that one of the concerns of sound engineers, in particular when working with renowned performers, is comprehending the latter’s modality of hearing and be-ing able to reproduce a realistic copy of it on the recorded track. Though this disentanglement from the self is temporarily and often consciously engaged, its way of functioning can be better described as a form of empathic and immediate way of sensing.


Since I am a chronic Western, I need to find an explanation for the practice of Zen in a re-cording studio. Assumed that basically the experience of the audience will be synthetic more than analytic, the point is that a musical piece, after having been nearly dissected, must be reconstructed into a meaningful, living whole. Consequently, in order to reach an effective communication with the listener, each fragment, which formerly has been treated separately, must be considered in rapport with the final product. In fact, aim of sound engineering is, providing a well balanced, good sounding product  - where the adverb ‘well’ and the adjective ‘good’ are historical and contextual. Therefore, too much analyticity would bring the sound engineers far from her task.
In order to recompose the dichotomy between analytic music-making and synthetic listen-ing, the music-maker will put into practice determined techniques, whose effects are predictable, and parallelly simulate ways of listening similar to those of the audience. In other words, the ana-lytical approach is not enough, and asks for a complementary disposition that will encourage immediacy over reflection, participation over detachment, synthesis over analyticity. Besides, both aspects of this approach are to be properly considered ‘techniques of the body’.
The metaphor of Zen can thus be interpreted within this cultural framework: being learned through practice and encompassing a particular setting of the sensory environment, it enhances full-bodied ways of being which are commonly neglected in the West, establishing a dialectic relation-ship with methods derived from natural sciences.


