In a bittersweet combination of personal achievement and global emergency, I moved to Norway and started working on a new research project right when the COVID-19 epidemic started sweeping across the world. When I was settling in my new office and sketching out the next three years of academic careering, friends and relatives were quarantined in Wuhan or bracing for lockdown in other parts of the world. And as I’m writing this post, I’ve moved back to my university residence right before the WHO declared the virus a pandemic and my department building was locked for a few weeks of state-enforced social distancing. Regardless, the project I’m excited to be part of is about machine vision, and I’ll be looking into how this technology is deployed and utilized in Chinese everyday life, which seems to be slowly getting back to a degree of normalcy. In the meantime, China has been using machine vision to look into itself and try to track down the virus as it spread through its citizenry.
Following the outbreak through social media and news headlines, I noticed the diverse roles that machine vision has been playing in this situation, and followed how this technology is embroiled in variegated discussions ranging from the widespread distrust of authoritarian surveillance and the enrolling of homegrown tech industries into state propaganda to enthusiastic paeans to datafication and digital governance. More than a centralized attempt of the government to enforce a degree of legibility onto its citizens – what James Scott would call “seeing like a state” – China’s response to the COVID-19 epidemic seems to be benefiting from a new kind of sensing that is enabled by digital technologies deployed as an integral part of a state of exception. In Benjamin Bratton’s words, this is an important phenomenon because
[t]he optical positions of a state—how it sees the world and its constituents and how its citizens see themselves reflected through the ambient qualitative commons—might bear all the benefits and bankruptcies of earlier forms of communicative reason. (2015, p. 121)
But how did China’s “optical position” vis-à-vis the COVID-19 outbreak consolidate?
In the early days of the contagion, once the (by now disputed) origin of COVID-19 was pinpointed to the Huanan Seafood Wholesale Market in Wuhan, the virus was mostly tracked by concerned citizens via vernacular forms of coveillance: people would take photos of license plates and share them on social media, warning each other about the suspect presence of Wuhan residents in their town or district. In the absence of official measures or recommendations, all that people had at their disposal were mobile cameras, social networks, and the imperfect geolocational information provided by car plates. This strategy was quickly adopted by local authorities, that started tracking the movement of drivers across provinces; partial reports would circulate as copy-pasted news or screenshots, stoking a climate of ambient fear about Wuhanese residents. For example, one WeChat comment alleged that data from the Shenzhen Public Security System and the Center for Disease Control identified 60,000 people entering Shenzhen from Wuhan, worrying the local government and prompting a warning for the local population to stay put and prepare to spend the Spring Festival at home.
Tech companies quickly reoriented their platforms towards this sort of cross-regional surveillance, drawing on intersecting streams of data: as another WeChat message warns, Alibaba had supposedly tracked 6,000 Wuhan residents conducting Alipay transactions in Shanghai over a single day, hinting at a massive exodus from the city and an impending danger for the coastal metropolis. For many, this repurposing of smart city systems and platform urbanism was a foreseeable confirmation of the state’s surveillance capabilities. As Liza Lin documents on the Wall Street Journal, tracking the movement of infected or at risk individuals was a readily available capacity of China’s surveillance apparatus, which extends from roads to subways and from smartphones to social media, combining visual identification with geolocation and other sorts of data trails.
Another intersection between COVID-19 and machine vision is more mundane, and revolves around people’s faces. As the virus spread, citizens resorted to using face masks (for a situated history of this object, see this), which were also made obligatory in the most affected provinces; yet, people quickly realized that wearing masks affected ths convenience and usability of face recognition systems including smartphone locks, community gates and payment terminals. Many noted the uncanny resonance with the proposed ban on face masks during the 2019 Hong Kong protests, which was supposedly motivated by the need to identify protesters; even though pandemic and protest are two radically different states of exception, they spotlight the face as a contested nexus of biopolitical affordances. Regardless, technological fixes often trump convenience in exceptional times: mask-vending machines that use face recognition to identify buyers and ration masks started popping up, and eventually a tech company in Chengdu developed a 3D face recognition system capable of bypassing face masks while also checking a person’s body temperature.
Besides car plates and masked faces, local tech companies have competed to play a central role in facilitating life under quarantine, which conveniently also helped spotlighting their commitment to public service by strengthening governmental responses. Alongside donations, emergency services and information platforms, artificial intelligence has predictably emerged as one of the central contributions of companies like Tencent, Baidu and Alibaba, which have offered their cloud computing platforms and predictive algorithms to researchers for efforts in tracking infection vectors and simulating drug compounds. A broader definition of AI has cemented the impression that this technology is powering China’s response to the epidemic, as exemplified by the many reports marveling at automated health information chatbots, autonomous vehicles used for disinfection and delivery, and a hospital ward “run entirely by robots” in Wuhan.
One specific example of how the use of machine vision technologies is both overblown and contested is offered by drones. As the COVID-19 virus spread beyond Wuhan, some residents of small towns and rural areas uploaded drone footage depicting people engaging in activities outdoors being remotely reprimanded via on-board loudspeakers and prompted to wear face masks and head back home. Given the funny lines dubbed over the footage, it is likely that these videos were mainly produced by minor online celebrities as humorous content to be shared on social media or as public service announcements for local officials; and yet, outside of China they were framed as a dystopian example of authoritarian control in a surveillance state, while local state media appropriated them as examples of national innovation. In practice, drones have been used for a broad range of tasks, including holding informational QR codes over traffic, spraying disinfectant and checking temperatures at a distance. While their actual usefulness in these contexts remains questionable, as David Li notes this sort of experimentation is only natural given the size of China’s drone industry and the ready availability of heavy-duty drones for agricultural use across the Chinese countryside.
QR codes themselves are perhaps the simplest form in which machine vision is utilized across China today, and throughout the outbreak they have been deployed as gateways for a wide variety of purposes – from linking citizens to quarantine declaration forms and cellphone roaming histories to directing digital payments and donations. The most relevant innovation here has been the system launched by Alibaba in Hangzhou that assigns a color-coded QR-code to each citizen, based on a self-declared travel history and health status connected to real-name identification, allowing them to clear checkpoints in public spaces. The green, yellow & red QR codes have subsequently been launched in Shanghai as well, and their integration in Alipay’s ‘City Services’ function showcases the efforts of platforms to become infrastructures. Similarly, ride-hailing apps have also introduced a system requiring users to scan QR codes when they use public transportation or taxis, in order to collect data and share it with health authorities to help track contacts between passengers.
It’s important to note that machine vision also powers screening and diagnostic technologies. Checkpoints set up at airports and stations and building entrances often deploy thermal cameras, and these have reportedly been installed in buses as well, warning drivers about the presence of people with a fever onboard. A short video shared by the Global Times shows thermal cameras embedded in “high-tech smart helmets” worn by police agents in Chengdu, while a robot uses them to check the temperature of passersby in Guangzhou. If temperature is not an entirely helpful metric given COVID-19’s long incubation period, medical imaging procedures are central to its diagnosis, and platform companies like Alibaba and Ping An both claim to have developed automated systems capable of detect coronavirus in CT scans with a higher accuracy than human evaluation in dramatically shorter times. More suspect claims – such as the development of a gait recognition system that could help tracking the virus – also abound, but it seems undeniable that machine learning applied to medical imaging can improve the processing of both citizen flows and medical data.
To answer my initial question: China developed its optical position in confronting the COVID-19 crisis by marshaling preexisting digital infrastructure and the surveillance apparatuses overlaid on top of them, and re-orienting large-scale technical systems (and the private companies behind them) towards the provision of an imperfect hodgepodge of technologies of vision: traffic and thermal cameras, drones and other automated vehicles, smartphones and QR codes, combined and nested in search of useful optics for the evolving state of exception. To be sure, this happened later than it should have, after cover-up attempts and the silencing of concerned medical professionals, with a delay that is costing more and more lives both in China and across the world. And to be even more sure, these technologies have been often tested and showcased in less-impacted cities and regions, while Hubei and its provincial capital Wuhan were struggling with much more dire lack of hospital beds and supplies.
But in general, the widespread use of machine vision in responding to the COVID-19 outbreak – with both the innovation of tried and tested diagnostic technologies and the experimental reconfiguration of large-scale surveillance systems and vernacular usages – raises important questions about machine vision and its often assumed dehumanizing or illiberal potential. Rogier Creemers has sharply identified an ontological shift in China’s digital governance from the model of the panopticon to that of the ‘panspectron’, which aims at making Chinese society “legible and predictable” (2017, p. 89) by intersecting constant streams of ambient data instead of relying on direct surveillance. The question to ask is how much these sorts of optics can contribute the kinds of seeing – at the inhuman sensory scales of viral contagion and global flows – that a state of exception requires to make itself unnecessary; in the case of a pandemic threatening to lock down the entire planet, this should by now be a pressing concern.
Bratton, B. H. (2015). The Stack: On software and sovereignty. MIT Press.
Creemers, R. (2017). Cyber China: Upgrading propaganda, public opinion work and social management for the twenty-first century. Journal of Contemporary China, 26(103), 85–100.
Scott, J. C. (1998). Seeing like a state: How certain schemes to improve the human condition have failed. Yale University Press.
At the end of last year, I had the privilege to be part of the best academic conference ever – full spoilers, it’s called Tuning Speculation, it’s broadly about sound, and its next iteration is happening in Bloomington, Indiana, in November 2018. Back then, I presented a bundle of largely incoherent gripes I have had with field recordings (as genre, practice, and aesthetic) ever since I picked up a dictaphone and started messing around with tapes in 2006. My talk was titled Postnaturalism, antinaturalism, unnaturalism: Pushing phonography beyond its field, and at some point video footage of it will be available online; for its largely improvised script, I drew on the work of some of my favorite sound artists to develop a few thoughts that I had already started outlining in an earlier podcast episode, and this blog post is an overdue attempt at taking stock of my own arguments, since these are still far from being material for a disciplined submission to the peer-review machinery. Feel free to disagree, for I might be wasting a couple thousand words taking down silent strawmen.
My main contention is that the practice of phonography and the genre aesthetics of field recordings have been (and, most troublingly, still are) consistently framed by a naturalist epistemology which, in many cases, reinforces an entrenched naturalist ontology of sound. Mine is a critique, rather than an outright rejection, of sonic naturalism – a comfortable mode of aurality that is too often unthinkingly defaulted to, and that in turn easily solidifies into a transparent ideology. After decades of representational critiques of naturalism in textual and visual media, the aural seems to have largely wiggled away from these accusations, and has hence become the privileged retreat for the naturalist onto-episteme. The reason behind this, I would argue, is that the affective experience of sound, combined with the unfamiliarity (and the resulting mythicization) of listening as a practice, lends the auditory sensorium to be easily construed as domain of an unmediated vibrational truth. I am still working on a convincing articulation of these arguments, and I am still reading up writing by the many others who have already argued similar points; yet, a few weeks ago, I stumbled across an exhibition that forced me to think once again about my favorite representation of sonic naturalism: the lonely high-end microphone, wrapped in a fluffy windscreen, and propped up on its tripod in a field.
The catalyst precipitating this post were a couple of extended visits to KINO-EAR: Audio Document / Audio Documentary, a delightful exhibition curated by sound recordist and composer Yannick Dauby, open to the public until early July 2018 at TheCube Project Space in Taipei. Co-organized by the Taiwan Film Institute and the Taiwan International Documentary Festival, KINO-EAR features a selection of audio works from Taiwan, Japan, Hong Kong, France, UK, the US and Australia, most of which are based on recorded sound. As the curatorial notes highlight, experiencing the KINO-EAR exhibition is a markedly aural affair: “a profusion of sound-woven stories, sceneries and events, and auditory descriptions […] unfold from the listening space and the display of diverse media such as documents and photographs”. While sound is tasked with weaving scenes and narratives, the reliance on a vocabulary of description and documentation to frame it should already raise a flag regarding the main claims (of representational accuracy, immediacy and truth) that are all too effortlessly attached to sound. As I make my way up to the exhibition space, the title of Susi Law’s recording of Hong Kong hawkers installed in the stairwell seems to offer a reasonable advice: “Come and listen even though you are not going to buy it!”
Right at the entrance of KINO-EAR, accessible exhibition paratexts offer a quite coherent discourse about the practice of recording, foregrounding the figure of the recordist – not a great framing if you’re a fan of non-human agencies:
Recordists are people who observe the world with audition rather than vision. Through microphones and headphones, they interpret their sonic environment: the sound recording tools sometimes act as filter, sometimes as an enhancer, revealing details and/or creating a distance. The recorder allows to keep traces of acoustic events, in a non-neutral ways since it recall to us the sounds that have been modified by the position in space and time, the gestures and the equipment of the recordists.
Here the naturalist epistemology is quite evident, and might resonate with overtones familiar to visitors acquainted with anthropological critiques of observation and interpretation. While the recognition of the partial and constructed nature of any kind of representation in a time-based medium such as sound (shaped by acts of filtering, enhancing, cropping, distancing) is a welcome bit of self-reflexivity, the idea that sounds are modified by the non-neutral practice of recording seems to fall back on a naturalist ontology assuming acoustic events to exist somewhere in a neutral, original, natural state.
The sort of sonic naturalism I’m struggling to outline perspires quite strikingly from the five definitions of keywords pinned at the entrance of the exhibition: audio-naturalist, soundscape, acoustemology, sonic journalism, and audio document/audio documentary. Audio naturalists are here described as a subset of recordists who focus on natural sounds, and whose aesthetic approach “depends on the technical quality as well as the ‘purity’ of the recorded environment”. Soundscape, defined in opposition to sound environment, is “a process of listening to the place” that harks back to the phono-hygienic obsessions of R. Murray Schafer. Acoustemology is explained through the words of Steven Feld in terms of acoustic knowning as “the ways in which sound is central to making sense, to knowing, to experiential truth”. A quote by Peter Cusack predicates sonic journalism on the idea that “all sound, including non-speech, gives information about places and events and that careful listening provides valuable insights different from, but complimentary to, visual images and language.” Audio documents and documentaries, finally, are defined by Yannick Dauby himself as more or less constructed narrations through which recordists share their perceptual experiences. The largely interchangeable meanings of these keywords, as they are marshaled by the KINO-EAR exhibition, help boiling them down to a workable definition of what I call sonic naturalism: the recording of soundscapes as a way of capturing reality and producing journalistic or documentary knowledge accessible through dedicated listening.
As an auspicious sign of a successful curatorial endeavor, some of the artworks escape, contradict and challenge the exhibition’s own framing devices. I sit down in the first room of the exhibition, a darkened space with two speakers and black cushions strewn across the floor that is dedicated to a selection of audio documentaries. Coincidentally, the work reverberating around me is Steven Feld’s Voices in the Forest, a 1983 multitrack recording that has been central in illustrating ad disseminating Feld’s own formulation of acoustemology. Resulting from years of fieldwork among the Kaluli people, Feld’s composition highlights the possibilities offered by collaborative and dialogic practices of recording, interpreting and representing the rainforest, which, “as a co-inhabited world of plural sounding and knowing presences was, most deeply, a listening to histories of listening” (Feld, 2017). These intuitions would warrant a re-examination of field recordings as a whole, but instead, when framed by multiple keywords that they were aimed to counter or upend – including, most notably, the concept of soundscape itself – Feld’s pioneering work is somehow reduced to a colorful ambience: the participatory approach to local epistemologies of listening, the aspect of acoustemology that is perhaps Feld’s most important contribution to the anthropology of sound, disappears behind the uncanny snaps of woodcutting interspersed by Kaluli songs.
The following couple of audio documentaries – Gael Segalen’s Empty Cab and No Ticket by Jeanne Robet and Pali Meursault – introduce a more outright embrace of intimate feelings and narrative composition. While the former sketches a short vignette of a taxi ride in San Francisco, the second revolves around a first-person narration by a passenger traveling on a train from Italy to France, and it buries sparse elements of naturalist phonography under layers of electronic processing, resulting in a suspenseful and intimate emotional narrative. It would be a disservice to criticize these works as paragons of naturalism, and their focus on precarious identities and urban mobilities brings a refreshing twist on the genre. On the contrary, Yannick Dauby’s own audio documentary 山林 Forest, a twenty-minute collage of spoken testimonies overlaid with “beautiful soundscapes, peaceful representations of wilderness” recorded on the mountains of Northern Taiwan, offers a textbook example of naturalist epistemology at work: the voice of an aboriginal man is undercut by Mandarin translations, and both testimonies end up drowning the subtle sonic backdrop in a didascalic descriptions of aboriginal living in forests that “still keep stories of their ancestors”. Similarly, Waterland 水上樂園 by Taiwanese artist Yen-ting Hsu 許雁婷, a fifteen-minute work about life in Taiwan’s coastal areas, intersperses comfortable snippets of crashing waves with fragments of local people talking and some occasional Taiwanese pop song hovering in the background: at its best, it feels like unreflexive eavesdropping on blurry everyday life scenes in places that can, as a woman’s voice puts it, “bring you closer to nature, to the environment”.
The second room of the exhibition showcases works accompanied by visual or textual components that might not fall directly under the definition of audio documentaries. One of the most prominent is David Toop’s Lost Shadows: In Defence of the Soul – Yanomami Shamanism, Songs, Ritual, 1978, which still takes me by surprise despite having listened to its CD release many times before. Installed here with some photographs and an excerpt from the original accompanying text, Toop’s work restlessly unsettles its own veneer of ethnomusicological dedication. The recordist is eager to surrender to an ascetic, if not heroic, practice of listening: “Here I have no language. I am helpless in this environment. Lost, I must listen, in isolation moving through further depths of listening. I listen in order to move out of my body, to collect sounds and bring them back into my body, as an act of repair.” And yet, Toop is also conscious of the absurdity of his position, and ready to dispel exoticism about the Yanomamö:
There is none of the ‘group mind’ psychology or cohesion seen in cinematic ritual. Men chat and laugh together as if discussing the absurdity of life or perhaps the absurdity of the sound recordist […] What they have been saying is that the foreigners are stupid to want to record their music and they are going to trick us out of many gifts.
Deep in the Amazon jungles, lost in the thrall of phonographic capture, David Toop identifies and confronts a central problem of both anthropology and ethnomusicology, formulating a hypernaturalist critique of naturalism that can still be moved to most of the artworks surrounding me in the exhibition space: “To wait until a plane passes overhead or a motorbike vanishes over the horizon, to edit out intrusions or to move away from unwanted noise before pressing the record button is to be the author of a highly selective idealised representation of an event”.
On the opposite side of the room, in a similar arrangement of photographs, booklets and headphones, is something I wanted to properly listen to for a long time: Peter Cusack’s Sounds from Dangerous Places. Undoubtedly the most interesting work of the whole exhibition, Cusack’s exemplar of sonic journalism embodies the paradox at the heart of the naturalist onto-episteme. As increasingly affordable recording devices and convenient digital distribution extend the reach of field recordists, the phonographer feels forced to push naturalist capture outwards, into places that are harder and more dangerous to reach – in Cusack’s case, disaster areas, nuclear sites and military zones characterized by “extreme or hostile conditions”. Based on two trips to Chernobyl’s exclusion zone in 2006 and 2007, Cusack’s work is a fascinating portfolio of minor voices and material details that avoids exploiting the eeriness of disaster porn and challenges the established boundaries of acoustic fidelity by including radiometer clicks, the electric crackling of generators, and the radioactive hiss of contaminated washing machines. “What can we learn by listening to the sound of dangerous places?”, Cusack asks. The answer emerging from his work falls unsurprisingly back onto familiar themes: that unchecked technological development encroaches upon traditional ways of life, that nature comes back by recolonizing an “undisturbed haven” in disaster’s wake, and that local knowledge wisely speaks “of a relationship between humans and nature that is utterly interdependent” – it is hard to disagree, but easy to want more.
At this point, my critique of sonic naturalism should be either clear or easily dismissed. Many of the works exhibited in KINO-EAR that I haven’t yet mentioned exemplify the naturalist onto-episteme to some degree, either by presenting recorded sounds as a neutral documentation of natural acoustic events, or by promoting listening as an unmediated way of knowing the world. The most evident marker of this framing is perhaps the aestheticization of remoteness: in Philip Samartzis’s Antarctica, an Absent Presence, field recordings captured in the South Pole are described as “a ghostly topography, a reinvention of the place, marked by a strong sense of absence, empty and white, deserted and silent”; while admitting that Antarctica is often stereotypically portrayed as the final wild frontier, Samartzis argues for phonography’s purchase on truth-seeking: “Antartica requires an interrogation about its untold secrets, which conceals a more accurate version of the truth”. Cédric Anglaret and Nicolas Perret’s Suania, a field recording work about Europe’s highest inhabited region, justifies its existence by connecting acoustic remoteness to ontological distance: “these beliefs and the still very traditional way of life are the pillars of a unique soundscape in which the harsh and spectacular natural environment echoes the ritual chants and ceremonies”. Luckily, a quirky audiovisual work looping on a small screen offers an ironic escape route from the naturalist trappings of harsh environments and uncontaminated traditions: Félix Blume’s #19 SON SEUL / WILD TRACK #1-19 is a collection of short videos depicting the sound engineer as he records “lonely sounds” to be used in the post-production of movies and documentaries, and it successfully foregrounds the absurdity and contrivedness of what, as listeners, we might confidently categorize as a “natural” sound.
Drawing me into hours of listening, watching and reading, the KINO-EAR exhibition generously nudged me into engaging with the vocabulary of field recording and wringing out a critique of sonic naturalism, of which this is anything but its most cogent version. What kept bugging me, as I slowly moved from pillow to chair to stool, was the striking contrast between the title of this exhibition and the cinematic mobility invoked by the “kino-” prefix as used by Tziga Vertov in compounds like “kino-eye” and “kino-pravda” to characterize a non-documentary constructivism working against both representation and narrative. Sonic naturalism and its onto-episteme, still current in much phonography, seem to run counter to Vertov’s proposition to capture things that escape the human auditory sensorium by provoking the field through the introduction of a non-human ear. Before things get confusing, Jean Rouch’s reading of Vertov himself might help closing the circle:
The field changes the simple observer. When he works, he is no longer one who greeted the oldtimers at the edge of the village; to take up again Vertovian terminology, he “ethno-looks,” he “ethno-observes,” he “ethno-thinks,” and once they are sure of this strange regular visitor, those who come in contact with him go through a parallel change, they “ethno-show,” they “ethno-speak,” and, ultimately, they “ethno-think.” (Jean Rouch, quoted in Stoller, 1992)
The field changes the simple recordist too, and this perhaps is a good starting point to move from a critique of sonic naturalism to the articulation of postnaturalism, antinaturalism and unnaturalism in sound.
Feld, Steven (2017). “On Post-Ethnomusicology Alternatives: Acoustemology”, in F. Giannattasio & G. Giuriati (eds.), Perspectives on a 21st century comparative musicology: Ethnomusicology or transcultural musicology?, Udine, Italy: Intersezioni Musicali (pp. 83-98).
Stoller, Paul (1992). The cinematic griot: The ethnography of Jean Rouch. Chicago, IL: The University of Chicago Press.