Why We’re Exhausted by Zoom


Twitter, Facebook and the news media are filled with people lamenting their weariness after Zoom class sessions. I feel that, too.

The first day I had two Zoom classes in a row, I ended up bleary-eyed and exhausted. I just sat and watched something silly on Netflix, drank a glass of wine and did nothing productive until I could finally go to sleep. I’d had countless Zoom meetings previously, many of which I’d hosted. Some were almost joy-filled. So what was different?

I have spent a lot of time thinking, posting, talking about this. And it is clear: it is because videoconferencing is nearly a replication of face-to-face interaction but not quite, and it depletes our energy. And anthropology can help explain what’s different. (I’m using Zoom to represent videoconferencing platforms in general. And I treasure and appreciate their benefits for connecting distant loved ones, despite the critique that follows.)

In a Zoom classroom with 30 students, we see faces — just like in a classroom. We see eye movement. We can hear voices. It can even be enhanced by chat — almost like hearing people thinking out loud. It is multimodal, to some extent. We see gestures, at least some big ones. All this is information used by our human capacity for understanding interaction. So far, so good.

Zoom works well for faculty members who lecture, or for groups that have formal meetings, with rules for who speaks and how to signal an interest in speaking. As long as the symphony is directed by an authority figure, order can be kept. The trumpets come in on cue. It is calm. Information and views can be exchanged. It beats a long email exchange any day!

But in the more interactive, active classrooms that I aim to create, this is terrible. When a classroom aims for (doesn’t always achieve) democratic nonauthoritarian conversation, rather than orchestrated teacher-centered pedagogy, all the tools of human interaction are recruited.

Over my decades of teaching, I’ve learned to read a room pretty well: the harmonized posture, the breaths, the laughter, the eye gaze. My classes are successful when everyone is so excited that they want to speak over each other out of sheer exuberance. When people sit up straight and say, “Wait! Do you mean …?” because they have a brand-new way to understand the world — that’s the superpower of anthropology. When students huddling around a text point to it, their gazes converging, and create a document they’re proud of. When people laugh simultaneously. When the affect and the cognition and the interaction work together.

I have also analyzed conversation quite a bit. In “ordinary” conversation — and that is a de-cultured formulation, isn’t it? — there is often brief overlap, as one speaker ends an utterance and another begins. And when it works well — when the hearer is successful at matching the prosodic contours, the rhythms and speeds of the speaker and anticipating the ending of the utterance — it’s like a symphony. And even when we need to repair the interaction, it’s incorporated into the conversation, sometimes with humor. Conversation has rhythm. Even our brain waves synchronize in a conversation. “The emotional/aesthetic experience of a perfectly tuned conversation is as ecstatic as an artistic experience,” Deborah Tannen writes. “It is a ratification of one’s place in the world and one’s way of being human … ‘a vision of sanity’” (quoting A. L. Becker at the end).

Anthropologists, linguists and sociologists who analyze conversation, which surely varies around the world, have shown some common traits. N. J. Enfield’s recent book How We Talk and the work of conversation analysts such as the late Charles Goodwin points to multimodality, rules about eye gaze, patterns for rapid turn taking, and near-universal reliance on microsecond timing. Goodwin reminds us that “co-operative action sits at the center of human language, and symbols are essentially co-operative structures in which one party is operating on another.”

This is not what my Zoom classrooms are like.

There is constant need to repair, to apologize. People are constantly talking at the same time and interrupting someone else’s signal. I am constantly switching views from one screen to another, to scan the faces (at least those who haven’t chosen to post a blank screen, permitting rest, multitasking or even absence). I am watching the eyes, listening for completion, listening for that intake of breath that indicates readiness to talk. I am continually repressing my lifelong, trained habit of uttering simultaneous encouragement through “continuers,” those back-channel cues that encourage the speaker to go on. Mmm-hmm, yeah, I know. None of that works; the platform is made for a single speaker at a time. It’s the folk model of how conversation works, but not what we actually find in practice.

In regular classrooms, we notice heads nodding, distracted, gazing in one direction or another.

Humans use eye gaze as communicative information; that’s why we have sclera. (It’s not only to look at someone; sometimes looking away is proper. Many primates, including many humans, see direct gaze as threat.) On Zoom, people may generally nod, but eye gaze can’t be tracked. We seek “joint attention” — that confirmation that everyone is sharing the focus. We get stares, or looking down or away, or watching the image on a screen, which may not even be in the center. What does it mean? We always want to know. Why did they do that?

That’s because, when we interact, the meaning is not just about the content, the semantics. Meaning is always also pragmatic: it does things. Did she say I’m confused about the assignment as an accusation or as an inside joke or because she needed clarification or to show leadership or simply to invite clarification? Was the laughter with me or at me? The meaning of classroom interaction is never just the “content” or the “information.” If it were that, we wouldn’t need to interact at all.

In the prototypical usage of these platforms, everyone is looking forward. A camera is broadcasting (unless people turn off their video, either to give themselves a rest from scrutiny or to mask their multitasking or even absence), but we’re not really looking at each other.

So all the communicative signs that embodied humans rely on are thinned, flattened, made more effortful or entirely impossible. Yet we interpret them anyway.

Technology does not completely determine our interactions. The medium is not always the message. Writing, pace Socrates, has brought some good to the world. We can write hymns of praise or also calls to hate. A hammer can build a sanctuary — or can murder an innocent person. These technologies, though, have affordances, as Gibson pointed out. It is easier to sit facing forward on a chair, though you can also sit backward. It is possible to use Facebook for lyric poetry. Users can contravene the designers’ intentions. I’m sure there’s a way to hack Zoom — and I don’t mean Zoombomb; I mean to roll up our sleeves and find a way to improve from within.

Pedagogy and interaction are quite nearly baked in, though, to our platforms. Banked classrooms with lecterns assume a single central speaker and multiple listeners — though a determined teacher can have students turn around even in stadium-type seating. Learning management systems usually presume that the instructor controls all the communication, unless a discussion board is enabled.

The “pivot to online learning” or “online teaching” affords a number of different opportunities: for asynchronous interaction in discussions, for posting of brief video messages. Many brilliant pedagogues are using these options well. I embrace this, and, surely, we all need to learn about more affordances of more platforms.

I have used Zoom’s small-group breakout room for some tasks to some effect, though it is cumbersome. In one class, where they are in project teams, I have to manually put the students into groups, and it takes measurable minutes, and then joining groups takes a little while, and then exiting each group takes time … I haven’t counted, but it definitely takes time, and students are frustrated with all their teachers having to learn the intricacies of Zoom. The dead time is, well, deadly to the rhythms.

When I see that technological platforms such as Zoom provide some imitations of face-to-face interaction, what I notice the most is that I miss the three-dimensional faces and the bodies and the eyes and the breaths.

Humans are delicately attuned to each other’s complete presence. If a perfectly tuned conversation provides a “vision of sanity,” then it is no wonder that an awkward, clunky, interrupted conversation provides the opposite. We are constantly interpreting others’ movements, timing, breaths, gazes, encouragement. It is our beautiful endowment. So we’re interpreting the misaligned gazes, the interrupted conversation, as stemming from the technology, not from the interlocutor. And that, my human friends, is a tale of human-technology-semiotic mismatch.


