AI is helping to decode animals’ speech. Will it also let us talk with them?
人工智能正助力破译动物语言。它是否也能让我们与它们对话?
Deep in the rainforests of the Democratic Republic of the Congo, Mélissa Berthet found bonobos doing something thought to be uniquely human.
在刚果民主共和国的雨林深处,梅丽莎·贝特发现倭黑猩猩正在做着一些曾被认为只有人类才能做到的事情。
During the six months that Berthet observed the primates, they combined calls in several ways to make complex phrases1. In one example, bonobos (Pan paniscus) that were building nests together added a yelp, meaning ‘let’s do this’, to a grunt that says ‘look at me’. “It’s really a way to say: ‘Look at what I’m doing, and let’s do this all together’,” says Berthet, who studies primates and linguistics at the University of Rennes, France.
在贝泰特观察灵长类动物的六个月期间,它们通过多种方式组合叫声以构成复杂短语 1 。例如,正在共同筑巢的倭黑猩猩会在表示“看我”的低吼声后加入意为“我们一起做”的短促吠叫。法国雷恩大学研究灵长类动物与语言学的贝泰特解释道:“这实际上是在表达:‘看我正在做的事,我们一起来完成这个’。”
In another case, a peep that means ‘I would like to do this’ was followed by a whistle signalling ‘let’s stay together’. The bonobos combine the two calls in sensitive social contexts, says Berthet. “I think it’s to bring peace.”
在另一个例子中,一声表示“我想这样做”的轻啼后,紧接着是一声示意“我们待在一起”的口哨。贝泰特指出,倭黑猩猩在敏感的社会情境中会组合这两种叫声。“我认为这是为了促进和谐。”
The study, reported in April, is one of several examples from the past few years that highlight just how sophisticated vocal communication in non-human animals can be. In some species of primate, whale2 and bird, researchers have identified features and patterns of vocalization that have long been considered defining characteristics of human language. These results challenge ideas about what makes human language special — and even how ‘language’ should be defined.
这项于四月报告的研究,是过去几年中凸显非人类动物声音交流之复杂的多个例证之一。在某些灵长类、鲸类和鸟类物种中,研究人员已识别出长久以来被视为人类语言定义特征的发音特征和模式。这些发现对关于人类语言独特性的观念提出了挑战,甚至对“语言”应如何定义也产生了疑问。
Perhaps unsurprisingly, many scientists turn to artificial intelligence (AI) tools to speed up the detection and interpretation of animal sounds, and to probe aspects of communication that human listeners might miss. “It’s doing something that just wasn’t possible through traditional means,” says David Robinson, an AI researcher at the Earth Species Project, a non-profit organization based in Berkeley, California, that is developing AI systems to decode communication across the animal kingdom.
或许并不令人惊讶,许多科学家借助人工智能(AI)工具加速动物声音的检测与解读,并探索人类听觉可能忽略的交流维度。”这项技术正在实现传统方法无法企及的目标,”加州伯克利非营利组织”地球物种计划”的 AI 研究员大卫·罗宾逊表示。该组织正致力于开发能破译动物王国交流的 AI 系统。
As the research advances, there is increasing interest in using AI tools not only to listen in on animal speech, but also to potentially talk back.
随着研究推进,人们越来越关注利用 AI 工具不仅监听动物语言,更可能实现双向对话。
Combining calls 叫声组合
Researchers studying animal communication ask some of the same types of question that linguists do. How are speech sounds physically produced (phonetics)? How are sounds combined to make meaningful units (morphology)? What rules determine how phrases and sentences are structured (syntax)?
研究动物交流的学者提出的问题与语言学家如出一辙:语音如何通过物理方式产生(语音学)?声音如何组合成有意义的单元(形态学)?决定短语和句子结构的规则是什么(句法)?
Until about a decade ago, researchers thought that only humans used a feature known in linguistics as compositionality. This is the combining of meaningful words, calls or other noises into expressions that have a meaning derived from those of their parts.
直到大约十年前,研究人员还认为只有人类使用语言学的”组合性”特征——即将有意义的词语、叫声或其他声响组合成整体含义源于各部分含义的表达方式。
But in 2016, a study of Japanese tits (Parus minor) changed how scientists thought about compositionality. The birds looked for predators when they heard an ‘alert’ call and approached a sound’s source after hearing a ‘recruitment’ call. When they heard the calls in that order, they performed both behaviours3. But they didn’t do so when the order was reversed, suggesting compositionality: the combination of calls had its own meaning.
但在 2016 年,一项关于远东山雀(Parus minor)的研究改变了科学家对组合性的认知。这些鸟类听到”警戒”叫声时会寻找捕食者,听到”召集”叫声后会接近声源。当它们按此顺序听到这两种叫声时,会同时做出这两种反应 3 。但若顺序颠倒则不会,这表明存在组合性:叫声的组合具有特定含义。
A study in 2023 extended that work. By presenting chimpanzees (Pan troglodytes) with fake snakes in the wild, scientists showed that the primates similarly combine ‘alarm’ and ‘recruitment’ vocalizations into a message that prompts others to gather around the caller to respond to a threat4.
2023 年的一项研究扩展了这项工作。通过在野外向黑猩猩(Pan troglodytes)展示假蛇,科学家证明这些灵长类动物同样会将”警报”和”召集”叫声组合成信息,促使其他成员聚集到呼叫者周围应对威胁 4 。
However, humans remained the only species known to use compositionality in more than one way. For instance, by ordering words differently to change the meaning of the phrase, adding endings to words to modify meaning and creating metaphors and idioms to produce a figurative expression.
然而,人类仍是已知唯一能以多种方式运用组合性的物种。例如,通过不同词序改变短语含义,添加词尾修饰词义,以及创造隐喻和习语形成比喻表达。

刚果民主共和国的倭黑猩猩能以多种方式将叫声组合成短语。图片来源:Christian Ziegler/Nature Picture Library
But the study by Berthet and her colleagues softened that distinction between humans and other animals. They recorded 700 calls by 30 adult bonobos and found that the animals combined a finite number of calls in four ways1. One — a yelp–grunt combination — the authors considered to have ‘trivial’ compositionality, because the meaning of the individual calls had merely been combined. (For instance, ‘the red car’ describes an object that is both red and a car.) In the three other cases, one call modified the other, resulting in ‘non-trivial’ compositionality. (‘A terrible actor’ describes a person who is bad at acting, not someone who is terrible and an actor.)
但伯特赫及其同事的研究弱化了人类与其他动物之间的这一区别。他们记录了 30 只成年倭黑猩猩发出的 700 次叫声,发现这些动物以四种方式组合有限数量的叫声。其中一种——尖叫-咕噜声组合——作者认为具有“浅层”组合性,因为个体叫声的含义仅仅是简单叠加。(例如,“红色的车”描述的是既是红色又是车的事物。)而在其他三种情况下,一种叫声修饰了另一种,形成了“非浅层”组合性。(“糟糕的演员”描述的是演技差的人,而非既是糟糕的人又是演员。)
Evolutionary biologist Cédric Girard-Buttoz at the Lyon Neuroscience Research Center, France, and his colleagues reported in May that chimpanzees also combine a finite number of calls in several ways5. For some vocalizations, the meaning of the combined phrase can’t be determined from the meaning of the individual calls, as is the case for some idioms in human languages. For example, a hoot, used when resting on the ground, followed by a pant, which signifies playing and affiliation, prompted the chimpanzees to climb a tree, make a nest and rest together, even though neither call is typically associated with tree climbing, says Girard-Buttoz. Generating meaning in several ways is a building block of language, he adds.
法国里昂神经科学研究中心的进化生物学家 Cédric Girard-Buttoz 及其同事在五月报告称,黑猩猩也会以多种方式组合有限数量的叫声 5 。对于某些叫声组合,其含义无法从单个叫声的意义中推断出来,这与人类语言中的某些习语情况相似。Girard-Buttoz 举例说明,当黑猩猩在地面休息时发出的鸣叫声,随后表示玩耍和亲近的喘息声,会促使它们爬树、筑巢并一起休息,尽管这两种叫声通常与爬树无关。他补充道,以多种方式生成意义是语言的基本构建单元。
Whales, too, have some notable features of human language. Researchers at Project CETI, a non-profit organization in New York City, have been tracking and recording sperm whales (Physeter macrocephalus) off the coast of the Caribbean island of Dominica to compile a large data set of movements and sounds. By finding patterns that link whale sounds and behaviours, the scientists hope to translate ‘whale speak’.
鲸鱼也拥有一些人类语言的显著特征。位于纽约市的非营利组织 CETI 项目的研究人员一直在加勒比海岛国多米尼克海岸追踪并记录抹香鲸的活动,以编制一个包含动作和声音的大型数据集。通过发现鲸鱼声音与行为之间的联系模式,科学家们希望能够解读“鲸语”。
CETI linguist Gašper Beguš has been training generative-AI models to produce sounds and sequences of sounds that mimic those made by sperm whales. Whereas humans create distinct sounds by sending air through vocal folds in the throat, which vibrate at different frequencies, these whales send air through a lip-like structure in their nasal passage, which vibrates and creates clicks. The clicks are grouped into units called codas.
CETI 语言学家加斯珀·贝古什一直在训练生成式 AI 模型,以模拟抹香鲸发出的声音及其序列。人类通过喉咙中的声带以不同频率振动产生气流来制造独特声音,而这些鲸鱼则通过鼻腔内类似嘴唇的结构使气流通过,产生振动并发出咔嗒声。这些咔嗒声被组合成称为“编码单元”的音节组。

科学家们利用无人机将能够收集生物声学及其他数据的传感器安装到抹香鲸身上。
CETI scientists reported last year that sperm whales have their own ‘phonetic alphabet’, with codas varying in characteristics such as tempo and rhythm6. Beguš and his colleagues have since found that whale codas can differ in ways analogous to vowels and diphthongs in human language. Vowels in human speech differ on the basis of the tongue’s position and the shape of the lips, such as for the ‘ee’ in cheese versus the ‘o’ in hot. Diphthongs, or gliding vowels, are created by combining two vowels in a single syllable, such as in ‘pout’, resulting in a frequency change as the lips and tongue move.
CETI 科学家去年报告称,抹香鲸拥有自己的‘语音字母表’,其尾声音节在节奏和韵律等特征上存在差异 6 。此后,贝古什及其同事发现鲸鱼尾声音节的差异方式类似于人类语言中的元音和双元音。人类语音中的元音差异基于舌头位置和唇形变化,例如“cheese”中的“ee”与“hot”中的“o”之别。双元音(或称滑音)通过将两个元音组合在单个音节中形成,如“pout”所示,随着唇舌运动产生频率变化。
Beguš’s team identified two codas with distinct sound patterns that the researchers called an a-vowel and i-vowel. They also found that these vowels changed frequency in four ways: they can rise, they can fall, they can fall then rise or they can rise then fall7. The frequency changes could be indicative of diphthongs.
贝古什团队识别出两种具有独特声学模式的尾声音节,研究人员将其称为 a 元音和 i 元音。他们还发现这些元音以四种方式改变频率:可上升、可下降、可先降后升或可先升后降 7 。这种频率变化可能预示着双元音的存在。
What’s in a language 语言中蕴含何物
Whether the sophistication of animal communication is enough to qualify it as language depends on how a person defines the term and what they think about how animals think. There are two prevailing views, Beguš says. “One world view says that language and complex thought are intrinsically connected.” According to this view, complex thought came first and language is a way to externalize thoughts. If so, animals can’t have a language unless they are capable of complex thought.
动物交流的复杂程度是否足以称之为语言,取决于人们对“语言”一词的定义以及对动物思维方式的认知。贝古什指出存在两种主流观点:“一种世界观认为,语言与复杂思维本质相连。”依此观点,复杂思维先于语言存在,而语言是思维外化的工具。若真如此,除非动物具备复杂思维能力,否则不可能拥有语言。
The other view holds that language is just one kind of communication, like gestures or facial expressions, and complex thought isn’t required. In this case, animals could have a language with or without complex thought. Experiments that train animals to communicate with humans, such as those with the bonobo Kanzi, who died earlier this year, have hinted that animals might be capable of having a language. But that’s a different question from whether they use language on their own in the wild.
另一种观点则认为,语言只是交流方式的一种,如同手势或面部表情,并不需要复杂思维的参与。照此说法,无论是否具有复杂思维,动物都可能拥有语言。通过训练动物与人类交流的实验(例如今年早些时候去世的倭黑猩猩坎兹的相关研究)已暗示动物或许具备掌握语言的能力。但这与它们是否在野外自主使用语言是两个不同的问题。
“The word is still out on whether we’ll find a full-on language,” says Robinson.
罗宾逊表示:“关于能否发现完整语言体系,目前尚无定论。”
For one, some aspects of human language haven’t been found in other species yet. Three of the 16 features — displacement, productivity and duality — on a language checklist created by linguist Charles Hockett haven’t been identified in non-human animals.
首先,人类语言的某些特征尚未在其他物种中发现。语言学家查尔斯·霍克特提出的语言特征清单中的 16 个要素里,有 3 个——移位性、能产性和双重性——在非人类动物中尚未得到确认。
Displacement is the ability to talk about abstract concepts, such as the past, the future or things that are distant. This feature hasn’t been seen convincingly in animal communication, although there is anecdotal evidence in some instances, such as dolphins calling the names of other dolphins that had disappeared years ago, and orangutans (Pongo spp.) telling others about a predator that was previously in an area, Berthet says.
位移性是指谈论抽象概念的能力,比如过去、未来或遥远的事物。尽管有传闻证据表明某些动物具备这种能力,如海豚呼唤多年前消失的其他海豚的名字,以及猩猩(Pongo spp.)告诉同伴某地曾出现过捕食者,但这一特征在动物交流中尚未得到确凿证实,伯特特说。
Productivity is the ability to say things that have never been said or heard before, and be understood by another individual.
生产力是指能够说出从未被说过或听过的事物,并被另一个个体理解的能力。
And duality describes meaningful messages made up of smaller meaningful units, which consist of even smaller, meaningless sounds. Although whales use clicks to create longer codas, scientists haven’t yet shown that clicks are meaningless and codas are meaningful.
二元性描述了由更小的有意义单元组成的有意义信息,而这些单元又由更小、无意义的声音构成。尽管鲸鱼使用咔嗒声来创造更长的编码序列,科学家们尚未证实咔嗒声是无意义的而编码序列是有意义的。
Recursion is another feature that might be unique to human language. This is when sentences or phrases are embedded in each other to create deeper levels of meaning. By training crows (Corvus corone) to peck at open and closed brackets in the appropriate sequence on a touch screen, Diana Liao, who studies vocal communication and cognition at the University of Tübingen in Germany, and her colleagues found evidence that the animals are mentally capable of recursion8. “They do this even better than macaque monkeys and on par with human toddlers”, Liao says. However, it’s not clear whether crows use it in their communication.
递归可能是人类语言独有的另一特性,它通过句子或短语的相互嵌套来构建更深层次的语义。德国蒂宾根大学研究声音交流与认知的戴安娜·廖(Diana Liao)及其团队通过训练小嘴乌鸦(Corvus corone)在触屏上按正确顺序啄击开合括号,发现这些鸟类在认知层面具备递归能力 8 。廖表示:”它们的表现甚至优于猕猴,与人类幼童相当。”但目前尚不清楚乌鸦是否在交流中运用这种能力。
It’s also unclear whether animals have grammatical rules that define how vocal communication is structured. And, although primates have been shown to mix and match calls to generate meaning, the number of meanings that they can produce is “really far from what humans can do”, says Girard-Buttoz.
动物是否拥有定义声音交流结构的语法规则同样尚未明确。尽管灵长类动物已被证明能通过混合搭配叫声来传递含义,但吉拉尔-布托指出,它们所能表达的意义数量”与人类能力相去甚远”。
