Google AI could soon use a person’s cough to diagnose disease
Google AI可能很快就会使用人的咳嗽来诊断疾病
Machine-learning system trained on millions of human audio clips shows promise for detecting COVID-19 and tuberculosis.
在数百万人类音频片段上训练的机器学习系统显示出检测COVID-19和结核病的希望。
A team led by Google scientists has developed a machine-learning tool that can help to detect and monitor health conditions by evaluating noises such as coughing and breathing. The artificial intelligence (AI) system1, trained on millions of audio clips of human sounds, might one day be used by physicians to diagnose diseases including COVID-19 and tuberculosis and to assess how well a person’s lungs are functioning.
由谷歌科学家领导的一个团队开发了一种机器学习工具,可以通过评估咳嗽和呼吸等噪音来帮助检测和监测健康状况。人工智能(AI)系统 1 经过数百万人类声音音频片段的训练,有一天可能会被医生用来诊断包括COVID-19和结核病在内的疾病,并评估一个人的肺部功能如何。
This is not the first time a research group has explored using sound as a biomarker for disease. The concept gained traction during the COVID-19 pandemic, when scientists discovered that it was possible to detect the respiratory disease through a person’s cough2.
这不是研究小组第一次探索使用声音作为疾病的生物标志物。这一概念在COVID-19大流行期间获得了关注,当时科学家发现可以通过一个人的咳嗽来检测呼吸道疾病。
What’s new about the Google system — called Health Acoustic Representations (HeAR) — is the massive data set that it was trained on, and the fact that it can be fine-tuned to perform multiple tasks.
谷歌系统的新之处在于它训练的大量数据集,以及它可以进行微调以执行多项任务的事实。
The researchers, who reported the tool earlier this month in a preprint1 that has not yet been peer reviewed, say it’s too early to tell whether HeAR will become a commercial product. For now, the plan is to give interested researchers access to the model so that they can use it in their own investigations. “Our goal as part of Google Research is to spur innovation in this nascent field,” says Sujay Kakarmath, a product manager at Google in New York City who worked on the project.
研究人员本月早些时候在一份尚未经过同行评审的预印本 1 中报告了该工具,他们说现在判断HeAR是否会成为商业产品还为时过早。目前,该计划是给予感兴趣的研究人员访问该模型的权限,以便他们可以在自己的研究中使用它。“作为谷歌研究的一部分,我们的目标是刺激这一新兴领域的创新,”纽约市谷歌的产品经理Sujay Kakarmath说,他曾参与该项目。
How to train your model
如何训练你的模型
Most AI tools being developed in this space are trained on audio recordings — for example, of coughs — that are paired with health information about the person who made the sounds. For example, the clips might be labelled to indicate that the person had bronchitis at the time of the recording. The tool comes to associate features of the sounds with the data label, in a training process called supervised learning.
在这个领域开发的大多数人工智能工具都是在音频记录上训练的-例如,咳嗽-与发出声音的人的健康信息配对。例如,剪辑可能被标记为指示该人在录制时患有支气管炎。该工具将声音的特征与数据标签相关联,在一个称为监督学习的训练过程中。
“In medicine, traditionally, we have been using a lot of supervised learning, which is great because you have a clinical validation,” says Yael Bensoussan, a laryngologist at the University of South Florida in Tampa. “The downside is that it really limits the data sets that you can use, because there is a lack of annotated data sets out there.”
“在医学上,传统上,我们一直在使用大量的监督学习,这是伟大的,因为你有一个临床验证,说:”雅埃尔Bensoussan,在南佛罗里达大学在坦帕的喉部专家。“缺点是,它确实限制了您可以使用的数据集,因为缺乏带注释的数据集。”
Instead, the Google researchers used self-supervised learning, which relies on unlabelled data. Through an automated process, they extracted more than 300 million short sound clips of coughing, breathing, throat clearing and other human sounds from publicly available YouTube videos.
相反,谷歌的研究人员使用了自我监督学习,这种学习依赖于未标记的数据。通过一个自动化的过程,他们从公开的YouTube视频中提取了超过3亿个咳嗽、呼吸、清嗓子和其他人类声音的短声音片段。
Each clip was converted into a visual representation of sound called a spectrogram. Then the researchers blocked segments of the spectrograms to help the model learn to predict the missing portions. This is similar to how the large language model that underlies chatbot ChatGPT was taught to predict the next word in a sentence after being trained on myriad examples of human text. Using this method, the researchers created what they call a foundation model, which they say can be adapted for many tasks.
每个片段都被转换成声音的视觉表示,称为声谱图。然后,研究人员阻止了频谱图的片段,以帮助模型学习预测缺失的部分。这类似于聊天机器人ChatGPT的大型语言模型在接受了无数人类文本示例的训练后,如何被教导预测句子中的下一个单词。使用这种方法,研究人员创建了他们所谓的基础模型,他们说可以适用于许多任务。
An efficient learner 高效的学习者
In the case of HeAR, the Google team adapted it to detect COVID-19, tuberculosis and characteristics such as whether a person smokes. Because the model was trained on such a broad range of human sounds, to fine-tune it, the researchers only had to feed it very limited data sets labelled with these diseases and characteristics.
在HeAR的案例中,谷歌团队对其进行了调整,以检测COVID-19、肺结核以及一个人是否吸烟等特征。由于该模型是在如此广泛的人类声音上训练的,为了对其进行微调,研究人员只需向其提供标记有这些疾病和特征的非常有限的数据集。
On a scale where 0.5 represents a model that performs no better than a random prediction and 1 represents a model that makes an accurate prediction each time, HeAR scored 0.645 and 0.710 for COVID-19 detection, depending on which data set it was tested on — a better performance than existing models trained on speech data or general audio. For tuberculosis, the score was 0.739.
在0.5表示模型的表现不比随机预测更好,1表示每次都做出准确预测的模型的尺度上,HeAR在COVID-19检测方面的得分为0.645和0.710,这取决于它所测试的数据集-比现有的语音数据或一般音频训练模型的性能更好。肺结核的得分为0.739。
The fact that the original training data were so diverse — with varying sound quality and human sources — also means that the results are generalizable, Kakarmath says.
Kakarmath说,原始训练数据如此多样化-具有不同的音质和人类来源-也意味着结果是可推广的。
Ali Imran, an engineer at the University of Oklahoma in Tulsa, says that the sheer volume of data used by Google lends significance to the research. “It gives us the confidence that this is a reliable tool,” he says.
位于塔尔萨的俄克拉荷马州大学的工程师阿里·伊姆兰说,谷歌使用的大量数据为这项研究提供了重要意义。“这让我们相信这是一个可靠的工具,”他说。
Imran leads the development of an app named AI4COVID-19, which has shown promise at distinguishing COVID-19 coughs from other types of cough3. His team plans to apply for approval from the US Food and Drug Administration (FDA) so that the app can eventually move to market; he is currently seeking funding to conduct the necessary clinical trials. So far, no FDA-approved tool provides diagnosis through sounds.
Imran领导了一款名为AI 4COVID-19的应用程序的开发,该应用程序有望将COVID-19咳嗽与其他类型的咳嗽区分开来。他的团队计划向美国食品和药物管理局(FDA)申请批准,以便该应用程序最终能够推向市场;他目前正在寻求资金进行必要的临床试验。到目前为止,还没有FDA批准的工具通过声音提供诊断。
The field of health acoustics, or ‘audiomics’, is promising, Bensoussan says. “Acoustic science has existed for decades. What’s different is that now, with AI and machine learning, we have the means to collect and analyse a lot of data at the same time.” She co-leads a research consortium focused on exploring voice as a biomarker to track health.
Bensoussan说,健康声学或“音响学”领域很有前途。“声学科学已经存在了几十年。不同的是,现在有了人工智能和机器学习,我们可以同时收集和分析大量数据。她共同领导了一个研究联盟,专注于探索声音作为跟踪健康的生物标志物。
“There’s an immense potential not only for diagnosis, but also for screening” and monitoring, she says. “We can’t repeat scans or biopsies every week. So that’s why voice becomes a really important biomarker for disease monitoring,” she adds. “It’s not invasive, and it’s low resource.”
她说:“不仅在诊断方面,而且在筛查和监测方面都有巨大的潜力。”“我们不能每周重复扫描或活检。这就是为什么声音成为疾病监测的一个非常重要的生物标志物,”她补充说。“这不是侵入性的,而且资源少。”