Center for Voice Intelligence and Security

Voice Intelligence

Profiling humans from their voice

How much information can we deduce from the human voice? Currently, our work is focussed on deducing many aspects of the human persona from voice. Some of this work is described below.

Deducing faces from voices

Image 1 Image 2

Sounds produced in our vocal tract resonate in our vocal chambers. They are modulated by our articulators (tongue, lips, jaw etc.) and further influenced by the structures within the vocal tract. The dimensions of our vocal chambers are highly correlated with our skull structure, which in turn defines our facial appearance to a very large extent. Many physical properties and dynamics of our vocal structures also correlate well with our age, ethnicity, height, gender and so on -- which in turn influence our appearance. Thus, it is easy to see how the signatures of many physical factors that naturally influence our voice can also directly or indirectly inform about our facial structure.
Technologies that infer facial appearance and structure from voice leverage this web of information embedded in the voice signal. AI systems can also be designed to isolate this information.

Publications

  • Reconstruction of Human Faces from Voice. Yandong Wen. PhD Thesis, Carnegie Mellon University. May 2022. pdf
  • Self-Supervised 3D Face Reconstruction via Conditional Estimation. Yandong Wen, Weiyang Liu, Bhiksha Raj and Rita Singh, In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2021), pp. 13289-13298. pdf
  • More publications (search for "face")

Code

Applications

First introduced in 2017, this technology made a global debut at the World Economic Forum in 2018, creating faces from voices of speakers in a VR environment. Over a thousand speakers tested it at the WEF. Some of our work was published in this book in 2019.
Backed by HSBC Bank, a more recent version of this technology powered a global campaign on fraud by Wunderman Thompson in 2022 and 2023. First released in the UAE, it was a nominee at the Cannes Lions in June 2023. It did not win, though! Better luck next time!

  • The Race to Hide Your Voice. Wired, 1 Jun 2022. Magazine article
  • Artificial Intelligence: HSBC Releases AI Campaign Featuring the Faces of Fraudsters. Adweek.com, 17 Nov 2022. News article
  • HSBC and Wunderman Thompson put AI faces to fraudsters in global fraud prevention campaign. Marketing Beat, 18 Nov 2022. News article
  • HSBC puts faces to 'invisible' fraudsters in global blitz. Decision Marketing, 22 Nov 2022. Article
  • New and innovative HSBC campaign by Wunderman Thompson UK reveals the real faces of fraudsters. ArabAd, 21 Nov 2022. Article

Biomarker discovery

Image 1

Biomarker discovery techniques identify or design/construct specific mathematical representations of voice signals that bear a strong correlation with a particular influence hypothesized to affect voice. As an example, consider a scenario where we theorize that the consumption of a specific medication must impact the voice. Yet in practical life this does not seem to be so -- these alterations, if present, remain undetectable to us, evading even the most direct voice signal analyses. In such instances, a biomarker discovery system could reveal the precise, distinctive signature indicative of the medication's influence on a given voice signal. This distinguishing signature might lie in a complex, high-dimensional mathematical space -- an imperceptible virtual representation that is only useful for computational purposes. Nevertheless, the ability to now extract the identified signature from new voice samples allows machines to learn and detect, merely from the speaker's voice, the recent consumption of that specific medication.

We began work on biomarker discovery techniques in 2016, well before the control of latent space representations using neural architectures became mainstream. Today we continue to expand the set of entities for which we have successfully created AI-based discovery pipelines.

Publications

  • Feature engineering for profiling. Rita Singh. In: "Profiling humans from their voice", Ch.7, pp.269-298. Springer, July 2019. pdf
  • Voice profiling: Discovering biomarkers of health conditions in voice. Rita Singh. In: "Artificial Intelligence and Machine Learning in Healthcare," Arman Kili{\'c} and Artur Dubrawski (Eds.), Elsevier. Was to appear in June 2022, now appearing in March 2024. book and
    chapter pre-print from 2022.
  • More publications

Patent

Applications

Currently, multiple entities worldwide are engaged in exploring the possibilities of this technology in healthcare applications.

  • La inteligencia artificial puede dise\~narse para descubrir patrones en la voz que los humanos no pueden percibir (Artificial intelligence can be designed to discover patterns in the voice that humans cannot perceive). Wired, 16 Jan 2023. Article; CNN Chile, News and Television, 18 Jan 2023. Article

Deducing vocal fold oscillations

Image 1 Image 1

Our vocal folds oscillate in a self-sustained manner during phonation (ie, when we produce voiced sounds like a sustained "aa"). There is a wealth of information in the fine-level details of how they oscillate. Understanding these details can reveal an astonishing amount of information aboout the state of the speaker. However, historically it was hard to measure the vocal fold oscillations of each person as they spoke. Doing so required specialized instruments, to be used in clinical settings. At CVIS, we have been developing techniques to deduce the vocal fold oscillations of speakers from voiced sounds directly from recorded speech signals. This opens doors to analyzing vocal fold osciallations on an individual basis, and studying ther changes in their patterns in response to various influencing factors -- from substances to mental problems to infectious diseases like Covid-19.

Based on our techniques, we built a live Covid-19 detection system in February 2020, and put together a protocol for analyzing a set of sustained vowel sounds and a couple of countinuous speech examples. This protocol is now globally used as a basis for sound analysis for Covid-19 and other conditions. We began this work in 2017 with the interesting goal of detecting voice disguise in-vacuo (without knowing what the original voice sounds like), and have made considerable progress in refining our techniques and exploring practical applications for it.

Publications

  • Deriving Vocal Fold Oscillation Information from Recorded Voice Signals Using Models of Phonation.Wayne Zhao and Rita Singh. Entropy 25(7), 1039; Special issue on Information-Theoretic Approaches in Speech Processing and Recognition. 2023. pdf
  • Vocal Fold Dynamics for Automatic Detection of Amyotrophic Lateral Sclerosis from Voice. Jiayi Zhang . Master's Thesis, Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA, 2022. Abstract (This thesis won the 2022 The School of Computer Science Alumni Award for Undergraduate Education. Jiayi Zhang is now a Ph.D student at the Lewis-Sigler Institute for Integrative Genomics, Princeton University.) )

  • More publications

Patent

Applications

The most potent use of this technology is in the early detection of Parkinsons and other serious neuromuscular disorders and illnesses. With suffiencient discriminative data, this can be applied to detect a plethora of diseases, used to break voice disguise, used to differentiate synthetic speech from real, and used in many other profiling applications.

  • AI And Medical Diagnostics: Can A Smartphone App Detect Covid-19 From Speech Or A Cough?. Forbes, 5 May 2020. Article;
  • Do I sound sick to you? Researchers are building AI that would diagnose COVID-19 by listening to people talk. Business Insider, 30 Apr 2020. Article;

Deducing emotional, physiological, psychological and behavioral states and traits from voice

Image 1

Vocal fold oscillations are not the sole cues that offer insights into the physiological alterations within our body. Subtleties embedded within our vocal production and control (vocal expression) are also linked with our emotional, behavioral, and psychological characteristics. The deduction of such "states" from vocal nuances has been an area of considerable research, spurred by centuries of observation and correlation between speech patterns and these characteristics.

Our work is two-faceted. In one, we join hands with researchers across the globe, contributing to mainstream techniques for deducing target states from voice. In the other, we move beyond these, aiming to discover the underlying traits: correlations that not only apply universally across populations but also relate specifically to an individual and their current state, and correlations with our genetic makeup. Our research aims to develop technologies to analyze diverse human states and traits, thereby enhancing our comprehension of the intricate relationship between vocal features and human characteristics. As an example, we deduce psychological traits from an aggregations of emotional states of a person -- the emotional spectrum -- and objective measurements of various voice qualities. These are also supported by measurements of low-level features.

Publications

  • Positional Encoding for Capturing Modality Specific Cadence for Emotion Detection. Hira Dhamyal, Bhiksha Raj, and Rita Singh.. Proc. Interspeech 2022: 166-170. 2022. pdf
  • More publications (search for "emotion")

Patent

Applications

"Creating geriatric specialists takes time, and we already have far too few. In a year, fewer than three hundred doctors will complete geriatric training in the United States, not nearly enough to replace the geriatricians going into retirement, let alone alone meet the needs of the next decade. Geriatric psychiatrists, nurses, and social workers are equally needed, and in no better supply. The situation in countries outside the United States appears to be little different. In many, it is worse." -- Quoted from Chad Boult, Geriatrics Professor, in "Being Mortal" by A. Gawande, Surgeon, and Professor at Harvard Medical School and Harvard School of Public Health.
Despite the shrinking pool of geriatric specialists, especially in developed nations like the US, and gerontology's economic unattractiveness, emerging technologies promise a lifeline. As the elderly often struggle with self-care and mobility, access to consistent and affordable medical care remains a challenge. This new technology offers remote, cost-effective health monitoring, reducing reliance on others for transport to medical assessments. While it has broad applications, such as proactive public health monitoring, its true power lies in empowering the elderly, economically disadvantaged, and disabled populations in taking control of many aspects of their healthcare.

  • AI And Medical Diagnostics: Can A Smartphone App Detect Covid-19 From Speech Or A Cough?. Forbes, 5 May 2020. Article;
  • Do I sound sick to you? Researchers are building AI that would diagnose COVID-19 by listening to people talk. Business Insider, 30 Apr 2020. Article;