Center for Voice Intelligence and Security

Basic questions about the human voice

Voice, like all sound, is a pressure wave. It exerts physical pressure on things. The pressure physically moves the diaphragm of a microphone, which converts the motion to electricity, which can then be sensed by electronic devices to "record" the variations. The pressure also moves the diaphragm in each ear, causing it to move, triggering electro-chemical impulses that are sensed by our brain. Thus the nature of voice as an acoustic wave is well understood. However, there are many aspects of the human voice that are not understood -- aspects that we intuitively sense or believe, but with little or no scientific explorations to support them.

At CVIS, we are working to unravel some of the mysteries of the human voice. Some examples are mentioned below.

The voice of humanity

Have you heard a crowd talking? Ten people at a time? Perhaps a hundred? How about a thousand? How about a thousand people saying the exact same thing at the exact same time. What would that sound like? This is an easy experiment to run, because we have the ability to record sound signals, mix them together and play them out. But how many of us have done that? And what would we hear?

In the video above, ten thousand people in Japan sang the same lyrics of a song at the same time. Listening to it, the question we ask is this: Are we hearing the average voice of humanity, or are we hearing the average Japanese voice? Is there such a thing as the average Chinese voice, the average Indian voice, the average French voice and so on? Are there as many average voices as there are regions? Or are there as many average voices as there are ethnicities? Countries? Languages? Genders? Ages? Heights...? We started working on discovering some answers, and had to restart with finding the right question to ask!

Publications

Code

Applications

One of the questions asked of voice evidence in crimes is "Who is this person -- who are we looking for?" Understanding the categories to which a voice belongs may help describe the speaker more accurately, or may boost the confidence of techniques that seek to profile a person from their voice.

The uniqueness of human voice

Image 1 Image 2 Image 3

All our knowledge about the process of voice production, of the physics of sound, math and statistics tells us that the human voice is unique. Our perception indicates otherwise. It seems like similar voices are very common across the globe. Sensing the differences is not important to our survival, so we naturally don't. Can we measure the uniqueness of human voice? Can we make machines sense and gauge it uniqueness?

We are working on understanding exactly where the differences between human voices lie.

Publications

Applications

Can a machine recognize every human on the planet from their voice? If so, can we just walk through airports with just a casual "Hello!" spoken to a machine in a quiet room -- no passports needed? Understanding how to characterize the uniqueness of the human voice quantitatively and reliably will eventually impact the legal definition of a "Voiceprint" -- and what will or will not be accepted in courts of law in relation to voice evidence of any kind.

Voice and our DNA

Image 1

The human voice reflects the intricate interplay between genetics and individuality. At the core of this lies the blueprint of life itself -- DNA. Within our genetic code, specific genes influence the development and characteristics of our vocal apparatus, shaping the unique timbre, pitch, and range that distinguish our voices. Genetic variations contribute to the diversity of vocal traits observed across individuals and populations. We also know that certain genetic mutations can lead to voice alterations in vocal production or voice disorders. It is highly likely that DNA also influences the psychological and behavioral aspects of speech, such as language acquisition, intonation, and emotional expression. Exploring the connection between the human voice and DNA unravels the intricate genetic mechanisms underlying our vocal abilities, paving the way for a deeper understanding of vocal development and the remarkable individuality that resonates through our voices.

How do we even begin to explore this? Our voice changes subtly each day, responding to the elements that perturb us. This happens because the process of voice production is extremely complex. Many many factors within and around us influence it -- including our thoughts, reactions, feelings and other mental processes. We are looking into cytogenetics and genomics to understand the relationships between our DNA and our voice, and thereby the relationship of voice to factors that disturb gene-mediated functions.

Publications

  • A Gene-Based Algorithm for Identifying Factors That May Affect a Speaker's Voice. Rita Singh. 2023. Entropy 25, No. 6: 897. Paper
  • Other publications

Code

Applications

Hundreds of factors influence us each day -- our environment, climate, temperature, what we eat or drink, the physical and mental perturbations we experience due to disease and so on. Changes in voice thet result from some of these are not preceived by us. If we could discern -- even without direct observation -- just what factors might cause a change in voice (and the nature of those changes), we could build automated systems to detect (from voice) the factors that are affecting us at any time. This could power many applications that could benefit us. Using genetic knowledge to explore the relationship of various perturbing factors to changes in voice production is one of several powerful approaches that could enable this.

The range of variability of the human voice, and voice disguise

Human voice, individually and collectively, can vary naturally over a wide range. Most of these variations are involuntary, and stem from physiological traits, psychological states, sociolinguistic elements, and environmental conditions. Nevertheless, the extent to which it can vary -- its range of variation -- is not charted. Interestingly, this lack of understanding also extends to voice disguise, raising questions about the limits of vocal manipulation. Mimicry is a fine art. However, as we observe voice mimics (as in the example given above), interesting questions arise that currently remain unanswered. Some questions are: How pervasive are the changes in voice during impersonation? Is it possible to actually erase or infuse (false) biometric information in voice through these changes? From both production and perception perspectives, just how far can we go with voice disguise? How much, and what aspects of our voice can we change?

The variabilities of the human voice, which is a complex blend of identity, emotion, individual variability and a multitude of other factors, remain areas of limited scholarly understanding. Despite our ability to alter pitch, tone, and accent, the underlying physical, physiological and psychological patterns may yet provide discernible signatures. Our effort to understand (and one day, eventaully, to chart) the variability of human voice examines voice voice in a wide range of situations -- ranging from cattle auctioneering, to stage performances voice actors and mimics, to character portrayals done by professional actors -- and brings to bear on them principles from scientific fields such as signal processing, psychoacoustics, linguistics, wave acoustics, psychology, and much more.

...

Publications

  • Formant manipulations in voice disguise by mimicry. Rita Singh, Deniz Gencaga, and Bhiksha Raj. 2023. In 2016 4th International Conference on Biometrics and Forensics (IWBF). 2016. Paper
  • (Won the best paper award)

  • Other publications

Applications

Identifying the elements of our voice that are voluntarily changeable, and those that aren't under our voluntary control, can help refine algorithms used in both voice intelligence and security analysis, leading to more precise results. Recognizing the immutable biometric features of the voice can enable strong authentication and more accurate profiling results, as specific application examples.