Center for Voice Intelligence and Security

Voice Security

Protecting our voiceprints

We make judgments about people from their voices all the time. Very soon, machines will analyze our voices and know more about us than we do -- because they will be able to make much finer-level, much more accurate judgments about us (and what influences us at the time of speaking) from our voice.

Voice is a potent biometric -- just like fingerprints and DNA. However, unlike fingerprints and DNA, it carries information about us, about our environtment and other factors at the time of speaking. Given the uniqueness of our voice, and given that it is possible to derive an ever-increasing amount of information from it, the question to ponder is: how will we protect our voices in the future? Will it be possible to de-identify voices? From what we know by now, that is akin to asking ``can we remove DNA from blood?'' If not, then what are our options?

Our work on voice security focuses on exploring and developing some viable options, while at the same time building technologies for preventing the abuse and misuse of voice if these options prove to be insufficient. Some of our work is described on this page.

Privacy preserving voice processing

Image 1 Image 2

Privacy-preserving voice processing is focused on safeguarding personal voice data from unauthorized access. It employs many sophisticated techniques such as homomorphic encryption, differential privacy, and secure multi-party computation to securely process voice data.

Homomorphic encryption allows operations directly on encrypted data, delivering results that, when decrypted, match operations as if performed on the raw data. Differential privacy offers statistical accuracy in data analysis without revealing information specific to individuals. Secure multi-party computation permits multiple parties to compute collective data results without revealing individual inputs. These methods, combined with voice biometrics and anonymization techniques, ensure that voice data can be processed and analyzed without compromising the privacy of the speaker.

Publications

  • Privacy-Preserving Machine Learning for Speech Processing. Manas Pathak. PhD thesis, Carnegie Mellon University, March 2012. pdf
  • Privacy-preserving Frameworks for Speech Mining. Jose Portelo. PhD thesis, Universidad de Lisboa, October 2015. pdf
  • An Information Theoretic Approach for Privacy Preservation in Distance-based Machine Learning. Abelino Jimenez. PhD thesis, Carnegie Mellon University, June 2019. pdf
  •  More information and related publications

Applications

Privacy preserving voice processing is vital in securing the confidentiality and integrity of applications such as voice assistants, call centers, and telecommunication services.


  • The computers are listening The Intercept, 11 May 2015. Article
  • Sound familiar? The Economist, 13 Sep 2012. Article

Technologies for adversarial robustness

Image 1 Image 2

Adversarial attacks involve intentional alterations in the input data, aimed at causing specific types of erroneous outputs that suit the purposes of the adversary. In systems that deal with speech, these attacks can manipulate speech signals subtly, deceiving Automatic Speech Recognition (ASR) systems, speaker recognition models, or voice biometric security systems into producing desired (wrong) outputs. Technologies for adversarial robustness are focused on counteracting these threats.

Adversarial robustness focuses on enhancing system defenses via techniques such as adversarial training, which augments training data with adversarial examples, and defensive distillation, a process that makes models less sensitive to input perturbations. Incorporating gradient masking or reducing gradient interpretability can also fortify against attacks. Machine learning models such as Generative Adversarial Networks (GANs) can be employed to generate robust synthetic speech data.

Publications

  • Assessing and enhancing adversarial robustness in context and applications to speech security. Raphael Olivier. PhD thesis, Carnegie Mellon University, July 2023. pdf
  • More publications (search for "privacy" or "security")

Applications

As the adoption of speech processing systems like smart voice assistants and voice-controlled IoT devices increases, ensuring their adversarial robustness becomes paramount to maintain trust and ensure secure, accurate operations.


AI systems for transformation, generation and detection of synthetic voices

Image 2

Synthetic voice generation models generate human-like speech from text by learning from vast quantities of voice data. These models create an audio waveform, matching human speech patterns and tones, thereby generating synthetic voices that closely resemble human speech.

While synthetic voice technologies offer significant benefits in domains like entertainment, virtual assistants and accessibility, they also present challenges. Their capacity to create realistic, human-like speech has led to the rise of 'deepfakes' -- synthetic media where a person's voice is replicated with high accuracy. Such technology can be misused for misinformation, fraud, or cybercrime, making an individual appear to say things they never did. This raises significant privacy and security concerns. We are working to develop technologies that can detect synthetic speech.

Publications

  • Audio Deepfake Detection Based on Differences in Human and Machine Generated Speech. Yang Gao. PhD thesis, Carnegie Mellon University, Jan 2023. pdf
  • More publications (search for "privacy" or "security")

Applications

These tools would help verify the authenticity of audio content, combat the misuse of deepfake technologies, and uphold trust in digital communications.


Voice steganography

Image 2 Image 2

Steganography is the art of hiding information. In cryptography, information is hidden in plain sight -- it is encrypted and often impossible to decrypt without the right keys. In steganography, the very existence of hidden information is hidden. We are using AI techniques to find ways to hide information imperceptibly in voice signals, and also for steganalysis -- to detect hidden information in voice signals.

Publications

  • Hide and speak: Deep neural networks for speech steganography Felix Kreuk, Yossi Adi, Bhiksha Raj, Rita Singh, and Joseph Keshet. Interspeech 2020. pdf and code
  • More publications

Applications

These tools would help verify the authenticity of audio content, combat the misuse of deepfake technologies, and uphold trust in digital communications.


Voice authentication

Image 2 Image 2

Voice authentication leverages the biometric characteristics of an individual's voice to verify a speaker. It serves as a secure and convenient alternative to traditional passwords and PIN-based systems, and is currently used in banking, customer service, smart homes, and mobile device security in conjuction with other biometric authentication methods, such as fingerprint and face recognition.

Voice authentication systems work by extracting features from the user's speech that carry the essence of a speaker's unique identity. Like the voice signal, these features comprise unique voiceprints for speakers. During authentication, the system compares the user's voice against voiceprint-generated information to verify their identity. While voice authentication provides ease of use and increased security, it's not foolproof. Background noise, illness, aging, and advanced synthetic voice technologies can potentially affect its accuracy. Furthermore, privacy concerns arise regarding the storage and potential misuse of biometric data, necessitating robust data protection measures.

At CVIS, we continue to work on cutting-edge biometric security technologies for voice authentication.

Publications

  • Optimizing Neural Network Embeddings Using a Pair-Wise Loss for Text-Independent Speaker Verification. Hira Dhamyal, Tianyan Zhou, Bhiksha Raj, Rita Singh. In 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 742-748. IEEE, 2019. pdf
  • Masked proxy loss for text-independent speaker verification. Jiachen Lian, Aiswarya Vinod Kumar, Hira Dhamyal, Bhiksha Raj, and Rita Singh. In INTERSPEECH 2021, Brno, Czechia (Czech Republic). 2021. pdf
  • More publications

Applications

When voice verification systems becomes completely accurate and secure, some procedures may become much more efficient, easy and convenient, such as entry through airports and secure areas.