Center for Voice Intelligence and Security

Profiling Humans from their Voice
Rita Singh
First published: July 2019
Publisher: Springer, Singapore
Copyright 2019 Springer-Nature, Switzerland, July 2019
ISBN: ISBN 978-981-13-8402-8
Also available on springer.com, other bookstores and ebay.

Chapter citations:

Profiling and its facets, Rita Singh in Profiling humans from their voice, Ch.1, pp.3-26, Springer, July 2019. pdf
Production and perception of voice, Rita Singh in Profiling humans from their voice, Ch.2, pp.27-83, Springer, July 2019. pdf
Relations Between voice and profile parameters, Rita Singh in Profiling humans from their voice, Ch.3, pp.85-131, Springer, July 2019. pdf
The voice Signal and its information content - 1, Rita Singh in Profiling humans from their voice, Ch.4, pp.133-169, Springer, July 2019. pdf
The voice Signal and its information content - 2, Rita Singh in Profiling humans from their voice, Ch.5, pp.171-220, Springer, July 2019. pdf
Qualitative aspects of the voice signal, Rita Singh in Profiling humans from their voice, Ch.6, pp.221-266, Springer, July 2019. pdf
Feature engineering for profiling, Rita Singh in Profiling humans from their voice, Ch.7, pp.269-298, Springer, July 2019. pdf
Mechanisms for profiling, Rita Singh in Profiling humans from their voice, Ch.8, pp.299-324, Springer, July 2019. pdf
Reconstruction of the human persona in 3D from voice, and its reverse, Rita Singh in Profiling humans from their voice, Ch.9, pp.325-363, Springer, July 2019. pdf
Applied profiling: Uses, reliability and ethics, Rita Singh in Profiling humans from their voice, Ch.10, pp.365-405, Springer, July 2019. pdf

Techniques for Noise Robustness in Automatic Speech Recognition
Tuomas Virtanen, Rita Singh, Bhiksha Raj (Eds)
First published:5 October 2012
Copyright 2013 John Wiley & Sons, Ltd
Print ISBN:9781119970880 |Online ISBN:9781118392683 |DOI:10.1002/9781118392683

Research papers

2025

Soham Deshmukh, Shuo Han, Hazim Bukhari, Benjamin Elizalde, Hannes Gamper, Rita Singh, and Bhiksha Raj. "Audio Entailment: Assessing Deductive Reasoning for Audio Understanding." Proceedings of the AAAI Conference on Artificial Intelligence, AAAI 2025. pdf
Satvik Dixit, Soham Deshmukh, and Bhiksha Raj. "MACE: Leveraging Audio for Evaluating Audio Captioning Systems." Under review (IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) -- SALMA Workshop , 2025.

2024

Abdul Waheed, Hanin Atwany, Bhiksha Raj, and Rita Singh. "What Do Speech Foundation Models Not Learn About Speech?" arXiv preprint arXiv:2410.12948, 2024. pdf
Abdul Waheed, Karima Kadaoui, Bhiksha Raj, and Muhammad Abdul-Mageed. "uDistil-Whisper: Label-Free Data Filtering for Knowledge Distillation in Low-Data Regimes." arXiv preprint arXiv:2407.01257, 2024. pdf
Abdul Waheed, Karima Kadaoui, and Muhammad Abdul-Mageed. "To Distill or Not to Distill? On the Robustness of Robust Knowledge Distillation." arXiv preprint arXiv:2406.04512, 2024. pdf
Avner May, Dmitriy Serdyuk, Ankit Parag Shah, Otavio Braga, and Olivier Siohan. "Audio-Visual Fine-Tuning of Audio-Only ASR Models." arXiv preprint arXiv:2312.09369, 2023. pdf
Benjamin Elizalde, Soham Deshmukh, Huaming Wang. " Natural language supervision for general-purpose audio representations." ICASSP 2024. pdf
Chien-yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan Sharma, Shinji Watanabe, Bhiksha Raj, Shady Shehata, Hung-yi Lee. "Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech." arXiv preprint arXiv:2410.08271, 2024. pdf
Dareen Alharthi, Roshan Sharma, Hira Dhamyal, Bhiksha Raj, and Rita Singh. "Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech." INTERSPEECH 2024, Kos, Greece. 2024. pdf
Fan Yang, Muqiao Yang, Xiang Li, Yuxuan Wu, Zhiyuan Zhao, Bhiksha Raj, Rita Singh. "A Closer Look at Reinforcement Learning-based Automatic Speech Recognition." In Computer Speech & Language (Elsevier). pdf
Hao Chen, Abdul Waheed, Xiang Li, Yidong Wang, Jindong Wang, Bhiksha Raj, and Marah I. Abdin. "On the Diversity of Synthetic Data and Its Impact on Training Large Language Models." arXiv preprint arXiv:2410.15226, 2024. pdf
Hao Chen, Jindong Wang, Lei Feng, Xiang Li, Yidong Wang, Xing Xie, Masashi Sugiyama, Rita Singh, and Bhiksha Raj. "A General Framework for Learning from Weak Supervision." Proceedings of the 41st International Conference on Machine Learning (ICML), 2024. pdf
Hazim Bukhari, Soham Deshmukh, Hira Dhamyal, Bhiksha Raj, and Rita Singh. "SELM: Enhancing Speech Emotion Recognition for Out-of-Domain Scenarios." Interspeech, 2024. pdf
Hira Dhamyal and Rita Singh. "Objective Measurements of Voice Quality." arXiv preprint (To be uploaded).
Hira Dhamyal, Benjamin Elizalde, Soham Deshmukh, Haizhou Wang, Bhiksha Raj, and Rita Singh. "Prompting Audios Using Acoustic Properties for Emotion Representation." ICASSP, 2024. pdf
Hira Dhamyal, Bhiksha Raj, and Rita Singh. "Understanding Personality Bases." (To be submitted).
Jiayi Zhang and Rita Singh. "Vocal Fold Dynamics for Automatic Detection of Amyotrophic Lateral Sclerosis from Voice." Proceedings of the 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024. pdf
Jee-weon Jung, Roshan Sharma, William Chen, Bhiksha Raj, and Shinji Watanabe. "AugSumm: Towards Generalizable Speech Summarization Using Synthetic Labels from Large Language Models." Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024. pdf
Jiatong Shi, Jinchuan Tian, Yihan Wu, Jee-weon Jung, Jia Qi Yip, Yoshiki Masuyama, William Chen, Yuning Wu, Yuxun Tang, Massa Baali, Dareen Alharthi, Dong Zhang, Ruifan Deng, Tejes Srivastava, Haibin Wu, Alexander H. Liu, Bhiksha Raj, Qin Jin, Ruihua Song, and Shinji Watanabe. "ESPNET-CODEC: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and Speech." Proceedings of SLT, 2024 (Published). pdf
Joseph Konan, Dareen Alharthi, Massa Baali, Shuo Han, Rita Singh, and Bhiksha Raj. "TIMIT-VC: A Voice Clone Detection Dataset with Multimodal Feature Extraction." (IRB Pending)
Joseph Konan, Shikhar Agnihotri, Ojas Bhargave, Shuo Han, Yunyang Zeng, Ankit Shah, and Bhiksha Raj. "Psychoacoustic Challenges of Speech Enhancement on VoIP Platforms." SynData4GenAI Workshop at Interspeech, 2024. pdf
Kai Qiu, Xiang Li, Hao Chen, Jie Sun, Jinglu Wang, Zhe Lin, Marios Savvides, and Bhiksha Raj. "AAR: Efficient Autoregressive Audio Modeling via Next-Scale Prediction." arXiv preprint arXiv:2408.09027, 2024. pdf
Kai Hu, Weichen Yu, Tianjun Yao, Xiang Li, Wenhe Liu, Lijun Yu, Yining Li, Kai Chen, Zhiqiang Shen, Matt Fredrikson. "Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization." The Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS 2024). pdf
Khai Duy Doan, Abdul Waheed, and Muhammad Abdul-Mageed. "Towards Zero-Shot Text-To-Speech for Arabic Dialects." 2024. pdf
Ksheeraja Raghavan, Samiran Gode, Ankit Shah, Surabhi Raghavan, Wolfram Burgard, Bhiksha Raj, and Rita Singh. "Did You Hear That? Introducing AADG: A Framework for Generating Benchmark Data in Audio Anomaly Detection." 2024. pdf
Kuang Yuan, Shuo Han, Swarun Kumar, and Bhiksha Raj. "DeWinder: Single-Channel Wind Noise Reduction using Ultrasound Sensing." Proceedings of Interspeech, 2024. pdf
Massa Baali, Abdulhamid Aldoobi, Hira Dhamyal, Rita Singh, and Bhiksha Raj. "PDAF: A Phonetic Debiasing Attention Framework for Speaker Verification." In Proc. IEEE Spoken Language Technology Workshop (SLT 2024), Macao, China. pdf
Minghao Liu, Zonglin Di, Jiaheng Wei, Zhongruo Wang, Hengxiang Zhang, Ruixuan Xiao, Haoyu Wang, Jinlong Pang, Hao Chen, Ankit Shah, Hongxin Wei, Xinlei He, Zhaowei Zhao, Haobo Wang, Lei Feng, Jindong Wang, James Davis, and Yang Liu. "Automatic Dataset Construction (ADC): Sample Collection, Data Curation, and Beyond." arXiv preprint arXiv:2408.11338, 2024. pdf
Minghao Wu, Abdul Waheed, Chiyu Zhang, Muhammad Abdul-Mageed, and Alham Fikri Aji. "LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions." Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2024. pdf
Muhammad Ahmed Shah and Bhiksha Raj. "Fixed Inter-Neuron Covariability Induces Adversarial Robustness." Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024), 2024. pdf
Muhammad Ahmed Shah and Bhiksha Raj. "Revisiting Acoustic Features for Robust ASR." arXiv preprint arXiv:2409.16399, 2024 (Submitted to ICASSP). pdf
Muhammad Ahmed Shah, David Solans, Mika Heikkila, Bhiksha Raj, and Nicolas Kourtellis. "Speech Robust Bench: A Robustness Benchmark for Speech Recognition." arXiv preprint arXiv:2403.07937, 2024 (Submitted to ICLR). pdf
Muqiao Yang, Chunlei Zhang, Yong Xu, Zhongweiyang Xu, Heming Wang, Bhiksha Raj, Dong Yu. "Usee: Unified Speech Enhancement and Editing with Conditional Diffusion Models." In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2024). pdf
Muqiao Yang, Umberto Cappellazzo, Xiang Li, Bhiksha Raj. "Improving Continual Learning of Acoustic Scene Classification via Mutual Information Optimization." In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2024), pp. 7105-7109. 2024. pdf
Muqiao Yang, Xiang Li, Umberto Cappellazzo, Shinji Watanabe, Bhiksha Raj. "Towards Unified Evaluation of Continual Learning in Spoken Language Understanding." In Proc. Annual Conference of the International Speech Communication Association (Interspeech 2024). pdf
Oscar Chang, Hank Liao, Dmitriy Serdyuk, Ankit Shah, and Olivier Siohan. "Conformer is All You Need for Visual Speech Recognition." Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024. pdf
Roshan Sharma, Hira Dhamyal, Mark Lindsey, Suwon Shon, Rita Singh, and Bhiksha Raj. "Speech versus Transcript: Does It Matter for Human Annotators in Speech Summarization?" ACL, 2024. pdf
Roshan Sharma, Ruchira Sharma, Hira Dhamyal, and Bhiksha Raj. "R-BASS: Relevance Aware Blockwise Adaptation for Speech Summarization." Findings of NAACL, 2024. pdf
Siddhant Arora, Ankita Pasad, Chung-Ming Chien, Jionghao Han, Roshan Sharma, Jee-weon Jung, Hira Dhamyal, William Chen, Suwon Shon, Hung-yi Lee, Karen Livescu, Shinji Watanabe. "On the Evaluation of Speech Foundation Models for Spoken Language Understanding." Findings of ACL, 2024. pdf
Soham Deshmukh, Dareen Alharthi, Benjamin Elizalde, Hannes Gamper, Mahmoud Al Ismail, Rita Singh, Bhiksha Raj, and Huaming Wang. "PAM: Prompting Audio-Language Models for Audio Quality Assessment." Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), 2024. pdf
Soham Deshmukh, Rita Singh, Bhiksha Raj. "Domain Adaptation for Contrastive Audio-Language Models." INTERSPEECH 2024, Kos, Greece. 2024 pdf
Soham Deshmukh, Benjamin Elizalde, Dimitra Emmanouilidou, Bhiksha Raj, Rita Singh, Huaming Wang. "Training Audio Captioning Models without Audio." ICASSP 2024. pdf
Xuankai Chang, Brian Yan, Kwanghee Choi, Jee-weon Jung, Yichen Lu, Soumi Maiti, Roshan Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, and Hsiu-Hsuan Wang. "Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study." 2024. pdf
Xiang Li, Kai Qiu, Hao Chen, Jason Kuen, Jiuxiang Gu, Jindong Wang, Zhe Lin, and Bhiksha Raj. "XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive Generation." arXiv preprint arXiv:2412.01762, 2024. pdf
Xiang Li, Hao Chen, Kai Qiu, Jason Kuen, Jiuxiang Gu, Bhiksha Raj, and Zhe Lin. "ImageFolder: Autoregressive Image Generation with Folded Tokens." arXiv preprint arXiv:2410.01756, 2024. pdf
Xiang Li, Jinglu Wang, Xiaohao Xu, Xiulian Peng, Rita Singh, Yan Lu, Bhiksha Raj. "QDFormer: Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition." 2024. pdf
Xiang Li, Jinglu Wang, Xiaohao Xu, Rita Singh, Yan Lu, Bhiksha Raj. "Rethinking Audiovisual Segmentation with Semantic Quantization and Decomposition." The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024). pdf
Xiang Li, Kai Qiu, Hao Chen, Jason Kuen, Zhe Lin, Rita Singh, and Bhiksha Raj. "ControlVAR: Exploring Controllable Visual Autoregressive Modeling." 2024. pdf
Xiang Li, Kai Qiu, Jinglu Wang, Xiaohao Xu, Rita Singh, Kashu Yamazaki, Hao Chen, Xiaonan Huang, Bhiksha Raj. "R^2-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations." The European Conference on Computer Vision (ECCV 2024). pdf
Xiang Li, Yinpeng Chen, Chung-Ching Lin, Hao Chen, Kai Hu, Rita Singh, Bhiksha Raj, Lijuan Wang, and Zicheng Liu. "Completing Visual Objects via Bridging Generation and Segmentation." Proceedings of the 41st International Conference on Machine Learning (ICML), 2024. pdf
Xukun Zhou, Jiwei Li, Tianwei Zhang, Lingjuan Lyu, Muqiao Yang, Jun He. "Backdoor Attacks with Input-Unique Triggers in NLP." In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2024). pdf
Zhaorun Chen, Zhuokai Zhao, Zhihong Zhu, Ruiqi Zhang, Xiang Li, Bhiksha Raj, Huaxiu Yao. "AutoPRM: Self-supervised Fine-grained Feedback for Multi-Step Reasoning via Controllable Question Decomposition." The 2024 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024). pdf
Zhongweiyang Xu, Yong Xu, Vinay Kothapally, Heming Wang, Muqiao Yang, Dong Yu. "SpatialCodec: Neural Spatial Speech Coding." In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2024). pdf

2023

Ankit Shah, Shuyi Chen, Kejun Zhou, Yue Chen, Bhiksha Raj. "Approach to Learning Generalized Audio Representation Through Batch Embedding Covariance Regularization and Constant-Q Transforms." 2023. pdf
Ankit Shah, Larry Tang, Po Hao Chou, Yi Yu Zhang, Ziqian Ge, Bhiksha Raj. "An Approach to Ontological Learning from Weak Labels." 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023).
Benjamin Elizalde, Soham Deshmukh, Mahmoud Al Ismail, Huaming Wang. "Clap: learning audio concepts from natural language supervision." ICASSP 2023. pdf
Chien-yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan Sharma, Shinji Watanabe, Bhiksha Raj, Shady Shehata, Hung-yi Lee, "Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech." arXiv preprint arXiv:2309.09510 (2023). pdf
Francisco Teixeira, Alberto Abad, Bhiksha Raj and Isabel Trancoso. "Privacy-oriented manipulation of speaker representations." in IEEE Access, vol. 12, pp. 82949-82971, 2024, doi: 10.1109/ACCESS.2024.3409067. pdf .
Francisco Teixeira, Alberto Abad, Bhiksha Raj, and Isabel Trancoso. "Privacy-Preserving A utomatic Speaker Diarization." In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, 2023. pdf
Hao Chen, Ran Tao, Yue Fan, Yidong Wang, Jindong Wang, Bernt Schiele, Xing Xie, Bhiksha Raj, and Marios Savvides. "Softmatch: Addressing the quantity-quality trade-off in semi-supervised learning." 2023 International Conference on Learning Representations (ICLR 2023). 2023. pdf
Hao Chen, Ran Tao, Yue Fan, Yidong Wang, Jindong Wang, Bernt Schiele, Xing Xie, Bhiksha Raj, and Marios Savvides. "Softmatch: Addressing the quantity-quality trade-off in semi-supervised learning." arXiv preprint arXiv:2301.10921 (2023). pdf
Hao Chen, Jindong Wang, Ankit Shah, Ran Tao, Hongxin Wei, Xing Xie, Masashi Sugiyama, and Bhiksha Raj. "Understanding and mitigating the label noise in pre-training on downstream tasks." arXiv preprint arXiv:2309.17002 (2023).
Hira Dhamyal, Benjamin Elizalde, Soham Deshmukh, Huaming Wang, Bhiksha Raj, and Rita Singh. "Prompting Audios Using Acoustic Properties For Emotion Representation." arXiv preprint arXiv:2310.02298 (2023).
Joseph Konan, Ojas Bhargave, Shikhar Agnihotri, Shuo Han, Yunyang Zeng, Ankit Shah, and Bhiksha Raj. "Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms." arXiv preprint arXiv:2310.07161 (2023).
Joseph Konan, Ojas Bhargave, Shikhar Agnihotri, Hojeong Lee, Ankit Shah, Shuo Han, Yunyang Zeng, Amanda Shu, Haohui Liu, Xuankai Chang, Hamza Khalid, Minseon Gwak, Kawon Lee, Minjeong Kim, Bhiksha Raj. "Improving Perceptual Quality, Intelligibility, and Acoustics on VoIP Platforms." arXiv:2303.09048v1 pdf
Kandaswamy Paramasivan, Bhiksha Raj, Nandan Sudarasanam, Rahul Subburaj. "Prolonged school closure during the pandemic time in successive waves of COVID-19- vulnerability of children to sexual abuses – A case study in Tamil Nadu, India." Heliyon 9 (2023) e1786, Cell Press. Article

Khoa Vo, Sang Truong, Kashu Yamazaki, Bhiksha Raj, Minh-Triet Tran, and Ngan Le. "AOE-Net: Entities interactions modeling with adaptive attention mechanism for temporal action proposals generation." International Journal of Computer Vision 131, No. 1 (2023): 302-323. pdf
Kashu Yamazaki, Khoa Vo, Quang Sang Truong, Bhiksha Raj, and Ngan Le. "VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning." In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 3, pp. 3081-3090. 2023. pdf
Laurie M. Heller, Benjamin Elizalde, Bhiksha Raj, and Soham Deshmukh. "Synergy between human and machine approaches to sound/scene recognition and processing: An overview of ICASSP special session." arXiv preprint arXiv:2302.09719 (2023). pdf
Muhammad Ahmed Shah, Roshan Sharma, Hira Dhamyal, Raphael Olivier, Ankit Shah, Dareen Alharthi, Hazim T. Bukhari, Massa Baali, Soham Deshmukh, Michael Kuhlmann, Bhiksha Raj, and Rita Singh. "LoFT: Local Proxy Fine-tuning For Improving Transferability Of Adversarial Attacks Against Large Language Model." arXiv preprint arXiv:2310.04445 (2023).
Muhammad Ahmed Shah, Bhiksha Raj. "Training on Foveated Images Improves Robustness to Adversarial Attacks." Neural Information Processing Systems (NeurIPS) 2023. pdf
Muqiao Yang, Naoyuki Kanda, Xiaofei Wang, Jian Wu, Sunit Sivasankaran, Zhuo Chen, Jinyu Li, and Takuya Yoshioka. "Simulating realistic speech overlaps improves multi-talker ASR." In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, 2023. pdf
Muqiao Yang, Joseph Konan, David Bick, Yunyang Zeng, Shuo Han, Anurag Kumar, Shinji Watanabe, and Bhiksha Raj. "PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement." In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, 2023. pdf
Raphael Olivier and Bhiksha Raj. "How many perturbations break this model? Evaluating robustness beyond adversarial accuracy." The Fortieth International Conference on Machine Learning (ICML) 2023. pdf
Raphael Olivier, Hadi Abdullah and Bhiksha Raj. "Transferable Adversarial Perturbations between Self-Supervised Speech Recognition Models." 2nd ICML Workshop on New Frontiers in Adversarial Machine Learning, 2023. pdf
Rita Singh. 2023. "A Gene-Based Algorithm for Identifying Factors That May Affect a Speaker's Voice" Entropy 25, No. 6: 897. pdf
Roshan Sharma, William Chen, Takatomo Kano, Ruchira Sharma, Atsunori Ogawa, Marc Delcroix, Shinji Watanabe, Rita Singh, Bhiksha Raj, "Introducing the Interview dataset and benchmarking methods for speech summarization", Automatic Speech Recognition and Understanding Workshop (ASRU), Taiwan. 2023.
Roshan Sharma, Kenneth Zheng, Siddhant Arora, Shinji Watanabe, Rita Singh, Bhiksha Raj, "BASS: Block-wise Adaptation for Speech Summarization", In Proc. Interspeech 2023. Dublin, Ireland 2023. pdf
Roshan Sharma, Suyoun Kim, Daniel Lazar, Trang Le, Akshat Shrivastava, Kwanghoon An, Piyush Kansal, Leda Sari, Ozlem Kalinli, Michael Seltzer, "Augmenting text for spoken language understanding with Large Language Models." arXiv preprint arXiv:2309.09390 (2023). pdf
Roshan Sharma, Weipeng He, Ju Lin, Egor Lakomkin, Yang Liu and Kaustubh Kalgaonkar. "Egocentric Audio-Visual Noise Suppression.", In Proc. 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes. 2023. pdf
Roshan Sharma, Kenneth Zheng, Siddhant Arora, Shinji Watanabe, Rita Singh, Bhiksha Raj. "BASS: Block-wise Adaptation for Speech Summarization." arXiv preprint arXiv:2307.08217 (2023).
Shentong Mo, Bhiksha Raj. "Weakly-Supervised Audio-Visual Segmentation." Neural Information Processing Systems (NeurIPS) 2023.
Samiran Gode, Supreeth Bare, Bhiksha Raj, and Hyungon Yoo. "Understanding Political Polarisation using Language Models: A dataset and method." arXiv preprint arXiv:2301.00891 (2023). pdf
Soham Deshmukh, Benjamin Elizalde, Rita Singh, Huaming Wang. "Pengi: An Audio Language Model for Audio Tasks." Neural Information Processing Systems (NeurIPS) 2023. code and paper
Suwon Shon, Siddhant Arora, Chyi-Jiunn Lin, Ankita Pasad, Felix Wu, Roshan Sharma, Wei Yu Wu, Hung-yi Lee, Karen Livescu, and Shinji Watanabe. "SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks." In Proc. 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023), Toronto, Canada. 2023. pdf
Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Roshan Sharma, Kohei Matsuura, and Shinji Watanabe, "Speech summarization of long spoken document: improving memory efficiency of speech/text encoders", In Proc. 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes. 2023. pdf
Thanh-Dat Truong, Ngan Hoang Le, Bhiksha Raj, Jackson Cothren, Khoa Luu. "FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding." Conference on Computer Vision and Pattern Recognition (CVPR), 2023. pdf
Thanh-Dat Truong, Hoang-Quan Nguyen, Bhiksha Raj, Khoa Luu. "Fairness Continual Learning Approach to Semantic Scene Understanding in Open-World Environments." Neural Information Processing Systems (NeurIPS) 2023.
Umberto Cappellazzo, Enrico Fini, Muqiao Yang, Daniele Falavigna, Alessio Brutti, and Bhiksha Raj. "Continual Contrastive Spoken Language Understanding." arXiv preprint arXiv:2310.02699 (2023).
Wayne Zhao and Rita Singh. "Deriving Vocal Fold Oscillation Information from Recorded Voice Signals Using Models of Phonation." Entropy 2023, 25(7), 1039; Special issue on Information-Theoretic Approaches in Speech Processing and Recognition. 2023. pdf
Xiang Li, Jinglu Wang, Xiaohao Xu, Muqiao Yang, Fan Yang, Rita Singh, Bhiksha Raj. "Towards Noise-Tolerant Speech-Referring Video Object Segmentation: Bridging Speech and Text." Empirical Methods in Natural Language Processing (EMNLP) 2023.
Xiang Li, Chung-Ching Lin, Yinpeng Chen, Zicheng Liu, Jinglu Wang, Bhiksha Raj, Rita Singh. "PaintSeg: Painting Pixels for Training-free Segmentation." Neural Information Processing Systems (NeurIPS) 2023. pdf
Xiang Li, Yandong Wen, Muqiao Yang, Jinglu Wang, Rita Singh, Bhiksha Raj. "Rethinking Voice-Face Correlation: A Geometry View." Proceedings of the 31st ACM International Conference on Multimedia (ACM-Multimedia), 2023. pdf
Xiang Li, Yinpeng Chen, Chung-Ching Lin, Rita Singh, Bhiksha Raj, Zicheng Liu. "Completing Visual Objects via Bridging Generation and Segmentation." arXiv preprint arXiv::2310.00808 (2023).
Xiang Li, Jinglu Wang, Xiaohao Xu, Xiulian Peng, Rita Singh, Yan Lu, Bhiksha Raj. "Rethinking Audiovisual Segmentation with Semantic Quantization and Decomposition." arXiv preprint arXiv:2310.00132 (2023).
Xiang Li, Jinglu Wang, Xiaohao Xu, Xiao Li, Bhiksha Raj and Yan Lu. "Robust Referring Video Object Segmentation with Cyclic Structural Consensus." 2023 International Conference on Computer Vision (ICCV). 2023.
Xiang Li, Haoyuan Cao, Shijie Zhao, Junlin Li, Li Zhang, Bhiksha Raj. "Panoramic Video Salient Object Detection with Ambisonic Audio Guidance". In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, No. 2, pp. 1424-1432. pdf
Xuankai Chang, Brian Yan, Kwanghee Choi, Jeeweon Jung, Yichen Lu, Soumi Maiti, Roshan Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, and Hsiu-Hsuan Wang, "Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study." arXiv preprint arXiv:2309.15800 (2023). pdf
Yandong Wen, Weiyang Liu, Yao Feng, Bhiksha Raj, Rita Singh, Adrian Weller, Michael Black and Bernhard Scholkopf. "Pairwise Similarity is SimPLE." 2023 International Conference on Computer Vision (ICCV). 2023.
Yidong Wang, Hao Chen, Qiang Heng, Wenxin Hou, Marios Savvides, Takahiro Shinozaki, Bhiksha Raj, Zhen Wu, and Jindong Wang. "Freematch: Self-adaptive thresholding for semi-supervised learning." 2023 International Conference on Learning Representations (ICLR 2023). 2023. pdf
Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jeeweon Jung, Soumi Maiti, Shinji Watanabe, "Reproducing Whisper Training using an Open-Source tool and Public Data", Automatic Speech Recognition and Understanding Workshop (ASRU), Taiwan. 2023.
Yunyang Zeng, Joseph Konan, Shuo Han, David Bick, Muqiao Yang, Anurag Kumar, Shinji Watanabe, and Bhiksha Raj. "TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement." 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023).
Yutian Chen, Hao Kang, Vivian Zhai, Liangze Li, Rita Singh, Bhiksha Raj. "Token Prediction as Implicit Classification to Identify LLM-Generated Text." Empirical Methods in Natural Language Processing (EMNLP) 2023. pdf

2022

Yandong Wen. "Reconstruction of Human Faces from Voice." PhD Thesis, Carnegie Mellon University. May 2022. pdf
Roshan Sharma, Tyler Vuong, Mark Lindsey, Hira Dhamyal, Rita Singh, Bhiksha Raj. "Self-supervision and Learnable STRFs for Age, Emotion, and Country Prediction." In Proceedings of the 39th International Conference on Machine Learning (ICML 2022), Expressive Vocalizations Workshop and Competition. 2022. pdf
Sunit Sivasankaran, Chenda Li, Takuya Yoshioka. "Exploring Pre-training and Self-training for Noise Robust ASR." Proc. Interspeech 2022.??????
Yang, Muqiao, Ian Lane, and Shinji Watanabe. "Online continual learning of end-to-end speech recognition models." In Proc. Interspeech 2022., pp. 2668-2672. doi: 10.21437/Interspeech.2022-11093. pdf
Yang, Muqiao, Joseph Konan, David Bick, Anurag Kumar, Shinji Watanabe, and Bhiksha Raj. "Improving speech enhancement through fine-grained speech characteristics." In Proc. Interspeech 2022., pp. 2953-2957. doi: 10.21437/Interspeech.2022-11161. pdf
Vuong, Tyler, Nikhil Madaan, Rohan Panda, and Richard M. Stern. "Investigating the Important Temporal Modulations for Deep-Learning-Based Speech Activity Detection." In 2022 IEEE Spoken Language Technology Workshop (SLT), pp. 525-531. IEEE, 2023.
Li, Xiang, Jinglu Wang, Xiao Li, and Yan Lu. "Hybrid instance-aware temporal fusion for online video instance segmentation." In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 2, pp. 1429-1437. 2022. pdf
Li, Xiang, Jinglu Wang, Xiao Li, and Yan Lu. "Video instance segmentation by instance flow assembly." IEEE Transactions on Multimedia (2022). doi: 10.1109/TMM.2022.3222643. pdf
Zhao, Yizhou, Xun Guo, and Yan Lu. "Semantic-aligned fusion transformer for one-shot object detection." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7601-7611. 2022. doi: 10.1109/CVPR52688.2022.00745. pdf
Zhao, Yizhou, Zhenyang Li, Xun Guo, and Yan Lu. "Alignment-guided temporal attention for video action recognition." Advances in Neural Information Processing Systems 35 (2022): 13627-13639. 2022. pdf
Sharma, Roshan, Shruti Palaskar, Alan W. Black, and Florian Metze. "End-to-end speech summarization using restricted self-attention." In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8072-8076. IEEE, 2022. doi: 10.1109/ICASSP43922.2022.9747320.
Sharma, Roshan, and Bhiksha Raj. "Cross-utterance context for multimodal video transcription." In 2022 56th Asilomar Conference on Signals, Systems, and Computers, pp. 1321-1325. IEEE, 2022. doi: 10.1109/IEEECONF56349.2022.10052073.
Yidong Wang, Hao Chen, Yue Fan, Wang Sun, Ran Tao, Wenxin Hou, Renjie Wang, Linyi Yang, Zhi Zhou, Lan-Zhe Guo, Heli Qi, Zhen Wu, Yu-Feng Li, Satoshi Nakamura, Wei Ye, Marios Savvides, Bhiksha Raj, Takahiro Shinozaki, Bernt Schiele, Jindong Wang, Xing Xie, Yue Zhang. "Usb: A unified semi-supervised learning benchmark for classification." Advances in Neural Information Processing Systems 35 (2022): 3938-3961. pdf
Wang, Yidong, Hao Chen, Yue Fan, Wang Sun, Ran Tao, Wenxin Hou, Renjie Wang et al. "Usb: A unified semi-supervised learning benchmark." arXiv preprint arXiv:2208.07204 (2022). pdf
Chen, Hao, Yue Fan, Yidong Wang, Jindong Wang, Bernt Schiele, Xing Xie, Marios Savvides, and Bhiksha Raj. "An Embarrassingly Simple Baseline for Imbalanced Semi-Supervised Learning." arXiv preprint arXiv:2211.11086 (2022). pdf
Olivier, Raphael, and Bhiksha Raj. "Recent improvements of asr models in the face of adversarial attacks." In Proc. Interspeech 2022. doi: 10.21437/Interspeech.2022-400. pdf
Shah, A., Singh, R., Raj, B. "On learning representations for automatic segmentation of histopathology images." ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 1728-1732. 2022. doi: 10.1109/ICASSP43922.2022.9747520.
Shah, Ankit, Shijie Geng, Peng Gao, Anoop Cherian, Takaaki Hori, Tim K. Marks, Jonathan Le Roux, and Chiori Hori. "Audio-visual scene-aware dialog and reasoning using audio-visual transformers with joint student-teacher learning." In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7732-7736. IEEE, 2022. pdf
Shah, Ankit Parag, Takaaki Hori, Jonathan Le Roux, and Chiori Hori. "DSTC10-AVSD Submission System with Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning." In Proceedings of DSTC10 Workshop at AAAI-2022. 2022. pdf
Hori, Chiori, Ankit Parag Shah, Shijie Geng, Peng Gao, Anoop Cherian, Takaaki Hori, Jonathan Le Roux, and Tim K. Marks. "Overview of Audio Visual Scene-Aware Dialog with Reasoning Track for Natural Language Generation in DSTC10." In Proc. DSTC10 Workshop at AAAI. 2022. pdf
Joseph Turian, Jordie Shier, Humair Raj Khan, Bhiksha Raj, Björn W. Schuller, Christian J. Steinmetz, Colin Malloy, George Tzanetakis, Gissel Velarde, Kirk McNally, Max Henry, Nicolas Pinto, Camille Noufi, Christian Clough, Dorien Herremans, Eduardo Fonseca, Jesse Engel, Justin Salamon, Philippe Esling, Pranay Manocha, Shinji Watanabe, Zeyu Jin, Yonatan Bisk. "Hear: Holistic evaluation of audio representations." In NeurIPS 2021 Competitions and Demonstrations Track, pp. 125-145. PMLR, 2022. pdf
Li, Xiang, Jinglu Wang, Xiaohao Xu, Bhiksha Raj, and Yan Lu. "Online video instance segmentation via robust context fusion." arXiv preprint arXiv:2207.05580 (2022). pdf
Olivier, Raphael, and Bhiksha Raj. "Not all broken defenses are equal: The dead angles of adversarial accuracy." arXiv preprint arXiv:2207.04129 (2022). pdf
Li, Xiang, Jinglu Wang, Xiaohao Xu, Xiao Li, Yan Lu, and Bhiksha Raj. "R^ 2VOS: Robust Referring Video Object Segmentation via Relational Multimodal Cycle Consistency." arXiv preprint arXiv:2207.01203 (2022). pdf
Yang, Muqiao, Joseph Konan, David Bick, Anurag Kumar, Shinji Watanabe, and Bhiksha Raj. "Improving speech enhancement through fine-grained speech characteristics." arXiv preprint arXiv:2207.00237 (2022). pdf
Teixeira, Francisco, Alberto Abad, Bhiksha Raj, and Isabel Trancoso. "Towards End-to-End Private A utomatic Speaker Recognition." arXiv preprint arXiv:2206.11750 (2022). pdf
Chen, Chonghan, Qi Jiang, Chih-Hao Wang, Noel Chen, Haohan Wang, Xiang Li, and Bhiksha Raj. "Bear the Query in Mind: Visual Grounding with Query-conditioned Convolution." arXiv preprint arXiv:2206.09114 (2022). pdf
Wang, Yidong, Hao Chen, Qiang Heng, Wenxin Hou, Marios Savvides, Takahiro Shinozaki, Bhiksha Raj, Zhen Wu, and Jindong Wang. "Freematch: Self-adaptive thresholding for semi-supervised learning." arXiv preprint arXiv:2205.07246 (2022). pdf
Shah, Ankit, Hira Dhamyal, Yang Gao, Daniel Arancibia, Mario Arancibia, Bhiksha Raj, and Rita Singh. "On the pragmatism of using binary classifiers over data intensive neural network classifiers for detection of COVID-19 from voice." arXiv preprint arXiv:2204.04802 (2022). pdf
Olivier, Raphael, and Bhiksha Raj. "Recent improvements of asr models in the face of adversarial attacks." arXiv preprint arXiv:2203.16536 (2022). pdf
Mo, Shentong, Jingfei Xia, Xiaoqing Tan, and Bhiksha Raj. "Point3D: tracking actions as moving points with 3D CNNs." arXiv preprint arXiv:2203.10584 (2022). pdf
Liu, Weiyang, Yandong Wen, Bhiksha Raj, Rita Singh, and Adrian Weller. "Sphereface revived: Unifying hyperspherical face recognition." IEEE Transactions on Pattern Analysis and Machine Intelligence 45, no. 2 (2022): 2458-2474. pdf
Tang, Larry, Po Hao Chou, Yi Yu Zheng, Ziqian Ge, Ankit Shah, and Bhiksha Raj. "Ontological Learning from Weak Labels." arXiv preprint arXiv:2203.02483 (2022). pdf
Dhamyal, Hira, Bhiksha Raj, and Rita Singh. "Positional Encoding for Capturing Modality Specific Cadence for Emotion Detection}}." In Proc. Conference of the International Speech Communication Association (Interspeech 2022) (2022): 166-170.
Ma, Yinghao, and Richard M. Stern. "Learnable Front Ends Based on Temporal Modulation for Music Tagging." arXiv preprint arXiv:2211.15254 (2022). pdf
Zhang, Mengchao, Richard M. Stern, Deborah Moncrieff, Catherine Palmer, and Christopher A. Brown. "Effect of Titrated Exposure to Non-Traumatic Noise on Unvoiced Speech Recognition in Human Listeners with Normal Audiological Profiles." Trends in Hearing 26 (2022): 23312165221117081. pdf
Vuong, Tyler, and Richard Stern. "Improved Modulation-Domain Loss for Neural-Network-based Speech Enhancement}}." Pro c. Interspeech 2022 (2022): 206-210.
Dhamyal, Hira, Benjamin Elizalde, Soham Deshmukh, Huaming Wang, Bhiksha Raj, and Rita Singh. "Describing emotions with acoustic property prompts for speech emotion recognition." arXiv preprint arXiv:2211.07737 (2022). pdf
Sharma, Roshan, and Bhiksha Raj. "Cross-utterance context for multimodal video transcription." In 2022 56th Asilomar Conference on Signals, Systems, and Computers, pp. 1321-1325. IEEE, 2022. pdf
Sharma, Roshan, and Bhiksha Raj. "XNOR-FORMER: Learning Accurate Approximations in Long Speech Transformers." arXiv preprint arXiv:2210.16643 (2022). pdf
Sharma, Roshan, Hira Dhamyal, Bhiksha Raj, and Rita Singh. "Unifying the Discrete and Continuous Emotion labels for Speech Emotion Recognition." arXiv preprint arXiv:2210.16642 (2022). pdf
Olivier, Raphael, and Bhiksha Raj. "There is more than one kind of robustness: Fooling Whisper with adversarial examples." arXiv preprint arXiv:2210.17316 (2022). pdf
Olivier, Raphael, Hadi Abdullah, and Bhiksha Raj. "Watch What You Pretrain For: Targeted, Transferable Adversarial Examples on Self-Supervised Speech Recognition models." arXiv preprint arXiv:2209.13523 (2022). pdf

2021

Wen, Yandong, Weiyang Liu, Adrian Weller, Bhiksha Raj, and Rita Singh. "SphereFace2: Binary Classification is All You Need for Deep Face Recognition." In International Conference on Learning Representations (ICLR 2021). 2021. pdf
Zheng, Xiaochen, Benjamin Kellenberger, Rui Gong, Irena Hajnsek, and Devis Tuia. "Self-supervised pretraining and controlled augmentation improve rare wildlife recognition in uav images." In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2021), pp. 732-741. 2021. pdf
Yandong Wen, Weiyang Liu, Bhiksha Raj and Rita Singh. "Self-Supervised 3D Face Reconstruction via Conditional Estimation." In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2021), pp. 13289-13298. 2021. pdf
Al Ismail, Mahmoud, Soham Deshmukh, and Rita Singh. "Detection of COVID-19 through the analysis of vocal fold oscillations." In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1035-1039. IEEE, 2021. doi: 10.1109/ICASSP39728.2021.9414201. pdf
Deshmukh, Soham, Mahmoud Al Ismail, and Rita Singh. "Interpreting glottal flow dynamics for detecting covid-19 from voice." In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1055-1059. IEEE, 2021. doi: 10.1109/ICASSP39728.2021.9414530. pdf
Olivier, Raphael, and Bhiksha Raj. "Sequential randomized smoothing for adversarially robust speech recognition." In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 63. pdf
Deshmukh, Soham, Bhiksha Raj, and Rita Singh. "Improving weakly supervised sound event detection with self-supervised auxiliary tasks." In Proc. Interspeech 2021, 596-600. 2021. doi: 10.21437/Interspeech.2021-2079. pdf
Zhang, Anxiang, Ankit Shah, and Bhiksha Raj. "Training image classifiers using Semi-Weak Label Data." arXiv preprint arXiv:2103.10608 (2021). pdf
Olivier, Raphael, Bhiksha Raj, and Muhammad Shah. "High-frequency adversarial defense for speech and audio." In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2995-2999. IEEE, 2021. doi: 10.1109/ICASSP39728.2021.9414525. pdf
Xia, Yangyang, Li-Wei Chen, Alexander Rudnicky, and Richard M. Stern. "Temporal Context in Speech Emotion Recognition." In Interspeech, vol. 2021, pp. 3370-3374. 2021. pdf
Vuong, Tyler, Yangyang Xia, and Richard M. Stern. "A modulation-domain loss for neural-network-based real-time speech enhancement." In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6643-6647. IEEE, 2021. pdf
Vuong, Tyler, Yangyang Xia, and Richard M. Stern. "The Application of Learnable STRF Kernels to the 2021 Fearless Steps Phase-03 SAD Challe nge." In Interspeech, pp. 4364-4368. 2021.
Shah, Ankit, Srishti Singh, and Shih-Yen Tao. "Feature extraction and evaluation for BioMedical Question Answering." arXiv preprint arXiv:2105.14013 (2021). pdf
Liu, Wenbo, Ming Li, Xiaobing Zou, and Bhiksha Raj. "Discriminative Dictionary Learning for Autism Spectrum Disorder Identification." Frontiers in Computational Neuroscience 15 (2021): 662401. pdf
Olivier, Raphael, and Bhiksha Raj. "Sequential randomized smoothing for adversarially robust speech recognition." arXiv preprint arXiv:2112.03000 (2021). pdf
Elizalde, Benjamin, Radu Revutchi, Samarjit Das, Bhiksha Raj, Ian Lane, and Laurie M. Heller . "Identifying actions for sound event classification." In 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 26-30. IEEE, 2021. pdf
Correia, Joana, Francisco Teixeira, Catarina Botelho, Isabel Trancoso, and Bhiksha Raj. "The in-the-wild speech medical corpus." In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6973-6977. IEEE, 2021. pdf
Olivier, Raphael, Bhiksha Raj, and Muhammad Shah. "High-frequency adversarial defense for speech and audio." In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2995-2999. IEEE, 2021. pdf
Shah, Muhammad A., Raphael Olivier, and Bhiksha Raj. "Towards Adversarial Robustness Via Compact Feature Representations." In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3845-3849. IEEE, 2021. pdf
Zhang, Anxiang, Ankit Shah, and Bhiksha Raj. "Training image classifiers using Semi-Weak Label Data." arXiv preprint arXiv:2103.10608 (2021). pdf
Chernyak, Bronya Roni, Bhiksha Raj, Tamir Hazan, and Joseph Keshet. "Constant random perturbations provide adversarial robustness with minimal effect on accuracy." arXiv preprint arXiv:2103.08265 (2021). pdf
Gao, Yang, Jiachen Lian, Bhiksha Raj, and Rita Singh. "Detection and evaluation of human and machine generated speech in spoofing attacks on automatic speaker verification systems." In 2021 IEEE Spoken Language Technology Workshop (SLT), pp. 544-551. IEEE, 2021. pdf
Yang Gao, Tyler Vuong, Mahsa Elyasi, Gaurav Bharaj and Rita Singh. "Generalized spoofing detection inspired from audio generation artifacts." In Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH 2021), Brno, Czech Republic. pp. 4184--4188. 2021. pdf

Shah, Muhammad A., Raphael Olivier, and Bhiksha Raj. "Optimal Strategies For Comparing Covariates To Solve Matching Problems." In 2020 25th International Conference on Pattern Recognition (ICPR), pp. 10622-10628. IEEE, 2021. pdf
Shah, Muhammad A., Raphael Olivier, and Bhiksha Raj. "Exploiting non-linear redundancy for neural model compression." In 2020 25th International Conference on Pattern Recognition (ICPR), pp. 9928-9935. IEEE, 2021. pdf
Zhao, Wenbo, Yang Gao, Shahan Ali Memon, Bhiksha Raj, and Rita Singh. "Hierarchical routing mixture of experts." In 2020 25th International Conference on Pattern Recognition (ICPR), pp. 7900-7906. IEEE, 2021. pdf
Lian, Jiachen, Aiswarya Vinod Kumar, Hira Dhamyal, Bhiksha Raj, and Rita Singh. "Masked proxy loss fo r text-independent speaker verification." arXiv preprint arXiv:2011.04491 (2020). pdf
Deshmukh, Soham, Bhiksha Raj, and Rita Singh. "Improving weakly supervised sound event detection with self-supervised auxiliary tasks." arXiv preprint arXiv:2106.06858 (2021). pdf
Vega, Rodolfo M., Enrique Peláez, and Bhiksha Raj. "Shadowing as peer experiential learning for faculty instructional development strategy: A case study on a computer science course." International Journal of Educational Research Open 2 (2021): 100091. pdf
Wen, Yandong, Weiyang Liu, Bhiksha Raj, and Rita Singh. "Self-supervised 3d face recon struction via conditional estimation." In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13289-13298. 2021. pdf
Hu, Kai, Jie Shao, Yuan Liu, Bhiksha Raj, Marios Savvides, and Zhiqiang Shen. "Contrast and order representations for video self-supervised learning." In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7939-7949. 2021. pdf
Truong, Thanh-Dat, Chi Nhan Duong, Hoang Anh Pham, Bhiksha Raj, Ngan Le, and Khoa Luu. "The right to talk: An audio-visual transformer approach." In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1105-1114. 2021. pdf

2020

Yang, Muqiao, Martin Q. Ma, Dongyu Li, Yao-Hung Hubert Tsai, and Ruslan Salakhutdinov. "Complex transformer: A framework for modeling complex-valued sequence." In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4232-4236. IEEE, 2020. doi: 10.1109/ICASSP40776.2020.9054008. pdf
Winata, Genta Indra, Samuel Cahyawijaya, Zhaojiang Lin, Zihan Liu, and Pascale Fung. "Lightweight and efficient end-to-end speech recognition using low-rank transformer." In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6144-6148. IEEE, 2020. pdf
Dhamyal, Hira, Shahan Ali Memon, Bhiksha Raj, and Rita Singh. "The phonetic bases of vocal expressed emotion: natural versus acted." In Proc. Interspeech 2020. pp. 3451-3455. 2020. doi: 10.21437/Interspeech.2020-3046. pdf
Vuong, Tyler, Yangyang Xia, and Richard Stern. "Learnable spectro-temporal receptive fields for robust voice type discrimination." arXiv preprint arXiv:2010.09151 (2020). pdf
Wuth, Jorge, Richard M. Stern, and Nestor Becerra Yoma. "Non causal deep learning based dereverberation." arXiv preprint arXiv:2009.02832 (2020). pdf
Stern, Richard M., and Anjali Menon. "Binaural Technology for Machine Speech Recognition and Understanding." The Technology of Binaural Understanding (2020): 511-545. pdf
Shah, Muhammad A., Khaled A. Harras, and Bhiksha Raj. "Sherlock: A crowd-sourced system for automatic tagging of indoor floor plans." In 2020 IEEE 17th International Conference on Mobile Ad Hoc and Sensor Systems (MASS), pp. 594-602. IEEE, 2020. pdf
Shamsabadi, Ali Shahin, Francisco Sepúlveda Teixeira, Alberto Abad, Bhiksha Raj, Andrea Cavallaro, and Isabel Trancoso. "Foolhd: Fooling speaker identification by highly imperceptible adversarial disturbances." In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6159-6163. IEEE, 2021. pdf
Deshmukh, Soham, Bhiksha Raj, and Rita Singh. "Multi-task learning for interpretable weakly labelled sound event detection." arXiv preprint arXiv:2008.07085 (2020). pdf
Koyama, Yuichiro, and Bhiksha Raj. "Exploring optimal dnn architecture for end-to-end beamformers based on time-frequency references." arXiv preprint arXiv:2005.12683 (2020). pdf
Koyama, Yuichiro, Tyler Vuong, Stefan Uhlich, and Bhiksha Raj. "Exploring the best loss function for DNN-based low-latency speech enhancement with temporal convolutional networks." arXiv preprint arXiv:2005.11611 (2020). pdf
Koyama, Yuichiro, Oluwafemi Azeez, and Bhiksha Raj. "Efficient integration of multi-channel information for speaker-independent speech separation." arXiv preprint arXiv:2005.11612 (2020). pdf
Shah, Muhammad A., and Bhiksha Raj. "Deriving compact feature representations via annealed contraction." In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2068-2072. IEEE, 2020. pdf
Correia, Joana, Isabel Trancoso, and Bhiksha Raj. "Automatic in-the-wild dataset annotation with deep generalized multiple instance learning." In Proceedings of the Twelfth Language Resources and Evaluation Conference, pp. 3542-3550. 2020. pdf
Serizel, Romain, Nicolas Turpault, Ankit Shah, and Justin Salamon. "Sound event detection in synthetic domestic environments." In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 86-90. IEEE, 2020. pdf
Chen, Rowland, Roger B. Dannenberg, Bhiksha Raj, and Rita Singh. "Artificial Creative Intelligence: Breaking the Imitation Barrier." In ICCC, pp. 319-325. 2020. pdf
Liang, Hao, Lulan Yu, Guikang Xu, Bhiksha Raj, and Rita Singh. "Controlled autoencoders to generate faces from voices." In Advances in Visual Computing: 15th International Symposium, ISVC 2020, San Diego, CA, USA, October 5–7, 2020, Proceedings, Part I 15, pp. 476-487. Springer International Publishing, 2020. pdf
Lian, Jiachen, Aiswarya Vinod Kumar, Hira Dhamyal, Bhiksha Raj, and Rita Singh. "Mask Proxy Loss for Text-Independent Speaker Recognition." CoRR (2020). pdf
Shao, Jie, Kai Hu, Changhu Wang, Xiangyang Xue, and Bhiksha Raj. "Is normalization indispensable for training deep neural network?." Advances in Neural Information Processing Systems 33 (2020): 13434-13444. pdf
Zhao, Wenbo, and Rita Singh. "Speech-based parameter estimation of an asymmetric vocal fold oscillation model and its application in discriminating vocal fold pathologies." In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7344-7348. IEEE, 2020. pdf

Wayne Zhao and Rita Singh. "Deriving Vocal Fold Oscillation Information from Recorded Voice Signals Using Models of Phonation." Entropy 2023, 25(7), 1039; Special issue on Information-Theoretic Approaches in Speech Processing and Recognition. 2023. pdf

Books