Center for Voice Intelligence and Security

 Profiling Humans from their Voice
Profiling Humans from their Voice
Rita Singh
First published: July 2019
Publisher: Springer, Singapore
Copyright 2019 Springer-Nature, Switzerland, July 2019
ISBN: ISBN 978-981-13-8402-8
Also available on springer.com, other bookstores and ebay.

Chapter citations:

  • Profiling and its facets, Rita Singh in Profiling humans from their voice, Ch.1, pp.3-26, Springer, July 2019. pdf
  • Production and perception of voice, Rita Singh in Profiling humans from their voice, Ch.2, pp.27-83, Springer, July 2019. pdf
  • Relations Between voice and profile parameters, Rita Singh in Profiling humans from their voice, Ch.3, pp.85-131, Springer, July 2019. pdf
  • The voice Signal and its information content - 1, Rita Singh in Profiling humans from their voice, Ch.4, pp.133-169, Springer, July 2019. pdf
  • The voice Signal and its information content - 2, Rita Singh in Profiling humans from their voice, Ch.5, pp.171-220, Springer, July 2019. pdf
  • Qualitative aspects of the voice signal, Rita Singh in Profiling humans from their voice, Ch.6, pp.221-266, Springer, July 2019. pdf
  • Feature engineering for profiling, Rita Singh in Profiling humans from their voice, Ch.7, pp.269-298, Springer, July 2019. pdf
  • Mechanisms for profiling, Rita Singh in Profiling humans from their voice, Ch.8, pp.299-324, Springer, July 2019. pdf
  • Reconstruction of the human persona in 3D from voice, and its reverse, Rita Singh in Profiling humans from their voice, Ch.9, pp.325-363, Springer, July 2019. pdf
  • Applied profiling: Uses, reliability and ethics, Rita Singh in Profiling humans from their voice, Ch.10, pp.365-405, Springer, July 2019. pdf
Techniques for Noise Robustness in Automatic Speech Recognition Techniques for Noise Robustness in Automatic Speech Recognition
Tuomas Virtanen, Rita Singh, Bhiksha Raj (Eds)
First published:5 October 2012
Copyright 2013 John Wiley & Sons, Ltd
Print ISBN:9781119970880 |Online ISBN:9781118392683 |DOI:10.1002/9781118392683

Research papers

2024

  1. Xiang Li, Yinpeng Chen, Chung-Ching Lin, Hao Chen, Kai Hu, Rita Singh, Bhiksha Raj, Lijuan Wang and Zicheng Liu. "Completing Visual Objects via Bridging Generation and Segmentation." The 41st International Conference on Machine Learning (ICML) 2024. pdf
  2. Hao Chen, Jindong Wang, Lei Feng, Xiang Li, Yidong Wang, Xing Xie, Masashi Sugiyama, Rita Singh and Bhiksha Raj. "A General Framework for Learning from Weak Supervision." The 41st International Conference on Machine Learning (ICML) 2024. pdf

2023

  1. Roshan Sharma, William Chen, Takatomo Kano, Ruchira Sharma, Atsunori Ogawa, Marc Delcroix, Shinji Watanabe, Rita Singh, Bhiksha Raj, "Introducing the Interview dataset and benchmarking methods for speech summarization", Automatic Speech Recognition and Understanding Workshop (ASRU), Taiwan. 2023.
  2. Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jeeweon Jung, Soumi Maiti, Shinji Watanabe, "Reproducing Whisper Training using an Open-Source tool and Public Data", Automatic Speech Recognition and Understanding Workshop (ASRU), Taiwan. 2023.
  3. Roshan Sharma, Kenneth Zheng, Siddhant Arora, Shinji Watanabe, Rita Singh, Bhiksha Raj, "BASS: Block-wise Adaptation for Speech Summarization", In Proc. Interspeech 2023. Dublin, Ireland 2023. pdf
  4. Suwon Shon, Siddhant Arora, Chyi-Jiunn Lin, Ankita Pasad, Felix Wu, Roshan Sharma, Wei Yu Wu, Hung-yi Lee, Karen Livescu, and Shinji Watanabe. "SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks." In Proc. 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023), Toronto, Canada. 2023. pdf
  5. Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Roshan Sharma, Kohei Matsuura, and Shinji Watanabe, "Speech summarization of long spoken document: improving memory efficiency of speech/text encoders", In Proc. 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes. 2023. pdf
  6. Roshan Sharma, Weipeng He, Ju Lin, Egor Lakomkin, Yang Liu and Kaustubh Kalgaonkar. "Egocentric Audio-Visual Noise Suppression.", In Proc. 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes. 2023. pdf
  7. Roshan Sharma, Suyoun Kim, Daniel Lazar, Trang Le, Akshat Shrivastava, Kwanghoon An, Piyush Kansal, Leda Sari, Ozlem Kalinli, Michael Seltzer, "Augmenting text for spoken language understanding with Large Language Models." arXiv preprint arXiv:2309.09390 (2023). pdf
  8. Xuankai Chang, Brian Yan, Kwanghee Choi, Jeeweon Jung, Yichen Lu, Soumi Maiti, Roshan Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, and Hsiu-Hsuan Wang, "Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study." arXiv preprint arXiv:2309.15800 (2023). pdf
  9. Chien-yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan Sharma, Shinji Watanabe, Bhiksha Raj, Shady Shehata, Hung-yi Lee, "Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech." arXiv preprint arXiv:2309.09510 (2023). pdf
  10. Yutian Chen, Hao Kang, Vivian Zhai, Liangze Li, Rita Singh, Bhiksha Raj. "Token Prediction as Implicit Classification to Identify LLM-Generated Text." Empirical Methods in Natural Language Processing (EMNLP) 2023. pdf
  11. Xiang Li, Jinglu Wang, Xiaohao Xu, Muqiao Yang, Fan Yang, Rita Singh, Bhiksha Raj. "Towards Noise-Tolerant Speech-Referring Video Object Segmentation: Bridging Speech and Text." Empirical Methods in Natural Language Processing (EMNLP) 2023.
  12. Xiang Li, Chung-Ching Lin, Yinpeng Chen, Zicheng Liu, Jinglu Wang, Bhiksha Raj, Rita Singh. "PaintSeg: Painting Pixels for Training-free Segmentation." Neural Information Processing Systems (NeurIPS) 2023. pdf
  13. Muhammad A Shah, Bhiksha Raj. "Training on Foveated Images Improves Robustness to Adversarial Attacks." Neural Information Processing Systems (NeurIPS) 2023.
  14. Shentong Mo, Bhiksha Raj. "Weakly-Supervised Audio-Visual Segmentation." Neural Information Processing Systems (NeurIPS) 2023.
  15. Thanh-Dat Truong, Hoang-Quan Nguyen, Bhiksha Raj, Khoa Luu. "Fairness Continual Learning Approach to Semantic Scene Understanding in Open-World Environments." Neural Information Processing Systems (NeurIPS) 2023.
  16. Soham Deshmukh, Benjamin Elizalde, Rita Singh, Huaming Wang. "Pengi: An Audio Language Model for Audio Tasks." Neural Information Processing Systems (NeurIPS) 2023. code and paper
  17. Xiang Li, Yandong Wen, Muqiao Yang, Jinglu Wang, Rita Singh, Bhiksha Raj. "Rethinking Voice-Face Correlation: A Geometry View." Proceedings of the 31st ACM International Conference on Multimedia (ACM-Multimedia), 2023. pdf
  18. Joseph Konan, Ojas Bhargave, Shikhar Agnihotri, Shuo Han, Yunyang Zeng, Ankit Shah, and Bhiksha Raj. "Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms." arXiv preprint arXiv:2310.07161 (2023).
  19. Francisco Teixeira, Alberto Abad, Bhiksha Raj and Isabel Trancoso. "Privacy-oriented manipulation of speaker representations." arXiv preprint arXiv:2310.06652 (2023).
  20. Umberto Cappellazzo, Enrico Fini, Muqiao Yang, Daniele Falavigna, Alessio Brutti, and Bhiksha Raj. "Continual Contrastive Spoken Language Understanding." arXiv preprint arXiv:2310.02699 (2023).
  21. Hira Dhamyal, Benjamin Elizalde, Soham Deshmukh, Huaming Wang, Bhiksha Raj, and Rita Singh. "Prompting Audios Using Acoustic Properties For Emotion Representation." arXiv preprint arXiv:2310.02298 (2023).
  22. Muhammad Ahmed Shah, Roshan Sharma, Hira Dhamyal, Raphael Olivier, Ankit Shah, Dareen Alharthi, Hazim T. Bukhari, Massa Baali, Soham Deshmukh, Michael Kuhlmann, Bhiksha Raj, and Rita Singh. "LoFT: Local Proxy Fine-tuning For Improving Transferability Of Adversarial Attacks Against Large Language Model." arXiv preprint arXiv:2310.04445 (2023).
  23. Muqiao Yang, Chunlei Zhang, Yong Xu, Zhongweiyang Xu, Heming Wang, Bhiksha Raj, Dong Yu. "uSee: Unified Speech Enhancement and Editing with Conditional Diffusion Models." arXiv preprint arXiv:2310.00900 (2023).
  24. Dareen Alharthi, Roshan Sharma, Hira Dhamyal, Soumi Maiti, Bhiksha Raj, Rita Singh. "Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech." arXiv preprint arXiv:2310.00706 (2023).
  25. Xiang Li, Yinpeng Chen, Chung-Ching Lin, Rita Singh, Bhiksha Raj, Zicheng Liu. "Completing Visual Objects via Bridging Generation and Segmentation." arXiv preprint arXiv::2310.00808 (2023).
  26. Xiang Li, Jinglu Wang, Xiaohao Xu, Xiulian Peng, Rita Singh, Yan Lu, Bhiksha Raj. "Rethinking Audiovisual Segmentation with Semantic Quantization and Decomposition." arXiv preprint arXiv:2310.00132 (2023).
  27. Soham Deshmukh, Benjamin Elizalde, Dimitra Emmanouilidou, Bhiksha Raj, Rita Singh, Huaming Wang. "Training Audio Captioning Models without Audio." arXiv preprint arXiv:2309.07372 (2023).
  28. Chen, Hao, Jindong Wang, Ankit Shah, Ran Tao, Hongxin Wei, Xing Xie, Masashi Sugiyama, and Bhiksha Raj. "Understanding and mitigating the label noise in pre-training on downstream tasks." arXiv preprint arXiv:2309.17002 (2023).
  29. Roshan Sharma, Kenneth Zheng, Siddhant Arora, Shinji Watanabe, Rita Singh, Bhiksha Raj. "BASS: Block-wise Adaptation for Speech Summarization." arXiv preprint arXiv:2307.08217 (2023).
  30. Xiang Li, Jinglu Wang, Xiaohao Xu, Xiao Li, Bhiksha Raj and Yan Lu. "Robust Referring Video Object Segmentation with Cyclic Structural Consensus." 2023 International Conference on Computer Vision (ICCV). 2023.
  31. Yandong Wen, Weiyang Liu, Yao Feng, Bhiksha Raj, Rita Singh, Adrian Weller, Michael Black and Bernhard Scholkopf. "Pairwise Similarity is SimPLE." 2023 International Conference on Computer Vision (ICCV). 2023.
  32. Kandaswamy Paramasivan, Bhiksha Raj, Nandan Sudarasanam, Rahul Subburaj. "Prolonged school closure during the pandemic time in successive waves of COVID-19- vulnerability of children to sexual abuses – A case study in Tamil Nadu, India." Heliyon 9 (2023) e1786, Cell Press. Article
  33. Rita Singh. 2023. "A Gene-Based Algorithm for Identifying Factors That May Affect a Speaker's Voice" Entropy 25, No. 6: 897. pdf
  34. Kashu Yamazaki, Khoa Vo, Quang Sang Truong, Bhiksha Raj, and Ngan Le. "VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning." In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 3, pp. 3081-3090. 2023. pdf
  35. Xiang Li, Haoyuan Cao, Shijie Zhao, Junlin Li, Li Zhang, Bhiksha Raj. "Panoramic Video Salient Object Detection with Ambisonic Audio Guidance". In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, No. 2, pp. 1424-1432. pdf
  36. Muqiao Yang, Naoyuki Kanda, Xiaofei Wang, Jian Wu, Sunit Sivasankaran, Zhuo Chen, Jinyu Li, and Takuya Yoshioka. "Simulating realistic speech overlaps improves multi-talker ASR." In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, 2023. pdf
  37. Muqiao Yang, Joseph Konan, David Bick, Yunyang Zeng, Shuo Han, Anurag Kumar, Shinji Watanabe, and Bhiksha Raj. "PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement." In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, 2023. pdf
  38. Yunyang Zeng, Joseph Konan, Shuo Han, David Bick, Muqiao Yang, Anurag Kumar, Shinji Watanabe, and Bhiksha Raj. "TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement." 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023).
  39. Wayne Zhao and Rita Singh. "Deriving Vocal Fold Oscillation Information from Recorded Voice Signals Using Models of Phonation." Entropy 2023, 25(7), 1039; Special issue on Information-Theoretic Approaches in Speech Processing and Recognition. 2023. pdf
  40. Ankit Shah, Larry Tang, Po Hao Chou, Yi Yu Zhang, Ziqian Ge, Bhiksha Raj. "An Approach to Ontological Learning from Weak Labels." 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023).
  41. Francisco Teixeira, Alberto Abad, Bhiksha Raj, and Isabel Trancoso. "Privacy-Preserving A utomatic Speaker Diarization." In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, 2023. pdf
  42. Raphael Olivier and Bhiksha Raj. "How many perturbations break this model? Evaluating robustness beyond adversarial accuracy." The Fortieth International Conference on Machine Learning (ICML) 2023. pdf
  43. Raphael Olivier, Hadi Abdullah and Bhiksha Raj. "Transferable Adversarial Perturbations between Self-Supervised Speech Recognition Models." 2nd ICML Workshop on New Frontiers in Adversarial Machine Learning, 2023. pdf
  44. Chen, Hao, Ran Tao, Yue Fan, Yidong Wang, Jindong Wang, Bernt Schiele, Xing Xie, Bhiksha Raj, and Marios Savvides. "Softmatch: Addressing the quantity-quality trade-off in semi-supervised learning." 2023 International Conference on Learning Representations (ICLR 2023). 2023. pdf
  45. Wang, Yidong, Hao Chen, Qiang Heng, Wenxin Hou, Marios Savvides, Takahiro Shinozaki, Bhiksha Raj, Zhen Wu, and Jindong Wang. "Freematch: Self-adaptive thresholding for semi-supervised learning." 2023 International Conference on Learning Representations (ICLR 2023). 2023. pdf
  46. Thanh-Dat Truong, Ngan Hoang Le, Bhiksha Raj, Jackson Cothren, Khoa Luu. "FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding." Conference on Computer Vision and Pattern Recognition (CVPR), 2023. pdf
  47. Joseph Konan, Ojas Bhargave, Shikhar Agnihotri, Hojeong Lee, Ankit Shah, Shuo Han, Yunyang Zeng, Amanda Shu, Haohui Liu, Xuankai Chang, Hamza Khalid, Minseon Gwak, Kawon Lee, Minjeong Kim, Bhiksha Raj. "Improving Perceptual Quality, Intelligibility, and Acoustics on VoIP Platforms." arXiv:2303.09048v1 pdf
  48. Ankit Shah, Shuyi Chen, Kejun Zhou, Yue Chen, Bhiksha Raj. "Approach to Learning Generalized Audio Representation Through Batch Embedding Covariance Regularization and Constant-Q Transforms." arXiv:2303.03591v1. pdf
  49. Heller, Laurie M., Benjamin Elizalde, Bhiksha Raj, and Soham Deshmukh. "Synergy between human and machine approaches to sound/scene recognition and processing: An overview of ICASSP special session." arXiv preprint arXiv:2302.09719 (2023). pdf
  50. Chen, Hao, Ran Tao, Yue Fan, Yidong Wang, Jindong Wang, Bernt Schiele, Xing Xie, Bhiksha Raj, and Marios Savvides. "Softmatch: Addressing the quantity-quality trade-off in semi-supervised learning." arXiv preprint arXiv:2301.10921 (2023). pdf
  51. Gode, Samiran, Supreeth Bare, Bhiksha Raj, and Hyungon Yoo. "Understanding Political Polarisation using Language Models: A dataset and method." arXiv preprint arXiv:2301.00891 (2023). pdf
  52. Vo, Khoa, Sang Truong, Kashu Yamazaki, Bhiksha Raj, Minh-Triet Tran, and Ngan Le. "AOE-Net: Entities interactions modeling with adaptive attention mechanism for temporal action proposals generation." International Journal of Computer Vision 131, No. 1 (2023): 302-323. pdf

2022

  1. Yandong Wen. "Reconstruction of Human Faces from Voice." PhD Thesis, Carnegie Mellon University. May 2022. pdf
  2. Roshan Sharma, Tyler Vuong, Mark Lindsey, Hira Dhamyal, Rita Singh, Bhiksha Raj. "Self-supervision and Learnable STRFs for Age, Emotion, and Country Prediction." In Proceedings of the 39th International Conference on Machine Learning (ICML 2022), Expressive Vocalizations Workshop and Competition. 2022. pdf
  3. Sunit Sivasankaran, Chenda Li, Takuya Yoshioka. "Exploring Pre-training and Self-training for Noise Robust ASR." Proc. Interspeech 2022.??????
  4. Yang, Muqiao, Ian Lane, and Shinji Watanabe. "Online continual learning of end-to-end speech recognition models." In Proc. Interspeech 2022., pp. 2668-2672. doi: 10.21437/Interspeech.2022-11093. pdf
  5. Yang, Muqiao, Joseph Konan, David Bick, Anurag Kumar, Shinji Watanabe, and Bhiksha Raj. "Improving speech enhancement through fine-grained speech characteristics." In Proc. Interspeech 2022., pp. 2953-2957. doi: 10.21437/Interspeech.2022-11161. pdf
  6. Vuong, Tyler, Nikhil Madaan, Rohan Panda, and Richard M. Stern. "Investigating the Important Temporal Modulations for Deep-Learning-Based Speech Activity Detection." In 2022 IEEE Spoken Language Technology Workshop (SLT), pp. 525-531. IEEE, 2023.
  7. Li, Xiang, Jinglu Wang, Xiao Li, and Yan Lu. "Hybrid instance-aware temporal fusion for online video instance segmentation." In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 2, pp. 1429-1437. 2022. pdf
  8. Li, Xiang, Jinglu Wang, Xiao Li, and Yan Lu. "Video instance segmentation by instance flow assembly." IEEE Transactions on Multimedia (2022). doi: 10.1109/TMM.2022.3222643. pdf
  9. Zhao, Yizhou, Xun Guo, and Yan Lu. "Semantic-aligned fusion transformer for one-shot object detection." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7601-7611. 2022. doi: 10.1109/CVPR52688.2022.00745. pdf
  10. Zhao, Yizhou, Zhenyang Li, Xun Guo, and Yan Lu. "Alignment-guided temporal attention for video action recognition." Advances in Neural Information Processing Systems 35 (2022): 13627-13639. 2022. pdf
  11. Sharma, Roshan, Shruti Palaskar, Alan W. Black, and Florian Metze. "End-to-end speech summarization using restricted self-attention." In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8072-8076. IEEE, 2022. doi: 10.1109/ICASSP43922.2022.9747320.
  12. Sharma, Roshan, and Bhiksha Raj. "Cross-utterance context for multimodal video transcription." In 2022 56th Asilomar Conference on Signals, Systems, and Computers, pp. 1321-1325. IEEE, 2022. doi: 10.1109/IEEECONF56349.2022.10052073.
  13. Yidong Wang, Hao Chen, Yue Fan, Wang Sun, Ran Tao, Wenxin Hou, Renjie Wang, Linyi Yang, Zhi Zhou, Lan-Zhe Guo, Heli Qi, Zhen Wu, Yu-Feng Li, Satoshi Nakamura, Wei Ye, Marios Savvides, Bhiksha Raj, Takahiro Shinozaki, Bernt Schiele, Jindong Wang, Xing Xie, Yue Zhang. "Usb: A unified semi-supervised learning benchmark for classification." Advances in Neural Information Processing Systems 35 (2022): 3938-3961. pdf
  14. Wang, Yidong, Hao Chen, Yue Fan, Wang Sun, Ran Tao, Wenxin Hou, Renjie Wang et al. "Usb: A unified semi-supervised learning benchmark." arXiv preprint arXiv:2208.07204 (2022). pdf
  15. Chen, Hao, Yue Fan, Yidong Wang, Jindong Wang, Bernt Schiele, Xing Xie, Marios Savvides, and Bhiksha Raj. "An Embarrassingly Simple Baseline for Imbalanced Semi-Supervised Learning." arXiv preprint arXiv:2211.11086 (2022). pdf
  16. Olivier, Raphael, and Bhiksha Raj. "Recent improvements of asr models in the face of adversarial attacks." In Proc. Interspeech 2022. doi: 10.21437/Interspeech.2022-400. pdf
  17. Shah, A., Singh, R., Raj, B. "On learning representations for automatic segmentation of histopathology images." ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 1728-1732. 2022. doi: 10.1109/ICASSP43922.2022.9747520.
  18. Shah, Ankit, Shijie Geng, Peng Gao, Anoop Cherian, Takaaki Hori, Tim K. Marks, Jonathan Le Roux, and Chiori Hori. "Audio-visual scene-aware dialog and reasoning using audio-visual transformers with joint student-teacher learning." In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7732-7736. IEEE, 2022. pdf
  19. Shah, Ankit Parag, Takaaki Hori, Jonathan Le Roux, and Chiori Hori. "DSTC10-AVSD Submission System with Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning." In Proceedings of DSTC10 Workshop at AAAI-2022. 2022. pdf
  20. Hori, Chiori, Ankit Parag Shah, Shijie Geng, Peng Gao, Anoop Cherian, Takaaki Hori, Jonathan Le Roux, and Tim K. Marks. "Overview of Audio Visual Scene-Aware Dialog with Reasoning Track for Natural Language Generation in DSTC10." In Proc. DSTC10 Workshop at AAAI. 2022. pdf
  21. Joseph Turian, Jordie Shier, Humair Raj Khan, Bhiksha Raj, Björn W. Schuller, Christian J. Steinmetz, Colin Malloy, George Tzanetakis, Gissel Velarde, Kirk McNally, Max Henry, Nicolas Pinto, Camille Noufi, Christian Clough, Dorien Herremans, Eduardo Fonseca, Jesse Engel, Justin Salamon, Philippe Esling, Pranay Manocha, Shinji Watanabe, Zeyu Jin, Yonatan Bisk. "Hear: Holistic evaluation of audio representations." In NeurIPS 2021 Competitions and Demonstrations Track, pp. 125-145. PMLR, 2022. pdf
  22. Li, Xiang, Jinglu Wang, Xiaohao Xu, Bhiksha Raj, and Yan Lu. "Online video instance segmentation via robust context fusion." arXiv preprint arXiv:2207.05580 (2022). pdf
  23. Olivier, Raphael, and Bhiksha Raj. "Not all broken defenses are equal: The dead angles of adversarial accuracy." arXiv preprint arXiv:2207.04129 (2022). pdf
  24. Li, Xiang, Jinglu Wang, Xiaohao Xu, Xiao Li, Yan Lu, and Bhiksha Raj. "R^ 2VOS: Robust Referring Video Object Segmentation via Relational Multimodal Cycle Consistency." arXiv preprint arXiv:2207.01203 (2022). pdf
  25. Yang, Muqiao, Joseph Konan, David Bick, Anurag Kumar, Shinji Watanabe, and Bhiksha Raj. "Improving speech enhancement through fine-grained speech characteristics." arXiv preprint arXiv:2207.00237 (2022). pdf
  26. Teixeira, Francisco, Alberto Abad, Bhiksha Raj, and Isabel Trancoso. "Towards End-to-End Private A utomatic Speaker Recognition." arXiv preprint arXiv:2206.11750 (2022). pdf
  27. Chen, Chonghan, Qi Jiang, Chih-Hao Wang, Noel Chen, Haohan Wang, Xiang Li, and Bhiksha Raj. "Bear the Query in Mind: Visual Grounding with Query-conditioned Convolution." arXiv preprint arXiv:2206.09114 (2022). pdf
  28. Wang, Yidong, Hao Chen, Qiang Heng, Wenxin Hou, Marios Savvides, Takahiro Shinozaki, Bhiksha Raj, Zhen Wu, and Jindong Wang. "Freematch: Self-adaptive thresholding for semi-supervised learning." arXiv preprint arXiv:2205.07246 (2022). pdf
  29. Shah, Ankit, Hira Dhamyal, Yang Gao, Daniel Arancibia, Mario Arancibia, Bhiksha Raj, and Rita Singh. "On the pragmatism of using binary classifiers over data intensive neural network classifiers for detection of COVID-19 from voice." arXiv preprint arXiv:2204.04802 (2022). pdf
  30. Olivier, Raphael, and Bhiksha Raj. "Recent improvements of asr models in the face of adversarial attacks." arXiv preprint arXiv:2203.16536 (2022). pdf
  31. Mo, Shentong, Jingfei Xia, Xiaoqing Tan, and Bhiksha Raj. "Point3D: tracking actions as moving points with 3D CNNs." arXiv preprint arXiv:2203.10584 (2022). pdf
  32. Liu, Weiyang, Yandong Wen, Bhiksha Raj, Rita Singh, and Adrian Weller. "Sphereface revived: Unifying hyperspherical face recognition." IEEE Transactions on Pattern Analysis and Machine Intelligence 45, no. 2 (2022): 2458-2474. pdf
  33. Tang, Larry, Po Hao Chou, Yi Yu Zheng, Ziqian Ge, Ankit Shah, and Bhiksha Raj. "Ontological Learning from Weak Labels." arXiv preprint arXiv:2203.02483 (2022). pdf
  34. Dhamyal, Hira, Bhiksha Raj, and Rita Singh. "Positional Encoding for Capturing Modality Specific Cadence for Emotion Detection}}." In Proc. Conference of the International Speech Communication Association (Interspeech 2022) (2022): 166-170.
  35. Ma, Yinghao, and Richard M. Stern. "Learnable Front Ends Based on Temporal Modulation for Music Tagging." arXiv preprint arXiv:2211.15254 (2022). pdf
  36. Zhang, Mengchao, Richard M. Stern, Deborah Moncrieff, Catherine Palmer, and Christopher A. Brown. "Effect of Titrated Exposure to Non-Traumatic Noise on Unvoiced Speech Recognition in Human Listeners with Normal Audiological Profiles." Trends in Hearing 26 (2022): 23312165221117081. pdf
  37. Vuong, Tyler, and Richard Stern. "Improved Modulation-Domain Loss for Neural-Network-based Speech Enhancement}}." Pro c. Interspeech 2022 (2022): 206-210.
  38. Dhamyal, Hira, Benjamin Elizalde, Soham Deshmukh, Huaming Wang, Bhiksha Raj, and Rita Singh. "Describing emotions with acoustic property prompts for speech emotion recognition." arXiv preprint arXiv:2211.07737 (2022). pdf
  39. Sharma, Roshan, and Bhiksha Raj. "Cross-utterance context for multimodal video transcription." In 2022 56th Asilomar Conference on Signals, Systems, and Computers, pp. 1321-1325. IEEE, 2022. pdf
  40. Sharma, Roshan, and Bhiksha Raj. "XNOR-FORMER: Learning Accurate Approximations in Long Speech Transformers." arXiv preprint arXiv:2210.16643 (2022). pdf
  41. Sharma, Roshan, Hira Dhamyal, Bhiksha Raj, and Rita Singh. "Unifying the Discrete and Continuous Emotion labels for Speech Emotion Recognition." arXiv preprint arXiv:2210.16642 (2022). pdf
  42. Olivier, Raphael, and Bhiksha Raj. "There is more than one kind of robustness: Fooling Whisper with adversarial examples." arXiv preprint arXiv:2210.17316 (2022). pdf
  43. Olivier, Raphael, Hadi Abdullah, and Bhiksha Raj. "Watch What You Pretrain For: Targeted, Transferable Adversarial Examples on Self-Supervised Speech Recognition models." arXiv preprint arXiv:2209.13523 (2022). pdf

2021

  1. Wen, Yandong, Weiyang Liu, Adrian Weller, Bhiksha Raj, and Rita Singh. "SphereFace2: Binary Classification is All You Need for Deep Face Recognition." In International Conference on Learning Representations (ICLR 2021). 2021. pdf
  2. Zheng, Xiaochen, Benjamin Kellenberger, Rui Gong, Irena Hajnsek, and Devis Tuia. "Self-supervised pretraining and controlled augmentation improve rare wildlife recognition in uav images." In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2021), pp. 732-741. 2021. pdf
  3. Yandong Wen, Weiyang Liu, Bhiksha Raj and Rita Singh. "Self-Supervised 3D Face Reconstruction via Conditional Estimation." In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2021), pp. 13289-13298. 2021. pdf
  4. Al Ismail, Mahmoud, Soham Deshmukh, and Rita Singh. "Detection of COVID-19 through the analysis of vocal fold oscillations." In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1035-1039. IEEE, 2021. doi: 10.1109/ICASSP39728.2021.9414201. pdf
  5. Deshmukh, Soham, Mahmoud Al Ismail, and Rita Singh. "Interpreting glottal flow dynamics for detecting covid-19 from voice." In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1055-1059. IEEE, 2021. doi: 10.1109/ICASSP39728.2021.9414530. pdf
  6. Olivier, Raphael, and Bhiksha Raj. "Sequential randomized smoothing for adversarially robust speech recognition." In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 63. pdf
  7. Deshmukh, Soham, Bhiksha Raj, and Rita Singh. "Improving weakly supervised sound event detection with self-supervised auxiliary tasks." In Proc. Interspeech 2021, 596-600. 2021. doi: 10.21437/Interspeech.2021-2079. pdf
  8. Zhang, Anxiang, Ankit Shah, and Bhiksha Raj. "Training image classifiers using Semi-Weak Label Data." arXiv preprint arXiv:2103.10608 (2021). pdf
  9. Olivier, Raphael, Bhiksha Raj, and Muhammad Shah. "High-frequency adversarial defense for speech and audio." In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2995-2999. IEEE, 2021. doi: 10.1109/ICASSP39728.2021.9414525. pdf
  10. Xia, Yangyang, Li-Wei Chen, Alexander Rudnicky, and Richard M. Stern. "Temporal Context in Speech Emotion Recognition." In Interspeech, vol. 2021, pp. 3370-3374. 2021. pdf
  11. Vuong, Tyler, Yangyang Xia, and Richard M. Stern. "A modulation-domain loss for neural-network-based real-time speech enhancement." In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6643-6647. IEEE, 2021. pdf
  12. Vuong, Tyler, Yangyang Xia, and Richard M. Stern. "The Application of Learnable STRF Kernels to the 2021 Fearless Steps Phase-03 SAD Challe nge." In Interspeech, pp. 4364-4368. 2021.
  13. Shah, Ankit, Srishti Singh, and Shih-Yen Tao. "Feature extraction and evaluation for BioMedical Question Answering." arXiv preprint arXiv:2105.14013 (2021). pdf
  14. Liu, Wenbo, Ming Li, Xiaobing Zou, and Bhiksha Raj. "Discriminative Dictionary Learning for Autism Spectrum Disorder Identification." Frontiers in Computational Neuroscience 15 (2021): 662401. pdf
  15. Olivier, Raphael, and Bhiksha Raj. "Sequential randomized smoothing for adversarially robust speech recognition." arXiv preprint arXiv:2112.03000 (2021). pdf
  16. Elizalde, Benjamin, Radu Revutchi, Samarjit Das, Bhiksha Raj, Ian Lane, and Laurie M. Heller . "Identifying actions for sound event classification." In 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 26-30. IEEE, 2021. pdf
  17. Correia, Joana, Francisco Teixeira, Catarina Botelho, Isabel Trancoso, and Bhiksha Raj. "The in-the-wild speech medical corpus." In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6973-6977. IEEE, 2021. pdf
  18. Olivier, Raphael, Bhiksha Raj, and Muhammad Shah. "High-frequency adversarial defense for speech and audio." In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2995-2999. IEEE, 2021. pdf
  19. Shah, Muhammad A., Raphael Olivier, and Bhiksha Raj. "Towards Adversarial Robustness Via Compact Feature Representations." In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3845-3849. IEEE, 2021. pdf
  20. Zhang, Anxiang, Ankit Shah, and Bhiksha Raj. "Training image classifiers using Semi-Weak Label Data." arXiv preprint arXiv:2103.10608 (2021). pdf
  21. Chernyak, Bronya Roni, Bhiksha Raj, Tamir Hazan, and Joseph Keshet. "Constant random perturbations provide adversarial robustness with minimal effect on accuracy." arXiv preprint arXiv:2103.08265 (2021). pdf
  22. Gao, Yang, Jiachen Lian, Bhiksha Raj, and Rita Singh. "Detection and evaluation of human and machine generated speech in spoofing attacks on automatic speaker verification systems." In 2021 IEEE Spoken Language Technology Workshop (SLT), pp. 544-551. IEEE, 2021. pdf
  23. Yang Gao, Tyler Vuong, Mahsa Elyasi, Gaurav Bharaj and Rita Singh. "Generalized spoofing detection inspired from audio generation artifacts." In Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH 2021), Brno, Czech Republic. pp. 4184--4188. 2021. pdf

  24. Shah, Muhammad A., Raphael Olivier, and Bhiksha Raj. "Optimal Strategies For Comparing Covariates To Solve Matching Problems." In 2020 25th International Conference on Pattern Recognition (ICPR), pp. 10622-10628. IEEE, 2021. pdf
  25. Shah, Muhammad A., Raphael Olivier, and Bhiksha Raj. "Exploiting non-linear redundancy for neural model compression." In 2020 25th International Conference on Pattern Recognition (ICPR), pp. 9928-9935. IEEE, 2021. pdf
  26. Zhao, Wenbo, Yang Gao, Shahan Ali Memon, Bhiksha Raj, and Rita Singh. "Hierarchical routing mixture of experts." In 2020 25th International Conference on Pattern Recognition (ICPR), pp. 7900-7906. IEEE, 2021. pdf
  27. Lian, Jiachen, Aiswarya Vinod Kumar, Hira Dhamyal, Bhiksha Raj, and Rita Singh. "Masked proxy loss fo r text-independent speaker verification." arXiv preprint arXiv:2011.04491 (2020). pdf
  28. Deshmukh, Soham, Bhiksha Raj, and Rita Singh. "Improving weakly supervised sound event detection with self-supervised auxiliary tasks." arXiv preprint arXiv:2106.06858 (2021). pdf
  29. Vega, Rodolfo M., Enrique Peláez, and Bhiksha Raj. "Shadowing as peer experiential learning for faculty instructional development strategy: A case study on a computer science course." International Journal of Educational Research Open 2 (2021): 100091. pdf
  30. Wen, Yandong, Weiyang Liu, Bhiksha Raj, and Rita Singh. "Self-supervised 3d face recon struction via conditional estimation." In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13289-13298. 2021. pdf
  31. Hu, Kai, Jie Shao, Yuan Liu, Bhiksha Raj, Marios Savvides, and Zhiqiang Shen. "Contrast and order representations for video self-supervised learning." In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7939-7949. 2021. pdf
  32. Truong, Thanh-Dat, Chi Nhan Duong, Hoang Anh Pham, Bhiksha Raj, Ngan Le, and Khoa Luu. "The right to talk: An audio-visual transformer approach." In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1105-1114. 2021. pdf

2020

  1. Yang, Muqiao, Martin Q. Ma, Dongyu Li, Yao-Hung Hubert Tsai, and Ruslan Salakhutdinov. "Complex transformer: A framework for modeling complex-valued sequence." In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4232-4236. IEEE, 2020. doi: 10.1109/ICASSP40776.2020.9054008. pdf
  2. Winata, Genta Indra, Samuel Cahyawijaya, Zhaojiang Lin, Zihan Liu, and Pascale Fung. "Lightweight and efficient end-to-end speech recognition using low-rank transformer." In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6144-6148. IEEE, 2020. pdf
  3. Dhamyal, Hira, Shahan Ali Memon, Bhiksha Raj, and Rita Singh. "The phonetic bases of vocal expressed emotion: natural versus acted." In Proc. Interspeech 2020. pp. 3451-3455. 2020. doi: 10.21437/Interspeech.2020-3046. pdf
  4. Vuong, Tyler, Yangyang Xia, and Richard Stern. "Learnable spectro-temporal receptive fields for robust voice type discrimination." arXiv preprint arXiv:2010.09151 (2020). pdf
  5. Wuth, Jorge, Richard M. Stern, and Nestor Becerra Yoma. "Non causal deep learning based dereverberation." arXiv preprint arXiv:2009.02832 (2020). pdf
  6. Stern, Richard M., and Anjali Menon. "Binaural Technology for Machine Speech Recognition and Understanding." The Technology of Binaural Understanding (2020): 511-545. pdf
  7. Shah, Muhammad A., Khaled A. Harras, and Bhiksha Raj. "Sherlock: A crowd-sourced system for automatic tagging of indoor floor plans." In 2020 IEEE 17th International Conference on Mobile Ad Hoc and Sensor Systems (MASS), pp. 594-602. IEEE, 2020. pdf
  8. Shamsabadi, Ali Shahin, Francisco Sepúlveda Teixeira, Alberto Abad, Bhiksha Raj, Andrea Cavallaro, and Isabel Trancoso. "Foolhd: Fooling speaker identification by highly imperceptible adversarial disturbances." In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6159-6163. IEEE, 2021. pdf
  9. Deshmukh, Soham, Bhiksha Raj, and Rita Singh. "Multi-task learning for interpretable weakly labelled sound event detection." arXiv preprint arXiv:2008.07085 (2020). pdf
  10. Koyama, Yuichiro, and Bhiksha Raj. "Exploring optimal dnn architecture for end-to-end beamformers based on time-frequency references." arXiv preprint arXiv:2005.12683 (2020). pdf
  11. Koyama, Yuichiro, Tyler Vuong, Stefan Uhlich, and Bhiksha Raj. "Exploring the best loss function for DNN-based low-latency speech enhancement with temporal convolutional networks." arXiv preprint arXiv:2005.11611 (2020). pdf
  12. Koyama, Yuichiro, Oluwafemi Azeez, and Bhiksha Raj. "Efficient integration of multi-channel information for speaker-independent speech separation." arXiv preprint arXiv:2005.11612 (2020). pdf
  13. Shah, Muhammad A., and Bhiksha Raj. "Deriving compact feature representations via annealed contraction." In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2068-2072. IEEE, 2020. pdf
  14. Correia, Joana, Isabel Trancoso, and Bhiksha Raj. "Automatic in-the-wild dataset annotation with deep generalized multiple instance learning." In Proceedings of the Twelfth Language Resources and Evaluation Conference, pp. 3542-3550. 2020. pdf
  15. Serizel, Romain, Nicolas Turpault, Ankit Shah, and Justin Salamon. "Sound event detection in synthetic domestic environments." In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 86-90. IEEE, 2020. pdf
  16. Chen, Rowland, Roger B. Dannenberg, Bhiksha Raj, and Rita Singh. "Artificial Creative Intelligence: Breaking the Imitation Barrier." In ICCC, pp. 319-325. 2020. pdf
  17. Liang, Hao, Lulan Yu, Guikang Xu, Bhiksha Raj, and Rita Singh. "Controlled autoencoders to generate faces from voices." In Advances in Visual Computing: 15th International Symposium, ISVC 2020, San Diego, CA, USA, October 5–7, 2020, Proceedings, Part I 15, pp. 476-487. Springer International Publishing, 2020. pdf
  18. Lian, Jiachen, Aiswarya Vinod Kumar, Hira Dhamyal, Bhiksha Raj, and Rita Singh. "Mask Proxy Loss for Text-Independent Speaker Recognition." CoRR (2020). pdf
  19. Shao, Jie, Kai Hu, Changhu Wang, Xiangyang Xue, and Bhiksha Raj. "Is normalization indispensable for training deep neural network?." Advances in Neural Information Processing Systems 33 (2020): 13434-13444. pdf
  20. Zhao, Wenbo, and Rita Singh. "Speech-based parameter estimation of an asymmetric vocal fold oscillation model and its application in discriminating vocal fold pathologies." In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7344-7348. IEEE, 2020. pdf

  • Wayne Zhao and Rita Singh. "Deriving Vocal Fold Oscillation Information from Recorded Voice Signals Using Models of Phonation." Entropy 2023, 25(7), 1039; Special issue on Information-Theoretic Approaches in Speech Processing and Recognition. 2023. pdf