Investigating Central Compensation For Voice Onset Time In Noise Using Deep Learning

Amirhossein Sameti; Nematollah Rouhbakhsh; Amir Homayoun Jafari; Zahra Shirzhiyan

Investigating Central Compensation For Voice Onset Time In Noise Using Deep Learning

Amirhossein Sameti

Nematollah Rouhbakhsh

Amir Homayoun Jafari

Zahra Shirzhiyan

Abstract

Background and aim: The brain's ability to resolve rapid temporal cues such as voice-onset time (VOT) is essential for speech perception in challenging listening environments. We tested whether central auditory compensation for VOT in noise is reflected in the fidelity of cortical auditory evoked potentials (CAEPs) using a neural-network classifier and a cross-condition similarity metric.
Methods: Electroencephalography (EEG) was recorded from 22 normal-hearing adults in response to /ka/ and /ga/ syllables with varying VOTs, presented in quiet and noise (+7 dB Signal-to-noise ratio). We measured CAEPs’ peak amplitude (N1-P2), employed a convolutional neural network (CNN) to classify CAEPs by syllable identity, and computed a cross-condition correlation (r_cc) to quantify the similarity between responses in quiet and noise.
Results: Background noise significantly reduced N1-P2 amplitude, behavioral performance, and CNN classification accuracy, confirming the degradation of phoneme-specific neural representations. Critically, inter-subject variability in behavioral speech in noise performance was significantly correlated by both r_cc (r=0.443, p=0.02*) and CNN accuracy in noise (r=0.492, p=0.01*). Individuals with higher behavioral speech-in-noise (SIN) scores exhibited CAEPs in noise that were more similar to their clean-speech responses (higher r_cc) and more discriminable by CNN. Scalp topography displayed the highest r_cc values over fronto-central regions, with the strongest correlation between r_cc and SIN performance.
Conclusion: The convergence of our findings demonstrates that successful SIN perception relies on the brain's capacity to maintain a stable, noise-invariant cortical representation of speech, particularly in fronto-central auditory regions. These EEG-derived metrics may serve as a research tool for future clinical investigations.

1. Han JH, Zhang F, Kadis DS, Houston LM, Samy RN, Smith ML, et al. Auditory cortical activity to different voice onset times in cochlear implant users. Clin Neurophysiol. 2016;127(2):1603-1617. [DOI:10.1016/j.clinph.2015.10.049]
2. Al-Meqbel A, McMahon C. Cortical auditory temporal processing abilities in elderly listeners. 2015;24(2):80-91.
3. Fox NP, Leonard M, Sjerps MJ, Chang EF. Transformation of a temporal speech cue to a spatial neural code in human auditory cortex. Elife. 2020;9:e53051. [DOI:10.7554/eLife.53051]
4. Parida S, Yurasits K, Cancel VE, Zink ME, Mitchell C, Ziliak MC, et al. Rapid and objective assessment of auditory temporal processing using dynamic amplitude-modulated stimuli. Commun Biol. 2024;7(1):1517. [DOI:10.1038/s42003-024-07187-1]
5. Tallal P, Miller SL, Bedi G, Byma G, Wang X, Nagarajan SS, et al. Language comprehension in language-learning impaired children improved with acoustically modified speech. Science. 1996;271(5245):81-4. [DOI:10.1126/science.271.5245.81]
6. Johannesen PT, Pérez-González P, Kalluri S, Blanco JL, Lopez-Poveda EA. The Influence of Cochlear Mechanical Dysfunction, Temporal Processing Deficits, and Age on the Intelligibility of Audible Speech in Noise for Hearing-Impaired Listeners. Trends Hear. 2016;20:2331216516641055. [DOI:10.1177/2331216516641055]
7. White-Schwoch T, Nicol T, Warrier CM, Abrams DA, Kraus N. Individual Differences in Human Auditory Processing: Insights From Single-Trial Auditory Midbrain Activity in an Animal Model. Cereb Cortex. 2017;27(11):5095-115. [DOI:10.1093/cercor/bhw293]
8. Abramson AS, Whalen DH. Voice Onset Time (VOT) at 50: Theoretical and practical issues in measuring voicing distinctions. J Phon. 2017;63:75-86. [DOI:10.1016/j.wocn.2017.05.002]
9. Sharma A, Marsh CM, Dorman MF. Relationship between N1 evoked potential morphology and the perception of voicing. J Acoust Soc Am. 2000;108(6):3030-5. [DOI:10.1121/1.1320474]
10. Tallal P. Language disabilities in children: perceptual correlates. Int J Pediatr Otorhinolaryngol. 1981;3(1):1-13. [DOI:10.1016/0165-5876(81)90014-8]
11. Tallal P, Stark RE, Mellits D. The relationship between auditory temporal analysis and receptive language development: evidence from studies of developmental language disorder. Neuropsychologia. 1985;23(4):527-34. [DOI:10.1016/0028-3932(85)90006-5]
12. Sinex DG, Narayan SS. Auditory-nerve fiber representation of temporal cues to voicing in word-medial stop consonants. J Acoust Soc Am. 1994;95(2):897-903. [DOI:10.1121/1.408400]
13. McFayden TC, Baskin P, Stephens JDW, He S. Cortical Auditory Event-Related Potentials and Categorical Perception of Voice Onset Time in Children With an Auditory Neuropathy Spectrum Disorder. Front Hum Neurosci. 2020;14:184. [DOI:10.3389/fnhum.2020.00184]
14. Kraus N, Nicol T. Aggregate neural responses to speech sounds in the central auditory system. Speech Commun. 2003;41(1):35-47. [DOI:10.1016/S0167-6393(02)00091-2]
15. Näätänen R, Picton T. The N1 wave of the human electric and magnetic response to sound: a review and an analysis of the component structure. Psychophysiology. 1987;24(4):375-425. [DOI:10.1111/j.1469-8986.1987.tb00311.x]
16. Michalewski HJ, Starr A, Zeng FG, Dimitrijevic A. N100 cortical potentials accompanying disrupted auditory nerve activity in auditory neuropathy (AN): effects of signal intensity and continuous noise. Clin Neurophysiol. 2009;120(7):1352-63. [DOI:10.1016/j.clinph.2009.05.013]
17. Tremblay K, Kraus N, McGee T, Ponton C, Otis B. Central auditory plasticity: changes in the N1-P2 complex after speech-sound training. Ear Hear. 2001;22(2):79-90. [DOI:10.1097/00003446-200104000-00001]
18. Ceponiene R, Alku P, Westerfield M, Torki M, Townsend J. ERPs differentiate syllable and nonphonetic sound processing in children and adults. Psychophysiology. 2005;42(4):391-406. [DOI:10.1111/j.1469-8986.2005.00305.x]
19. Oron A, Szelag E, Nowak K, Dacewicz A, Szymaszek A. Age-related differences in Voice-Onset-Time in Polish language users: An ERP study. Acta Psychol (Amst). 2019;193:18-29. [DOI:10.1016/j.actpsy.2018.12.002]
20. Morris DJ, Tøndering J, Lindgren M. Electrophysiological and behavioral measures of some speech contrasts in varied attention and noise. Hear Res. 2019;373:1-9. [DOI:10.1016/j.heares.2018.12.001]
21. Giraud K, Trébuchon-DaFonseca A, Démonet JF, Habib M, Liégeois-Chauvel C. Asymmetry of voice onset time-processing in adult developmental dyslexics. Clin Neurophysiol. 2008;119(7):1652-63. [DOI:10.1016/j.clinph.2008.02.017]
22. Tremblay KL, Piskosz M, Souza P. Effects of age and age-related hearing loss on the neural representation of speech cues. Clin Neurophysiol. 2003;114(7):1332-43. [DOI:10.1016/s1388-2457(03)00114-7]
23. Dimitrijevic A, Pratt H, Starr A. Auditory cortical activity in normal hearing subjects to consonant vowels presented in quiet and in noise. Clin Neurophysiol. 2013;124(6):1204-15. [DOI:10.1016/j.clinph.2012.11.014]
24. Tremblay KL, Friesen L, Martin BA, Wright R. Test-retest reliability of cortical evoked potentials using naturally produced speech sounds. Ear Hear. 2003;24(3):225-32. [DOI:10.1097/01.AUD.0000069229.84883.03]
25. Lawhern VJ, Solon AJ, Waytowich NR, Gordon SM, Hung CP, Lance BJ. EEGNet: a compact convolutional neural network for EEG-based brain-computer interfaces. J Neural Eng. 2018;15(5):056013. [DOI:10.1088/1741-2552/aace8c]
26. Boersma P. Praat, a system for doing phonetics by computer. Glot Int. 2001;5(9):341-5.
27. Dong Y, Gai Y. Speech Perception with Noise Vocoding and Background Noise: An EEG and Behavioral Study. J Assoc Res Otolaryngol. 2021;22(3):349-63. [DOI:10.1007/s10162-021-00787-2]
28. Fritz JB, Elhilali M, David SV, Shamma SA. Auditory attention--focusing the searchlight on sound. Curr Opin Neurobiol. 2007;17(4):437-55. [DOI:10.1016/j.conb.2007.07.011]
29. Zion Golumbic EM, Ding N, Bickel S, Lakatos P, Schevon CA, McKhann GM, et al. Mechanisms underlying selective neuronal tracking of attended speech at a "cocktail party". Neuron. 2013;77(5):980-91. [DOI:10.1016/j.neuron.2012.12.037]
30. Carter JA, Bidelman GM. Perceptual warping exposes categorical representations for speech in human brainstem responses. Neuroimage. 2023;269:119899. [DOI:10.1016/j.neuroimage.2023.119899]

Files	XML PDF (1MB)
Issue	Articles in Press
Section	Research Article(s)
Keywords
speech-in-noise perception voice-onset time cortical auditory evoked potentials convolutional neural network electroencephalography temporal processing

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

How to Cite

Sameti A, Rouhbakhsh N, Jafari AH, Shirzhiyan Z. Investigating Central Compensation For Voice Onset Time In Noise Using Deep Learning. Aud Vestib Res. 2026;.

Vancouver

Download Citation