Human listeners benefit from binaural hearing in detection tasks. For example, in the tone-in-noise (TIN) detection task, the threshold for detection of out-of-phase tone in identical noise at the two ears (N0Sπ) is lower (i.e., better) than that for detection of an in-phase tone (N0S0) (e.g., Hirsh, 1948; Hawley et al., 2004). The difference in detection thresholds between the N0S0 and N0Sπ conditions is referred to as the binaural masking level difference (BMLD).
In N0Sπ stimuli, the difference between the tone-plus-noise waveforms at the two ears results in differences in interaural time or phase and level differences (ITDs, IPDs, or ILDs), as well as changes in the interaural correlation (IAC) (e.g., Domnitz and Colburn, 1976; Bernstein and Trahiotis, 1996). The statistics of the interaural phase and level cues, and their distributions for different signal-to-noise ratios (SNRs) for stimuli used in binaural detection experiments are described in Zurek (1991). Experiments designed to distinguish the relative importance of dynamic ITD vs. IAC cues have suggested that ITD is most important for 500-Hz binaural detection (van der Heijden and Joris, 2010). Furthermore, a psychophysical study that manipulated ITD and IAC cues over a wide range of frequencies showed that predictions for tone detection differ for ITD and IAC cues (Culling, 2011), and as expected, the role of the ITD cue is diminished at higher target frequencies. The challenge of discriminating between models based on these cues, which co-vary in stimuli used for binaural detection, was described by Domnitz and Colburn (1976), who stressed the importance of testing these models over a range of frequencies or other stimulus parameters in order to distinguish the models. Several subsequent models for binaural detection have focused on detection of a decrease in IAC upon addition of a tone in the N0Sπ condition and have tested this class of model across a wide range of stimulus conditions (e.g., Colburn, 1977; Bernstein and Trahiotis, 1997, 2017).
Human listeners can have substantial BMLDs (>3 dB) up to at least 8 kHz (van de Par and Kohlrausch, 1999; Goupell, 2012), yet physiological studies have mainly focused on low frequencies, for which the BMLD is typically larger (up to 20 dB, depending on bandwidth) (e.g., van de Par and Kohlrausch, 1999). Early physiological studies of detection of tones in N0Sπ stimuli focused on sensitivity of low-frequency neurons in the auditory midbrain (inferior colliculus, IC) to ITDs (e.g., Caird et al., 1991; McAlpine et al., 1996; Jiang et al., 1997a,b). Later physiological studies analyzed low-frequency IC responses in terms of the IAC cue (Palmer et al., 1999; Lane and Delgutte, 2005), and the effect of decorrelation was estimated over a wider frequency range in the owl (Asadollahi et al., 2010). The current study extends this work by applying an analysis of IAC cues to responses in the IC of awake rabbit across a wide range of frequencies. If interaural decorrelation explains neural responses to N0Sπ stimuli, then the difference in average rate between IC responses to diotic noise and binaurally uncorrelated noise should be correlated to the rate difference between responses to the noise-alone condition and the noise-plus-dichotic-tone condition. This correlation was directly tested in this study.
Additionally, human psychophysical studies have shown that BMLDs are robust across a range of noise levels (Buss et al., 2003) and in a roving-level paradigm, in which stimulus level was randomly varied from interval to interval (Henning et al., 2005). Therefore, in the current study neural responses were recorded over a wide range of noise levels to explore trends across sound level.
The IC is a nearly obligatory synapse along the ascending auditory pathway, thus all information available for perception must be encoded at this level. This fact makes the IC an interesting place to examine the relationship between neural and behavioral response properties in tasks such as masked detection. The IC receives afferent inputs from nearly all of the auditory brainstem nuclei (Cant and Oliver, 2018). IC neurons are sensitive to several features of stimuli, including ITDs and ILDs (Reviewed in Yin et al., 2019) and envelope frequency and depth (e.g., Langner and Schreiner, 1988; Krishna and Semple, 2000; Nelson and Carney, 2007; Zheng and Escabi, 2013). Addition of a dichotic tone to a diotic noise masker influences all of these cues. However, individual IC responses are complex in that each neuron responds to different cues with different sizes and directions of rate changes. In the current study, the sensitivities of individual neurons were evaluated using standard physiological characterizations, such as modulation transfer functions and responses to noise with ITDs and ILDs. Responses were then tested for their correlation to the IAC cue. Consistent with previous physiological and psychophysical studies, our results support the importance of the IAC in shaping IC responses to stimuli used to estimate BMLDs, and extend these results by illustrating that this correlation extends across a wide range of noise levels and frequencies.
The current study also computed rate-based IC neural thresholds for comparison with published detection thresholds for human listeners (van de Par and Kohlrausch, 1999; Buss et al., 2003; Goupell, 2012) and rabbits (Zheng et al., 2002).
Materials and methods
All neurophysiological procedures were approved by the University of Rochester Committee on Animal Resources. Recordings were from four awake, female Dutch-belted rabbits with normal hearing. Distortion product otoacoustic emissions (Whitehead et al., 1992) were used to monitor hearing over the timecourse of the study. Two of the rabbits were studied from 17 to 55 months of age, and two rabbits from age 13 to 23 months.
Surgical and recording procedures are described in detail in Fan et al. (2021). Briefly, rabbits were anesthetized with an intramuscular injection of ketamine (66 mg/kg) and xylazine (2 mg/kg) for both headbar placement and microdrive (five-drive, Neuralynx, Inc., Bozeman, MT, USA) implantation surgeries. The headbar was custom-designed, 3D-printed hard plastic, with a chamber that held the microdrive. The headbar was permanently mounted on the rabbit skull with stainless-steel screws and dental acrylic. After the rabbit recovered from the headbar surgery, a craniotomy was made to allow insertion of guidetubes from the microdrive through the dura. One microdrive held four guidetubes and tetrodes and allowed for independently advancing and retracting each tetrode. Each tetrode consisted of four twisted 18-μm platinum iridium wires, coated in epoxy (California Fine Wire Co., Grover Beach, CA, USA). The microdrive was replaced as needed, with guidetube positions varied across placements, to search for new neurons.
During recording sessions, the rabbit was placed in a double-walled, sound-proof chamber (Acoustic Systems, Austin, TX, USA), with head fixed using the headbar. Sound was delivered using Beyerdynamic DT990 (Beyerdynamic GmbH & Co., Heilbronn, Germany) or Etymotic ER2 earphones (Etymotic Research, Inc., Elk Grove Village, IL, USA) with custom ear molds for each rabbit. Ear molds were positioned deep in the concha and included an Etymotic probe tube for calibration. The stimulus system included an audio interface (16A, MOTU, Cambridge, MA, USA), a digital-to-analog converter (DAC3 HGC, Benchmark Media Systems, Inc., Syracuse, NY, USA), and earphones (Beyerdynamic DT990, Beyerdynamic GmbH and Co., Heilbronn, Germany or ER2, Etymotic Research). Wideband noise bursts were presented to search for auditory responses. Recordings were made with a multi-channel system (RHD, Intan Technologies, LLC., Los Angeles, CA, USA). When the characteristic frequencies (CFs) increased with tetrode depth, the tetrodes were determined to be in the central nucleus of the IC (ICC). Action potentials were identified offline using spike-sorting techniques applied to the tetrode recordings (Schwarz et al., 2012; Fan et al., 2021). After the termination of recording sessions in each animal, post-mortem histology was applied to verify tetrode locations in the IC.
Speakers were calibrated with ER-7C or ER-10B+ microphones (Etymotic Research) at the beginning of each recording session. The neurons were characterized in several ways before presenting TIN stimuli. Binaural sensitivity was determined by responses to contralateral, ipsilateral, and binaural wideband noise (0.1–19 kHz) at several sound levels. Responses to contralateral pure tones between 0.25 and 20 kHz from 10 to 70 dB SPL were used to identify CF, the frequency at which the neuron responded at the lowest sound level. Noise delay functions (NDFs) described rate responses to noise stimuli as a function of ITD; NDFs were recorded with wideband noise (0.1–19 kHz), 1-sec duration, 30-dB SPL spectrum level, and ITDs from –2,000 to 2,000 μs with a 200-μs stepsize. Responses to ILDs were recorded with the same noise bandwidth and duration as for the NDF. ILDs ranged from –15 to 15 dB with a 5-dB stepsize; the stimulus on the contralateral side had a fixed spectrum level of 30 dB re 20 μPa. Responses to contralateral sinusoidally-amplitude-modulated (SAM) wideband noise (0.1–19 kHz), with 1-sec duration, were collected to identify the shape of the modulation transfer function (MTF). SAM noises were described by:
where n(t) is the wideband noise with a spectrum level of 30 dB SPL, and fm is the modulation frequency. Modulation frequencies were logarithmically spaced between 2 and 350 Hz, with three steps/octave. Responses to contralateral unmodulated noise were also recorded. For all of the above characterizations, three repetitions of each stimulus condition were presented, in random sequence.
For TIN stimuli, the tone frequency and the center frequency of 1/3-oct gaussian noise maskers were chosen to be approximately equal to CF. Noise maskers were simultaneously gated with tone signals and generated by filtering wideband noise with a 5,000th-order FIR band-pass filter. TIN stimuli had 0.3-sec duration with 10-msec cos2 on/off ramps. Overall noise levels ranged from 35 to 75 dB SPL, with a10-dB stepsize. Signal-to-noise ratio (SNR) ranged from –12 to 8 dB, with a 4-dB stepsize; a noise-alone condition was also included. Tone levels and noise levels were presented in random order, and the order was shuffled for each of the 30 repetitions of the stimulus set. Responses were collected for sets of random noise, or reproducible noise (for the temporal analyses in Fan et al., 2021), or both. If more than one dataset was recorded, the dataset with responses to random noise waveforms was used for the analyses presented here. Among all neurons reported here, there were 55 neurons studied with random noise and 81 neurons studied with reproducible noise. No qualitative differences were observed between these two types of datasets, although the use of random noise would be expected to reduce the potential effect of external noise on neural responses.
To test the influence of IAC on IC neurons, responses to diotic (N0) and binaurally uncorrelated (Nu) noise were recorded. For both N0 and Nu conditions, the stimuli were 1/3-octave random gaussian noise, with 2-sec duration, at 65 dB SPL. Five repetitions of five N0 and ten Nu noise were presented, in random sequence.
Noise delay function shape classification
The shape of the NDF, the best ITD (dBITD), and the frequency of ITD tuning (fITD) were determined by fitting the NDF with a Gabor function (Lane and Delgutte, 2005), a sinusoid modulated by a gaussian function:
where dITD is the interaural delay, A, B, and σ are parameters for the amplitude, DC offset, and standard deviation of the gaussian function, respectively, and |•| refers to half-wave rectification. If a neuron’s CF was more than twice fITD (i.e., a high-frequency neuron), indicating that the neuron did not have fine-structure-based ITD sensitivity, then fITD was set to zero, and the NDF was refitted with the following gaussian function:
The function was fit to an NDF using a least-square fit, obtained with a trust-region-reflective algorithm (lsqcurvefit in MATLAB).
Each NDF was classified as peak-like, trough-like, or ITD-insensitive. In the following cases, the neuron was considered sensitive to ITDs: (1) for NDFs fitted with G1, if the absolute value of the amplitude (A) was more than 5 spikes/sec; (2) for NDFs fitted with G2, if the absolute value of the prominence (A/B) was more than 0.25; (3) for NDFs fitted with G2, for a fit with σ between 60 and 1,000 μs. If the amplitude (A) was positive, the neuron was classified as having a peak-like NDF; otherwise, the neuron was classified as having a trough-like NDF. Other neurons were classified as ITD-insensitive. The classification of each NDF generally agreed with a qualitative assessment (Figure 1).
Figure 1. Example neural ITD responses (black solid curve) and fitted Gabor function (blue dashed curve) for peak-like (left) and trough-like (right) NDFs. Vertical dotted line indicates the best ITD. The neuron’s CF, Best or Worst ITD (dBITD), and for cyclic ITD curves, the ITD tuning frequency (fITD) are described in the text.
Modulation transfer function shape classification
The MTF shape was classified with rules designed to be simple and to agree with qualitative descriptions of the functions. Enhancement or suppression was identified with the Mann-Whitney test as significantly higher or lower rates at two or more neighboring modulation frequencies than the rates in response to unmodulated noise. The presence or absence of enhancement or suppression was used to classify the MTF into the following four types: all-pass (AP, no enhancement or suppression), band-enhanced (BE, only enhancement), band-suppressed (BS, only suppression), and hybrid (both enhancement and suppression, over different ranges of modulation frequency).
Average rates, excluding 20-ms onset responses, were calculated for responses to all stimuli. For TIN stimuli, at each noise and tone level (i.e., SNR), a rate-based receiver-operating-characteristic (ROC, Egan, 1975) was calculated using average rate responses for all 30 noise-alone and tone-plus-noise presentations. The percent-correct performance was estimated from the area under the ROC curve. Note that rates in response to tone-plus-noise stimuli could be either higher or lower than rates in response to noise-alone stimuli, so the minimum percent correct was limited to 50%, regardless of the direction of change in rate. The neural threshold was estimated using linear interpolation to find the lowest SNR with 70.7% correct, which corresponds to a threshold estimated with a two-down, one-up tracking procedure (Levitt, 1971).
Responses to both N0S0 and N0Sπ stimuli were recorded from 136 isolated single units; responses to N0S0 of 111 of these units were presented in Fan et al. (2021). Responses to Nu were recorded for 68 units. The distribution of CFs is shown in Figure 2. All units were tested using a tone frequency within 1/3-octave of the neuron’s CF. Based on the MTF categorization criteria described above, there were 40 BE units (29.4%), 62 BS units (45.6%), 12 hybrid units (8.8%) and 22 AP units (16.2%). Distribution of MTF types across CFs is shown in Figure 2.
Figure 2. Distribution of MTFs across CF (in one-octave bins) for the units presented in this study. Gray shades from light to dark indicate units with band-enhanced (BE), band-suppressed (BS), hybrid and all-pass (AP) MTF shapes. Two neurons with CF of 12.1k were included in the last bin for simplicity. Most MTF types were represented across the range of CFs, although hybrid MTFs were not observed at the lower CFs.
Examples of single-neuron responses
Responses of several example units illustrate the complexity of response properties of the IC responses that were analyzed to test the IAC hypothesis. IC neurons have rates that vary with both ITD and ILD, and the interaction of these cues in the N0S0 and N0Sπ stimuli are complex (Zurek, 1991). Additionally, IC neurons are sensitive to periodicity in the stimulus as conveyed in their neural inputs. Adding a tone to narrowband gaussian noise flattens the stimulus envelope (Richards, 1992) and also reduces the amplitudes of neural fluctuations in peripheral responses (Carney, 2018). Therefore, the MTFs of IC neurons are interesting to consider, as well as sensitivity to the classical interaural cues. Neurons with BE MTFs (Figure 3A) are excited by fluctuations and therefore expected to have decreasing rate with increasing SNR for TIN stimuli. On the contrary, neurons with BS MTFs (Figure 3E) are suppressed by fluctuations and therefore expected to have increasing rate with increasing SNR. As expected, Neuron 1, with a BE MTF, had decreasing rate versus SNR at all noise levels (Figures 3C,D), and Neuron 2, with a BS MTF, had increasing rate versus SNR at all noise levels (Figures 3G,H). Both of these examples responded as predicted by their MTF types. Note that for both neurons in Figure 3, the average rate changed at lower SNRs for the N0Sπ condition than for the N0S0 condition, for all noise levels tested, indicating lower neural thresholds, consistent with psychophysical results (e.g., van de Par and Kohlrausch, 1999).
Figure 3. Responses of two example neurons (top and bottom row respectively). (A,E) MTF, response rates to amplitude-modulated noise; stars indicate modulation frequencies that had rates significantly different from the unmodulated condition. (B,F) ITD sensitivity, response rates vs. time delay in contralateral side (negative indicates ipsilateral side has delay). (C,D,G,H) responses to N0S0 and N0Sπ stimuli at different noise levels (different symbols) vs. SNR (from left to right); filled symbols indicate supra-thresholds. Errorbars indicate standard deviation. MTF shape and tone frequency for TIN stimuli (close to CF) are shown on the left. The example BE neuron had decreasing rate upon addition of a tone for both N0S0 and N0Sπ, while the example BS neuron had increasing rate for both conditions.
Neural responses to N0S0 stimuli have previously been described as having increasing rate as a function of tone level (Jiang et al., 1997a; Ramachandran et al., 2000), possibly based on the assumption that neurons respond more strongly to increasing stimulus energy (i.e., upon addition of a tone). Note that Neuron 1 in Figure 3 is an example of a neuron that had decreasing rate as tone level increased at each masker level, whereas it had increasing rate versus masker level for the noise-alone stimuli (SNR = –inf); these responses cannot be explained based on stimulus energy. The shape of NDF has been used to explain changes in neural responses for the N0Sπ condition (Jiang et al., 1997a,b): a diotic noise masker has zero ITD; adding a dichotic tone introduces non-zero ITDs. Neurons with peak-like NDFs respond most strongly to near-zero ITDs, and thus would be expected to have decreasing rate with increasing SNR based on the ITD hypothesis. In contrast, neurons with trough-like NDFs would be expected to have increasing rate with increasing SNR. Responses to N0Sπ stimuli of Neurons 1 and 2 can also be explained by their NDF shapes: Neuron 1 had a peak-like NDF shape (Figure 3B) and decreasing rate versus SNR for the N0Sπ condition; Neuron 2 had a trough-like NDF (Figure 3F) and increasing rate versus SNR.
Single-unit responses to N0S0 and N0Sπ stimuli were analyzed based on MTF properties and responses to ITDs and ILDs. In general, the directions and sizes of rate differences to N0S0 stimuli can be predicted based on MTF properties (Fan et al., 2021), but in response to N0Sπ stimuli, predictions of changes in rate based on MTF properties were only significant at the highest noise level tested (Fan, 2020). Rate differences were also weakly but significantly correlated to rate differences in the NDF, but the correlations decreased as stimulus level increased (Fan, 2020).
In general, IC responses to dichotic TIN stimuli are not easily explained by characterizations based on MTFs, ITDs, or ILDs (see below), likely because of the interaction of these cues in N0S0 and N0Sπ stimuli and because of the different types of sensitivity of IC neurons to these cues (Figure 4). For example, Neurons 3 and 4 both had BE MTFs and decreasing rate versus SNR for the N0S0 condition at most noise levels, as expected. However, for the N0Sπ condition, Neuron 3 had decreasing rate versus SNR that could be explained by its MTF shape, but not its trough-like NDF. In contrast, Neuron 4 had an increasing rate versus SNR that could be explained by its NDF shape, but not by its MTF shape. Neurons 5, 6, and 7 all had BS MTFs, and thus were expected to have increasing rates versus SNR, but the responses of these neurons differ. Neuron 5 had increasing rate versus SNR for both N0S0 and N0Sπ conditions, which could be explained by its BS MTF, but not by its peak-like NDF. The MTF of Neuron 6 did not explain responses to either N0S0 or N0Sπ stimuli, but responses to N0Sπ stimuli (decreasing rate) could be explained by its peak-like NDF. Neuron 7 also had decreasing rate versus SNR, which could not be explained by either MTF or NDF shape. Neuron 8 had an all-pass MTF, and responses to N0Sπ stimuli that could be explained by its peak-like NDF.
Figure 4. Responses of six example neurons (A–F). The left two columns show the neuron’s MTF and ITD sensitivity, respectively. The right three columns show the neuron’s response to N0S0 (blue circles) and N0Sπ (red squares) TIN stimuli at noise levels of 35, 55, and 75 dB SPL, respectively; filled symbols indicate supra-threshold responses. MTF shape and tone frequency of TIN stimuli (close to CF) are shown on the left.
Rate differences in response to N0Sπ stimuli and binaural cues
The rate differences in response to ITDs or ILDs were quantified by the difference between the maximum and minimum response rates over the range of stimuli tested. The maximum change in rate in response to N0Sπ stimuli, for both directions of rate change as a function of SNR, was significantly correlated to the maximum rate differences in both ITD and ILD responses (Figures 5A,B), explaining a small but significant proportion of the variance (i.e., r2). There was not an obvious difference between results shown in Figure 5 for lower-CF neurons (< 1.5 kHz, filled triangles) vs. higher-CF neurons (open circles). The significant correlation between the maximum rate differences for N0Sπ responses and rate differences for both ITD and ILD responses could be because (1) adding a dichotic tone not only introduces ITDs, but also ILDs; and/or (2) the dynamic ranges of ITD and ILD responses were significantly correlated (Figure 5C). Changes in neural responses to N0Sπ are likely due to a combination of ITD and ILD sensitivities and to the co-variation of these cues. The standard deviations of interaural phase and interaural level cues as a function of SNR have been previously described [see Figures 9 and 10 in Zurek (1991)].
Figure 5. Correlation between dynamic ranges of responses to N0Sπ and ITD (A), N0Sπ and ILD (B), and ILD and ITD (C) at 65 dB SPL (as indicated in titles). Correlation coefficients and p-values are shown at the top right of each panel; a star indicates that the correlation coefficient was significant after Bonferroni correction (p < 0.017). Neurons with CF below 1.5 kHz (low–CF) are shown with filled triangles, whereas neurons with CF above 1.5 kHz (high–CF) are shown with open circles. Solid gray lines indicate linear regressions.
Inferior colliculus responses to interaural correlation
Adding a dichotic tone to diotic noise introduces both ITD and ILD cues, as well as interaural decorrelation, but the changes in these cues differ for different tokens of noise waveform as well as for different SNRs. For example, the ITD of a N0Sπ stimulus is dominated by the ITD of the added tone with increasing tone level, but the effective ITD of a N0Sπ stimulus with a low-SNR tone (e.g., at threshold) is hard to estimate, and varies with the noise token due to the phase interaction between the noise and tone. Additionally, unlike a pure tone, the instantaneous ITD of N0Sπ stimuli varies throughout the duration of the stimulus waveform. Therefore, prediction of the rate-change direction upon addition of a tone at threshold based on sensitivity to static ITDs and ILDs is not simple. On the other hand, the effect of interaural decorrelation can be studied with a more straightforward method. To examine the effect of decorrelation, average rates were recorded in response to 1/3-octave diotic (N0) and binaurally uncorrelated (Nu) gaussian noise for 68 neurons. The Nu noises presented at the two ears were simply independent narrowband noise tokens. The correlation between the difference in average rate in response to the N0Sπ condition (the difference between average rates in response to noise-alone and to N0Sπ at 0-dB SNR) and the difference in average rates in response to the N0 and Nu conditions was significant at all noise levels (Figure 6), supporting the hypothesis that IC rates are influenced by IAC. The correlation was strongest for TIN stimuli with a masker level of 65 dB SPL, the level at which the N0 and Nu noise were presented. At 65 dB SPL, additional analyses of the rate differences in responses to N0Sπ stimuli at SNRs of –8 to 8 dB relative to the noise-alone condition were all significantly correlated to the rate difference between the responses to Nu and N0 noise, with correlation coefficients ranging from 0.71 to 0.84, and p values all less than 0.0001 (significant after Bonferroni correction, not shown). The significant correlation coefficients at all SNRs and noise levels indicated that, in general, the direction and size of the changes in rate in response to N0Sπ stimuli were explained by the change in the stimulus from N0 toward Nu. Note that there were only a few low-CF (<1.5 kHz, filled triangles) in this dataset, so it is clear that the correlations illustrated in Figure 6 applied to the much larger group of high-CF neurons (open circles).
Figure 6. Correlation between the rate difference elicited by addition of a dichotic tone (N0Sπ) at 0 dB SNR and the rate difference between responses to N0 and Nu conditions. Correlation coefficients and p-values are shown; a star indicates that the correlation coefficient was significant after Bonferroni correction (p < 0.0014). Neurons with CF below 1.5 kHz (low–CF) are shown in filled triangles, whereas neurons with CF above 1.5 kHz (high–CF) are shown in open circles. Solid lines show linear regressions.
Rate-based neural thresholds
Rate-based thresholds of all units for the N0S0 and N0Sπ conditions at five noise levels were computed and compared with behavioral data from previous studies (Figure 7). There was no clear trend in the numbers of units with increasing or decreasing rate-change direction across frequency, for either the N0S0 or N0Sπ condition, except a weak trend of more units with increasing rate at the lowest noise level tested (bottom row). The lowest rate thresholds across frequency were lower for the N0Sπ condition than for the N0S0 condition, as expected.
Figure 7. Rate-based threshold for N0S0 (A) and N0Sπ (B) conditions. Thresholds of most sensitive neurons across frequencies matched human behavioral data for the N0S0 condition, but had a trend different from human for the N0Sπ condition. Neural thresholds at 500 Hz matched rabbit behavioral data for both conditions.
The lowest rate thresholds at 500 Hz matched the mean rabbit behavioral detection threshold at the same frequency (Zheng et al., 2002). Compared with human thresholds, the lowest rate thresholds for the N0S0 condition were close to human thresholds across frequencies, but the lowest rate thresholds for the N0Sπ condition only matched human thresholds at high frequencies (note that the lower limit of SNRs tested limited this comparison, see below). Human thresholds from Goupell (2012) are slightly lower than van de Par and Kohlrausch (1999) at some frequencies, possibly due to differences in paradigm and stimulus bandwidths. Note that stimuli used in previous studies have slightly different parameters from this study: stimuli in Zheng et al. (2002) had 200-Hz bandwidth (vs. 116 Hz in this study) and an overall level of 63 dB SPL; stimuli in van de Par and Kohlrausch (1999) had bandwidths of 100, 250, 500 Hz and 1 kHz (vs. 116 Hz, 232, 463, and 926 Hz in this study) for center frequencies of 500 Hz, 1, 2, and 4 kHz, and with overall level of 70 dB SPL; stimuli in Buss et al. (2003) had 50-Hz bandwidth and overall noise levels of 42, 57, and 72 dB SPL; stimuli in Goupell (2012) had bandwidths of 78, 240, 456, and 888 Hz (vs. 116, 463, 926, and 1,852 Hz in this study) for center frequencies of 500 Hz, 2, 4, and 8 kHz. However, despite the discrepancies among stimuli, in general, the lowest rate-based thresholds could explain human thresholds for the N0S0 condition across all frequencies tested and for the N0Sπ condition at high frequencies. Note that the thresholds of most sensitive neurons across frequencies did not vary qualitatively across noise levels, consistent with human thresholds tested at multiple noise levels (Buss et al., 2003) and with a roving-level paradigm (Henning et al., 2005).
Rate-based neural binaural masking level differences
Neural BMLDs were evaluated in two ways: using the BMLDs of individual neurons, and using the BMLDs calculated from the N0S0 and N0Sπ thresholds of the neural population. For BMLDs of single neurons (Figure 8), only neurons with measurable thresholds for both N0S0 and N0Sπ conditions are plotted, together with human BMLDs (van de Par and Kohlrausch, 1999; Buss et al., 2003; Goupell, 2012). BMLDs were typically positive, indicating greater TIN sensitivity for N0Sπ compared to N0S0. There was no clear association observed between small or negative.
Figure 8. Binaural masking level differences (BMLDs) calculated based on single-neuron thresholds for both N0S0 and N0Sπ conditions. Open triangles indicate that the direction of change in rate vs. SNR at threshold for the N0Sπ condition was the same as for the N0S0 condition, whereas filled triangles indicate opposite direction of change in rate at threshold for the N0S0 and N0Sπ conditions. Only neurons that had measurable thresholds in both N0S0 and N0Sπ conditions are shown here.
BMLDs and rate-change direction for either N0S0 or N0Sπ conditions, in contrast to a previous report (Jiang et al., 1997a). There was also no clear pattern of same (open symbols) or opposite (filled symbols) rate-change directions for N0S0 and N0Sπ conditions across frequency (i.e., thresholds were similar for upward and downward triangles). Overall, there were more neurons with the same rate-change directions than with opposite rate-change directions (more open symbols than filled symbols) between N0S0 and N0Sπ conditions. Among neurons with opposite rate-change directions across conditions, more neurons had decreasing rate at threshold for the N0S0 condition (more filled downward than upward triangles). At 500 Hz, single-neuron BMLDs were close to human BMLDs at noise levels of 45 and 65 dB SPL, but not at other noise levels. At 1 kHz and above, the maximum single-neuron BMLDs were larger than human BMLDs. The maximum BMLDs were similar across noise levels, as well as across frequencies, unlike human BMLDs that decrease substantially with increasing frequency (van de Par and Kohlrausch, 1999; Goupell, 2012).
To calculate BMLDs of the neural population, neural thresholds for the most sensitive subset of neurons were calculated for 0.5, 1, 2, 4, and 8 kHz for the N0S0 or N0Sπ conditions. The decision to focus on the most sensitive units for this analysis, as proposed by the lower-envelope principle (Barlow et al., 1971), was based on the fact that many of the neural thresholds were significantly higher than behavioral thresholds (Figure 7). Due to the limited SNR range that was tested, many sensitive neurons were suprathreshold (greater than 70.7% correct) at the lowest tested SNR, especially for the N0Sπ condition. To reduce the number of neurons for which the BMLD estimate was limited in this way, individual thresholds were recalculated using a criterion of 79.1% correct for the population-threshold results shown in Figure 9 (squares and diamonds). Individual symbols in Figure 9 represent all neurons that had thresholds above the lowest SNR tested. For each frequency, the population threshold was based on the neurons with thresholds in the lowest 10th percentile within a one-octave range centered at that frequency. Thresholds at 55–75 dB SPL had similar patterns and were plotted together in Figure 9, which shows that neural population thresholds for both N0S0 (blue solid line) and N0Sπ conditions (red dashed line) did not vary across frequency. Human thresholds were moved up by 4 dB to align the means of the human and N0S0 thresholds of the population, to better compare the trend across frequency (Figure 9). Human N0Sπ thresholds increase as a function of frequency, whereas thresholds of the neural population did not. Therefore, human and neural BMLDs had different trends across frequency: human BMLDs decrease with increasing frequency, whereas neural BMLDs did not. The BMLDs based on the neural population thresholds were smaller than the maximum single-neuron BMLDs, as expected due to averaging across the subsets of sensitive neurons for calculation of the population thresholds. However, the BMLDs based on either neural-population or single-neuron thresholds had similar trends across frequency.
Figure 9. N0S0 (solid blue line) and N0Sπ thresholds (dashed red line) of the neural population across frequency. Individual neural thresholds at 79.1% correct for N0S0 (blue square) and N0Sπ (red diamond) conditions, for noise levels of 55–75 dB SPL are shown for all neurons with measurable thresholds above the lowest SNR tested. Symbols with a black star indicate that the threshold was lower than the lowest measured SNR. Human detection thresholds are from van de Par and Kohlrausch (1999) and shifted up by 4 dB for comparison with neural thresholds, which were computed using a higher criterion. Neural binaural masking level differences (BMLDs) had a different trend across frequency compare to human BMLDs.
In the current study, single-neuron responses to TIN stimuli were recorded in the IC for both N0S0 and N0Sπ conditions over a wide range of target frequencies, as well as noise and tone levels. For the population of neurons, changes in rate due to interaural decorrelation were strongly correlated with changes in rate upon addition of an out-of-phase tone to identical noise at all noise levels.
Comparison with previous physiological studies
There have been a limited number of physiological studies of neural responses to both N0S0 stimuli and N0Sπ stimuli in the IC (Jiang et al., 1997a,b; Lane and Delgutte, 2005). The results here were most comparable to those of Jiang et al. (1997a,b), who used a tone target, as opposed to the chirp target used in Lane and Delgutte (2005). There were a few differences between the stimuli used in the current study and in Jiang et al. (1997a) that may explain differences in the results between the two studies. First, responses were only recorded for a tone frequency of 500 Hz in Jiang et al. (1997a), for neurons with a range of CFs, up to 1.5 kHz. Large differences between the tone frequency and CF would be expected to affect response properties. For example, the response of a model auditory-nerve (AN) fiber (Zilany et al., 2014) is saturated in response to a CF tone at 65 dB SPL, but not in response to a 65-dB-SPL tone one octave below CF. Therefore, when the tone frequency is far from CF, AN rates would vary with stimulus sound level, possibly a stronger cue than the relatively small change in neural fluctuations that would result from an off-CF tone. Thus, the difference between CF and target-tone frequency could explain the finding that the majority of neurons in Jiang et al. (1997a) had increasing rate with increasing SNR for the N0S0 condition, whereas many neurons in the current study had decreasing rate versus SNR.
Second, many neurons in the current study did not have measurable thresholds due to the limited range of SNRs tested, but finer steps and a wider range of SNRs were used in Jiang et al. (1997a), so thresholds were measurable for almost all neurons. However, it is worth noting that a 20-dB range of SNRs were tested in this study; thus, neurons without a measurable threshold over this SNR range were largely insensitive to addition of a tone. Thresholds for more neurons might have been measured if the SNR had been increased further, but such thresholds would likely reflect changes in response to tone levels high above behavioral thresholds, and would thus not be relevant to tone-in-noise detection.
Third, the masker in Jiang et al. (1997a) had a bandwidth from 50 Hz to 5 kHz and a level of 65 dB SPL, whereas the current study used 1/3-octave noise centered at the tone frequency, presented over a wide range of noise levels, including 65 dB SPL. The difference in masker bandwidth between studies represents a large difference in noise spectrum level: e.g., 28 dB SPL for Jiang et al. (1997a) 65 dB SPL overall level noise, versus a spectrum level of 44 dB SPL for the 500-Hz target tone tested at the overall noise level of 65 dB SPL in the current study. This difference in spectrum level would have elicited different responses in the periphery, especially at low stimulus frequencies. Even though peripheral neurons respond to a wide frequency range at high sound levels (Ruggero, 1992), the tuning is usually asymmetric and spreads more toward lower frequencies (Schmiedt, 1989). Therefore, for low-CF neurons (e.g., 1 kHz), possibly only the low frequency components of the noise masker used in Jiang et al. (1997a) effectively masked the tone. Additionally, due to non-linear cochlear compression (Robles and Ruggero, 2001), neural responses would differ for maskers having different spectral levels, though the overall level may be matched.
The role of interaural correlation in N0Sπ responses and relationship to other binaural cues
Adding an out-of-phase tone reduces the IAC (e.g., Bernstein and Trahiotis, 2017). The change in rate elicited by an out-of-phase tone was significantly correlated with the rate difference between responses to N0 and Nu noise (Figure 6); the large proportion of variance explained (37–69%) suggested an important role of the IAC in physiological N0Sπ responses.
Results showed that both ITD-, and IAC-based cues explained a proportion of neural responses to N0Sπ stimuli (maximum 34 and 69%, respectively) (Fan, 2020). The ITD-based hypothesis explained responses at low-to-medium noise levels, whereas the IAC-based hypothesis explained TIN responses at all noise levels. The IAC-based hypothesis explained a larger proportion of variance in rate responses at 65 dB SPL, at which N0 and Nu noise responses were collected. However, these cues are not independent. For example, the decreasing trend in the proportion of results explained by the ITD-based hypothesis as noise level increased could be due to the fact that envelope ITDs dominated responses of the high-CF neurons, which were the majority of the neurons in the population studied here. However, the fluctuation amplitudes in AN responses saturate (i.e., flatten) at higher sound levels, and thus binaural differences in the neural representations of the stimulus envelope would also decrease with increasing sound level, which would explain a weaker effect of envelope ITDs at high sound levels. Also, at high frequencies, IAC-cues have been proposed to be envelope-based (Durlach, 1964; Bernstein and Trahiotis, 1996).
Some effort has been made to separate the role of IAC and ITD in binaural detection (van der Heijden and Joris, 2010; Culling, 2011). Based on results from these studies, both ITD and ILD cues are proposed to contribute to interaural decorrelation. Adding an out-of-phase tone not only introduces ITDs, but also ILDs; additionally, the added binaural cues are time-varying. The dynamic range of neural responses to ILD was correlated not only to that of N0Sπ responses, but also to the dynamic range of ITD responses (Figure 5). Fluctuations of ITD in an N0Sπ stimulus increase with increasing tone level, whereas fluctuations of ILD first increase and then decrease as tone level increases (Zurek, 1991). Therefore, interaural decorrelation involves a nonlinear combination of ITD and ILDs cues: both ITD and ILD cues affect IAC at low tone levels, whereas at high tone levels (e.g., above 4 dB SNR), ITD cues dominate IAC. This proposed idea is consistent with a previous modeling study (Mao and Carney, 2014) in which ITD cues are shown to dominate in stimuli with low modulation depths (e.g., tone-plus-noise), and the combination of ITD and ILD cues dominate in stimuli with high modulation depths (e.g., noise). In that study, the nonlinear combination of ITD and ILD cues is described by the slope of the interaural envelope difference (SIED), whereas detection in the N0Sπ condition at high frequencies has been proposed to be explained by the envelope-based IAC (Durlach, 1964; Bernstein and Trahiotis, 1996). Thus, the SIED cue is hypothesized to be a specific implementation of an envelope-based IAC in explaining N0Sπ responses.
Neural binaural masking level differences vs. human binaural masking level differences
Rate-based thresholds were estimated for both N0S0 and N0Sπ conditions in order to estimate neural BMLDs over a range of frequencies and noise levels. For the N0S0 condition, the lowest rate-based thresholds across frequency could explain human detection thresholds. For the N0Sπ condition, the lowest rate-based thresholds across frequency had a different trend from human detection thresholds: neural thresholds were higher (i.e., worse) than human thresholds at low frequencies, and lower (i.e., better) than human thresholds at high frequencies. Many neurons had BMLDs as large as 20 dB. BMLDs estimated based on the most sensitive units in the neural population and estimates of maximum BMLDs for single neurons only varied slightly across frequency, whereas human BMLDs decrease substantially with increasing frequency. BMLDs estimated for the neural population were shown to be slightly lower than maximum single-neuron BMLDs across all frequencies, because individual neurons with the lowest thresholds in either the N0S0 or N0Sπ condition did not always have the lowest thresholds in the other condition.
Rate-based neural thresholds were similar across noise levels, consistent with human psychophysical studies (Buss et al., 2003). Human BMLDs have been shown to be minimally affected by the roving-level paradigm, in which stimulus levels randomly vary from interval to interval (Henning et al., 2005). Similar patterns of rate-based neural BMLDs across noise levels could explain the level-resistance of human listeners.
Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://osf.io/kbrnw/.
The animal study was reviewed and approved by University of Rochester, University Committee on Animal Resources.
LF designed and conducted the experiment, analyzed data, and wrote the manuscript. KH was involved in data analysis and provided feedback on the manuscript. LC proposed the general hypothesis, was involved in experimental design, data analysis, and edited the manuscript. All authors contributed to the article and approved the submitted version.
This study was supported by NIH-DC-010813.
Thanks to Kristina Abrams for help with animal husbandry and surgical procedures, to Margaret Youngman and Meron Abate for assistance with data collection, and to Douglas Schwarz for help with software and hardware development. Thanks to Emily Buss and Matthew Goupell for providing the human threshold values from their studies. Also thanks to Goupell for discussions about binaural cues. Ralf Haefner, Marc Schieber, and Steven McAleavey provided useful suggestions for analysis and presentation. We remember Armin Kohlrausch, who discussed this work with us and provided extensive comments on a previous version of the manuscript.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Asadollahi, A., Endler, F., Nelken, I., and Wagner, H. (2010). Neural correlates of binaural masking level difference in the inferior colliculus of the barn owl (Tyto alba). Eur. J. Neurosci. 32, 606–618. doi: 10.1111/j.1460-9568.2010.07313.x
Bernstein, L. R., and Trahiotis, C. (1997). The effects of randomizing values of interaural disparities on binaural detection and on discrimination of interaural correlation. J. Acoust. Soc. Am. 102, 1113–1120. doi: 10.1121/1.419863
Bernstein, L. R., and Trahiotis, C. (2017). An interaural-correlation-based approach that accounts for a wide variety of binaural detection data. J. Acoust. Soc. Am. 141, 1150–1160. doi: 10.1121/1.4976098
Caird, D. M., Palmer, A. R., and Rees, A. (1991). Binaural masking level difference effects in single units of the Guinea pig inferior colliculus. Hear. Res. 57, 91–106. doi: 10.1016/0378-5955(91)90078-N
Cant, N. B., and Oliver, D. L. (2018). “Overview of auditory projection pathways and intrinsic microcircuits,” in The mammalian auditory pathways, eds D. L. Oliver, N. B. Cant, R. R. Fay, and A. N. Popper (Cham: Springer), 7–39. doi: 10.1007/978-3-319-71798-2_2
Fan, L. (2020). Physiological studies of binaural tone-in-noise detection in the inferior colliculus: The role of stimulus envelope and neural fluctuations. Unpublished dissertation. Rochester, NY: University of Rochester.
Fan, L., Henry, K. S., and Carney, L. H. (2021). Responses to diotic tone-in-noise stimuli in the inferior colliculus: Stimulus envelope and neural fluctuation cues. Hear. Res. 409:108328. doi: 10.1016/j.heares.2021.108328
Hawley, M. L., Litovsky, R. Y., and Culling, J. F. (2004). The benefit of binaural hearing in a cocktail party: Effect of location and type of interferer. J. Acoust. Soc. Am. 115, 833–843. doi: 10.1121/1.1639908
Henning, G. B., Richards, V. M., and Lentz, J. J. (2005). The effect of diotic and dichotic level-randomization on the binaural masking-level difference. J. Acoust. Soc. Am. 118, 3229–3240. doi: 10.1121/1.2047167
Jiang, D., McAlpine, D., and Palmer, A. R. (1997a). Responses of neurons in the inferior colliculus to binaural masking level difference stimuli measured by rate-versus-level functions. J. Neurophysiol. 77, 3085–3106. doi: 10.1152/jn.19126.96.36.19985
Jiang, D., McAlpine, D., and Palmer, A. R. (1997b). Detectability index measures of binaural masking level difference across populations of inferior colliculus neurons. J. Neurosci. 17, 9331–9339. doi: 10.1523/JNEUROSCI.17-23-09331.1997
Krishna, B. S., and Semple, M. N. (2000). Auditory temporal processing: Responses to sinusoidally amplitude-modulated tones in the inferior colliculus. J. Neurophysiol. 84, 255–273. doi: 10.1152/jn.2000.84.1.255
Lane, C. C., and Delgutte, B. (2005). Neural correlates and mechanisms of spatial release from masking: Single-unit and population responses in the inferior colliculus. J. Neurophysiol. 94, 1180–1198. doi: 10.1152/jn.01112.2004
Mao, J., and Carney, L. H. (2014). Binaural detection with narrowband and wideband reproducible noise maskers. IV. Models using interaural time, level, and envelope differences. J. Acoust. Soc. Am. 135, 824–837. doi: 10.1121/1.4861848
Nelson, P. C., and Carney, L. H. (2007). Neural rate and timing cues for detection and discrimination of amplitude-modulated tones in the awake rabbit inferior colliculus. J. Neurophysiol. 97, 522–539. doi: 10.1152/jn.00776.2006
Palmer, A. R., Jiang, D., and McAlpine, D. (1999). Desynchronizing responses to correlated noise: A mechanism for binaural masking level differences at the inferior colliculus. J. Neurophysiol. 81, 722–734. doi: 10.1152/jn.19188.8.131.522
Ramachandran, R., Davis, K. A., and May, B. J. (2000). Rate representation of tones in noise in the inferior colliculus of decerebrate cats. J. Assoc. Res. Otolaryngol. 1, 144–160. doi: 10.1007/s101620010029
Schwarz, D. M., Zilany, M. S., Skevington, M., Huang, N. J., Flynn, B. C., and Carney, L. H. (2012). Semi-supervised spike sorting using pattern matching and a scaled Mahalanobis distance metric. J. Neurosci. Methods 206, 120–131. doi: 10.1016/j.jneumeth.2012.02.013
van de Par, S., and Kohlrausch, A. (1999). Dependence of binaural masking level differences on center frequency, masker bandwidth, and interaural parameters. J. Acoust. Soc. Am. 106, 1940–1947. doi: 10.1121/1.427942
van der Heijden, M., and Joris, P. X. (2010). Interaural correlation fails to account for detection in a classic binaural task: Dynamic ITDs dominate N0Spi detection. J. Assoc. Res. Otolaryngol. 11, 113–131. doi: 10.1007/s10162-009-0185-8
Whitehead, M. L., Lonsbury-Martin, B. L., and Martin, G. K. (1992). Evidence for 2 discrete sources of 2f1-F2 distortion-product otoacoustic emission in rabbit. 1. Differential dependence on stimulus parameters. J. Acoust. Soc. Am. 91, 1587–1607. doi: 10.1121/1.404382
Zheng, L., Early, S. J., Mason, C. R., Idrobo, F., Harrison, J. M., and Carney, L. H. (2002). Binaural detection with narrowband and wideband reproducible noise maskers: II. Results for rabbit. J. Acoust. Soc. Am. 111, 346–356. doi: 10.1121/1.1423930
Zheng, Y., and Escabi, M. A. (2013). Proportional spike-timing precision and firing reliability underlie efficient temporal processing of periodicity and envelope shape cues. J. Neurophysiol. 110, 587–606. doi: 10.1152/jn.01080.2010
Zilany, M. S., Bruce, I. C., and Carney, L. H. (2014). Updated parameters and expanded simulation options for a model of the auditory periphery. J. Acoust. Soc. Am. 135, 283–286. doi: 10.1121/1.4837815