Liu, ShiTong
(2021)
Constructing Invariant Representation of Sound Using Optimal Features And Sound Statistics Adaptation.
Doctoral Dissertation, University of Pittsburgh.
(Unpublished)
Abstract
The ability to convey information using sound is critical for the survival of many vocal species, including humans. These communication sounds (vocalizations or calls) are often comprised of complex spectrotemporal features that require accurate detection to prevent mis-categorization. This task is made difficult by two factors: 1) the inherent variability in vocalization production, and 2) competing sounds from the environment. The auditory system must generalize across these variabilities while maintaining sufficient sensitivity to detect subtle differences in fine acoustic structures. While several studies have described vocalization-selective and noise invariant neural responses in the auditory pathway at a phenomenological level, the algorithmic and mechanistic principles behind these observations remain speculative.
In this thesis, we first adopted a theoretical approach to develop biologically plausible computational algorithms to categorize vocalizations while generalizing over sound production and environment variability. From an initial set of randomly chosen vocalization features, we used a greedy search algorithm to select most informative features that maximized vocalization categorization performance and minimized redundancy between features. High classification performance could be achieved using only 10–20 features per vocalization category. The optimal features tended to be of intermediate complexity, offering an optimal compromise between fine and tolerant feature tuning. Predictions of tuning properties of putative feature-selective neurons matched some observed auditory cortical responses. While this algorithm performed well in quiet listening conditions, it failed in noisy conditions. To address this shortcoming, we implemented biologically plausible algorithms to improve model performance in noisy conditions. We explored two model elements to aid adaption to sound statistics: 1. De-noising of noisy inputs by thresholding based on wide-band energy, and 2. Adjusting feature detection parameters to offset noise-masking effects. These processes were consistent with physiological observations of gain control mechanisms and principles of efficient encoding in the brain. With these additions, our model was able to achieve near-physiological levels of performance. Our results suggest that invariant representation of sound can be achieved based on task-dependent features with adaptation to input sound statistics.
Share
Citation/Export: |
|
Social Networking: |
|
Details
Item Type: |
University of Pittsburgh ETD
|
Status: |
Unpublished |
Creators/Authors: |
|
ETD Committee: |
|
Date: |
3 September 2021 |
Date Type: |
Publication |
Defense Date: |
13 April 2021 |
Approval Date: |
3 September 2021 |
Submission Date: |
22 July 2021 |
Access Restriction: |
No restriction; Release the ETD for access worldwide immediately. |
Number of Pages: |
117 |
Institution: |
University of Pittsburgh |
Schools and Programs: |
Swanson School of Engineering > Bioengineering |
Degree: |
PhD - Doctor of Philosophy |
Thesis Type: |
Doctoral Dissertation |
Refereed: |
Yes |
Uncontrolled Keywords: |
acoustic simulation, animals, auditory cortex, auditory perception, marmosets, guinea pigs, vocalization, sound, gain control, computer simulation, noise, invariant representation |
Date Deposited: |
03 Sep 2021 18:30 |
Last Modified: |
03 Sep 2021 18:30 |
URI: |
http://d-scholarship.pitt.edu/id/eprint/41466 |
Metrics
Monthly Views for the past 3 years
Plum Analytics
Actions (login required)
![View Item View Item](/style/images/action_view.png) |
View Item |