Link to the University of Pittsburgh Homepage
Link to the University Library System Homepage Link to the Contact Us Form

Constructing Invariant Representation of Sound Using Optimal Features And Sound Statistics Adaptation

Liu, ShiTong (2021) Constructing Invariant Representation of Sound Using Optimal Features And Sound Statistics Adaptation. Doctoral Dissertation, University of Pittsburgh. (Unpublished)

Download (3MB) | Preview


The ability to convey information using sound is critical for the survival of many vocal species, including humans. These communication sounds (vocalizations or calls) are often comprised of complex spectrotemporal features that require accurate detection to prevent mis-categorization. This task is made difficult by two factors: 1) the inherent variability in vocalization production, and 2) competing sounds from the environment. The auditory system must generalize across these variabilities while maintaining sufficient sensitivity to detect subtle differences in fine acoustic structures. While several studies have described vocalization-selective and noise invariant neural responses in the auditory pathway at a phenomenological level, the algorithmic and mechanistic principles behind these observations remain speculative.

In this thesis, we first adopted a theoretical approach to develop biologically plausible computational algorithms to categorize vocalizations while generalizing over sound production and environment variability. From an initial set of randomly chosen vocalization features, we used a greedy search algorithm to select most informative features that maximized vocalization categorization performance and minimized redundancy between features. High classification performance could be achieved using only 10–20 features per vocalization category. The optimal features tended to be of intermediate complexity, offering an optimal compromise between fine and tolerant feature tuning. Predictions of tuning properties of putative feature-selective neurons matched some observed auditory cortical responses. While this algorithm performed well in quiet listening conditions, it failed in noisy conditions. To address this shortcoming, we implemented biologically plausible algorithms to improve model performance in noisy conditions. We explored two model elements to aid adaption to sound statistics: 1. De-noising of noisy inputs by thresholding based on wide-band energy, and 2. Adjusting feature detection parameters to offset noise-masking effects. These processes were consistent with physiological observations of gain control mechanisms and principles of efficient encoding in the brain. With these additions, our model was able to achieve near-physiological levels of performance. Our results suggest that invariant representation of sound can be achieved based on task-dependent features with adaptation to input sound statistics.


Social Networking:
Share |


Item Type: University of Pittsburgh ETD
Status: Unpublished
CreatorsEmailPitt UsernameORCID
Liu, ShiTongshl87@pitt.edushl87
ETD Committee:
TitleMemberEmail AddressPitt UsernameORCID
Thesis AdvisorSadagopan,
Committee ChairGandhi,
Committee MemberKandler,
Committee MemberSmith, Matthew
Committee MemberChandrasekaran,
Committee Member,
Date: 3 September 2021
Date Type: Publication
Defense Date: 13 April 2021
Approval Date: 3 September 2021
Submission Date: 22 July 2021
Access Restriction: No restriction; Release the ETD for access worldwide immediately.
Number of Pages: 117
Institution: University of Pittsburgh
Schools and Programs: Swanson School of Engineering > Bioengineering
Degree: PhD - Doctor of Philosophy
Thesis Type: Doctoral Dissertation
Refereed: Yes
Uncontrolled Keywords: acoustic simulation, animals, auditory cortex, auditory perception, marmosets, guinea pigs, vocalization, sound, gain control, computer simulation, noise, invariant representation
Date Deposited: 03 Sep 2021 18:30
Last Modified: 03 Sep 2021 18:30


Monthly Views for the past 3 years

Plum Analytics

Actions (login required)

View Item View Item