inferno
transfer in progress...
Repository for the Paper "Brain-inspired model for early vocal learning and correspondence matching using free-energy optimization", PLoS Computational Biology.
Abstract
We propose a neural architecture called INFERNO standing for Iterative Free-Energy Optimization of Recurrent Neural Networks. Free-energy (noise) minimization is used for exploring, selecting and learning sound primitives. The whole system is implemented with recurrent spiking neural networks for the learning and retrieving of spike trains constituting the audio memory sequences encoded at the milliseconds order.
Experiment 1 - compact representation in small dataset
audio database 14.000 MFCC (3 minutes length), network size 14.000 MFCC (compression 1:1); we study here the capacity of reconstruction of the INFERNO architecture
The audio dataset consists of the repetition of 5 sentences in french repeated three times by a native speaker (young woman). The audio .wav file is translated into 14000 MFCC vectors (dimension 12) sampled at 25ms each.
The number of Striatal and Gp units are chosen so that the representation of the MFCC vectors is orthogonal, which means that the size for the BG layers corresponds to the number of MFCC to retrieve in the sequence; ie 14000 units.
Audio Files (speaker #1, 20 seconds cut)
sentence: "un homme de bien agit et raisonne en homme de bien, un méchant agit et raisonne en méchant" Pierre Corneille ; Discours du poème dramatique (1660)
[Original wav file/Filtered]
[Reconstructed sound, Speaker #1, Sentence #1]
[Reconstructed sound, after free-energy minimization period #0]
[Reconstructed sound, after free-energy minimization period #1]
[Reconstructed sound, after free-energy minimization period #2]
[Reconstructed sound, after free-energy minimization period #3]
Experiment 2 - generalization in large dataset
audio database 140.000 MFCC (29 minutes length), network size 14.000 MFCC (compression 1:10); we study here the capacity of generalization of the INFERNO architecture
Experiment 2 consists on a bigger audio dataset of 27 minutes length is used in experiment 2 from six native french speakers (same sentence as in Experiment 1), three women and three men. The audio .wav file is translated into 140000 MFCC vectors (dimension 12) sampled at 25ms each.
The number of Striatal and Gp units are kept the same as for the first experiment (ie 14000 units), which means that the size for the BG layers are ten times lower as the number of MFCC to retrieve in the sequence.
This second experiment will serve to test the generalization capabilities of the INFERNO architecture and its robustness to high variabilities.
Audio Files (speaker #1, 20 seconds cut)
sentence: "un homme de bien agit et raisonne en homme de bien, un méchant agit et raisonne en méchant" Pierre Corneille ; Discours du poème dramatique (1660)
[Original wav file/Filtered, Speaker #1 only, Sentence #1]
[Reconstructed sound, Speaker #1-#6, Sentence #1]