Unsupervised Structural Plasticity

An online unsupervised algorithm for Winner-Take-All with binary synapses.

Classifying unsupervised high-dimensional spike trains

In certain practical Machine Learning problems the class labels or "right answers" are not provided. The learning techniques used to tackle such problems are termed as unsupervised rules. These types of problems can be found in various real-world scenarios. Hence, after proposing two supervised structural plasticity based algorithms namely Liquid State Machine with Dendritically Enhanced Readout (LSM-DER) and Morphology Optimizing Tempotron (MOT), we ventured into developing an unsupervised morphological learning rule for a layer of neurons. This work led to two innovations. First, we designed a novel architecture termed as Winner-Take-All (WTA) employing Neurons with NonLinear Dendrites (WTA-NNLD). Second, for training WTA-NNLD we developed a branch-specific Spike Timing Dependent Plasticity based Network Rewiring (STDP-NRW) learning rule.

Architecture

The WTA is a computational framework in which a group of recurrent neurons cooperate and compete with each other for activation. WTA-NNLD is a spike based WTA network employing neurons with lumped dendritic nonlinearities as the competing entities. For implementing lateral inhibition, an inhibitory neuron has been included which, upon activation, provides a global inhibition signal to all the neurons. Unlike traditional Machine Learning systems that require high resolution weights, the proposed network uses low resolution non-negative integer weights and trains itself through modifying connections of inputs to dendrites. Hence, change of the 'morphology' or structure of the neurons (in terms of connectivity pattern) reflects the learning. This results in easier hardware implementation since a low-resolution non-negative integer weight of W can be implemented by activating a shared binary synapse W times through time multiplexing schemes like Address Event Representation (AER).

WTA-NNLD

learning_rule

Learning rule

Since we consider binary synapses with weight 0 or 1, we do not have the provision to keep real valued weights associated with them. Hence, to guide the unsupervised learning, we define a correlation coefficient based fitness value cnpj(t) for the pth synaptic contact point on the jth dendrite of the nth neuron of the WTA network, as a substitute for its weight. In the STDP-NRW algorithm, structural plasticity or connection modifications happen in longer timescales (at the end of patterns) which is guided by the fitness function cnpj(t) updated by a STDP type rule in shorter timescales (at each pre- and post-synaptic spike). As shown in the accompanying figure, when a post-synaptic spike occurs at tpost1 the value of cnpj(t) increases and due to the appearance of a pre-synaptic spike at tpre2, cnpj(t) reduces. After the presentation of each pattern, STDP-NRW replaces the synapse having the least value of cnpj(t) of a target set with the synapse having the maximum value of cnpj(t) of a replacement set.

Specificity vs. Sensitivity

When a pattern is presented to the WTA-NNLD and any one of the N neurons produce an output spike, a global inhibition current Iinh(t) is injected into all the N neurons. The slow time constant τs,inh of this signal controls the output firing activity of the WTA-NNLD. If a large value of τs,inh (w.r.t to pattern duration) is set, only one neuron produces an output spike i.e. patterns of same class are encoded by a single neuron. During training of WTA-NNLD for this case, different neurons get locked onto different classes of pattern and the latency gradually decreases until the end of the training. Hence, after completion of training, the unique neurons that have learned different classes of pattern rely only on the first few spikes (determined by the latency at the end of training) to predict the pattern's class thereby significantly reducing the prediction time. So, the sensitivity of the network is increased. However, the problem with this method is that it neglects most part of the pattern after the first few spikes which may lead to a lot of false detections. This limitation is demonstrated in the accompanying figure. Let us consider we are performing C class classification and assume that after the training phase is complete, neuron Nf1 responds to patterns belonging to Class 1. Neuron Nf1 has trained itself to provide an output spike depending on the position of the first few spikes (red spikes in dashed box) of the pattern. It neglects the rest of the pattern while providing a prediction. However, for longer patterns there is a chance that this spike set can occur anywhere inside a random pattern (not belonging to any class or to another class). The same neuron Nf1 responds to such patterns by producing a post-synaptic spike. Thus, we see that though trained WTA-NNLD is very sensitive in this case, it loses specificity.

spec_vs_sens

nsub_explanation

Role of Inhibitory Time Constant

To mitigate the above problem we set a moderate value of τs,inh. This ensures that for a single pattern, multiple neurons are capable of producing output spikes. Hence, patterns of the same class are now encoded by a sequence of successive firing of few neurons where each neuron fires for one subpattern. We denote nsub as the number of subpatterns that is set by a proper choice of τs,inh. Thus the original case of one NNLD firing for each pattern corresponds to nsub = 1. Thus the original case of one neuron firing for each pattern corresponds to nsub = 1 . In this article, for a C class classification we define a successful trial as one in which (a) during the training phase WTA-NNLD learns different unique representations for patterns of different classes and (b) after completion of training and achieving success in (a), the network produces the same representation, when presented with testing patterns corresponding to classes that it had learned during the training phase. When nsub = 1 i.e. no pattern subdivisions are made, this unique representation is a different neuron firing for different classes of patterns. When nsub > 1, the unique representation is a different sequence of successive neurons firing for different classes of patterns. When, nsub > 1, we allow the neurons to detect subpatterns within patterns. Since in this approach the WTA-NNLD gives weightage to the entire pattern before predicting its class, the number of false detections can be largely reduced.

Problem description and performance

The benchmark task we have selected to analyze the performance of the proposed method is the Spike Train Classification problem. In the generalized Spike Train Classification problem, C arrays of h Poisson spike trains having frequency f and length Tp are present which are labeled as classes 1 to C. Jittered versions of these templates are created by altering the position of each spike within the templates by a random amount. The network is trained by these jittered spike trains, and the task is to correctly identify a pattern's class. To demonstrate the performance of our system, as an example, we consider a particular trial of four class classification and look into the first and last 3 epochs during its training. It is evident from the accompanying figure that during the first 3 epochs, WTA-NNLD produces arbitrary sequences of spikes. However, it can be seen that after the training of the network is complete, WTA-NNLD produces different firing sequences for different patterns while producing the same sequence when same patterns are encountered.

figure-9

mismatch

Effect of statistical variation

In this section we analyze the stability of our algorithm to hardware nonidealities by incorporating the statistical variations of the key subcircuits. The primary subcircuits needed to implement our architecture are synapse, dendritic squaring block, neuron and cnpj(t) calculator. While the variabilities of the synapse circuit are modeled by mismatch in the amplitude (I0) and time constant (τs) of the synaptic kernel function, the variabilities of the squaring block are captured by a multiplicative constant (cbni). We do not consider the variation of inhibitory current kernel since it is global and only a single instance is present in the architecture. The accompanying figure shows the performance of the proposed method when these nonidealities are included in the model for nsub = 1 (top figure) and nsub = 5 (bottom figure) keeping σjitters =  0.1. Lastly, the nonidealities of the cnpj(t) calculator block are modeled as a multiplicative constant (ccni). The bars corresponding to I0, τs, cbni, and ccni denote the performance degradation when statistical variations of I0, τs, cbni, and ccni are included individually. Finally, to mimic the proper hardware scenario we consider the simultaneous implementations of all the nonidealities, which is marked by (...). The (...) bars show that there is an 8% and 6% decrease in performance for nsub = 1 and nsub = 5 respectively.

Check out our paper to learn more

S. Roy and A. Basu, "An Online Unsupervised Structural Plasticity Algorithm for Spiking Neural Networks," IEEE Transactions on Neural Networks and Learning Systems. (Accepted) [pdf]