An Influence of Nonlinearities to Storage Capacity of Neural Networks

The more realistic neural soma and synaptic nonlinear relations and an alternative mean field theory (MFT) approach relevant for strongly interconnected systems as a cortical matter are considered. The general procedure of averaging the quenched random states in the fully-connected networks for MFT, as usually, is based on the Boltzmann Machine learning. But this approach requires an unrealistically large number of samples to provide a reliable performance. We suppose an alternative MFT with deterministic features instead of stochastic nature of searching a solution a set of large number equations. Of course, this alternative theory will not be strictly valid for infinite number of elements. Another property of generalization is an inclusion of the additional member in the effective Hamiltonian allowing to improve the stochastic hill-climbing search of the solution not dropping into local minima of the energy function. Especially, we pay attention to increasing of neural networks retrieval capability transforming the replica-symmetry model by including of different nonlinear elements. Some results of numerical modeling as well as the wide discussion of neural systems storage capacity are presented.


Introduction
The nonlinearities in the nature especially in the biology or neuroanatomy as well as in artificial technical systems and even social life play a marked role in the behavior either small separate particles or large-scale, massive, strongly interconnected systems.The neuroanatomy systems included the central nervous system with massive huge interconnected neural networks (NN) a cerebral cortex matter are belong to last ones.
In this paper, we would like to pay an attention to increasing of NN retrieval capability, narrowing the domain of stability and try to transform the replica -symmetry method based on including of different neural components with nonlinearities.
A question of neural networks storage capacity is old and it appears since the NN were first studied theoretically.The Hopfield (1982) was a pioneer of study of the capacity for random patterns and he established that amount of stories patterns p = αN , where N is the number of neurons, α = α c = 0.14 at the temperature T = 0, but it decreases rapidly when p > α c N .However Weisbuch (Weisbuch and Fogelman-Soulie, 1985) have proved that local stable patterns are existed only if p < N/2 ln N , i.e., more less than α c = 0.14.If patterns are very correlated each with high magnetization close to one, the storage capacity reaches N 2 /(ln N ) 2 (Wilshaw and Loguest-Higgens, 1970).For this case at the random situation, the maximum storage patterns is 2N as Cover (1985) established.For linear independent patterns in the pseudo-inverse case (Kohonen, 1984), Kanter capacity limits to N patterns.
An influence of nonlinearities to the formation different kinds of metastable states, retrieval, spin-glass, and mixture ones has been analyzed by Sompolinsky (1986).He emphasized that nonlinearity of the learning algorithm affects only weekly the retrieval of patterns at small α.Here the effect of diluted synapses and external noise have been evaluated.In another work (Matus and Perez, 1990), the state dependance of synaptic strengths expressed by squared, polynomial function was analyzed and shown by computational experiments, that the number of spurious states is reduced and the stability of retrieval states has been improved.But in both of last works, the nonlinearities were not realistic, artificially idealized ones.
Thus, in this paper we will lead to the more realistic neuronal soma and synaptic nonlinear relations and build an alternative mean field theory (MFT) approach relevant for strongly interconnected systems as a cortical matter.
The MFT approach we propose is based on analytical presentation of a state dependent Boltzman distribution and partition functions represented by correspond manipulate the summations as a MFT approximation and inclusion of the effective energy function that has a smoother landscape due to the extra terms.

Synapse Nonlinearities
The main sub-system of the brain cerebral cortex matter is the synapse-dendrite-somaaxon chain.Experiments demonstrate that all compose elements of the chain are characterized by nonlinearities ones of them are strong nonlinear as neuron cells other as a synaptic excitatory receptors or inhibitory ones are weakly one.The synapses both excitatory and inhibitory typically operate by changing the conductance of postsynaptic membrane openning ion channels.The time course of the synaptic conductance changes and in consequence the electrical current changes are different and depends on type of synapses.For fast excitatory (non-NMDA) and inhibitory (GABAa), the synapses operate with 1ms and peak conductance on the other of 1nS.The conductance is up 10 times larger than the slow excitatory (NMDA) and inhibitory (GABAb) with a time scale of 10-100ms.An influence of the synaptic nonlinearities to the dynamic processes in the neuron are wide represented in (Segev et al., 1989;Rall and Shepherd, 1968;Jack et al., 1975) or by neuron modeling software engineering Heines and Carveline from Duke University, Hogkin-Huxley (HH) (Hogkin and Huxley, 1939) modeling and so on.We use the static current-voltage relation of NMDA (Foster and Fagg, 1984;Mayer et al., 1980) which common view is shown in Fig. 1.There is the domain where the slope is of negative conductance.For our theoretical investigations we use the polynomial approximation example of Sompolinsky (Hopfield, 1982) as follows where k and λ are constants, V is the excitatory or inhibitory potentials expressed by voltage.
If one follows to the Hebb rule (Hebb, 1949) which results from the conjunctive presence of presynaptic firing or activation and postsynaptic one, one might be taken the change of the synaptic weight where a is learning rate, S i is presynaptic rate (activation) from the axon of the ith neuron output, and S j is the postsynaptic rate (activation) for the synapse in the dendrite of the jth neuron output.The learning rule (2) is biologically plausible because it expresses the hypothesis that the simultaneous presence both presynaptic and postsynaptic activity increments the synaptic strength.

Dendrite Nonlinearities
The dendrites are unique treelike structures that type depends on neuron predetermination: pyramidal, amacrine, stellate, Purkinje, etc.They are also the largest composed component of the brain.The endritic tree is the place where information is updated and delivered to the cell body of neuron or through dendro-dendritic interactions to the neighboring dendrites.The dendritic branches are thin, starting near the soma with a diameter of a few microns and the diameter decreases to less the one micron in contact place with axon or other dendrites.On terminals of dendrites are spines.Many types of dendrites have strong nonlinear voltage dependent channels and play important role in the neuronal processes (Garliauskas, 1998).Especially we would like to pay an attention to the dendritic bistability phenomenon.The basis of this phenomenon has the slow inward currents which are distributed over dendritic membrane (Schwindt and Crill, 1977;Llinas and Sugimori, 1980).These slow currents experimentally observed in motoneurons, brain-stem dendrites, Purkinje cells cause a dendritic membrane action bistability when a steady depolarization persists without the two stable states are formed an appearance of areas of positive and negative conductance slopes leads to a formation of the N -shaped current-voltage relation (C-VR).
The N -shaped current-voltage relation curve (b) (Fig. 2) crosses the abscissa V three times in the points V r , V u , and V d where V r is a rest potential, V u is a potential where a slope conductance is negative, and V d is a depolarized potential.Latter it was also established that the slow inward current mediated by calcium ions is the reason for a existence of stable depolarization in cat γ-motoneurons (Fig. 2, curve (b), point V d ).
This was confirmed experimentally by Llinas and Sigumori (1980).They directly demonstrated that the existence of dendritic stable depolarization depends on the inward calcium current.
In simplified case the N -shaped C-RV was approximated by broken-line curve (Gutman, 1984) or by polynomial approximation (Garliauskas, 1998).Let us take the description from last referred issue expressed by polynomial where a is the constant (a = 3) and b is a parameter (range is from 0.5 to 0.8), or simplified forms of a Nagumo's The nonlinearities of synapses and dendrites are very important for the storage information in the massive fully-connected neural networks.

Soma Nonlinearities
The neuron cell body (soma) is central part of a neuron which on behalf biochemical and electrical processes maintenances the life of cell, the main component of the brain.
The neuron through dendrites and the axon of the soma produces summation of spatial and temporal electrical signals.The spatial summation collects several weak signals together converting them into a large one.If the signal after summation overcomes the axon hillock's threshold, then the neuron fires and output signal is transmitted along axon to other neurons.This signal up each terminal button is unaffected.There is the uniformity of the hillock's signal in comparing with other processes in the brain as an analogue device.
Based on experiments on squid giant axon, the classical formalization of Hodgkin-Huxley (Hogkin and Huxley, 1939;Hodgkin and Katz, 1949) through background of ionic currents and through sodium and potassium conductance changes, and generation of periodic spiking activity (class 1.2) do not answer to the question how is formed the all-or nothing threshold effect function into cell body (hillock).
Though the all-or-nothing threshold affect of neuron was experimentally confirmed almost a hundred years ago but theoretical modeling results were represented in (Jack et al., 1975;Garliauskas, 1998), where the minimum threshold value of voltage was calculated or modeled in dynamics.

Axon Nonlinearities
The axon, that conducts the neuron's output signals to the goal cells, is very specialized component of CNS.It can contact with ten thousand other neurons in the cortical system.If the information in the cell bodies and the dendritic trees is integrated, in contrast the axons serve only to transit signals from cell body to other neurons over synapses and dendrites.Axonal signal propagation delay slows down neural communication conditions and influences to possible computational modeling characteristics.The modeling is predetermined by classical Hodgkin-Huxlay model (Hogkin and Huxley, 1939) and clear demonstrated by Koch and Bernander (Koch et al., 1983).
Most authors confirm that axon posses the uniformity of electrical behavior making sure that whatever pulse train is put into one end of the axon is readily propagated to the terminal of other neuron.The pulses of action potentials or spikes originate at the axon hillock propagate along axon of neuron in constant velocity and amplitude.The axon represents a radial symmetry, i.e., radial current is neglected, and characterizes an active strong nonlinear axon.In active axon, the membrane is presented with a negative resistance relation changed in time and along axon as a cable.

Prepositions to Mean Field Theory on NN
Different peculiarities and phenomena of artificial neural networks were analyzed in the early of the first issues contributed by Little (1978), Hopfield (1982), andPeretto (1984).All ideas in these previous works and the latter (Amit et al., 1985a;Amit et al., 1985b;Gardner, 1986) are turning around an interaction of Spin Glass and ferromagnetic macrostates, and the storage capacity of memorized or embedded patterns in the certain configured network.There were also analyzed the stable and metastable states, the first and the second -order phase transition, the 2p degenerated ground states, Mattis states, replica -symmetry breaking and many others fundamental thermodynamic properties of the Ising model.
I have done in introduction the short review of published works on the problem of the neural network capacity to retrieve the memorized states, on which in the first time spoke Hopfield (1982) considering a general content -addressable memory.The Hopfield model some times called zero-temperature model uses random sequential updating.The dynamics of the Hopfield model is completely asynchronous, where the discrete step of time tight coexists with the refractory period.The Little model is also supported by a discrete principle of the dynamical updating though it is synchronous one, because at the each time step all neuron states update simultaneously.Hopfield (1982), Amit (Amit et al., 1985a;Amit et al., 1985b), Gardner (Gardner, 1986;Gardner and Derrida, 1988), Buhmann (1989) have paid not a few attention to storage capacity of the full-connected neural ensembles.Especially, the study of Sompolinsky in his analytical issue (Sompolinsky, 1986) estimates the different presumptions to an influence of nonlinear updating of synapses and a static noise.His affirmation, that a presence of synaptic nonlinearities positive influence to retrieval capabilities, requires an essential foundation based upon consideration of concrete nonlinear characteristics of main components of neural networks.We are going to fill this blank niche of the research even though a little bit.
Another question, that is close to representation of synapse nonlinearities, is connected with an absence or presence synapses at all.On real biological neural networks not all neurons are connected to all others.One neuron is maximum bounded to approximately 10 4 neurons of all neurons to be 10 10 .Taking an attention to this fact, the new problem of consideration the asymmetry (dilution) of connections is arisen.The asymmetric diluted neural networks in dynamics have been considered by authors Sompolinsky (1986), Gardner and Derrida (1988), Choi (1990).The effect of influence of the dilution to the capacity α c and the overlap m c for the retrieval states is inversely proportional to dilution parameter of synapses: The more concentration of survived synapses the more critical memory capacity (Sompolinsky, 1986).
The influence of noise (static Gaussian) embedded to synapse strength to the memory capacity has been examined by Weisbuch and Fogel-Solie (1985), McElicee et al. (1987), and Sompolinsky (1986).The conclusion is one a noise influences to the memory capacity positively but it has either severe bounded conditions.Almost all information can be retrieved without loss if a level of the noise is less than 1 2 ln N , i.e., the noise must be η 2 1.The correct retrieval information is achieved when a noise vector is far from limits 1 and −1.It is very strong condition.

Alternative Mean Field Theory
In statistical physics no other way than an averaging over any randomness of the observable physical particles.For the solution of thermodynamic problems there were two possible cases.The first is to average the free energy, the second is to define the partition function.As usually, ones use a free-energy average by transformation into a partition function average.In statistical mechanics, this method has been in the first used by Kac (Kac and Lin, 1970), and later by Grinstein (Grinstein and Lather, 1976), Emery (1975).For the ferromagnetism, in its interpretation to the neurophysiology disordered fully connected systems, applying the replica procedure to the infinite Ising model, the latter method has been first used by Kirkpatrick and Sherrington (1978).Further more, this method has been widely disseminated among the most theoretic scientists, for example, Amit (Amit et al., 1985b), Gardner (1986), Sompolinsky (1986), and many others.The theory based upon these procedures is named the mean field theory.
The general procedure of averaging of the quenched random states in the fully connected networks for MFT, as usually, is based on the Boltzmann Machine learning.But this approach requires an unrealistically large number of sweeps (samples) to provide a reliable performance.We suppose an alternative MFT with instead of stochastic nature of search a solution a set of large number equations with deterministic features.Of course, this alternative theory will not be strictly valid for infinite number of elements.It will be approximate as well as BM too.

Analytical Presentation of Alternative MFT
Starting the formulation an alternative MFT we consider the origin presumptions of the fully connected neural network.The networks with symmetric couplings were studied in early works of Little, Hopfield, Amit and as it was shown the system of states in dynamics converges to local minimum which is the solution of global energy function.The solution remains absolutely stable.The state configuration of this minimum endows the network with abilities to the embedded memory.The symmetric approximation of the real neural networks possesses the limited properties.They lack reproduce the natural chaotic properties (Garliauskas, 2003), temporal associations (Kanter and Sompolinsky, 1987) though the many phenomena such as associative memory, collective computation, fault tolerance, influence of nose and damage of composed elements to storage capacity and others have been successfully performed.But it needs to note that the biological neural networks only in the sense of the synaptic couplings are endowed with a high degree of asymmetry.As it is shown by Garliauskas (2003) in this case the chaos phenomenon was successfully opened through the studies of Amit et al. (1985) have shown that the adding the some asymmetry by the dilution of couplings as an internal noise does not increase significantly the capacity to memorizing or retrieving of information.
We consider a neural network consisting abstracted neurons, states of which form the configurations {S i } N 0 = S, where N is a number of neurons, and the states starting from initial S o relax to local minima of the energy function where W ij are the strengths of synaptic couplings between i and j neurons.
The step function updating rule has been accepted.Here the h is a threshold value of neurons.
Allowing the randomness of synapse strengths through matrix |W| we resort to statistical updating methodology of network states for the learning processes.Then the probability of state configuration S at a temperature T will be given by the Boltzmann distribution or by the average of a state dependent function f (S) by the Gibbs distribution where Z is the partition function As usually, the standard updating procedure to compute < f(S) > is the Monte-Carlo sampling technique or simulated annealing with likelihood of uphill moves providing for search of global minimum of E(S).Other simulation techniques lead only for finding local minimum of E(S) since then the T -dependence disappears does not take part in the modeling.Now we define and apply an alternative mean field theory, which maybe called as a MFT approximation?The nondeterministic nature into the statistical systems we replace by a system of deterministic equations.
Let us proceed some analytic transformations by the Dirrac-function and the integration technique by small rectangular pulse calculation.
In the first step, let us formulate theory for one neuron with two-state The normal δ-function in the complex plane Substituting ( 10) to ( 9), one gets After some simple mathematical transformations the equality (11) becomes In general case for N neuron states S i using f (S) = exp(−E(S)/T ), we obtain Z as follows Under realization of multiplication the degree of the exponent in the last under integrating expression and returning to index i the effective energy becomes It is necessary to note that the effective energy function differs from the effective Hamiltonian function (Peretto, 1984) by presence of additional member E(V), which helps us to have a smoother surface than simple E(S) leading to the smaller probability to stuck into local minimum.
The mean field variables U i and V i are determined by the saddle-point equations They after definition of the partial derivatives, we obtain the system of equations as follows Because U i for neural network coincides with sum of firing potentials from all j-side neurons, the neuron state variables will be represented according to (17) by the following equations The way of solution of Eqs.16-18 we discuss in the next section.

Modeling Results of Memory Capacity Evaluation
Dynamic associative memory capacity is a measure of the ability of neural networks to store a set of binary patterns and at the same time be capable of associative recall those patterns.Another capacity measure, known as relative capacity, has been proposed which is an upper bound of memory capacity.It has been shown (Amari and Maginu, 1988;Amari and Yanai, 1993;Hopfield, 1982;Amit et al., 1985b) that patterns are remembered approximately (i.e., no perfect retrieval is allowed), and then must not exceed 0.14.This value is the relative capacity.Another result on the capacity of this memory for the case of error-free memory recall by one-pass parallel convergence is given by the absolute capacity (Weisbuch and Fogelman-Soulie, 1985;McEliece et al., 1987).On the other hand, if all memorized configurations are required to be in equilibrium with a probability close to 1, say 0.99, then an upper bound on can be derived by requiring that all bits of all configurations be retrievable with less than a 1 percent error.Noting that this stringent error correction requirement necessitates small values.
In order that error-free one-pass retrieval of a fundamental memory from random key patterns laying inside the Hamming hypersphere of radius is achieved with probability approaching 1.Here, it defines the radius of attraction of a fundamental memory.Amari and Maginu (1988) (also Amari and Yanai (1993)) have analyzed the transient dynamic behavior of memory recall under the assumption of a normally distributed with mean zero.The variance was calculated by taking the direct correlation up to two steps between the bits of the stored memories and those given one.Under these assumptions, the relative capacity was found to be equal to 0.16.This theoretical value is in good agreement with early simulations reported by Hopfield (1982) and with the theoretical value of 0.14 reported by Amit et al. (1985) using a method known as the replica method.
In other words, they showed that each of the fundamental memories is an attractor with a basin of attraction surrounding it.They also showed that once initialized inside one of these basins of attraction, the state converges to the basin's attractor in order ln(ln N ).We have simulated an example of Eqs.15-17 and the results as asymptotic solution moving to global attractor is shown in Fig. 3.
The radius of the basin of attraction becomes asymptotically smaller.It depends on nonlinearities.
We checked up some theoretical foundations of the recent works.According to (Amit et al., 1985, formula (9)) we reconstructed an error function as follows where t = m (2rα) 1/2 , m is a magnetization, r = p − s is the rest patterns over s, where s remains finite as a common quality of neurons at N → ∞.Letting r = 1 we have calculated the error function versus parameter t and α = p/N at temperature T = 0.The family of curves are presented in Fig. 4 when the α decreases, the error decreases too.
The range of changes α has been taken up to the critical value α < α c .At α → 0 Eq, (18) asymptotically tends to remaining m on the level 0.985.The N e is an average number of errors.The approximation (19) good coincides with the estimate of Hopfield (1982) up to values E rr ∼ = 1.8 and α = 0.12, but further two curves divergate.The Fig. 5a and display on an expanded scale in the insert Fig. 5b shows two domains of the match.The error is essential.Returning to Fig. 4, for α α c , and corresponding temperature T > T c there is a competition between entropy effects ( there are much more configurations at m = 0) and energy effects, when at low T the spins tend to minimum energy at non zero m.Below and upper critical temperature the behavior of the neural network is characterized by that: For T < T c , m = ±m(T ).For T > T c , m = 0 Below T c one observes a macroscopic magnetization.The probability distribution P (m) has two peaks at m = ±m(T ), where the widths of the peaks decrease as N −1/2 Fig. 4. The error versus parameter t and values of α.The discrete values are marked: diamonds -0.01, boxes -0.05, circles -0.1, crosses -0.12, and line -0.14. in the thermodynamic limit, N → ∞.The value of P (m) outside the peaks is quite small (e.g.P (0) ∼ exp(−CN 2 /3)).The main states ±m(T ) are so called pure states.Mathematically these states are Boltzmann probability measures on the space of all configurations.
Further we suppose that output potentials of neighbor neurons j to the base neuron i are nonlinear transformed twice in the synapse action and in the neuron i.The first is presented by Eq. 1 and pointed by the curve with crosses (Fig. 6a), the second one is presented by Eq. 4 and pointed by curve with circles.The resulting data after inserting Eq. 1 to the Eq. 4 are shown by the complex dependence (Fig. 6a, line curve) including two straight forwarded ranges and the oscillation range near zero of coordinates.
The surface in the 3D space showing dependence on potential and temperature T is presented in the Fig. 6b.Such dependence will be included in the further modeling.
where ζ µ i , ζ µ j are the i and j patterns, the overlap of stae S i with µth pattern is expressed Summing over µ and i and dividing by p and N the macroscopic overlap was got as follows Then the error in percents will be taken The modeling was carried out by representing input data matrices A[i, j] as a state matrix and B[i, p] as a pattern matrix.Here i = [1, N 1 ], j = [1, N 2 ], and p = [1, p 1 ] are taken of these upper ranges.The elements of matrices were taken randomly.Then the averaged m is represented according to ( 18) and ( 24) by such equality Now we W ij as a synaptic strength considered in the cases: (a) linear when where k is the constant general strength for all neurons, (b) nonlinear case when W ij was expressed by current according to the current-voltage relation ( 1), (c) double nonlinear case when the synaptic strength was taken as in item (b) and instead of the sigmoid function of neurons was taken strongly nonlinear Nagumo's relation (4).
The results of the numerical modeling are presented in Fig. 7a and b.The Fig. 7a shows that in the linear case (a) the error values are higher than in the nonlinear case (b).It means that expected memory capacity is higher in nonlinear presentation of the modeling preposition.Including double nonlinearities based on N -shaped relations of neurons the results are more surprised.As the minimum of curve in the Fig. 7b indicates at p = 3, T = 0.4, and N = 20 that the value of the error is essentially lower than in two upper cases.Thus, nonlinearities of the functional presentation of the neural networks provide a narrower range (as in (Sompolinsky, 1986) it was noticed) of changes of characteristics and more expressed a basin of the attraction in the neuron state configuration space.

Conclusions
The proposed alternative mean field theory model generalizes the process of averaging over random observable elements of fully connected artificial neural networks by a large number of equations with deterministic features.The simulation by this model can take an advantage in a parallel processor updating.Another property of generalization is an inclusion of the additional member in the effective Hamiltonian allowing to improve the stochastic hill-climbing search of the solution not dropping into local minimum of the energy function.Especially, we pay an attention to increasing of neural networks retrieval capability transforming the replica-symmetry model by including different nonlinear elements.The study of Sompolinsky in his analytical paper (Sompolinsky, 1986) estimates the different presumptions to an influence of nonlinear updating of synapses and a static noise.His affirmation, that a presence of synaptic nonlinearities positive influence to retrieval capabilities, requires an essential foundation based upon consideration of concrete nonlinear characteristics of main components of neural networks.We have got some modeling results confirming this thesis.In the research in progress, the stability problems and the massive-parallel modeling of stochastic averaging process with nonlinearities of neural network components and noises are planning for realization.
A. Garliauskas received his Habil.Dr. degree of technical sciences from the Computer Center, the Department of the USSR Academy of Sciences, Novosibirsk, USSR, in 1977.He is a head of the Laboratory of Neuroinformatics, Institute of Mathematics and Informatics and a professor in Vilnius Gediminas Technical University.His research interest includes neuroinformatics methodology, control problems and development of neural networks algorithms, chaos processes.

Fig. 1 .
Fig. 1.The current-voltage relation of the synaptic receptor of NMDA.

Fig. 2 .
Fig. 2. The current-voltage relation for sodium, potassium and net inward current.

Fig. 5 .
Fig. 5.The error versus α.The dotted line is estimates of Hopfield (1982), the solid line was calculated on the formula (18).a) is main and b) expanded scale displays.

Fig. 6 .
Fig.6.The nonlinear characteristics of the neural network units.In the a), the cross curve marks the synaptic relation, the circle curve -N -shaped relation (4), and the solid curve -the complex dependence of joint action of two nonlinearities.In the b), the surface of state potentials and temperature mature action without restriction is shown.

Fig. 7 .
Fig. 7.The error versus number of stored patterns.In the a), the upper curve marks the results of linear case and the lower -nonlinear case.In the b), the error curve in the case of strong nonlinearties is shown.