Neural Networks - A Systematic Introduction

a book by Raul Rojas

Foreword by Jerome Feldman

Springer-Verlag, Berlin, New-York, 1996 (502 p.,350 illustrations)

Forword, Preface, chapter 1, chapter 2, chapter 3, chapter 4, chapter 5, chapter 6, chapter 7, chapter 8, chapter 9, chapter10, chapter 11, chapter 12, chapter 13, chapter 14, chapter 15, chapter 16, chapter 17, chapter 18, References, Reported errata

Whole Book (PDF)

Forword (PDF)

Preface (PDF)

1. The biological paradigm (PDF)

  • 1.1 Neural computation
    • 1.1.1 Natural and artificial neural networks
    • 1.1.2 Models of computation
    • 1.1.3 Elements of a computing model
  • 1.2 Networks of neurons
    • 1.2.1 Structure of the neurons
    • 1.2.2 Transmission of information
    • 1.2.3 Information processing at the neurons and synapses
    • 1.2.4 Storage of information - Learning
    • 1.2.5 The neuron - a self-organizing system
  • 1.3 Artificial neural networks
    • 1.3.1 Networks of primitive functions
    • 1.3.2 Approximation of functions
    • 1.3.3 Caveat
  • 1.4 Historical and bibliographical remarks

2. Threshold logic (PDF)

  • 2.1 Networks of functions
    • 2.1.1 Feed-forward and recurrent networks
    • 2.1.2 The computing units
  • 2.2 Synthesis of Boolean functions
    • 2.2.1 Conjunction, disjunction, negation
    • 2.2.2 Geometric interpretation
    • 2.2.3 Constructive synthesis
  • 2.3 Equivalent networks
    • 2.3.1 Weighted and unweighted networks
    • 2.3.2 Absolute and relative inhibition
    • 2.3.3 Binary signals and pulse coding
  • 2.4 Recurrent networks
    • 2.4.1 Stored state networks
    • 2.4.2 Finite automata
    • 2.4.3 Finite automata and recurrent networks
    • 2.4.4 A first classification of neural networks
  • 2.5 Harmonic analysis of logical function
    • 2.5.1 General expression
    • 2.5.2 The Hadamard-Walsh transform
    • 2.5.3 Applications of threshold logic
  • 2.6 Historical and bibliographical remarks

3. Weighted Networks - The Perceptron (PDF)

  • 3.1 Perceptrons and parallel processing
    • 3.1.1 Perceptrons as weighted threshold elements
    • 3.1.2 Computational limits of the perceptron model
  • 3.2 Implementation of logical functions
    • 3.2.1 Geometric interpretation
    • 3.2.2 The XOR problem
  • 3.3 Linearly separable functions
    • 3.3.1 Linear separability
    • 3.3.2 Duality of input space and weight space
    • 3.3.3 The error function in weight space
    • 3.3.4 General decision curves
  • 3.4 Applications and biological analogy
    • 3.4.1 Edge detection with perceptrons
    • 3.4.2 The structure of the retina
    • 3.4.3 Pyramidal networks and the neocognitron
    • 3.4.4 The silicon retina
  • 3.5 Historical and bibliographical remarks

4. Perceptron learning (PDF)

  • 4.1 Learning algorithms for neural networks
    • 4.1.1 Classes of learning algorithms
    • 4.1.2 Vector notation
    • 4.1.3 Absolute linear separability
    • 4.1.4 The error surface and the search method
  • 4.2 Algorithmic learning
    • 4.2.1 Geometric visualization
    • 4.2.2 Convergence of the algorithm
    • 4.2.3 Accelerating convergence
    • 4.2.4 The pocket algorithm
    • 4.2.5 Complexity of perceptron learning
  • 4.3 Linear programming
    • 4.3.1 Inner points of polytopes
    • 4.3.2 Linear separability as linear optimization
    • 4.3.3 Karmarkarīs Algorithm
  • 4.4 Historical and bibliographical remarks

5. Unsupervised learning and clustering algorithms (PDF)

  • 5.1 Competitive learning
    • 5.1.1 Generalization of the perceptron problem
    • 5.1.2 Unsupervised learning through competition
  • 5.2 Convergence analysis
    • 5.2.1 The one-dimensional case - Energy function
    • 5.2.2 Multidimensional case - The classical methods
    • 5.2.3 Unsupervised learning as minimization problem
    • 5.2.4 Stability of the solutions
  • 5.3 Principal component analysis
    • 5.3.1 Unsupervised reinforcement learning
    • 5.3.2 Convergence of the learning algorithm
    • 5.3.3 Multiple principal components
  • 5.4 Examples
    • 5.4.1 Pattern recognition
    • 5.4.2 Image compression
  • 5.5 Historical and bibliographical remarks

6. One and two layered networks (PDF)

  • 6.1 Structure and geometric visualization
    • 6.1.1 Network architecture
    • 6.1.2 The XOR problem revisited
    • 6.1.3 Geometric visualization
  • 6.2 Counting regions in input and weight space
    • 6.2.1 Weight space regions for the XOR problem
    • 6.2.2 Bipolar vectors
    • 6.2.3 Projection of the solution regions
    • 6.2.4 Geometric interpretation
  • 6.3 Regions for two layered networks
    • 6.3.1 Regions in weight space for the XOR problem
    • 6.3.2 Number of regions in general
    • 6.3.3 Consequences
    • 6.3.4 The Vapnik-Chervonenkis dimension
    • 6.3.5 The problem of local minima
  • 6.4 Historical and bibliographical remarks

7. The backpropagation algorithm (PDF)

  • 7.1 Learning as gradient descent
    • 7.1.1 Differentiable activation functions
    • 7.1.2 Regions in input space
    • 7.1.3 Local minima of the error function
  • 7.2 General feed-forward networks
    • 7.2.1 The learning problem
    • 7.2.2 Derivatives of network functions
    • 7.2.3 Steps of the backpropagation algorithm
    • 7.2.4 Learning with Backpropagation
  • 7.3 The case of layered networks
    • 7.3.1 Extended network
    • 7.3.2 Steps of the algorithm
    • 7.3.3 Backpropagation in matrix form
    • 7.3.4 The locality of backpropagation
    • 7.3.5 An Example
  • 7.4 Recurrent networks
    • 7.4.1 Backpropagation through time
    • 7.4.2 Hidden Markov Models
    • 7.4.3 Variational problems
  • 7.5 Historical and bibliographical remarks

8. Fast learning algorithms (PDF)

  • 8.1 Introduction - Classical backpropagation
    • 8.1.1 Backpropagation with momentum
    • 8.1.2 The fractal geometry of backpropagation
  • 8.2 Some simple improvements to backpropagation
    • 8.2.1 Initial weight selection
    • 8.2.2 Clipped derivatives and offset term
    • 8.2.3 Reducing the number of floating-point operations
    • 8.2.4 Data decorrelation
  • 8.3 Adaptive step algorithms
    • 8.3.1 Silva and Almeidaīs algorithm
    • 8.3.2 Delta-bar-delta
    • 8.3.3 RPROP
    • 8.3.4 The Dynamic Adaption Algorithm
  • 8.4 Second-order algorithms
    • 8.4.1 Quickprop
    • 8.4.2 Second-order backpropagation
  • 8.5 Relaxation methods
    • 8.5.1 Weight and node perturbation
    • 8.5.2 Symmetric and asymmetric relaxation
    • 8.5.3 A final thought on taxonomy
  • 8.6 Historical and bibliographical remarks

9. Statistics and Neural Networks (PDF)

  • 9.1 Linear and nonlinear regression
    • 9.1.1 The problem of good generalization
    • 9.1.2 Linear regression
    • 9.1.3 Nonlinear units
    • 9.1.4 Computing the prediction error
    • 9.1.5 The jackknife and cross-validation
    • 9.1.6 Committees of networks
  • 9.2 Multiple regression
    • 9.2.1 Visualization of the solution regions
    • 9.2.2 Linear equations and the pseudoinverse
    • 9.2.3 The bidden layer
    • 9.2.4 Computation of the pseudoinverse
  • 9.3 Classification networks
    • 9.3.1 An application: NETtalk
    • 9.3.2 The Bayes property of classifier networks
    • 9.3.3 Connectionist speech recognition
    • 9.3.4 Autoregressive models for time series analysis
  • 9.4 Historical and bibliographical remarks

10. The complexity of learning (PDF)

  • 10.1 Network functions
    • 10.1.1 Learning algorithms for multilayer networks
    • 10.1.2 Hilbertīs problem and computability
    • 10.1.3 Kolmogorovīs theorem
  • 10.2 Function approximation
    • 10.2.1 The one-dimensional case
    • 10.2.2 The multidimensional case
  • 10.3 Complexity of learning problems
    • 10.3.1 Complexity classes
    • 10.3.2 NP-complete learning problems
    • 10.3.3 Complexity of learning with AND-OR networks
    • 10.3.4 Simplifications of the network architecture
    • 10.3.5 Learning with hints
  • 10.4 Historical and bibliographical remarks

11. Fuzzy Logic (PDF)

  • 11.1 Fuzzy sets and fuzzy logic
    • 11.1.1 Imprecise data and imprecise rules
    • 11.1.2 The fuzzy set concept
    • 11.1.3 Geometric representation of fuzzy sets
    • 11.1.4 Set theory, logic operators and geometry
    • 11.1.5 Families of fuzzy operators
  • 11.2 Fuzzy inferences
    • 11.2.1 Inferences from imprecise data
    • 11.2.2 Fuzzy numbers and inverse operation
  • 11.3 Control with fuzzy logic
    • 11.3.1 Fuzzy controllers
    • 11.3.2 Fuzzy networks
    • 11.3.3 Function approximation with fuzzy methods
    • 11.3.4 The eye as a fuzzy system - color vision
  • 11.4 Historical and bibliographical remarks

12. Associative Networks (PDF)

  • 12.1 Associative pattern recognition
    • 12.1.1 Recurrent networks and types of associative memories
    • 12.1.2 Structure of an associative memory
    • 12.1.3 The eigenvector automaton
  • 12.2 Associative learning
    • 12.2.1 Hebbian Learning - The correlation matrix
    • 12.2.2 Geometric interpretation of Hebbian learning
    • 12.2.3 Networks as dynamical systems - Some experiments
    • 12.2.4 Another visualization
  • 12.3 The capacity problem
  • 12.4 The pseudoinverse
    • 12.4.1 Definition and properties of the pseudoinverse
    • 12.4.2 Orthogonal projections
  • 12.4.3 Holographic memories
  • 12.4.4 Translation invariant pattern recognition
  • 12.5 Historical and bibliographical remarks

13. The Hopfield Model (PDF)

  • 13.1 Synchronous and asynchronous networks
    • 13.1.1 Recursive networks with stochastic dynamics
    • 13.1.2 The bidirectional associative memory
    • 13.1.3 The energy function
  • 13.2 Definition of Hopfield networks
    • 13.2.1 Asynchronous networks
    • 13.2.2 Examples of the model
    • 13.2.3 Isomorphism between the Hopfield and Ising models
  • 13.3 Converge to stable states
    • 13.3.1 Dynamics of Hopfield networks
    • 13.3.2 Convergence proof
    • 13.3.3 Hebbian learning
  • 13.4 Equivalence of Hopfield and perceptron learning
    • 13.4.1 Perceptron learning in Hopfield networks
    • 13.4.2 Complexity of learning in Hopfield models
  • 13.5 Parallel combinatorics
    • 13.5.1 NP-complete problems and massive parallelism
    • 13.5.2 The multiflop problem
    • 13.5.3 The eight rooks problem
    • 13.5.4 The eight queens problem
    • 13.5.5 The traveling salesman
    • 13.5.6 The limits of Hopfield networks
  • 13.6 Implementation of Hopfield networks
    • 13.6.1 Electrical implementation
    • 13.6.2 Optical implementation
  • 13.7 Historical and bibliographical remarks

14. Stochastic networks (PDF)

  • 14.1 Variations of the Hopfield model
    • 14.1.1 The continuous model
  • 14.2 Stochastic systems
    • 14.2.1 Simulated annealing
    • 14.2.2 Stochastic neural networks
    • 14.2.3 Markov chains
    • 14.2.4 The Boltzmann distribution
    • 14.2.5 Physical meaning of the Boltzmann distribution
  • 14.3 Learning algorithms and applications
    • 14.3.1 Boltzmann learning
    • 14.3.2 Combinatorial optimization
  • 14.4 Historical and bibliographical remarks

15. Kohonen networks (PDF)

  • 15.1 Self-organization
    • 15.1.1 Charting input space
    • 15.1.2 Topology preserving maps in the brain
  • 15.2 Kohonenīs model
    • 15.2.1 Learning algorithm
    • 15.2.2 Mapping low dimensional spaces with high-dimensional grids
  • 15.3 Analysis of convergence
    • 15.3.1 Potential function - the one-dimensional case
    • 15.3.2 The two-dimensional case
    • 15.3.3 Effect of a unitīs neighborhood
    • 15.3.4 Metastable states
    • 15.3.5 What dimension for Kohonen networksX
  • 15.4 Applications
    • 15.4.1 Approximation of functions
    • 15.4.2 Inverse kinematics
  • 15.5 Historical and bibliographical remarks

16. Modular Neural Network (PDF)

  • 16.1 Constructive algorithms for modular networks
    • 16.1.1 Cascade correlation
    • 16.1.2 Optimal modules and mixtures of experts
  • 16.2 Hybrid networks
    • 16.2.1 The ART architecures
    • 16.2.2 Maximum entropy
    • 16.2.3 Counterpropagation networks
    • 16.2.4 Spline networks
    • 16.2.5 Radial basis functions
  • 16.3 Historical and bibliographical remarks

17. Genetic Algorithms (PDF)

  • 17.1 Coding and operators
    • 17.1.1 Optimization problems
    • 17.1.2 Methods of stochastic optimization
    • 17.1.3 Genetic coding
    • 17.1.4 Information exchange with genetic operators
  • 17.2 Properties of genetic algorithms
    • 17.2.1 Convergence analysis
    • 17.2.2 Deceptive problems
    • 17.2.3 Genetic drift
    • 17.2.4 Gradient methods versus genetic algorithms
  • 17.3 Neural networks and genetic algorithms
    • 17.3.1 The problem of symmetries
    • 17.3.2 A numerical experiment
    • 17.3.3 Other applications of Gas
  • 17.4 Historical and bibliographical remarks

18. Hardware for neural networks (PDF)

  • 18.1 Taxonomy of neural hardware
    • 18.1.1 Performance requirements
    • 18.1.2 Types of neurocomputers
  • 18.2 Analog neural networks
    • 18.2.1 Coding
    • 18.2.2 VLSI transistor circuits
    • 18.2.3 Transistors with stored charge
    • 18.2.4 CCD components
  • 18.3 Digital networks
    • 18.3.1 Numerical representation of weights and signals
    • 18.3.2 Vector and signal processors
    • 18.3.3 Systolic arrays
    • 18.3.4 One-dimensional structures
  • 18.4 Innovative computer architectures
    • 18.4.1 VLSI microprocessors for neural networks
    • 18.4.2 Optical computers
    • 18.4.3 Pulse coded networks
  • 18.5 Historical and bibliographical remarks

References (PDF)

Reported errata (PDF)