Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch01
1st Reading
CHAPTER 1
Introduction
1.
Computational Ecology
Ecology is the scientific study of the relationship between organisms and
their environments. This concept was put forward by Haeckel as early as
1866. Through more than one hundred years’ development, ecology has
become a major branch of knowledge. This is especially so since the early
1990: ecology has evolved to be one of the centers of modern science.
There are many sub-disciplines of ecology. Depending on the organizational levels of organisms, ecology is divided into molecular ecology,
physiological ecology, population ecology, community ecology, ecosystems ecology, landscape ecology, etc.; according to the differences in taxa
categories of organisms, there are plant ecology, animal ecology, microbial
ecology, insect ecology, etc.; based on differences in landscape and habitat
categories, there are terrestrial ecology, marine ecology, wetland ecology,
or forest ecology, grassland ecology, etc.; if we focus on application categories, they are agro-ecology, urban ecology, pollution ecology, etc., and
if we categorize in terms of scientific disciplines, there are mathematical
ecology, environmental ecology, chemical ecology, physiological ecology,
economic ecology, behavioral ecology, etc.
Among the known ecological disciplines, only mathematical ecology is
a pure quantitative science. Mathematical ecology stresses the mathematical
analysis of ecological issues, mostly by developing analytical models and
equations.
Due to the complexity, nonlinearity and uncertainty of ecological problems, simple mathematical models or equations are far from enough to
address them. As the knowledge of ecology and computational science
1
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch01
1st Reading
2 Computational Ecology
advances, intensive computation is playing an increasingly important role in
ecological studies. Various theories and methods based on intensive computation, like artificial neural networks, agent-based modeling, systems simulation, numerical approximation, etc., are increasingly used in ecology. As a
result, an ecological discipline, computational ecology, is formally proposed
here to integrate and synthesize computation-intensive areas in ecology.
Research tasks in the discipline of computational ecology are described
below:
(1) Computational ecology is a science focusing mainly on ecological
researches, constructions and applications of theories and methods of
computational science including computational mathematics. Intensive
computation is one of the major features of computational ecology.
Most of the issues in computational ecology start from modeling, followed by intensive computation based on the model (iteration, training,
etc.). It aims at the simulation, approximation, prediction, recognition,
and classification of ecological issues. With computational ecology as
a unified platform, we may not only apply theories and methods of
computational science to ecology, but also construct new theories and
methods for computational science. It is an interface, membrane, or
gate between ecology and computational science.
(2) Ecology is the main body of computational ecology. Various sciences
are involved in computational ecology, including computational mathematics (such as numerical methods), artificial intelligence (artificial
neural networks, machine learning, etc.), computer science (algorithm
design, software development, etc.), probability theory, statistics, optimization theory, combinatorics, differential equations, functional analysis, algebraic topology, differential geometry, and others.
(3) The research areas of computational ecology involve (but not limited
to) the following aspects:
(a) Artificial neural networks, knowledge-based systems, machine
learning, data exploration, statistical computation (Bayesian computing, randomization, bootstrapping, Monte Carlo techniques,
stochastic process, etc.), computation-intensive inferential methods, heuristics, numerical and optimization methods, individualbased modeling and simulation (differential and difference
equation modeling and simulation, etc.), prediction, recognition,
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch01
1st Reading
Introduction
3
classification, agent-based modeling and simulation, network analysis and computation, databases, and other computation-intensive
theories and methods.
(b) The development, evaluation and validation of software and algorithms for computational ecology. The development and evaluation of apparatus, instruments and machines for ecological and
environmental analysis, investigation and monitoring based on the
software of computational ecology.
2. Artificial Neural Networks and Ecological
Applications
2.1. A brief history of artificial neural network development
An artificial neural network is a simulation system of human brain. It can
be implemented by both electric elements and computer software. It is a
parallel distributed processor with large numbers of connections. Artificial
neural networks can achieve knowledge by learning and possess the ability
of problem-solving, and the knowledge achieved is stored in connection
weights.
Researches on modern artificial neural networks began approximately
60 years ago. Development of artificial neural networks has undergone four
phases (Fecit, 2003).
(1) Birth phase. As early as 1943, McCulloch and Pitts described the neural network with mathematical tools and presented the mathematical
model of neurons, i.e., MP model. MP model was finally developed to
the theory of limited automata. Their works demonstrated that artificial neural networks can be used to compute any arithmetic and logic
functions. Their works were recognized as the origination of artificial
neural network researches.
In 1949 Hebb speculated that the conditioned response resulted
from the characteristics of single neuron. He thus presented a hypothesis on the learning law of neurons. The hypothesis was proved in the
following 30 years and widely known as the Hebb learning law.
The perceptron network developed by Rosenblatt (1958) was a
landmark event which initiated the engineering application of artificial
neural networks.
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch01
1st Reading
4 Computational Ecology
Adaptive linear element (Adaline), a variant of perceptron, was
subsequently proposed by Widrow and Hoff in 1962 and used in signaling analysis and radar antenna control. Widrow–Hoff learning law
is being used in various neural network models.
At the late period of this phase, the researches on neural networks entered a recession period because of the limitation of computing
ability.
(2) Transition phase. The key to promote the development of artificial
neural networks is to propose new models and new learning algorithms,
while mathematical principles of artificial neural networks are also
indispensable. During the 1970s various network models, theories and
learning algorithms were further proposed.
In this phase Grossberg combined psychology and brain science
to form a unified artificial neural theory.
Since 1971 the Japanese scientist Amari developed some theories
on dynamics and stability of artificial neural networks, in particular the
theories based on manifold and probability theory.
In 1970 and 1973 Fukushima proposed the theory of neural cognitive network based on his previous researches on artificial system
model of human brain. Fukushima’s models include artificial neural
cognition and the cognition with optional attention based on neural
cognizer.
Researches on association memory made a great achievement during this period. Various association memory models were developed by
Kohonen (1972), Anderson (1968, 1973, 1977), and other researchers.
(3) Peak phase. Since 1980, Feldman and Ballard began their neural
network researches and developed various neural network systems
and theories covering natural language, logistic reasoning, concept
representation, parallel distributed processing, etc.
In 1982 and 1984 Hopfield published two papers on a new model
and led the neural network researches to climax. Hopfield network is
an interconnected and nonlinear dynamic network.
Sejnowski started his neural network researches since 1976, and
proposed Boltzmann machine according to the methods and concepts
of statistical physics, together with Hinton and Ackley in 1984 and
1985.
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch01
1st Reading
Introduction
5
During this period a milestone algorithm for multilayer neural networks, backpropagation (BP) algorithm, was proposed by McClelland
and Rumelhart in 1986.
(4) Phase of rapid development. The establishment of International Neural Network Society in 1987 marked the beginning of a new era of
neural network researches and applications. Since then annual meetings or symposiums on neural networks have been convened around the
world. Neural networks have been used in various areas of our society.
2.2. Fundamentals of artificial neural networks
2.2.1. Biological neurons and mechanisms
A typical biological neuron is composed of four parts (Bian and Zhang,
2000; Fig. 1)
(1) Soma. It is the body of a neuron cell. There are nucleus and cytoplast
in the soma.
(2) Dendrite.A dendrite is typically less than 1 mm long. It receives signals
from other neurons. There are thousands of branched dendrites on the
soma.
(3) Neurite. It outputs signals to other neurons. Signals are transmitted in
the neurite with the rate of dozens of meters per second. A neurite may
have several branches connected to different neurons.
(4) Synapse. Synapse is a connection between two neurons. A synapse to
dendrite is always stimulant which stimulates the next neuron and the
synapse to soma is always inhibitive which inhibits the next neuron.
A neuron has two different states, i.e., stimulation and inhibition (Bian
and Zhang, 2000). A neuron in inhibitive state receives the stimulant signal
Figure 1. A biological neuron.
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch01
1st Reading
6 Computational Ecology
from other neurons. Several inputs are algebraically summed. If the sum
exceeds a threshold the neuron will be inspired. It will be in a stimulant state,
and delivers an output pulse to other neurons. There is a refractory period
for a neuron to be inspired. This neuron will not respond to any stimulation
from other neurons and the threshold will drop down gradually. Theoretically the biological neuron can only transmit Boolean signals. However,
a series of pulses from a neuron when it is inspired may be treated as a
frequency-modulated signal and the density of this signal may represent
some continuous signal.
2.2.2. Types and mechanisms of artificial neural networks
A neural network can be regarded as a digraph with nodes (input nodes
and neurons, or input nodes and computation nodes), synaptic connections, and functional connections. As far as connection types are concerned,
there are two types of neural networks, i.e., feedforward network and feedback network. Feedforward networks are functional mapping networks and
usually used for pattern recognition, function approximation and prediction
(Haykin, 1994; Yan and Zhang, 2000; Fecit, 2003). Feedback neural networks are used as association memorizers and optimization tools. In a feedforward network, every neuron receives the inputs from the last layer and
yields outputs for the next layer and there is not any feedback. A feedback
network can be redrawn as an undigraph in which each connection is bidirectional. In a feedback neural network all nodes are computation nodes, and
each node has (n−1) inputs and one output if the total number of nodes is n.
There are two phases in the workflow of a neural network:
(1) Learning phase. The states of all computation nodes are constant and
the connection weights can be adjusted through learning process.
(2) Working phase. Connection weights are constant during this phase
and the states of computation nodes change to achieve stable states.
2.2.3. Basic architecture of artificial neural networks
(1) One-input neuron
The architecture of a one-input neuron is indicated in Fig. 2. The mathematical expression of the one-input neuron is
y = f(wx + b)
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch01
1st Reading
Introduction
7
Figure 2. One-input neuron.
Figure 3. Multiple-input neuron. There are n inputs for the neuron.
where w = the weight of input x; b = bias; y = output; f = transfer
function. In this expression, the output of accumulator, p = wx + b, is also
called net input of transfer function f . Addition of a bias, b, can increase
the adaptability of neurons and neural networks.
(2) Multiple-input neuron
The architecture of a multiple-input neuron is indicated in Fig. 3. The mathematical expression of the multiple-input neuron is
y=f
w1i xi + b ,
where w1i = the connection weight of source neuron i to target neuron 1,
i = 1, 2, . . . , n; b = bias; y = output; f = transfer function.
The architecture of the multiple-input neuron (n inputs) can be briefly
represented by a simpler illustration, as indicated in Fig. 4.
(3) One-layer feedforward neural network
A neuron with multiple inputs is not enough to generate a neural network.
In a neural network there are generally several neurons operated in parallel
(Hagan et al., 1996). A set of neurons operated in parallel form a layer
(Fig. 5). The mathematical expression of one-layer feedforward neural networks with s neurons is
y = f(wx + b),
where x ∈ Rn , y ∈ Rs , b ∈ Rs , and w = (wij )s×n .
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch01
1st Reading
8 Computational Ecology
Figure 4. The simpler representation of a multiple-input neuron. In this representation x
is a n × 1 input vector; w is the 1 × n weight vector; b, p and y are scalar constant and
variables.
Figure 5. A one-layer feedforward network with s neurons. Each neuron has n inputs.
The architecture of one-layer feedforward networks with s neurons is
briefly represented by a simpler illustration, as indicated in Fig. 6.
The number of neurons in a one-layer feedforward neural network is
completely dependent on the number of network outputs.
(4) Multilayer feedforward neural network
In a multilayer feedforward neural network, each layer has its bias vector,
net input vector, weight vector, and output vector. The layer with its output
as the network output is output layer and the remaining layers are hidden
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch01
1st Reading
Introduction
9
Figure 6. The simpler representation of a one-layer feedforward network with s neurons.
In this representation x is n × 1 input vector; w is the s × n weight matrix; b, p and y are
s × 1 vectors.
Figure 7. A two-layer feedforward neural network with s neurons in each layer. Each
neuron of the first layer receives n inputs.
layers. A multilayer feedforward neural network, such as the network using
sigmoid transfer function in the first layer and linear transfer function in
the second layer, may arbitrarily approximate most functions (Hagan et al.,
1996).
A two-layer feedforward neural network, as indicated in Fig. 7, is
represented by the following equation:
y = g(w2 f(w1 x + b1 ) + b2 ),
where x ∈ Rn , y ∈ Rs , b1 ∈ Rs , b2 ∈ Rs , w1 = (wij1 )s×n , and w2 =
(wij2 )s×s .
Dec. 17, 2009
10
16:31
9in x 6in
B-922
b922-ch01
1st Reading
Computational Ecology
As for the number of layers, two or three layers are enough in most
cases.
There is not any reasonable algorithm or rule to determine the number
of hidden neurons in a multilayer feedforward neural network.
(5) Recursive neural network
A recursive network is a feedback neural network in which some of the
net outputs are redirected to inputs. Recursive networks are more powerful
than feedforward networks.
A recursive network contains one or more time-delay modules that
form the network feedback. The mathematical representation of a time
delay element (Fig. 8) is
y(t) = x(t − 1),
where y(0) is the initial condition.
Figure 9 illustrates the architecture of a recursive neural network.
Figure 8. Time delay element.
Figure 9. A recursive neural network.
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch01
1st Reading
Introduction
11
2.2.4. Learning methods of artificial neural networks
There are three ways of network learning:
(1) Supervised learning. There is a set of training samples and neural network adjusts its connection weights according to the difference between
given outputs and practical outputs;
(2) Unsupervised learning. Neural network adjusts connection weights
according to statistical information carried by environmental data (Yan
and Zhang, 2000), which is a self-organizing process;
(3) Reinforcement learning. In this learning system the external environment yields evaluation to network output and neural network adjusts
its connection weights through reinforcing those actions encouraged
by external environment. It is a learning method between supervised
and unsupervised learning.
Three kinds of learning algorithms are used in neural networks (Yang
and Zhang, 2000)
(1) Hebb learning law. The strength of connection between two neurons is
expected to increase if the activation of the two neurons is synchronous
and decrease if the activation is asynchronous, e.g., to minimize the
following weight change
wij (t) = ηxi (t)yj (t),
where xi (t) and yj (t) are states of two connected neurons at time t.
(2) Error correction learning law. The error is
ei (t) = yi (t) − zi (t),
where yi (t) = desired output of ith neuron at time t; zi (t) = practical
output of ith neuron at time t; ei (t) = output error of ith neuron at
time t. The objective is to minimize some function of ei (t). An example
is the delta rule:
wij (t) = ηei (t)xj (t),
where xi (t) is the ith input at time t, and wij (t) is the weight change.
Dec. 17, 2009
16:31
12
9in x 6in
B-922
b922-ch01
1st Reading
Computational Ecology
(3) Competitive learning law. All output nodes compete with each other
and finally only the strongest node is activated. Generally there are
prohibitive connections among output nodes. The learning law can be
represented by the following:
wij (t) = η(xj (t) − wji (t)), if node j wins the competition;
wij (t) = 0, if node j fails in the competition.
2.2.5. Adaptation of artificial neural networks
A neural network may ideally acquire knowledge by learning from the
steady environment. However, if the environment is unsteady (timechanging), the neural network must be provided with the adaptive ability in
order to follow the changing environment (Yan and Zhang, 2000). In this
case every different input will be treated as a new data set and the neural
network is thus viewed as a predictor:
x(t) = f(x(t − 1), w(t − 1)),
e(t) = z(t) − x(t),
where z(t) = observed output at time t; x(t) = predicted output at time t;
e(t) = output error at time t. The objective is to let e(t) = 0.
2.3. Applications of artificial neural networks
Neural networks are currently applied in many areas (Haykin, 1994;
Widrow, 1994; Hoffmann, 1998; Zhang, 2007). Some are listed bellow:
(1) Numerical computation. Function approximation, interpolation,
optimization, etc.
(2) Modeling. Chemical modeling, ecological modeling, dynamic
modeling of industrial processes, etc.
(3) Data mining and knowledge discovery. Between-variable relationship
discovery, classification, etc.
(4) Biological and medical applications. Gene discovery, protein prediction, biodiversity analysis, growth simulation, survival analysis,
community prediction, etc.
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch01
1st Reading
Introduction
13
(5) Environmental applications. Pollutant prediction, environmental
monitoring, habitat discrmination, etc.
(6) Visual and audio recognition and processing. Face and signature
recognition, radar and sonic image processing (image compression,
feature extraction, noise removal, etc.), robot visualization, target
identification, audio compression and recognition, recognition of
human tissues and cells, etc.
(7) Control systems. Robot control, orbit control, traffic dispatch
and control, production flow control, weapon manipulation, target
tracking, etc.
(8) Diagnostic systems. Disease diagnosis, vehicle diagnosis, machine
and flow diagnosis, cardiograph classification, etc.
(9) Industry and manufacturing. Quality monitoring and analysis, performance analysis, project bidding, product design and analysis, oil
exploration, etc.
(10) Communication systems. Route selection, aero-navigation, echo
canceling, etc.
(11) Economic and financial applications. Market analysis, advisory
system of stock exchange, real estate assessment, loan consultation, financial analysis, price prediction, cheque recognition and cash
detection, etc.
2.4. Ecological applications of artificial neural networks
Since the 1970s ecologists have been used to understand the ecosystems by
constructing mechanistic models. However, at the increase of complexity
of ecosystems studied, more and more black boxes emerged in the model
and model complexity increased rapidly. The effectiveness and validity of
mechanistic models declined at the increase of model complexity. Those
models finally became unsolvable, unstable, and unreliable. Due to the complexity of ecosystems, empirical models regained popularity in recent years
(Tan et al., 2006). On the other hand, ecological relationships are highly
nonlinear and thus could not be reasonably described by classical models
(Schultz and Wieland, 1997; Pastor-Barcenas et al., 2005), including both
mechanistic and empirical models.
Artificial neural networks have been recognized as the universal function approximators for complex and nonlinear ecological relationships
Dec. 17, 2009
14
16:31
9in x 6in
B-922
b922-ch01
1st Reading
Computational Ecology
(Acharya et al., 2006; Nour et al., 2006; Zhang and Barrion, 2006; Zhang,
2007; Zhang et al., 2008). They have the advantages of more automated
model synthesis and analytical input–output models (Tan et al., 2006).
A large number of studies on ecological applications of artificial neural
networks were conducted in the last ten years.
Concerning the dynamic modeling of ecological or environmental processes, artificial neural networks were used for modeling short and middle long-term concentration levels (Viotti et al., 2002), subsurface process
(Almasri and Kaluarachchi, 2005), sediment transfer (Abrahart and White,
2001), subsurface drain outflow and nitrate–nitrogen concentration in tile
effluent and surface ozone (Sharma et al., 2003; Pastor-Barcenas et al.,
2005), flow and phosphorus concentration (Nour et al., 2006), dioxide dispersion (Nagendra and Khare, 2006), the growth of Chinese cabbage (Zhang
et al., 2007), and food intake dynamics of a holometabolous insect (Zhang
et al., 2008).
Artificial neural networks are always used to make classification, recognition, and prediction of ecological issues. They were used to explain
the observed structure of functional feeding groups of aquatic macroinvertebrates (Jorgensen et al., 2002). Backpropagation (BP) and radial
basis function (RBF) neural networks were used to simulate and predict
species richness of rice arthropods (Zhang and Barrion, 2006). They were
used in the classification and discrimination of vegetation (Marchant and
Onyango, 2003; Filippi and Jensen, 2006), habitat zones and functional
groups of invertebrates (Zhang, 2007). In addition, artificial neural networks have been used to explain observed changes in species composition
and abundance (Jaarsma et al., 2007), to construct transfer functions that
implement organism–environment relationships for paleoecological uses
(Racca et al., 2007), to classify community assemblages (Zhang, 2007;
Tison et al., 2007), and to determine the risk of insect pest invasion (Watts
and Worner, 2009).
Spatial distribution patterns of invertebrates can be effectively
described by artificial neural networks (Cereghino et al., 2001; Zhang et al.,
2008). They performed better than partial differential equation and spline
function.
Artificial neural networks have been compared to various conventional
models in terms of modeling performance. They were proved to be more
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch01
1st Reading
Introduction
15
effective than differential equations (Zhang et al., 2007; Zhang and Wei,
2008). They were superior to linear models, generalized additive models, and regression trees (Moisen and Frescino, 2002). They outperformed
logistic regression, multiple discriminant model and multiple regression in
predicting community composition (Olden et al., 2006) and the number of
salmonids (McKenna, 2005). They can also provide a feasible alternative
to more classical spatial statistical techniques (Pearson et al., 2002).
The need for better techniques, tools and practices to analyze ecological
systems within an integrated framework has never been so great (Shanmuganathan et al., 2006). Approaches conditioned on data should thus be
preferred (Lukacs et al., 2007). Artificial neural networks are universal
and adaptive data-driven models. Wider ecological application of them is
expected in the future.
2.5. Important books and journals
There are a large number of books and journals on artificial neural networks.
Some books on theories and applications of artificial neural networks are
as follows:
(1) Anderson JA. An Introduction to Neural Networks. MIT press,
Cambridge, USA, 1995
(2) Smith SM. Neural Networks for Statistical Modeling. van Nostrand
Reinhold, New York, USA, 1993
(3) Chester M. Neural Networks: A Tutorial. Prentice-Hall, New Jersy,
USA, 1993
(4) Hassoun MH. Fundamentals of Artificial Neural Networks. MIT
Press, Cambridge, USA, 1995
(5) Kohonen T. Self-Organizing Maps. Springer-Verlag, Germany, 1995
(6) Haykin S. Neural Networks: A Comprehensive Foundation.
Macmillan, New York, USA, 1994
(7) Nigrin A. Neural Networks for Pattern Recognition. MIT Press,
Cambridge, USA, 1993
(8) Fecit. Analysis and Design of Neural Networks in MATLAB 6.5.
Electronics Industry Press, Beijing, China, 2003
(9) Hagan MT, Demuth HB, Beale MH. Neural Network Design. PWS
Publishing Company, Boston, USA, 1996
Dec. 17, 2009
16
16:31
9in x 6in
B-922
b922-ch01
1st Reading
Computational Ecology
(10) Yan PF, Zhang CS. Artificial neural networks and simulated evolution.
Tsinghua University Press, Beijing, China, 2000
(11) Zhang WJ. Methodology on Ecology Research. Sun Yat-Sen University Press, Guangzhou, China, 2007
Some journals on theories and applications of artificial neural networks
are listed below:
(1) Neural Networks
http://www.elsevier.com/locate/neunet
(2) Neural Computation
http://www.mitpressjournals.org/loi/neco
(3) Artificial Intelligence
http://www.elsevier.com/locate/artint
(4) IEEE Transactions on Neural Networks
(5) Journal of Artificial Neural Networks
(6) Machine Learning
http://www.springerlink.com/content/100309/
(7) Network: Computation in Neural Systems
http://www.informaworld.com/smpp/title∼db=all∼content=
t713663148
(8) International Journal of Neural Systems
http://www.worldscinet.com/ijns/ijns.shtml
(9) IEEE Transactions on Circuits and Systems
(10) Ecological Modeling
http://www.elsevier.com/locate/ecolmodel
(11) Ecological Complexity
http://www.elsevier.com/locate/ecocom
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch02
1st Reading
PART I
Artificial Neural Networks:
Principles, Theories
and Algorithms
1
Dec. 17, 2009
16:31
9in x 6in
B-922
2
b922-ch02
1st Reading
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch02
1st Reading
CHAPTER 2
Feedforward Neural Networks
Feedforward neural networks are one of the two most important types of
neural networks. In a feedforward neural network, each neuron gets inputs
from the last layer and produces outputs for the next layer (Anderson,
1972; Anderson and Rosenfeld, 1989; Haykin, 1994; Hagan et al., 1996;
Fecit, 2003; Marchant and Onyango, 2003). There is not any feedback in
a feedforward network and the latter may be illustrated as an unlooped
graph. In a sense, a feedforward neural network is a compound function
that is repeatedly compounded from nonlinear functions. The perceptron
networks, linear networks, BP networks, RBF networks, etc., are feedforward neural networks (Zhang, 2007).
The history of feedforward neural networks may be traced back to perceptron (Rosenblatt, 1958; Minsky and Papert, 1969). To learn perceptron
is instructive for understaning more complex feedforward neural networks.
Perceptron is the first layered neural network with learning feature. An original perceptron contains an input layer (sensory layer, S), an intermediate
layer (association layer, A) and an output layer (response layer, R) (Fig. 1).
However, the connection weights between the input layer and the intermediate layer are fixed and thus the intermediate layer cannot be regarded as
a hidden layer. The connection weights between the intermediate layer and
the output layer are adjustable by a learning procedure.
3
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch02
1st Reading
4 Computational Ecology
Figure 1. A three-layer perceptron.
1.
Linear Separability and Perceptron
1.1. Linear threshold unit and Boolean function
A linear threshold unit is composed of a neuron and a set of adjustable
weights, in which the activation function is the threshold function. Defining
the threshold θ = −b, where b is the bias, the mathematical expression of
the linear threshold unit is
y = sgn(wT x − θ),
where x = (x1 , x2 , . . . , xn )T , is a n-dimensional input, w =
(w1 , w2 , . . . , wn )T , is the weight vector. sgn(z) = 1, if z ≥ 0; sgn(z) = −1,
if z < 0.
The linear threshold unit can realize such logical functions as AND,
OR, NOT, in particular NAND. As a consequence we have the following
theorem:
Theorem 1. Any Boolean function may be realized by the feedforward
network that is composed of linear threshold units, and it can be realized by
a three-layer (including input layer) feedforward network (Yan and Zhang,
2000; Fecit, 2003).
1.2. Linear separable function
For a given function f , if there exist w and θ, such that
f(x) = sgn(wT x − θ),
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch02
1st Reading
Feedforward Neural Networks
5
the function f is called linear separable. It is obvious that a two-layer
network can only realize the linear separable function. AND, OR, NAND
are all separable.
As revealed above, a one-layer network, including the one-layer perceptron, could only act as a linear classifier. With the thereshold function as
activation function, a multi-layer network may realize Boolean functions.
If the activation functions of neurons are continuous (e.g., sigmoid function, sine function, etc.), the neural network will be able to approximate
any continuous function. It has been proved that a feedforward neural network with n inputs and m outputs can be treated as a nonlinear mapping
from n-dimensional Euclidean space to m-dimensional Euclidean space,
and this mapping is able to approximate any continuous function (with or
without countable discrete points) (Hornik et al., 1989). This conclusion
can be described by the following theorem:
Theorem 2 (Kolmogorov Theorem). Suppose that φ(·) is a bounded, nonconstant, and monotonically increasing continuous function, In is the ndimensional unit super cube [0, 1]n , and C(In ) is the set of continuous
functions defined on In , then given any function, f ∈ C(In ), and ε > 0,
there is an integer m and a group of real constants αi , θi and ωij , where
i = 1, 2, . . . , m, and j = 1, 2, . . . , n, such that the network output
F(x1 , x2 , . . . , xn ) =
m
i=1
n
ωij xj − θi
αi ϕ
j=1
will arbitrarily approximate f(·), i.e.,
|F(x1 , x2 , . . . , xn ) − f(x1 , x2 , . . . , xn )| < ε, (x1 , x2 , . . . , xn ) ∈ In .
The above theorem indicates that a feedforward network with even only
one hidden layer may be used as a general approximator. The theorem may
be also expanded to pattern classification mapping f :An → {1, 2, . . . , m},
j = 1, 2, . . . , m, where f(x) = j if and only if x ∈ Cj , and An is the
compact set in Rn , An = ∪Cj , Ci ∩ Cj is a null set if i = j.
Generally the more inflection points there are, the more hidden neurons
are needed.
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch02
1st Reading
6 Computational Ecology
1.3. Learning law of perceptron
The learning law of perceptron is a supervised learning law. Suppose x(k),
y(k), ŷ(k), and b(k) are network input, practical output, desired output, and
bias at k, respectively,
x(k) = (x1 (k), x2 (k), . . . , xn (k))T ,
y(k) = (y1 (k), y2 (k), . . . , ym (k))T ,
ŷ(k) = (ŷ1 (k), ŷ2 (k), . . . , ŷm (k))T ,
b(k) = (b1 (k), b2 (k), . . . , bm (k))T ,
w(k) = (wij (k)).
The learning law of a perceptron is a gradient descent method, expressed
as the following
wij (k + 1) = wij (k) + (ŷi (k) − yi (k))xj (k),
bi (k + 1) = bi (k) + ŷi (k) − yi (k).
Through the adjustment of connection weights and bias above, the network
output will approximate the desired output.
The Matlab functions for the learning law of perceptron are learnp
and learnpn (Mathworks, 2002).
1.4. Limitations of perceptron
Perceptron neural networks have some limitations:
(1) Perceptron can only be used for simple classification issues because its
activation function is the threshold function; it can only make classifications on linear separable issues.
(2) Perceptron is not able to solve XOR (exclusive OR) issue.
(3) Singular samples of inputs will lengthen the learning time.
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch02
1st Reading
Feedforward Neural Networks
2.
7
Some Analogies of Multilayer Feedforward
Networks
Multilayer linear neural networks can be related to the following areas
(Zhang and Fang, 1982; Chen et al., 1987; Yan and Zhang, 2000)
(1) Regression analysis. Regression analysis aims to determine x − y relationship according to observed samples (x, y). A multilayer feedforward network is designed to find a mapping: f :Rn → Rm , such that
min
y − f(x)dxdy.
The solution of the minimization issue is
f(x) = ypy|x (x, y)dy,
where py|x (x, y) is the conditional probability of y if x is given. f(x) is
just the regression of y against x. If f is a linear function (linear neural
network), we have: ŷ = Ax. The solution
A = Ryx R−1
xx
will minimize ŷ − Ax2 .
(2) Discriminant analysis. It is a supervised pattern classification based
on linear transformation. Different categories of data are separated as
distant as possible in the new coordinate system.
(3) Principal component analysis (PCA). PCA can be used in data reduction in order to identify a small number of factors that explain most of
the variance that is observed in a larger number of manifest variables
(SPSS, 2006).
3.
Functionability of Multilayer Feedforward
Networks
In the view of probability theory, if the true input–output relationship is
p(x, y), the probability density of output will be
p(y/x) = p(x, y)/p(x),
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch02
1st Reading
8 Computational Ecology
where the probability density of input is
p(x) = p(x, y)dy.
A neural network tries to obtain p(x, y) by learning from samples and thus
to approximate p(x, y) with ṗ(x, y, w).
Functionality of feedforward neural networks can be summarized as
the following (Yan and Zhang, 2000):
(1) In function approximation, the network output is the regressional function between y and x.
(2) In pattern classification, the network output is the posterior probability
of corresponding category.
(3) The correctness of network output is higer for the unknown samples
with larger occurrence probability p(x), and is lower for the unknown
samples with smaller occurrence probability.
(4) The lower the variance is, the higher the confidence level of network
output will be.
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch03
1st Reading
CHAPTER 3
Linear Neural Networks
Linear neural networks are the simplest networks which are composed
of several linear neurons. The prototype of linear neural networks was
primarily developed by Widrow and Hoff (1960), and was called Adaptive
Linear Element (ADALINE).
Different from perceptron, the transfer function of linear neural networks is a linear function. Similar to feedforward neural networks, linear
neural networks are used for function approximation, system modeling,
adaptive filtering, prediction, control, and pattern classification (Widrow
and Hoff, 1960; Widrow and Stearns, 1985; Widrow and Winter, 1988;
Haykin, 1994; Anderson and Rosenfeld, 1989; Zhang, 2007). However,
they can only be used to make classification on linear separable issues.
1.
Linear Neural Networks
Before introducing linear neural networks, a similar model, i.e., Generalized
Linear Model (GLM), is introduced here. Suppose xi = (x1i , x2i , . . . , xni )T ,
yi is a scalar variable, i = 1, 2, . . ., p, p is the number of samples, and
β = (β1 , β2 , . . . , βn )T is the parameter vector of linear model. A GLM
should meet the following rules:
(1) There is a strictly increasing and differential function g, such that
ηi = g(µi ) = (xi )T β,
where µi = E(yi ), g(·) is the link function.
1
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch03
1st Reading
2 Computational Ecology
(2) The probability distribution of yi is the exponential type (i.e., normal distribution, Poisson distribution, binomial distribution, etc.), as
described in the following
p(yi , η, ) = exp[(ηy − b(η))/ + c(y, )],
where η is the natural parameter, is the parameter for deviation, and
the parameters are relevant to the variance of yi .
1.1. Adaline
The mathematical expression of ADALINE (Fig. 1) is: y = wx + b, where
x = (x1 , x2 , . . . , xn )T , y = (y1 , y2 , . . . , ys )T , b = (b1 , b2 , . . . , bs )T , and
w = (wij )s×n .
ADALINE may learn from its environment by adjusting connection
weights and thresholds according to some learning law like Widrow–Hoff
learning law, i.e., LMS (Least Mean Square) rule (Fecit, 2003).
1.2. Multilayer linear neural network
Figure 2 shows a multilayer linear neural network. In this network, ẁn×m
and ẃs×m are between-layer weight matrices. The rank of overall weight
matrix, w = ẃẁ, is less or equal to m if all neurons are linear.
Figure 1. Adaline.
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch03
1st Reading
Linear Neural Networks
3
Figure 2. A multilayer linear neural network. There are m hidden neurons in the network
and m ≤ min(s, n).
2.
LMS Rule
The Least Mean Square (LMS) rule is only efficient to one-layer linear
network (Fecit, 2003). Suppose x(k), y(k), ŷ(k), and b(k) are network input,
practical output, desired output, and bias at k, respectively, and
x(k) = (x1 (k), x2 (k), . . . , xn (k))T ,
y(k) = (y1 (k), y2 (k), . . . , ys (k))T ,
ŷ(k) = (ŷ1 (k), ŷ2 (k), . . . , ŷs (k))T ,
b(k) = (b1 (k), b2 (k), . . . , bs (k))T ,
w(k) = (wij (k)).
LMS rule is approximately a gradient descent method, expressed as the
following
wij (k + 1) = wij (k) + η(yi (k) − ŷi (k))xj (k),
bi (k + 1) = bi (k) + η(yi (k) − ŷi (k)),
where η is the learning rate. The training procedure of a linear neural
network is as follows:
(1) Calculate network output, y = wx + b, and error, e = y − ŷ.
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch03
1st Reading
4 Computational Ecology
(2) Compare the mean square of output error with the desired. If the error
is less than the desired or the maximum epochs is reached, the training
procedure terminates; or else continue to train.
(3) Calculate weights and thresholds (without confusion, threshold and
bias are used as the same meaning in the following context) using
LMS rule and return to (1).
Linear neural network converges if input vectors are linearly independent
and η is approximately determined.
LMS rule has been widely used in echo cancellation system of longdistance telephone and other adaptive filter designs (Hagan et al., 1996).
LMS rule is also the fundamental of BP algorithm. The Matlab function for
LMS is learnwh (Mathworks, 2002).
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch04
1st Reading
CHAPTER 4
Radial Basis Function Neural
Networks
Radial basis function (RBF) neural networks are always used to conduct
functional interpolation (Albus, 1971; Broomhead and Lowe, 1988; Powel,
1990; Zhang and Barrion, 2006). RBF neural networks are multilayer feedforward networks. A RBF network is composed of three layers. The first
layer is the input layer. Neurons in the second layer do not have connection
weights linked to inputs. The outputs of neurons of the second layer are
determined by the distances between inputs and the centers of basis functions. The third layer is usually a linear layer which yields weighted sum of
outputs of the second layer (Haykin, 1994; Hagan et al., 1996). All weights
from the input layer to the hidden layer are one’s and the weights from
the hidden layer to output layer are adjustable (Bian and Zhang, 2000). In
a RBF network, neurons respond only to the inputs adjacent to centers of
basis functions, i.e., the basis function only responds to a local neighborhood of input space. RBF networks are therefore locally tuned and approximated. Learning is fast in RBF network because only a small number of
parameters must be adjusted when learning new data, but more neurons
are needed in the situation of high-dimensional input (Moody and Darken,
1989).
In a sense probabilistic neural networks, general regression neural networks, wavelet neural networks, and functional link neural networks, etc.,
are all RBF networks.
1
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch04
1st Reading
2 Computational Ecology
1. Theory of RBF Neural Network
1.1. Basic theory
An RBF neural network is an approximator of an unknown function f(x).
In general any function can be expressed as the weighted sum of a set of
basis functions (Bian and Zhang, 2000). In RBF network the function f(x)
is approximated by a set of basis functions composed of output functions
of hidden neurons. The output of basis function from input layer to hidden
layer is a nonlinear mapping and the network output is a linear mapping.
According to functional analysis theory, suppose H is a Hilbert space
with reconstruction kernel K(x, z), and {ϕi } is a orthonormal basis of H, if
there is a constant a, such that (Hurt, 1989; Yan and Zhang, 2000)
ϕ(x), ϕ(z) = aK(x, z),
where x = (x1 , x2 , . . . , xn )T , z = (z1 , z2 , . . . , zn )T , any function f ∈ H
can thus be approximated by the linear combination of a set of basis functions (Powel, 1990; Park et al., 1991)
f(x) =
N
wi ϕi (x, zi ),
i=1
where {ϕ(x, zi )|i = 1, 2, . . . , N} is a set of basis functions, and zi =
(zi1 , zi2 , . . . , zin )T .
Suppose there are N samples, (xi , ŷi ), i = 1, 2, . . . , N, where xi =
i
(x1 , x2i , . . . , xni )T . The theory above can be represented by the following equation: w = ŷ, where = (ϕij )N×N , ϕij = ϕ(xi − zj ),
ŷ = (ŷ1 , ŷ2 , . . . , ŷN )T , and w = (w1 , w2 , . . . , wN )T . The solution of the
above equation is w = −1 ŷ. The matrix is sometimes singular and the
regularization process is thus needed (Yan and Zhang, 2000).
Some RBFs needed in the above equation are as follows
ϕ(x) = exp(−(x − µi )2 /(2σi2 ))
ϕ(x) = (σi2 + x2 )βi .
(Gaussian kernel function),
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch04
1st Reading
Radial Basis Function Neural Networks
3
1.2. Choice of network architecture
In a RBF neural network the basis functions are always not orthonormal and
the hidden representation is thus redundant. The number of hidden neurons
and the parameters of basis functions should be empirically determined
(Yan and Zhang, 2000). Based on functional analysis theory, suppose {ϕi },
i = 1, 2, . . . , is a group of orthonormal functions (basis functions) that are
continuous on [0,1], a continuous function f(x) on [0,1], has the unique L2
approximation
f(c, x) =
n
ci ϕi (x),
i=1
where c = (c1 , c2 , . . . , cn
functions
)T ,
can be determined by the projection on basis
ci =
1
0
f(x)ϕi (x)dx.
The mean square error for the approximation with n basis functions will be
e2n
=
∞
ci2 .
i=n+1
The mean square error, e2n , declines as the number of basis functions
increases. The basis function ϕi (x) with the largest coefficient ci can be
chosen to minimize e2n (Scott and Mulgrew, 1997). By doing as above the
architecture of RBF network can be determined.
2.
Regularized RBF Neural Network
The regularization theory tries to find some function hidden in limited data.
The latter is a reverse issue and always ill-posed (Yan and Zhang, 2000). In
regularization method a constraint condition is set to guarantee the stability
of solution f(x), x = (x1 , x2 , . . . , xn )T . The regularization issue is to find
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch04
1st Reading
4 Computational Ecology
f(x) such that
N
(ŷi − fi (x))2 /2 + λDf 2 /2,
min E(f) =
i=1
where N is the numer of samples, ŷi is the desired output, and D is a linear
differential operator. The Euler equation of the above issue is (Poggio and
Girosi, 1990)
D∗ Df(x) =
N
(ŷi − fi (x))δ(x − µi )/λ,
i=1
where
is the adjoint operator of D, and µi = (µi1 , µi2 , . . . , µin )T . The
solution of the regularization issue is
D∗
f(x) =
N
wi G(x, µi ),
i=1
where G(x, µi ) is the Green function of operator D∗ D. Suppose that
G(x, µi ) = G(x − µi ), G is thus a RBF and the solution is
f(x) =
N
wi G(x − µi ).
i=1
The neural network for this equation is the regularized RBF network with
only one hidden layer (Yan and Zhang, 2000; Fig. 1).
Figure 1. A regularized RBF neural network.
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch04
1st Reading
Radial Basis Function Neural Networks
5
The RBF, G(x − µi ), can be further normalized as the function of
hidden neurons (Moody and Darken, 1988)
i
zj (x) = G(x − µ
/σj2 )/
m
G(x − µj /σi2 ),
i=1
where zj = 1, j = 1, 2, . . . , m.
The regularized RBF network is a universal approximator. It can
approximate any multivariable functions defined on the compact sets of
Rn . It is the best approximator. Moreover, the solution achieved by regularized network is the best, i.e., the approximation performance on sampling
points and non-sampling points will not be substantially different (Yan and
Zhang, 2000).
3.
RBF Neural Network Learning
The parameters in RBF neural network learning include µi (RBF center),
σi2 (RBF variance), and wi (connection weights), i = 1, 2, . . . , N. The
center and variance of each RBF can be subjectively determined. However,
the following learning law is usually used (Wettschereck et al., 1992; Yan
and Zhang, 2000).
Firstly, define objective function as
E=
N
e2i ,
i=1
ei = ŷi − fi (x) = ŷi −
m
wi G(xi − µj cj ),
j=1
where m is the number of hideden neurons, N is the number of samples,
and cj2 = 1/(2σj2 ). The learning law is thus
wi (k + 1) = wi (k) − η1 ∂E(k)/∂wi (k)
= µi (k) − η1
N
j=1
ej (k)G(xj − µi (k)ci ),
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch04
1st Reading
6 Computational Ecology
µi (k + 1) = µi (k) − η2 ∂E(k)/∂µi (k) = µi (k) − 2η2 wi (k)
×
N
ej (k)G′ (xj − µi (k)ci )
i
j=1
−1
i
(k + 1) =
−1
(k)(xj − µi (k)),
−1
−1
−1
(k) − η3 wi (k)
(k) =
(k) + η3 ∂E(k)/∂
i
i
×
N
i
ej (k)G′ (xj − µi (k)ci )(xj − µi (k))T ,
j=1
i = 1, 2, . . . , m,
where i is the covariance matrix, and G′ is the derivative function of G.
The Matlab functions for the radial basis function network are newrb
and newrbe (Mathworks, 2002).
4.
Probabilistic Neural Network
4.1. Architecture of probabilistic neural network
Probabilistic neural network (PNN) is a parallel realization of Bayesian
classifier. A PNN has two hidden layers. An exponential function is used to
replace sigmoid function in the RBF neural network (Sprecht, 1988, 1990).
In the probabilistic neural network the number of pattern neurons equals
to the number of trained samples and the number of summation neurons
equals to the number of categories (Yan and Zhang, 2000; Zhang, 2007;
Fig. 2).
The input of the pattern neuron in the probabilistic neural network is
g(yi ) = exp((yi − 1)/σ 2 ).
If σ = c, where c is a constant, the network will be a Bayes classifier; If
σ = ∞, it tends to be a linear classifier, and the network tends to be a
neighborhood classifier if σ = 0 (Yan and Zhang, 2000).
The Matlab function for the probabilistic neural network is newpnn
(Mathworks, 2002).
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch04
1st Reading
Radial Basis Function Neural Networks
7
Figure 2. A probabilistic neural network.
4.2. Learning of probabilistic neural network
The learning process of probabilistic neural network is:
(1) Calculate the point product of each pattern neuron: yi = xT wi , where
x is the input vector (a sample), wi is the weight vector, and achieve
the input of each pattern neuron, g(yi ) = exp((yi − 1)/σ 2 ).
(2) Choose adequate weights and activation functions of pattern neurons
such that fA (x) and fB (x) represent distribution density functions of
classes A and B, respectively.
(3) Calculate the weighted output.
5.
Generalized Regression Neural Network
Generalized regression neural network (GRNN) is a nonparametric regression model developed on the basis of probabilistic neural network (Sprecht,
1991). It is usually used for pattern classification.
Matlab function for generalized regression neural network is newgrnn
(Mathworks, 2002).
6.
Functional Link Neural Network
Functional link artificial neural network (FLANN) is developed by introducing the high-order terms of input variable into neural networks
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch04
1st Reading
8 Computational Ecology
Figure 3. Functional link artificial neural network.
(Pao et al., 1992; Yan and Zhang, 2000; Fig. 3). It has been successfully
used in ecological researches (Zhang, 2007; Zhang et al., 2008).
Suppose that ϕi (x), i = 1, 2, . . . , p, are a set of basis functions with
the following properties
(1) ϕi are linearly independent,
j
(2) supj ( i=1 ϕi 2 )1/2 < ∞.
The practical output of jth output node is thus
yj = ρ(sj (x)),
sj (x) =
p
ωji ϕi (x),
i=1
j = 1, 2, . . . , m,
where x = (x1 , x2 , . . . , xn )T is the input vector, y = (y1 , y2 , . . . , ym )T is
the practical output vector, ωj = (ωj1 , ωj2 , . . . , ωjp )T is the weight vector
for node j, j = 1, 2, . . . , m, ρ(·) is a nonlinear function, and p is the
number of basis functions.
Given K training samples {xk , yk }, k = 1, 2, . . . , K, where xk =
k )T , and if the kth sample is
(x1k , x2k , . . . , xnk )T , yk = (y1k , y2k , . . . , ym
added, its value for the inverse function s(·) of nonlinear function ρ(·)
should be computed. The network matrix equation is: W T = S, where
= (ϕ(x1 ), ϕ(x2 ), . . . , ϕ(xK ))T is a K ∗ p matrix, S = (s1 , s2 , . . . , sK )T
is a K ∗ m matrix, W = (ω1 , ω2 , . . . , ωm )T is a m ∗ p matrix,
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch04
1st Reading
Radial Basis Function Neural Networks
9
sk
=
(s1 (xk ), s2 (xk ), . . . , sm (xk )),
and
k
k
k
ϕ(xk ) = (ϕ1 (x ), ϕ2 (x ), . . . , ϕp (x )), k = 1, 2, . . . , K. The analytical
solution of the matrix equation above is the weight matrix (Zhang, 2007;
Zhang et al., 2008)
W = ((T )−1 T S)T .
RBF neural network and wavelet neural network are special cases of
FLANN.
The Matlab function for functional link artificial neural network is flann
(Zhang et al., 2008).
7. Wavelet Neural Network
7.1. Principles of wavelet neural network
Generally training samples are not evenly distributed in input space. To
fully mine data information the wavelet neural network (WNN) can be
adopted. Multiresolution learning, in which higher resolution is applied
for data-dense zone and lower resolution is applied for data-sparse zone
and various learning results are finally combined, is an efficient method
used in WNN (Liang and Page, 1997). A WNN is generated by using
orthonormal wavelets as the basis functions in the FLANN or RBF neural network. WNN has been successfully used in the prediction of time
series.
There are two types of units in the hidden layer of WNN (Liang and
Page, 1997; Yan and Zhang, 2000; Fig. 4):
(1) Units ϕLk (x) (k = 1, 2, . . . , nL ) of scaling function ϕ(x). Orthonormal
functions ϕLk (x), k = 1, 2, . . . , nL , will construct the approximate
to unknown function under different displacements when the coarsest
resolution is L, where
ϕmk (x) = (2−m )1/2 ϕ(2−m x − k)
and ϕ is an orthogonal function, nL is the number of data points under
the coarsest resolution, m = 1, 2, . . . , L; k = 1, 2, . . . , 2L−m nL .
Dec. 17, 2009
10
16:31
9in x 6in
B-922
b922-ch04
1st Reading
Computational Ecology
(a)
(b)
(c)
Figure 4. Wavelet network under different resolutions. (a) Units ϕLk (k = 1, 2, . . . , s) of
scaling function ϕ under resolution L. (b) Units Lk (k = 1, 2, . . . , s) of wavelet function
are added. (c) Units L−1k (k = 1, 2, . . . , T) of wavelet function are added.
(2) Units mk (x), m = 1, 2, . . . , L; k = 1, 2, . . . , nm , of wavelet function
(x) are orthonormal functions of the details of contiunous square
integrable function F(x) ∈ L2 (R), where
mk (x)
= (2−m )1/2 (2−m x − k)
and
is an orthogonal function, m = 1, 2, . . . , L; k = 1, 2, . . . ,
2L−m nL .
Orthogonal functions, ϕmk (x) and mk (x), are also mutually orthogonal. The scaling function ϕ(x) and wavelet function (x) may be of
the same type, and take one of the following functions: Legendre polynomial, Chebyshov polynomial, Laguerre polynomial, Hermite polynomial, and trigonometric functions (Li et al., 1996; Burden and Faires,
2001).
Suppose the function F(x) ∈ L2 (R) is unknown, and there is only a set
of discrete samples of F(x), a0k , k = 1, 2, . . . , n0 , which is the approximation to F(x) under the coarsest resolution from experiment. The following
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch04
1st Reading
Radial Basis Function Neural Networks
11
recursion formula can be used (Liang and Page, 1997):
FL (x) =
nL
aLk ϕLk (x),
k=1
Fm−1 (x) = Fm (x) +
2L−m
nL
dmk
mk (x),
k=1
m = L, L − 1, . . . , 1,
where
amk =
2L−m
nL
hk am−1k ,
k=1
2L−m nL
dmk =
gk am−1k ,
k=1
m = 1, 2, . . . , L,
and
hk =
gk =
+∞
ϕm−1k (x)ϕmk (x)dx,
−∞
+∞
ϕm−1k (x)
mk (x)dx.
−∞
Fm−1 (x) is the approximation to F(x) when resolution is m − 1. The error
of the above approximation is
L−m
L 2 nL
m=1
2
dmk
.
k=1
7.2. Wavelet neural network learning
Learning procedures of WNN are (Yan and Zhang, 2000):
(1) Construct a coefficient grid of multiresolution for every dimension of
input variable. The interval of grid is equal to the sampling interval or
Dec. 17, 2009
12
16:31
9in x 6in
B-922
b922-ch04
1st Reading
Computational Ecology
every dimension in the case of the highest resolution, i.e., m = 0, while
there are only two data points for the coarsest resolution (m = L).
(2) Train the units of scaling function ϕ.
(3) Add appropriate units of wavelet function if error criterion is not
met. Wavelet units are added if sample space is in the range of requiring
higher precision.
(4) Delete the wavelet units with small weights and test network using
new data.
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch05
1st Reading
CHAPTER 5
BP Neural Network
Feedforward networks are used to approximate nonlinear functions. However, their learning is difficult due to the existence of hidden layers. Backpropagation (BP) algorithm, a gradient descent algorithm and an extension of LMS algorithm, makes feedforward network learning easier BP
neural network is a one-way propagated multilayer feedforward network
(Rumelhart and McClelland, 1986; Fecit, 2003). BP algorithm was initially constructed by Werboss (1974), and later improved by Rumelhart
et al. (1986), Parker (1985), and Le Cun (1985). Currently BP is the most
widely used neural network (Zhang and Barrion, 2006; Zhang, 2007; Zhang
et al., 2008), with applications in pattern recognition (Zhang et al., 2008),
function approximation (Zhang and Barrion, 2006; Zhang, 2007), and data
compression, etc.
1.
BP Algorithm
BP neural network is composed of an input layer, an output layer and one
or more hidden layers. No between-neuron connections exist within the
same layer. The transfer functions for output layer are always linear and
(continuously differentiable) sigmoid transfer functions are used for both
the input layer and hidden layers (Haykin, 1994; Hagan et al., 1996; Fecit,
2003; Fig. 1).
There are two opposite signals between neurons or units in different
layers (Yan and Zhang, 2000; Fig. 1):
(1) Working signal. This signal is the information flow from input layer to
output layer. It is a function of input and weights.
1
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch05
1st Reading
2 Computational Ecology
Figure 1. Architecture of BP neural netwrok.
(2) Error signal. The error, i.e., the difference between practical and desired
outputs, is propagated layer by layer from output layer to previous
layers.
The learning procedure consists of two parts, forward propagation and
backward propagation (Fecit, 2003). In the forward propagation, every
layer’s neurons will only exert influence on the next layer’s neurons. If there
is an error between desired and practical outputs, the backward propagation starts to function. In the backward propagation, the error signal flows
back along original path and every layer’s weights are modified until the
input layer is reached. Forward and backward propagation are repeatedly
conducted to minimize error signal.
BP algorithm can be summarized as follows (Churing, 1995; Yan and
Zhang, 2000; Fig. 2):
(1) Perform initialization. Choose a suitable network architecture and set
all weights and thresholds to smaller uniformly distributed values.
(2) Conduct the following computation for every input sample:
(a) Feedforward computation. For the neuron j of layer l, we have
ulj (k)
=
p
i=0
wlji (k)yil−1 (k),
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch05
1st Reading
BP Neural Network
Figure 2. Flow diagram of BP algorithm.
3
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch05
1st Reading
4 Computational Ecology
where yil−1 (k) is the working signal transferred from the unit i in
layer l − 1, p is the number of connections to unit j. If the transfer
function of neuron j is a sigmoid function, then
yjl−1 (k) = 1/(1 + exp(−ul−1
j (k)),
φ′ (uj (k)) = ∂yil (k)/∂uj (k) = yil (k)(1 − yil (k)).
If neuron j is in layer 1, i.e., l = 1, set yj0 (k) = xj (k), and if
neuron j is in output layer s, i.e., l = s, set yjs (k) = Oj (k), and
ej (k) = xj (k) − Oj (k).
(b) Backforward computation. For hidden neurons and output units,
we have
l+1
δlj (k) = yjl (k)(1 − yjl (k))
δl+1
i (k)wij (k)
and
δlj (k) = esj (k)Oj (k)(1 − Oj (k)),
respectively.
(3) Modify weights
l
l
(k + 1) = wji
(k) + ηδlj (k)yil−1 (k).
wji
(4) Set k = k + 1, and input new sample, until the error
m
N
e2j (k)/(2N)
i=1 j=1
is lower than the desired value, where m is the number of units in the
output layer, N is the total number of samples, and ej (k) = ŷj (k) −
yj (k).
2.
BP Theorem
BP can be regarded as a nonlinear mapping from input space to output
space: F :Rn → Rm , f(X) = Y . Given the input and output sets xi ∈ Rn ,
i )T , there is
yi ∈ Rm , where xi = (x1i , x2i , . . . , xni )T , yi = (y1i , y2i , . . . , ym
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch05
1st Reading
BP Neural Network
5
a mapping g and g(xi ) = yi , i = 1, 2, . . . , N. It is desirable to obtain a
mapping f ∈ F = {f |f : Rn → Rm }, which is an optimal approximation
onto g (Fecit, 2003; Zhang and Barrion, 2006; Zhang et al., 2008).
The Kolmogorov theorem described in Chap. 2 did not provide a method
to construct a three-layer feedforward network that approximates any continuous function. The following BP theorem will solve the problem.
BP Theorem. Given any L2 function f : [0, 1]n → Rm , there is a
three-layer BP network, which can approximate f with arbitrary ε square
error.
Using three-layer BP networks will require a large number of hidden
neurons. Multilayer BP network is thus widely used. In the multilayer BP
network, the number of hidden neurons and layers are always determined
by empirical methods (Zhang and Barrion, 2006; Zhang, 2007; Zhang et al.,
2008).
The Matlab function for BP neural network is newff (Mathworks,
2002).
3.
BP Training
The following learning procedure is for two-layer BP networks, using
Matlab toolbox (Fecit, 2003):
(1) Initilize weights and thresholds (bias) of all layers with small uniformly
distributed random values; set desired error threshold ERTHR, maximum epochs, and learning rate LR.
(2) Compute every layer’s outputs o, y, and error E:
o = tansig(w1 x, b1 ),
y = purelin(w2 o, b2 ),
e = ŷ − y.
(3) Compute error changes δ2 and δ1 of every layer in backpropagation
and modified weights of every layer:
δ2 = deltalin(y, e),
δ1 = deltalin(o, δ2 , w2 ),
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch05
1st Reading
6 Computational Ecology
[dw1, db1] = learnbp(x, δ1 , LR),
[dw2, db2] = learnbp(o, δ2 , LR),
w1 = w1 + dw1,
w2 = w2 + dw2,
b1 = b1 + db1,
b2 = b2 + db2.
(4) Compute square sum of error
sse = sumsqr(ŷ − purelin(w2 tansig(w1 x, b1 ), b2 )).
If sse < ERTHR, or maximum epochs are reached, stop training; or
else return to (2).
The above training is represented by Matlab function trainbp.
4.
Limitations and Improvements of BP Algorithm
A major problem of the basic BP algorithm is the slow rate of convergence.
Using the basic BP algorithm will sometimes comsume several weeks of
computation time. Moreover, there are local minimum points in the goal
function of BP algorithm. A BP network of more than three layers has a
great occurrence possibility of local minimum issue.
The issue of local minimum points can be solved by generic algorithm (Van Rooij et al., 1996), global optimization (Shang and Wah, 1996),
homotopy method (Gao and Yang, 1996), etc.
The convergence rate and local minimum issue can also be improved
by taking the following measures:
(1) Add momentum term (Rumelhart et al., 1986; Vogl et al., 1988). In BP
algorithm, a larger step η will result in instability and a smaller η will
result in a slow convergence rate. As a consequence a momentum term,
α(0 < α < 1), can be added in the algorithm
wji (k) = αwji (k − 1) + ηδj (k)yi (k).
(2) Samples should be provided in a random way (Yan and Zhang, 2000).
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch05
1st Reading
BP Neural Network
7
(3) The number of dimension of input variables should be compressed
before use in BP training. For a limited number of samples, the compression of variable dimension is important in order to guarantee the
generalization performance of trained neural network.
(4) Replace gradient descent method with conjugate gradient algorithm
or Levenberg–Marquardt algorithm in BP algorithm (Shanno, 1990;
Charalambous, 1992; Hagan and Menhaj, 1994).
(5) Use small uniformly distributed random values as initial values of
weights and thresholds. If all weights are zero’s or of the same value,
there will be no differences between hidden neurons and computation
will not be able to start.
(6) Change learning rate during the training process (Jacobs, 1988; Tollenaere, 1990).
(7) Batch processing. Network parameters are modified just after the entire
training data set is delivered.
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch06
1st Reading
CHAPTER 6
Self-Organizing Neural Networks
Neural networks improve their functionality by supervised (training samples) or unsupervised learning from environments. Self-organizing networks are neural networks with unsupervised learning. They are used to
find hidden laws and relationships from redundant data and they adjust
themselves to adapt to the environments (Fecit, 2003).
There are different architectural designs for self-organizing neural networks (Kohonen, 1982, 1988, 1990, 1995). A self-organizing network may
be composed of two layers, i.e., input layer and output layer. There are
between-layer forward connections and lateral connections within the output layer (Zhang and Li, 2007; Fig. 1). A self-organizing network may be
a multilayer feedforward network in which the self-organizing process is
performed layer by layer (Yang and Zhang, 2000).
In a self-organizing network, the changes of connection weights only
relate to the states of neighborhood neurons (Yang and Zhang, 2000). This is
called local interaction. Stochastic local interactions may result in a global
order. Local interactions in a self-organizing network follow three principles
(Von der Malsburg, 1990): (1) Connection force tends to be strengthened by
itself (Hebb rule); (2) all neurons compete with each other and the strongest
neuron is activated and other neurons are inhibited (winner-take-all); (3)
there is coordination among various neurons because a single neuron cannot
function by itself.
There are several types of self-organizing neural networks, such as
self-organizing feature map network, self-organizing competitive learning
network, learning vector quantization (LVQ) network, Hamming network,
adaptive resonance theory network, etc.
1
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch06
1st Reading
2 Computational Ecology
Figure 1. Lateral connections of neurons in the self-organizing neural network.
1.
Self-Organizing Feature Map Neural Network
1.1. Principle and algorithm of self-organizing feature map
neural network
In a self-organizing feature map (SOM, or SOFM) neural network, the
neurons in a layer learn to represent different regions of a sample space and
the neighborhood neurons learn to respond to similar samples (Kohonen,
1982, 1995). This network can learn both the topology of sample space and
the distribution of samples (Song et al., 2007; Zhang, 2007; Zhang and Li,
2007; Fig. 2). A competitive network is used to classify these samples into
natural classes (Mathworks, 2002). SOM can be further used to recognize
additional samples.
SOM is a mapping from a higher-dimensional space to a 2-dimensional
(a 2-dimensional curvature in a n-dimensional space, i.e., principal curvature) or 1-dimensional space (a curve in a n-dimensional space, i.e., principal curve). In a sense SOM is the generalization of PCA and thus has the
functionality of feature extraction. Nevertheless, the mapping is not unique,
which is related to sample sequences and initial weights (Bian and Zhang,
2000).
The principle of SOM is described below (Kohonen, 1982, 1988, 1990,
1995). Suppose the input x = (x1 , x2 , . . . , xn )T , and the weight vector
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch06
1st Reading
Self-Organizing Neural Networks
3
Figure 2. Architecture of the self-organizing feature map network.
of neuron j is wj = (wj1 , xj2 , . . . , xjn )T , j = 1, 2, . . . , N. First, input a
sample x, and find the neuron i (winner neuron) that matches x optimally,
i.e., find the maximum wjT x. If all wj are normalized to fixed Euclidean
norms, find
i(x) = arg min x − wj ,
j
j = 1, 2, . . . , N.
Second, determine the neighborhood δi (k) of the winner neuron i, which
is changeable as the number of iterations increases. Finally, determine the
rule for changing connection weights. The algorithm of SOM can be summarized as follows (Yan and Zhang, 2000):
(1) Initialize weights wj (0), j = 1, 2, . . . , N, with small random values.
(2) Choose a sample x randomly from the sample set and use x as the input.
(3) For step k, find the neuron i that matches x optimally, i.e.,
i(x) = arg max wjT x,
j
j = 1, 2, . . . , N,
or
i(x) = arg min x − wj ,
j
j = 1, 2, . . . , N.
(4) Determine the neighborhood δi (k) of the winner neuron i.
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch06
1st Reading
4 Computational Ecology
(5) Modify weights as the following:
wj (k + 1) = wj (k) + η(k)[x(k) − wj (k)],
wj (k + 1) = wj (k),
j ∈ δi (k),
j∈
/ δi (k).
(6) Stop training if a desirable network is achieved, or else let k = k + 1,
then return to (2).
Choosing the appropriate neighborhood δi (k) and learning rate η(k) is
important for a good trained network (Kohonen, 1988, 1990). In general
η(k) should be around 1.0 in the first 1000 iterations and decrease (but not
less than 0.1) thereafter. For 2-dimension SOM the neighborhood δi (k) may
be taken as a square, or hexagon, etc. At the beginning δi (k) is large enough
even to include all neurons and then shrinks step by step, until it includes
1 or 2 adjacent neurons. During the convergence stage δi (k) includes the
nearest neuron or even itself only.
SOM algorithm is improved in different ways and yields various variants. Amari (1983) developed a theory of self-organizing neural fields and
approached the problem of continuous SOM. SOM can also be treated as a
nonparametric regression, or constrained topological mapping (Cherkassky
and Lari-Najafi, 1991). Data are adaptively divided into different regions
and simpler functions are used to conduct approximation in different
regions.
The Matlab function for SOM neural network is newsom (Mathworks,
2002).
1.2. Performance of self-organizing feature map
SOM is performed as follows: (1) Topology is reserved in the mapping.
The points with similar features in the input space are adjacent in the
mapped space. (2) The area with greater distribution density in the original space corresponds to a larger area in the mapped space. (3) Cluster
centers are used to represent the original input functions as a kind of data
compression.
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch06
1st Reading
Self-Organizing Neural Networks
2.
5
Self-Organizing Competitive Learning Neural
Network
In self-organizing competitive learning neural networks, neurons in a competitive layer learn to represent different regions in a sample space. This
network can thus learn the distribution of samples. A competitive network
may be used to classify these samples into natural classes (Mathworks,
2002). In this network, the competitive layer is viewed as a classifier and
each neuron in this layer corresponds to a different category. Additional
samples can be recognized using the trained network (Zhang, 2007; Watts
and Worner, 2009).
The Matlab function for self-organizing competitive learning neural
network is newc (Mathworks, 2002).
3.
Hamming Neural Network
The Hamming network is a simple competitive neural network that is used to
find the class that has the smallest Hamming distance with existing vectors.
It is composed of two layers, i.e., feedforward layer and recursive layer.
The feedforward layer links the input vector with the prototype vector, and
the recursive layer (output layer) uses a competitive algorithm, by lateral
inhibition among neurons in this layer, to find out which prototype vector
is the nearest to the input vector (Hagan et al., 1996; Fig. 3).
Figure 3. Hamming neural network.
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch06
1st Reading
6 Computational Ecology
The algorithm for Hamming network is described as the following (Yan
and Zhang, 2000). Suppose there are m vectors, sj , j = 1, 2, . . . , m, where
j
j j
1 , w1 , . . . , w1 )T , j = 1, 2, . . . , m.
sj = (s1 , s2 , . . . , sn )T , and wj1 = (wj1
jn
j2
The Hamming distance between input x, x = (x1 , x2 , . . . , xn )T , and sj is
dh (x, sj ) = (n − xT sj )/2.
Let w1 = (s1 , s2 , . . . , sm )T /2, the threshold is n/2. The activation function
of neurons is
f(uj ) = uj /n,
j = 1, 2, . . . , m,
where uj = n−dh (x, sj ). For weight matrix w2 = (wij2 ), if i = j, wij2 = −ε;
if i = j, wij2 = 1. The output is
y(k + 1) = (w2 y(k)),
where is a diagonal operator and its element is
f(u) = u,
if u ≥ 0;
f(u) = 0,
if u < 0.
4. WTA Neural Network
Winner-take-all (WTA) network is a simple competitive neural network
(Fig. 4).
Figure 4. Architecture of the WTA neural network.
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch06
1st Reading
Self-Organizing Neural Networks
7
Suppose weight vectors wj = (wj1 , wj2 , . . . , wjn )T , j = 1, 2, . . . , m.
The algorithm is:
(1) Normalize weights
ŵj = wj /wj ,
j = 1, 2, . . . , m.
(2) Choose a wl such that
max ŵjT x.
j
(3) Modify weights
ŵl (k + 1) = ŵl (k) + η(k)[x − ŵl (k)],
ŵj (k + 1) = ŵj (k),
j = l.
(4) Stop training if desirable performance is reached, or else repeat the
training process.
5.
LVQ Neural Network
Learning vector quantization (LVQ) network is a competitive neural network (Kohonen, 1990). It is a mixed network that generates classification
through both supervised and unsupervised learning.
LVQ network is composed of two layers: competitive layer and linear
layer. In the competitive layer several neurons are assigned to the same
class and each class is assigned to a neuron in the linear layer. The competitive layer learns classification (subclass) to input vectors and the linear
layer transforms the classification generated in the competitive layer to the
categories (targeted class) defined by user. The number of neurons in the
competitive layer is usually larger than that in the linear layer.
The Algorithm of LVQ network is (Yan and Zhang, 2000):
(1) Find a neuron i with the largest output in the output set after a vector x
is delivered to the network.
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch06
1st Reading
8 Computational Ecology
(2) Given that x belongs to class p, and i belongs to q in the last learning,
wi (k + 1) = wi (k) + η(k)[x(k) − wi (k)],
p = q,
wi (k + 1) = wi (k) − η(k)[x(k) − wi (k)],
p = q,
wj (k + 1) = wj (k),
j = i.
Both LVQ and SOM are able to conduct clustering, but their performances are largely dependent on initial values. Some new algorithms for
LVQ have been developed, e.g., the GLVQ network (Pal et al., 1993).
The Matlab function for LVQ neural network is newlvq (Mathworks,
2002).
6. Adaptive Resonance Theory
A key issue for competitive networks is how to guarantee the stability
of learning process and at the same time demonstrate a higher adaptability. Weight matrix may not converge in many cases. Adaptive resonance
theory (ART) is an improvement and extension of competitive learning,
which is developed to solve stability/adaptability dilemma (Carpenter and
Grossberg, 1987, 1990; Carpenter et al., 1991). ART will realize selfstability and self-organizing recognition to complex environmental patterns. There are three types of ART, i.e., ART1, ART2, and ART3. They are
used for the situations of Boolean inputs, continuous inputs, and hierarchical search, respectively. ART may adapt to nonstationary environments and
can learn in real time. It has a stable recognition performance on learned
objects and will quickly adapt to new objects that have never been learned.
There are two layers — input layer and output layer — in ART networks
(Hagan et al., 1996; Fig. 5). The input layer compares the input pattern to
the desired pattern returned from the output layer. If these patterns cannot
match each other, then the output layer will need to be reset. The winner
neuron and desired pattern will be cancelled and a new round of competition
will be performed in the output layer. The new winner neuron of output layer
will return input layer a desired pattern. This process will continue until
the input and desired patterns match each other. If they can match then the
input layer will generate a new prototype pattern by combining the desired
and input patterns.
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch06
1st Reading
Self-Organizing Neural Networks
9
Figure 5. General architecture of ART neural networks (Hagan et al., 1996).
Suppose there are n neurons in the input (Boolean input) layer and m
neurons in the output layer, ẇij are the connection weights from the input
layer to the output layer and ẅji are the connection weights from the output
layer to the input layer, i = 1, 2, . . . , n; j = 1, 2, . . . , m. The algorithm of
ART1 is (Carpenter and Grossberg, 1987; Yan and Zhang, 2000):
(1) Initialize connection weights. Assign values ẇij = aj , i = 1, 2, . . . , n;
j = 1, 2, . . . , m, where am < · · · < a1 < 1/(α + n), 0 < α ≪ 1.
ẇji = 1, j = 1, 2, . . . , m; i = 1, 2, . . . , n.
(2) Input a Boolean sample x, x = (x1 , x2 , . . . , xn )T .
(3) Compute the weighted value for every output neuron
yj =
n
ẇij xi ,
j = 1, 2, . . . , m.
i=1
(4) Choose the winner neuron K with WTA network
K = arg max yj .
j
Dec. 17, 2009
10
16:31
9in x 6in
B-922
b922-ch06
1st Reading
Computational Ecology
(5) If
n
ẅji xi /
n
xi < β,
i=1
j=1
where β is the warning threshold, then input and the desired patterns
do not match each other; otherwise continue the following steps.
(6) For winner neuron modify the weights
ẇij = ẇij ,
j = K
ẇij = ẅji / α +
ẅji = ẅji ,
ẅji = ẅji xi ,
n
ẅji xi ,
i=1
j = K
j = K.
(7) Return to step (2) to input a new sample.
j=K
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch07
1st Reading
CHAPTER 7
Feedback Neural Networks
Most of feedforward neural networks are learning-type networks and do not
possess dynamic behaviors. Different from feedforward networks, a feedback neural network will converge to a stable state and associative memory can thus be achieved through transition of states of neurons (Fig. 1).
Because of the feedback mechanism, feedback neural networks are nonlinear dynamic systems (Fecit, 2003). Their nonlinear behaviors are much
diverse and complex. Nonlinear properties like stability problem, attractors, chaos, unpredictability, randomness, etc., are attractive topics in the
researches of feedback networks.
In the feedback neural network, all neurons or units are the same. They
can connect to one another (Yan and Zhang, 2000; Fig. 1). A feedback
neural network is generated in a two-layer feedforward network when the
numbers of neurons in the input layer and the output layer are equal, if each
output is directly connected to the corresponding input (Bian and Zhang,
2000).
The most used feedback neural networks include Elman network and
Hopfield networks.
1.
Elman Neural Network
Elman neural network consists of several layers. The first layer has weights
coming from the input and each subsequent layer has a weight coming from
the previous layer. All layers except the last one have a recursive weight.
The last layer is the network output (Mathworks, 2002; Fig. 2).
1
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch07
1st Reading
2 Computational Ecology
Figure 1. Architecture of feedback neural networks.
Figure 2. Architecture of Elman neural network.
Elman network is a nonlinear system. It can be used to approximate
any function with desired accuracy in a definite time (Zhang, 2007). It may
store the information for future use. This model is able to learn not only
spatial patterns but also temporal patterns (Hagan et al., 1996; Fecit, 2003).
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch07
1st Reading
Feedback Neural Networks
3
The Matlab function for Elman neural network is newelm (Mathworks, 2002).
2.
Hopfield Neural Networks
Hopfield neural networks possess the general properties of feedback or
recursive networks (Hopfield, 1982, 1984; Fecit, 2003). Additionally they
meet the following conditions (Bian and Zhang, 2000): (1) Connection
weights are symmetrical (wji = wji ) and the weight matrix is thus a symmetrical matrix; (2) no self-feedback of neuron (wii = 0). Because of the
symmetrical property, Hopfield neural networks are stable and there are
only isolated attractors.
Hopfield neural networks are used to store one or more stable target vectors. These vectors may be evoked at a certain time. They are also used for
optimization, linear programming, A/D transformation, and pattern recognition (Hopfield and Tank, 1985; Hagan et al., 1996).
Hopfield neural networks are designed to store some equilibrium points.
Given an initial condition, the network will converge to these points.
2.1. Discrete Hopfield neural network
Suppose that a discrete Hopfield neural network (DHNN) is a n-order network (Fig. 3), in which Wn×n is the (symmetrical) weight matrix, T is the
n-dimension vector, and Ti is the threshold of neuron i. Neurons may only
take the states of 1 or −1. The state equation of discrete Hopfield network
is a group of nonlinear difference equations:
n
xi (t + 1) = sgn
wij xj (t) − Ti , i = 1, 2, . . . , n,
j=1
where xj (t) is the state of neuron j at time t, and t is a positive integer. The
network’s state is x(t) = (x1 (t), x2 (t), . . . , xn (t))T , xi (t) ∈ {1, −1].
Discrete Hopfield networks work in two ways (Bian and Zhang, 2000):
(1) Asynchronous way. Only one neuron changes its state at a time. Other
neurons do not change their states.
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch07
1st Reading
4 Computational Ecology
Figure 3. Architecture of discrete Hopfield neural network.
(2) Synchronous way. Some neurons change their states simultaneously at
a time.
If there is a time t, such that: x(t + t) = x(t). The state equation is
stable. A Lyapunov function, i.e., energy function, was defined to analyze
the stability of the network:
E(t) =
n
i=1
Ti xi (t) −
n
n
wij xi (t)xj (t)/2.
i=1 j=1
For asynchronous operation, the change of energy function when xi changes
its state is
E(t) = −[xi (t + 1) − xi (t)]
j=i
wij xj (t) − Ti .
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch07
1st Reading
Feedback Neural Networks
5
It is obvious that the energy function will certainly decline when the state of
network changes. The energy function will reach a minimum, i.e, the equilibrium state, because the network has only 2n states. Due to the nonlinearity
the system has multiple isolated equilibrium states (isolated attractors). For
synchronous operation, the attractors are isolated attractors or oscillation
cycles with a length of 2.
2.2. Continuous Hopfield neural network
The architecture of a continuous Hopfield neural network (CHNN) is analogous to that of a discrete network (Hopfield, 1984). The changes of states of
neurons can be represented by a group of nonlinear differential equations:
Ci dxi /dt =
n
wij yj − xi /Ri + Ii ,
j=1
yi = g(xi ),
i = 1, 2, . . . , n.
The energy function is
E(t) = −
n
i=1
Ii yi (t) −
n
n
wij yi (t)yj (t)/2 +
i=1 j=1
n
i=1
yi
0
g−1 (y)dy/Ri .
If is g−1 (·) is a monotonous and continuous function, and Ci > 0, wji = wji ,
then dE/dt 0.
3.
Simulated Annealing
The neural networks discussed so far assumed that signals transmit deterministically between neurons. In real systems, however, signal transmission
is inpterrupted by chaos and the states of neuron change randomly (Yan and
Zhang, 2000), i.e., s = 1 at the probability of P(v), and s = −1 at the probability of 1 − P(v). The probability P(v) is a sigmoid function:
P(v) = 1/[1 + exp(−2v/T)],
where T is analogous to temperature and there is not any chaos if T = 0.
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch07
1st Reading
6 Computational Ecology
If the temperature of a heat object drops slowly, the inner part of object
will keep its balance and the inner energy of the object will reach a minimum
when temperature declines to a given point. This is called annealing (Yan
and Zhang, 2000). The optimization algorithm to simulate annealing is the
simulated annealing (SA) algorithm, which is summarized as follows:
(1) Initialize various parameters randomly, including initial temperature
T0 , and determine the annealing rule.
(2) Compute
x′ = x + x,
E = E(x′ ) − E(x),
where x = (x1 , x2 , . . . , xn )T , x is an uniformly distributed random
perturbation, and E is the energy change of the system. If E < 0,
x′ is accepted as the new state; or else x′ is accepted at the probability
of P = exp(−E/(kT )), where k is the Boltzmann constant. Changes
of temperature T usually follow the rule:
T(t) = αT(t − 1),
0.85 ≤ α ≤ 0.98.
Repeat this step until equilibrium state is realized.
(3) Perform annealing based on rule given in (1), and repeat (2), until T = 0
or the given temperature is reached.
SA was improved by researchers to raise the convergence rate (Ingber,
1993; Ingber and Rosen, 1992).
4.
Boltzmann Machine
Boltzmann machine is an all-connected feedback network (Hinton et al.,
1984; Fecit, 2003; Fig. 4). It is analogous to Hopfield network. There are
only two states for neurons in a Boltzmann machine, i.e., 1 and −1. Only one
neuron changes its state at a time. Hidden neurons are permissible and all
neurons are stochastic neurons. They have no self-feedback connections and
all of the between-neuron connections are bidirectional and symmetrical.
Boltzmann machine can be trained in the supervised way (Yan and Zhang,
2000).
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch07
1st Reading
Feedback Neural Networks
7
Figure 4. Architecture of Boltzmann machine.
In Fig. 1, visible neurons interact with environment. Visible neurons
are fixed at some states and hidden neurons change their states according
to inputs received. The energy function is defined as
E=−
wji sj si .
i
j=i
The probability of state change of neuron j from sj to −sj is
P(sj → −sj ) = 1/[1 + exp(−Ej /T)],
where the energy change Ej resulted from state change is
wji si .
Ej = −2sj vj = −2sj
i
The probability of state change of neuron j from −sj to sj is
1 − P(sj → −sj ). The objective of Boltzmann machine learning is to minimize E.
Supposing that visible neurons are divided into input neurons and output neurons, the algorithm of Boltzmann machine is:
(1) Initialize connection weights wji with uniformly distributed random
values in [−a, a], where a can be assigned a value, e.g., 1.
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch07
1st Reading
8 Computational Ecology
(2) Fix the states of visible neurons according to desired values of data
pattern. Relax the network using SA, i.e., change the states of neurons
following the rule
sj = 1, at the probability of P(vj ),
sj = −1at the probability of 1 − P(vj ),
where
vj =
wji si ,
i =j
P(vj ) = 1/[1 + exp(−2vj /T)].
Perform specified number of iterations for every temperature T , until
a given lower temperature is reached. Finally compute
+
p+
=
s
s
=
Q+
j
i
xy sj|xy si|xy ,
ji
x
y
i, j = 1, 2, . . . , n;
i = j,
where x and y are states of the visible neuron and hidden neuron,
respectively, sj|xy is the state of neuron j when the state of visible
neuron is x and the state of hidden neuron is y, n is the total number
−
+
−
of neurons, and Q+
xy = Qy|x Qx . Qx is the probability of the visible
neuron at state x when network’s states of visible neurons are fixed.
Q−
x is the probability of the visible neuron at state x when network’s
states of visible neurons are free. State x may be assigned values 1 to
2c , and state y takes values 1 to 2d ; here c and d are the number of
visible and hidden neurons, respectively.
(3) Fix the states of input neurons only, and repeat (2) to compute
−
p−
Q−
xy sj|xy si|xy ,
ji = sj si =
x
y
i, j = 1, 2, . . . , n;
i = j.
(4) Modify connection weights by the rule
−
wji = wji + η(p+
ji − pji ),
i, j = 1, 2, . . . , n;
i = j.
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch07
1st Reading
Feedback Neural Networks
9
(5) Repeat steps (2) to (4) until wji will not change, i.e., the network
converges to its stable state.
Boltzmann machine will surely converge to the global minimum of
energy function, and the network distribution will match the probability
distribution of the environment. However it is a time-cost network. A mean
field approximation method has been developed to improve its performance
(Perterson andAnderson, 1987; Haykin, 1994). In this method the stochastic
neuron is replaced with a deterministic neuron. The mean field theory may
improve the training efficiency of Boltzmann machine by one to two orders
of magnitude.
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch08
1st Reading
CHAPTER 8
Design and Customization
of Artificial Neural Networks
Classical neural networks may approximate any nonlinear function and
are widely used. Nevertheless, they do not naturally ensure a satisfactory
exploitation of the intrinsic mechanisms of some systems. Special problems
usually demand special architectures of neural networks (Zhang and Wei,
2008). Moreover, sometimes a complex system needs to be divided into
simpler subsystems and the composite archirecture (modular network) of
several neural networks is thus required. For example, a complex input space
may be divided into several subspaces and in each space a separate neural
network is used. As a result, designing and customizing neural networks
based on specific systems or problems are popularly implemented.
1.
Mixture of Experts
Mixture of experts (ME) is a modular neural network (MNN). MNN uses
high-order computational units to perform multiple tasks by dividing a
problem into simpler subtasks. The division is conducted by partitioning the
input–output response patterns into regions of identical features. A complex
problem can be more precisely understood by using MNN than using conventional neural networks (Haykin, 1994;Almasri and Kaluarachchi, 2005).
ME is composed of k modules (expert networks) and a control module
(gating network) (Jacobs et al., 1991; Fig. 1). The control module assigns
different features of the input space to the different modules (Neural Ware,
2000). Each expert network yields an output corresponding to the input
1
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch08
1st Reading
2 Computational Ecology
Figure 1. Architecture of ME (mixture of experts).
vector and the network output is the weighted sum of these outputs with
the weights equal to the output of the gating network. The output of gating
network is considered as the prior probability that the expert network connected to the output of gating network will be used for a given input vector
(Almasri and Kaluarachchi, 2005).
Suppose that network input is x = (x1 , x2 , . . . , xn )T , network output
i )T ,
is y, desired output is ŷ, the output of module i is yi = (y1i , y2i , . . . , ym
and g is the output of output neuron i of gating network (0 ≤ gi ≤ 1, and
i
gi = 1, i = 1, 2, . . . , k), then the network output will be
y=
k
gi yi .
i=1
Assume that the outputs of every expert network have the same covariance,
the desired output of expert network (module) i is a variable of normal
distribution.
f(ŷi /x, i) = exp(−ŷi − yi 2 /2)/(2π)1/2 ,
i = 1, 2, . . . , k.
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch08
1st Reading
Design and Customization of Artificial Neural Networks
3
The network output is thus represented by the sum of k distributions:
i
f(ŷ /x) =
k
gi f(ŷi /x, i).
i=1
The objective of network learning is to maximize the logarithmic likelihood
function of f(ŷi /x), i.e., ln f(ŷi /x) (Jacobs et al., 1991).
2.
Hierarchical Mixture of Experts
Hierarchical mixture of experts (HME) is an expansion of ME (Jordan and
Jacobs, 1992; Fig. 2). In HME the input space is divided into some regions
and data fitting is separately conducted for each region. Some data points
may simultaneously belong to different regions. The boundaries of regions
will be automatically adjusted in the learning process (Yan and Zhang,
2000).
Suppose the input x ∈ Rn , the output y ∈ Rm . The output of expert
network (module) (i, j) is a continuous generalized linear function
µij = f(wij , x),
Figure 2. Architecture of a two-layer hierarchical mixture of experts.
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch08
1st Reading
4 Computational Ecology
where wij is the connection weight. The function f is the logit function
(f(x) = x/(1 − x)) if HME is used for Boolean classification. The output
i of the first layer of gating network is
gi = exp(ξi )/
exp(ξj ).
j
where ξi = vTi x, vi is the weight vector and gi is the first partition on input
space. The output j of gating network i in the second layer is
gj|i = exp(ξij )/
exp(ξik ),
k
where ξij = vTji x. gj|i is a partition on the region generated from the first
partition. The output of gating network in the second layer is
µi =
gj|i µij .
j
Finally the network output will be
y=
gi µi .
i
EM (Dempster et al., 1977; Laird, 1993; Yan and Zhang, 2000) and
IRLS (Chen et al., 1999) algorithms have been jointly used in HME computation. The algorithm of HME using EM algorithm is:
(1) For a given data (x, y), compute posterior probabilities hi and hj|i .
(2) Using IRLS algorithm to modify weight µij of expert network (i, j).
(3) Using IRLS algorithm to modify weight vi of the first layer of gating
network.
(4) Using IRLS algorithm to modify weight vij of every layer of gating
network (except the first layer).
(5) Start a new round of interation based on the modified weights.
3.
Neural Network Controller
Some neural networks have been designed to perform the control of
dynamic systems (Mathworks, 2002). There are usually two steps in the
neural network control, i.e., system identification and control design. The
objective of system identification is designing a neural network model for
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch08
1st Reading
Design and Customization of Artificial Neural Networks
5
Figure 3. NARMA-L2 neural network.
the system to be controlled. The neural network model is then used to train
the controller in the stage of control design. The feedback linearization controller, or NARMA-L2 controller, will be described here. This controller
was designed to remove nonlinearity from a nonlinear system and transform
it into a linear system (Mathworks, 2002; Fecit, 2003).
In order to design a NARMA-L2 controller, the first step is to identify
the system to be controlled (Fecit, 2003; Fig. 3). A neural network is trained
to simulate the dynamics of system.
(1) Choose a model architecture. The nonlinear autoregressive moving
average model (NARMA) is a standard model of discrete nonlinear
systems:
y(k + d) = f(y(k), y(k − 1), . . . , y(k − n + 1),
u(k), u(k − 1), . . . , u(k − n + 1)),
where d ≥ 2, and u(k) is the input of system. Our objective is to train
neural network for approximating the nonlinear function F .
(2) If system output is demanded to track referential curve: y(k + d) =
yτ (k + d), a nonlinear controller in the form of the following equation
should be developed:
u(k) = g(y(k), y(k − 1), . . . , y(k − n + 1), yτ (k + d),
u(k − 1), . . . , u(k − n + 1))
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch08
1st Reading
6 Computational Ecology
To avoid dynamic feedback and slow training, a new model was developed to approximate NARMA (Narendra and Mukhopadhyay, 1994, 1997):
y(k + d) = f(y(k), y(k − 1), . . . , y(k − n + 1),
u(k − 1), . . . , u(k − n + 1)) + g(y(k),
y(k − 1), . . . , y(k − n + 1), u(k − 1), . . . , u(k − n + 1)).
The controller is
u(k) = [yτ (k + d) − f(y(k), y(k − 1), . . . , y(k − n + 1),
u(k − 1), . . . , u(k − n + 1))]/g(y(k), y(k − 1), . . . , y(k − n + 1),
u(k − 1), . . . , u(k − n + 1)).
4.
Customization of Neural Networks
4.1. Definition of neural network
Various new architectures of neural networks are allowed to be created
in the Matlab toolkit. A new network can be created by performing the
command “network” in the command window (Fig. 4). The neural network
should possess the following properties, with initial parameters, structures,
or functions, etc.
(1) Architecture properties. numInputs (0; the number of inputs);
numLayers (0; the number of layers); biasConnect ([];
bias connections); inputConnect ([]; input-layer connections);
layerConnect ([]; between-layer connections); outputConnect
([]; layer-output connections); targetConnect ([]; layer-target
connections); numOutputs (0; read-only; the number of outputs); numTargets (0; read-only; the number of targets);
numInputDelays (0; read-only; the number of input delays);
numLayerDelays (0; read-only; the number of layer delays).
(2) Subobject structures. inputs (0 × 1 cell of inputs); layers (0 × 1
cell of layers); outputs (1×0 cell containing no outputs); targets
(1 × 0 cell containing no targets); biases (0 × 1 cell containing no
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch08
1st Reading
Design and Customization of Artificial Neural Networks
7
Figure 4. Creat a new neural network in Matlab environment.
biases); inputWeights (0 × 0 cell containing no input weights);
layerWeights (0 × 0 cell containing no layer weights).
(3) Functions. adaptFcn (none; adapt function); gradientFcn
(none; gradient function); initFcn (none; initialization function);
performFcn (none; performance function); trainFcn (none;
training function).
(4) Parameters. adaptParam (none; parameters for adapting network); gradientParam (none; parameters for gradient function);
initParam (none; parameters for initialization); performParam
(none; parameters for network performance); trainParam (none;
parameters for training network).
(5) Weight and bias values. IW (0 × 0 cell containing no input weight
matrices; input-layer connection weights); LW (0 × 0 cell containing
no layer weight matrices; between-layer connection weights); b (0 × 1
cell containing no bias vectors; bias).
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch08
1st Reading
8 Computational Ecology
Figure 5. Architecture of the neural network designed.
A neural network can be developed, for example, using the following
settings (Fecit, 2003; Fig. 5):
%Generate an empty neural network
net=network;
%Architecture properties
%Set the number of inputs and network layers
net.numInputs=2;
%Number of inputs
net.numLayers=3;
%Number of layers
%Set bia connections. Bias connection matrix is a 3 x 1 matrix. We
may set layer i has bias
%connection, i.e., net.biasConnect(i)=1. Suppose layer 1 and layer
3 have bias connections
%in this network
net.biasConnect=[1;0;1];
%Set input-layer connections. Input-layer connection matrix is a
%3 x 2 matrix. For example, input 1 to layer 1:
net.inputConnect(1,1)=1;
%input 1 to layer 2: net.inputConnect(2,1)=1;
%input 2 to layer 2: net.inputConnect(2,2)=1;
net.inputConnect=[1 0;1 1;0 0];
%Set between-layer connections.Layer connection matrix is a 3 x 3
matrix. The syntax: net.layerConnect(i,j)
%means a weight connection
%of layer j to layer i. Here there are weight connections of layer 1 to
%layer 3 and layer 2 to layer 3
net.layerConnect=[0 0 0;0 0 0;1 1 0];
%Set output and target connections. Both output and target matrices are
%1 x 3 matrix, which means there is one target and 3 layers are
connected to 1 target.
net.outputConnect=[0 1 1]; %Connect layers 2 and 3 to output
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch08
1st Reading
Design and Customization of Artificial Neural Networks
9
net.targetConnect=[0 0 1];
%Set the layer 3 to target connection. The output of layer 3 will be
compared to the target of
%layer 3 and yields an error signal which is used in the training.
%Subobject structures
%Some layer properties may be changed. Here layer 3 has default settings
excepting its
%initialization function
net.inputs{1}.range=[-5 10;-5 10]; %Set range of input 1
net.inputs{2}.range=[-5 5;-5 5;-5 5;-5 5;-5 5]; %Set range of input 2
net.layers{1}.size=8; %There are 5 neurons in layer 1
net.layers{1}.transferFcn=’tansig’; %Set transfer function
net.layers{1}.initFcn=’initnw’;
%Nguyen-Widrow function is the initialize function
net.layers{2}.size=5; %There are 4 neurons in layer 2
net.layers{2}.transferFcn=’logsig’;
net.layers{2}.initFcn=’initnw’;
net.layers{3}.initFcn=’initnw’;
%Here delays of input or layer weights are set.
net.inputWeights{2,1}.delays=[0 1];
net.inputWeights{2,2}.delays=1;
net.layerWeights{3,3}.delays=1;
%Set some functions
net.trainFcn=’trainlm’; %Training function: Levenberg-Marquardt
net.performFcn=’mse’;
%Performance function: mse
net.initFcn=initlay’;
%Network initialization function: initlay. This means the network will
%initialize itself using layer initialization function.
The following codes try to initialize, simulate, and train the network,
and training goals are defined:
net=init(net);
x={[2;3] [-3;4]; [2;-1;0;3;1;4] [1;-2;0;4;1;3]};
y={1 -1};
z=sim(net,x);
net.trainParam.epochs=1000;
net.trainParam.goal=1e-8;
net=train(net,x,y);
In addition to neural network models, various functions used in neural
networks, e.g., transfer functions, topological functions, distance functions,
initialization functions, training functions, etc., can also be customized,
designed, and loaded by users (Zhang and Li, 2008).
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch09
1st Reading
CHAPTER 9
Learning Theory, Architecture
Choice and Interpretability
of Neural Networks
1.
Learning Theory
Artificial neural networks exploit the mechanism or laws hidden in the samples by learning from samples. Learning theory and methods are attractive
topics in the neural networks researches and applications. Learning theory
focuses on questions such as (Yan and Zhang, 2000): is the learning result
able to approximate hidden mechanism as the number of training samples
increases (statistical performance of learning, or learnability)? How many
samples are required and how long is the computation time for learning the
mechanism (complexity of learning)? How about the convergence of learning algorithm? It should be noted that some issues are in nature unlearnable
irrespective of architecture, performance, and learning method of neural
network used.
1.1. Statistical performance of learning
For supervised learning, the objective is to make the network output y
approximate the desired output ŷ by finding a specific F(x, w) (adjusting
weights) from a given set of functions (a given architecture of neural network), i.e., y = F(x, w), where x ∈ Rn , y ∈ Rm , w is the weight matrix.
The question is to analyze whether a given set of samples contain enough
information such that the trained neural network has a good generalization
1
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch09
1st Reading
2 Computational Ecology
capacity. The theory on minimization of empirical risk was proposed to
address this problem (Vapnik, 1995, 1999). Suppose the empirical risk is
R(w) = (ŷ − F(x, w))2 dP(x, ŷ),
where P(x, ŷ) is the joint distribution of x and ŷ. Suppose we is the weight
matrix to minimize empirical risk Re (w), and if Re (w) uniformly converges
to pratical risk R(w), i.e.,
p{ sup |R(w) − Re (w)| > ε} → 0,
N → ∞,
w∈W
R(we ) will probabilistically converge to the possible minimum of R(w).
The above theory proved the availability of minimization of empirical
risk. However, the number of training samples is not limitless and therefore
the convergence speed is the focus of the learning (Yan and Zhang, 2000).
The convergence speed directly relates to the VC dimension of the problem
(VC dimension of a function class F is defined as the cardinality of S, the
set of N points in Rn , that can be finely divided by F ). If VC dimension
is a limited number, the problem will be learnable and the empirical risk
will converge to a minimum. For feedforward neural networks, their VC
dimension (capacity of multilayer neural networks) represents their classification capacity. For the feedforward network with only a hidden layer, its
VC dimension d will be (Sontag,1998)
2n[h/2] ≤ d ≤ 2|W| log2 (eN),
where h is the number of hidden neurons, |W | is the number of adjustable
connection weights, N is the number of neurons, [ ] denotes the maximum
integer, n is the input dimension, e = 2.7183. Sometimes VC dimension is
simply represented by |W|.
VC dimension should be reduced as lower as possible. Some measures
can be taken to reduce VC dimension: (1) limit the number of hidden neurons; (2) limit the number of connection weights, and (3) try to reduce the
dimension of input space.
Amari et al. (1997) suggested a simple principle for network learning,
i.e., the model is optimally trained when the ratio of the number of training
samples to the number of connection weights exceeds 30.
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch09
1st Reading
Learning Theory, Architecture Choice and Interpretability of Neural Networks
3
The number of hidden neurons (N) is sometimes determined by using
the following rule (Nagendra and Khare, 2006), i.e., “rule of thumb”:
(1) N = number of input neurons + number of output neurons;
(2) The maximum number of neurons in the hidden layer (Nmax ) is two
times the number of input layer neurons (Swingler, 1996; Berry and
Linoff, 1997);
(3) N = number of the training patterns divided by five times of the number
of input and output neurons.
1.2. Complexity of learning
It is impossible to engineer the network to perfectly learn the unknown. As
a consequence, PAC learning is a better choice for neural network learning.
PAC learning only demands the approximate learning in the sense of probability at a given learning error (Valiant, 1984; Anthony and Biggs, 1992;
Yan and Zhang, 2000). Given a probability distribution p and real values
0 < ε, δ ≤ 1, if there is an algorithm A, by using the samples of C, A is
able to output the hypothesis h over the polynomial time of 1/ε, 1/δ, and
N (size or scale of the problem to be studied, or the complexity of c ∈ C),
such that
P((h, c) < ε) ≥ 1 − δ,
∀c ∈ C.
Then C is PAC learnable.
Suppose the VC dimension of a neural network is d, and d → ∞
if n → ∞ (n is the input dimension). When n → ∞, for any ε > 0
the algorithm with the number of samples less than (d − 1)/(32ε) is not
PAC learnable (Venkatech, 1992). For a given neural network with Boolean
output, the relationship between sample size (N) and error (ε) should meet
the following rule (Yan and Zhang, 2000): N ≥ d/ε, where |W | is the
number of adjustable connection weights.
1.3. Dynamic learning
Neural network learning is a dynamic process. A learning rule with continuous time can be considered to be a differential equation (Amari, 1990;
Yan and Zhang, 2000).
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch09
1st Reading
4 Computational Ecology
Suppose a single neuron has the input x ∈ Rn and the weight vector
w ∈ Rn , the desired output is ŷ. The input is interrupted by the environment
and represented by p(x, ŷ). Amari (1990) developed a unified learning
equation as the following:
τdw(t)/dt = −w(t) + η(t)r(w(t), x(t), ŷ)x(t),
which is equal to
w = ηr(w, x, ŷ)x − λw
dw/dt = ηr(w, x, ŷ)x − λw
discrete equation,
continuous equation,
where λ, η > 0, r(w, x, ŷ) is the learning signal. If λ = 0 and r(w, x, ŷ) =
ŷ − y = ŷ − sgn(wT x) in the above discrete equation, we get the perceptron
learning rule; if λ = 0 and r(w, x, ŷ) = ŷ − y = ŷ − wT x, we get Widrow–
Hoff learning rule; and if λ=0 and r(w,x,ŷ) = y, Hebb learning rule is
achieved.
Various new learning rules may be developed based on the unified
learning equation, and dynamic properties of learning rules can be analyzed
according to the equation.
1.4. Generalization capacity
The generalization capacity represents neural network’s ability to respond
to unknown samples (but with the same mechanism as hidden in training
samples). Using neural networks to interpolate, predict, recognize, etc.,
demands a high generalization capacity. An over-fitted network will overfit the details (chaos) of training samples and omit the general trends or
mechanisms. This network thus has a lower generalization capacity. Some
measures for generalization capacity are described in the following.
(1) A general rule to describe the relationship between generalization error
and training error is (Lee and Tenorio, 1993)
GMEE ≤ ε + λ((d/N) ln(N/d))1/2 ,
where GMEE is the generalization error (prediction error, or generalized minimum empirical error), ε is the training error, d is the VC
dimension, N is the number of training samples.
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch09
1st Reading
Learning Theory, Architecture Choice and Interpretability of Neural Networks
5
(2) An extension of VC dimension, regular dimension, has also been used
to describe generalization–training relationship (Gu and Takahashi,
1996):
R = ε + σ(dRe /N)1/2 ,
where R is the generalization (prediction) risk, ε is the training error,
dRe is the regular dimension, σ is the variance of loss function R(w),
and N is the number of training samples.
Training error ε can be expressed as the ratio of misclassified
samples versus total samples.
(3) A network capacity based measure was developed to represent generalization capacity (Yan and Zhang, 2000):
G = log2 C/m,
where C is the network capacity, i.e., the number of different mappings
or functions to be realized by adjusting connection weights, and m is
the output dimension. A small G represents a large generation capacity.
2. Architecture Choice
The objective of architecture choice is to maximize the generalization
capacity of neural network. Compared to complex networks, a simple network demands a longer training time and has larger training errors. But it
is easy to be understood, to extract knowledge and rules, and to be realized
by hardware. Moreover, it has a larger generalization capacity. As a result,
the general principle is to choose the simplest architecture given the same
conditions.
2.1. Regular method
An available method for architecture choice is to follow this rule:
Criterion of architecture choice = logarithmic likelihood function + λ
× architecture complexity,
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch09
1st Reading
6 Computational Ecology
where the first term represents the fitting goodness of model output to input
samples, the second term is to penalize the model for its complexity, and λ
is a factor to adjust the constraint strength.
2.2. Network pruning and construction
Network pruning, i.e., removing some unimportant connection weights or
neurons from a redundant network, has been suggested (Reed, 1993; Castellano et al., 1997; Setiono, 1997; Yan and Zhang, 2000). For example, in
the weight decay method, a penalty function is added in the goal function
of BP algorithm, and the weights not inflicted in the learning process will
decline exponentially.
The iterative pruning algorithm proposed by Castellano et al. (1997) can
be used to prune not only hidden neurons but laso connections in feedforward neural networks. In this algorithm, network pruning and modification
of weights are conducted simultaneously to keep the same network outputs.
The computational procedure of the algorithm is:
(1) Assign k=0.
(2) Remove the neuron h which has the smallest influence on network N k ,
following the rule
2
h = arg min
whi
yh 22 ,
h∈H
i∈Ih
where whi is the connection weight from neurons h to i, H is the
set of hidden neurons, Ih is the sending domain of neuron h, Ih =
{j ∈ V |(i, j) ∈ L}, V is the set of all neurons and L is the set of all
connections, and yh is the signal of neuron h to Ih .
(3) Solve the following equation to obtain δ using conjugate gradient
algorithm:
Yk δ = Zk .
(4) Generate a new network: N k+1 = {V k+1 , Lk+1 , wk+1 }, where
V k+1 = V k − {h},
Lk+1 = Lk − {h} × Ihk ∪ Rkh × {h},
k
wk+1 = wji
,
i∈
/ Ih ,
k
+ δji ,
wk+1 = wji
i ∈ Ih .
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch09
1st Reading
Learning Theory, Architecture Choice and Interpretability of Neural Networks
7
Rh is the receiving domain of neuron h, Rh = {j ∈ V |(j, i) ∈ L}, and
δji is the value for weight modification.
(5) Assign k = k + 1, repeat steps (2) to (4), until the performance of
network N k declines significantly.
A neural network can also be constructed step by step from a simple and
small architecture. There are different algorithms of network construction,
for example, tiling algorithm (Mezard and Nadal, 1989), sequential addition
(Marchand et al., 1990), upstart algorithm (Frean, 1990), etc.
In general, there is not any universal method or algorithm for the choice
of architecture. In addition to the choice of basic models, the number of
hidden layers, neurons, etc., should be determined according to the specific
system in hand (Zhang, 2007; Zhang et al., 2007; Zhang et al., 2008).
3.
Interpretability of Neural Networks
The interpretability power of neural networks has long been the major concern for neural network models. For example, screening the relative importance of input variables of a system is a valuable research topic in ecology
and environmental science. Developing the methods for interpreting neural
networks is the subject of recent researches (Kemp et al., 2007).
So far various methods to address neural network’s data interpretability,
such as neural interpretation diagram (Özesmi and Özesmi, 1999), sensitivity analysis (Lek et al., 1996; Scardi, 1996; Recknagel et al., 1997; Gevrey
et al., 2006), inference rule extraction, randomization test of significance
(Scardi and Harding, 1999; Olden, 2000; Olden and Jackson, 2002; Kemp
et al., 2007), partial derivatives (Dimopoulos et al., 1999; Reyjol et al.,
2001), connection weight methods (Olden et al., 2004; Zhang and Wei,
2008), etc., have been developed for practical uses (Özesmi et al., 2006;
Gevrey et al., 2003).
3.1. Sensitivity analysis
In sensitivity analysis (Lek et al., 1996), the network output corresponding
to each input variable is determined by assigning designed values to the
variable at a time while holding the other input variables constant (Lek
et al., 1996; Özesmi et al., 2006). The variables that are held constant are
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch09
1st Reading
8 Computational Ecology
assigned their minimum, first quartile, median (or mean), third quartile,
and maximum values successively. The relative importance of the variable
is determined by comparing input–output relationships. Moreover, white
noise can be added to each input variable and the output error should be
examined (Scardi and Harding, 1999).
3.2. Neural interpretation diagrams
Neural interpretation diagrams are used to analyze how the neural network
is weighting every input variable and how the input variables interact to
yield the network output (Özesmi and Özesmi, 1999; Özesmi et al., 2006).
Neural interpretation diagrams are drawn by scaling the thickness of the
connection lines of neurons according to the relative values of their weights.
Positive and negative signals are represented by different colors. By scaling
the thickness of the connection lines from the input neurons, the most
important variables can be found. Between-input variable interactions can
also be observed. Neural interpretation diagrams are useful for situations
with simpler network architecture.
3.3. Randomization test of significance
A randomization test was developed to assess the statistical significance of
connection weights and input variables (Olden, 2000; Olden and Jackson,
2002). In this method the neural network uses randomized data and all
connection weights (input-hidden-output connection weights) are recorded.
This procedure is repeated a given number of times, e.g., 1000 or 10000
times, to yield a null distribution for each connection weight. It is then
compared to the actual value to calculate the significance level (Özesmi
et al., 2006). The connections with little influence on network output can
be removed. Through randomization test, the independent variables that
significantly contribute to network prediction can be identified.
HIPR (Holdback Input Randomization) is a network-independent randomization test method (Kemp et al., 2007). It is a refinement of the method
presented by Scardi and Harding (1999). The procedure of HIPR method
can be summarized as follows:
(1) Optimize the neural network.
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch09
1st Reading
Learning Theory, Architecture Choice and Interpretability of Neural Networks
9
(2) Use the test data set to determine the relative importance of input variables: (a) sequentially input each data point in the test data set to the
neural network but replace the values of one input variable by uniformly distributed random values in the interval (0.1, 0.9), the range
over which the network was originally trained; (b) calculate the MSE
(mean squared error) of the neural network when the randomized test
set has been presented, and (c) repeate the procedure for each input
variable, each time substituting the original values with uniformly distributed random values.
3.4. Partial derivatives
The partial derivatives of the network output with respect to input variables can be used to determine the relative importance of input variables
(Dimopoulos et al., 1999; Reyjol et al., 2001). By plotting the partial derivatives of the network output with respect to an input variable, the changes
of network output with increasing values of the input variable can be determined (Özesmi et al., 2006).
3.5. Connection weight methods
Connection weights themselves represent the importance of connections
and can thus be used to evaluate the relative importance of input variables.
The input variable relevance is substantially a connection-weight
method. In this method the relevance of an input variable is the sum square
of weights for that input variable divided by the total sum square of weights
for all input variables. Variables with higher relevance are more important
(Özesmi and Özesmi, 1999).
A connection weight method proposed by Olden et al. (2004) is considered to outperform all other previous methods in determining the relative
importance of input variables. It sums the product of connection weights
from input neurons to hidden neurons with the connection weights from
hidden neurons to output neurons for all input variables (Kemp et al., 2007).
The larger the sum of connection weights, the greater the importance of the
variable connected to this input neuron. The relative importance of a input
Dec. 17, 2009
10
16:31
9in x 6in
B-922
b922-ch09
1st Reading
Computational Ecology
variable is determined by
Vi =
N
wik wko ,
k=1
where Vi is the relative importance of variable i, N is the total number of
hidden neurons, wik the connection weight from input variable i to hidden
neuron k, and wko is the connection weight from hidden neuron k to output
neuron o.
A connection weight method, IDM (importance detection method;
Zhang and Wei, 2008), was presented for a specific neural network
ANNSSM (artificial neural network for state space modeling). In this algorithm the total importance of input variable i may be calculated by the
following formula
Vi =
=
n
n
LW{k + n, j}/(1 + exp(−IW{j, i}))
k=1 j=1
m
′
wij /(1 + exp(−wji
)),
i = 1, 2, . . . , n,
j=1
where n is the number of state variables, wij is the between-hidden layer
′ is the input-hidden layer connection weight. The relconnection weight, wji
ative importance of input variable i to output variable k may be obtained by
Vki =
n
LW{k + n, j}/(1 + exp(−IW{j, i})),
j=1
k, i = 1, 2, . . . , n.
The larger importance value represents the greater importance of the input
variable. For ANNSSM, IDM is better than the previous connection weight
method (Olden et al., 2004).
Many methods that determine the relative importance of input variables implicitly assume that all input variables are unit independent or have
the same unit representation, e.g., the number of individuals of different
biological taxa (Zhang and Wei, 2008). However, in many cases input variables are different physical quantities, e.g., temperature and humidity. Perhaps temperature is mathematically a more important factor than humidity
Dec. 17, 2009
16:31
9in x 6in
B-922
b922-ch09
1st Reading
Learning Theory, Architecture Choice and Interpretability of Neural Networks
11
according to the sensitivity analysis in neural network modeling, but sometimes humidity is practically more influential than temperature if humidity
variation of the environment is much larger than temperature variation. For
this reason, in the applications of those methods (e.g., sensitivity analysis),
the range of each input variable (or training set) must be deliberately determined based on the practical variation of studied input variable in order
to achieve reasonable results. In addition, sequentially removing one input
variable from all input variables to check model performance (fitting error,
prediction error, etc.) is also an alternative method.
March 23, 2010
20:1
9in x 6in
B-922
b922-ch10
1st Reading
CHAPTER 10
Mathematical Foundations
of Artificial Neural Networks
Artificial neural networks involve many areas of mathematics, such as
probability theory, differential geometry, topology, numerical computation, differential equation theory, etc. Some mathematical principles for
the design and analysis of artificial neural networks are discussed in this
chapter.
1.
Bayesian Methods
Learning process and designation of neural networks can be based on
Bayesian methods (Yan and Zhang, 2000). The fundamental idea of
Bayesian methods is that the probability that a postulate, A, will be true is
positively proportional to multiplication of the postulate’s prior probability
and the conditional probability of information, I, being observed given H
is true.
Given a discrete sample space, S, and Ai ∈ S, i = 1, 2, . . ., where
∪Ai = S, and Ai ∩ Aj = φ, i = j. Bayesian rule is represented by:
p(Ai /I) = p(I/Ai )p(Ai )/
p(I/Aj )p(Aj ).
For a continuous sample space, Bayesian rule is represented by:
p(A/I) = p(I/A)p(A)/ p(I/A)p(A)dA.
1
March 23, 2010
20:1
9in x 6in
B-922
b922-ch10
1st Reading
2 Computational Ecology
1.1. Models selection
Bayesian rule has been used in the selection of models (Mackey, 1992).
Suppose that there are several models (e.g., BP, RBF, etc.). Given the data
or information, I, the posterior probability of model Ai is thus given by
Bayesian rule
p(Ai /I) = p(I/Ai )p(Ai )/p(I),
where p(Ai ) is the prior probability of model Ai ; p(I/Ai ) is the evidence
of model Ai , which can be represented by
p(I/Ai ) = p(I/w, Ai )p(w/Ai )dw,
where w = (w1 , w2 , . . . , wn ) is a weight vector.
1.2. Bayesian learning
Let p(w) be a distribution of network weights (including various thresholds), w = (w1 , w2 , . . . , wn ), without information or samples, I. Given
information or samples, the posterior distribution is
p(w/I) = p(I/w)p(w)/ p(I/w)p(w)dw.
Without any prior information, the prior distribution, p(w), and conditional distribution, p(I/w), may be represented by exponential distributions
(Yan and Zhang, 2000):
p(w) = exp(−cf (w))/ h(c),
p(I/w) = exp(−ag(I))/q(a).
The posterior distribution of network weights can be achieved by
p(w/I) =
p(w/c, a, I)p(c, a/I)dcda.
1.3. Bayesian Ying-Yang system
Bayesian Ying-Yang system was proposed to unify various neural network models and learning algorithms (Xu, 1997). Ying-Yang system is the
March 23, 2010
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Mathematical Foundations of Artificial Neural Networks
3
generalization of Bayesian estimation of weights and Bayesian selection of
models.
Perception and association are two of the most important mechanisms
in neural network models. They are conducted by supervised or unsupervised learning procedures in the neural networks (Yan and Zhang, 2000),
yielding an output (y) from input (x). Perception is targeted to realize
pattern recognition, extraction of characteristics, etc. Association is performed by neural networks for classification, function approximation, and
control.
The relationship between network input, x, and output, y, can be
statistically represented by the joint distribution, p(x, y):
p(x, y) = p(x)p(y/x),
p(x, y) = p(y)p(x/y).
Corresponding to the above there are two models, M1 = {My/x , Mx } and
M2 = {Mx/y , My }. Mx and My are called Yang model (visible model) and
Ying model (invisible model), respectively. The above joint distribution is
thus expressed as
pM1 (x, y) = pMx (x)pMy/x (y/x),
pM2 (x, y) = pMy (y)pMx/y (x/y).
A Yang learning machine and a Ying learning machine are used to realize
pM1 (x, y) and pM2 (x, y), respectively. The learning model is named as a
Ying-Yang system. Learning is performed in the following way (Xu, 1997):
(1)
(2)
(3)
(4)
Obtain an appropriate representation;
Design a basic structure of the model;
Determine the scale and size of the model;
Learn the parameters in the model.
The learning algorithm tries to reach
pMx (x)pMy/x (y/x) = pMy (y)pMx/y (x/y).
Various neural network models, like RBF, multi-tier feedforward network,
etc., and learning algorithms, like PCA, Helmholtz machine, EM learning
algorithm, etc., are specific realizations of this theory (Xu, 1997).
March 23, 2010
20:1
9in x 6in
B-922
b922-ch10
1st Reading
4 Computational Ecology
2.
Randomization, Bootstrap and Monte Carlo
Techniques
Randomization, bootstrap and Monte Carlo techniques are widely used in
the researches of artificial neural networks. Details on these techniques can
be found in Gentle (2002), Manly (1997) and Zhang (2007).
2.1. Random numbers
2.1.1. General random numbers
Random numbers are the basis in randomization, bootstrap and Monte Carlo
techniques.
To produce a series of uniformly distributed random numbers (0 ∼ 1),
the Matlab codes are:
x=rand;
m=rand(5,6);
To arrange integer numbers, 1 ∼ n, randomly:
n=10;
x=randperm(n);
The random number from Matlab generator is a pseudo-random number. The generator will reset once Matlab is initiated. Thus, the sequence of
random numbers will be the same. The following codes record the present
status (R) of generator, and set the status as R:
R=rand(‘state’);
rand(‘state’,R);
Different status of generator can be set at any time:
rand(’state’,sum(100*clock));
Theoretically, Matlab generator may yield 21492 different random numbers, which is enough for general research purposes.
2.1.2. Probability distributions and random numbers
Random numbers of normal distribution (norm), χ2 distribution (chi2),
t distribution (t), F distribution (f), β distribution (beta), uniform
March 23, 2010
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Mathematical Foundations of Artificial Neural Networks
5
distribution (unif), and exponential distribution (exp), etc., can be generated
in Matlab environment. The commands of probability density, probability
distribution, inverse probability distribution, mean and variance, and random number are pdf, cdf, inv, stat, rnd. Various combinations of
corresponding commands will yield different probability distributions and
random numbers.
x=5;
%Yield a density value of normal distribution with mean 5
%and standard deviation 30
p=normpdf(x,5,30);
%Yield a random number of normal distribution with mean 3
%and standard deviation 25
r=normrnd(3,25);
%Yield a value of probability distribution
of t-distribution with
%degree of freedom 8
s=tcdf(x,8);
%Yield a 0.95 percentile of F -distribution
with degrees of freedom
%3 and 9
g=finv(0.95,3,9);
The random numbers of probability distribution can be generated based
on uniformly distributed random numbers (Gentle, 2002; Zhang, 2007). The
common techniques include:
(1) Inverse transformation method. Given the probability distribution F(x)
and its inverse function F −1 , generate an uniformly distributed random
number on [0,1], X ∼ U (0,1), and take X = F −1 (U).
(2) Convolution method. Given that random variable Y = X1 + X2 +
· · · + Xn , where Xi , i = 1, 2, . . . , n, are independent and share the
same probability distribution, firstly generate Xi , i = 1, 2, . . . , n, and
take Y = X1 + X2 + · · · + Xn .
(3) Accept–reject method. Suppose the density function to be found is
f(x); firstly get a density function g(x) and a constant c, such that
f(x) ≤ cg(x); then generate a random number, x, of g(x); take r =
cg(x)/f(x); finally, generate a uniformly distributed random number,
u; x is the desired random number if ru < 1, or else repeat the above
procedure.
March 23, 2010
20:1
9in x 6in
B-922
b922-ch10
1st Reading
6 Computational Ecology
The Matlab codes for generating random numbers of various probability distributions are:
%Generate a matrix of uniformly distributed random numbers
rand(3,4)
%Generate a matrix of normally distributed random numbers,
%N(3,252 )
random(‘norm’,3,25,5,7);
%Generate a matrix of F -distributed random numbers with
%degrees of freedom10 and 20
random(‘f’,10,20,6,8)
%Generate a matrix of binomially distributed random numbers,
%B(5,0.3)
random(‘bino’,5,0.3)
Latin hypercube is a method to determine inputs (McKay et al., 1979).
In Latin hypercube method, m factors (i.e., inputs) are all uniformly distributed random variables, U(0,1). Set the i-th realization of j-th factor as
vj = (pj (i) + uj − 1)/n,
where pj (·), j = 1, 2, . . . , m, are independent and random selections from
1, 2, . . . , n. pj (i) is the i-th element of j-th permutation, uj is a value
sampled from U(0,1), and n is the sample size. vj will completely disperse
over factor space.
2.1.3. Markov chain and random numbers
Markov chain can be used to generate random numbers. Using different
transition matrix will result in different methods for producing random
numbers (Gentle, 2002). One of these methods is the Metropolis–Hastings
algorithm: given the probability density function, pX , of random variable
X, and conditional probability density function, gYt+1|Yt , of deviation of
Markov chain, then
(1) assign i = 0, and choose xi with probability p;
(2) generate y with probability density function gYt+1|Yt (y/xi );
(3) calculate Hastings ratio, r
r = pX (y)gYt+1|Yt (xi /y)/[pX (xi )gYt+1|Yt (y/xi )];
(4) assign xi = y, if r ≥ 1; or else generate a uniformly distributed random
number, v. Assign xi+1 = y, if v < r, or else xi+1 = xi ;
(5) assign i = i + 1, and return to step (2).
March 23, 2010
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Mathematical Foundations of Artificial Neural Networks
7
2.1.4. Multivariable random numbers
Multivariable random numbers can be generated based on single random
variables (Gentle, 2002). Suppose X = (X1 , X2 , . . . , Xn ) is a random
vector, where Xi , i = 1, 2, . . . , n, are independent and share the same
probability distribution with variance 1 and mean 0. Construct a random
vector, Y = AX, with covariance matrix AAT , where |A| = 0. Try to find
A, such that AAT = , where
is the desired covariance matrix.
2.2. Randomization-based data partition
Data partition includes cross validation, Jackknife, etc. They are used to
mine more information from a limited data set (Gentle, 2002).
2.2.1. Cross validation
The principle of cross validation is to randomly (or systematically) divide
the data set, (xi , yi ), i = 1, 2, . . . , n, into two parts, i.e., training set (T)
and validation set (V). The training set is used to determine parameters in
a fitted model, y = f(x), and the validation set is used to validate and test
the model. Moreover, the training set and validation set can be exchanged
to estimate the fitting error in x:
E(R(Y0 , f(x0 )) =
R(Yi , f1 (xi )) +
R(Yi , f2 (xi ))/n,
i∈V
i∈T
where f1 (x) and f2 (x) are fitted functions using training sets T and V ,
respectively, R(Y0 , f(x0 ) is the predictive error. A data set can be divided
into several intersected subsets (Breiman, 2001), for example, K subsets
with similar sizes. A subset is chosen as the validation set and the remaining
subsets as training set, in order to achieve the predictive error. The averaged
error for K subsets is the predictive error to be estimated.
2.2.2. Jackknife method
A data set is systematically divided in the jackknife method in order to
obtain the estimates, e.g., variance or mean, from the data set (Gentle,
2002). Suppose the statistic T of a random sample, X1 , X2 , . . . , Xn , is the
estimate of population parameter θ. Divide data set into r subsets, each
with m elements (m is always 1 or 2). Remove subset i from the data set
March 23, 2010
20:1
9in x 6in
B-922
b922-ch10
1st Reading
8 Computational Ecology
and calculate the estimate T−i from the remaining subsets. The estimate of
population parameter θ is thus
′
T =
r
T−i /r.
i=1
Moreover, the jackknife T is
J(T) = T ∗ =
r∗
Ti∗ /r = rT − (r − 1)T ′ ,
i=1
where Ti∗
= rT − (r − 1)T−i . If Ti∗ are independent, the variance of T may
be estimated with jackknife variance V(J(T)):
r
V(J(T)) =
(Ti∗ − J(T))2 /(r(r − 1)),
i=1
or
V(J(T)) =
r
(Ti∗ − T)2 /(r(r − 1)).
i=1
Suppose m = 1, and the deviation of T may be extended as (Gentle, 2002)
D(T) =
∞
ai /ni .
i=1
D(T) is unbiased if ai = 0, i = 1, 2, . . .; D(T) has second order precision
if a1 = 0. The jackknife estimate of deviation of T is
D(J(T)) = E(J(T)) − θ = n
∞
i=1
i
ai /n − (n − 1)
∞
ai /(n − 1)i .
i=1
2.3. Bootstrap method
The principle of boostrap method is treating observed samples as a population and resampling this population. The statistical distribution is inferred
according to the conditional distribution of the sample taken from the population. In statistical language (Gentle, 2002), suppose the observed sample
is xi , i = 1, 2, . . . , n, and the population parameter is θ, statistc T is used
March 23, 2010
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Mathematical Foundations of Artificial Neural Networks
9
to estimate θ. First, the sampling distribution of T should be determined in
order to achieve the confidence interval of θ. Statistic T is the functional of
empirical cumulative distribution function, Pn ,
T = T(Pn ) = θ(Pn ) = g(x)dPn (x).
In bootstrap method, the observed sample xi , i = 1, 2, . . . , n, is resampled to generate a resampling sample, xi∗ , i = 1, 2, . . . , n, and the corresponding statistic is T ∗ . The variance of T is
∗ 2
V(T) = V(T ∗ ) =
(T ∗j − Tbar
) /(m − 1),
∗ is the average of T ∗j , V(T ∗ ) is the
where T ∗j is the j-th T ∗ of T , Tbar
variance of m samples, all the which from Pn , with size n.
The confidence interval of θ can be derived based on the relationship
between θ and T , e.g., f(T, θ), and the confidence interval is (Gentle, 2002;
Zhang, 2007):
P(fα/2 ≤ f(T, θ) ≤ f1−α/2 ) = 1 − α.
fα/2 and f1−α/2 may be approximated with the bootstrap method when the
probability of the population is not available. fα/2 and f1−α/2 are determined by percentiles of Monte Carlo samples of T ∗ − To , where To is the
T value of the observed sample.
2.4. Monte Carlo Method
Monte Carlo method is used to test the characteristics of statistical methods, approximate the distribution of statistics with asymptotic approximation, and compute the expectation of function of random variables (Manly,
1987). It is particularly efficient for the hidden model without explicit
relationships.
Monte Carlo method has been used in function approximation. For
example, we need to estimate F(x) with f(x). Given a set of random
inputs xi , i = 1, 2, . . . , n, and the corresponding outputs are yi = f(xi ),
i = 1, 2, . . . , n. By doing so, the mean and variance of f(x) can be
achieved.
March 23, 2010
10
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Computational Ecology
Monte Carlo method is also used to estimate the variance and test
statistical significance, etc. A procedure for statistical test is as follows
(Gentle, 2002; Zhang, 2007): generate a random sample from the observed
sample, and compute the characteristic of the random sample; repeat this
procedure n times (i.e., n randomizations); finally, test null hypothesis, i.e.,
observed samples are from the same distribution. The statistic, p, is
p = r/n,
or
p = (r + 1)/(n + 1),
where r is the number of the characteristic of random sample greater than
that of observed sample.
Random sampling of data set and subset generation of data set are additional applications of Monte Carlo method. Moreover, missing data can be
generated by Monte Carlo method (Gentle, 2002). A data matrix X is composed of the observed data block and the missing data block. We can generate m missing data and analyze the corresponding m complete data blocks.
3.
Stochastic Process and Stochastic
Differential Equation
Stochastic process and stochastic differential equation are fundamental to
the design and analysis of distributable artificial neural networks.
3.1. Stationary stochastic process
A stationary stochastic process is a stochastic process that meets the
condition
p(x(ti ) = ci |i = 1, 2, . . . , n) = p(x(ti + τ) = ci |i = 1, 2, . . . , n),
∀ti , ci , τ ∈ R.
A stochastic process is reversible if
p(x(ti ) = ci |i = 1, 2, . . . , n) = p(x(τ − ti ) = ci |i = 1, 2, . . . , n),
∀ti , ci , τ ∈ R.
March 23, 2010
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Mathematical Foundations of Artificial Neural Networks
11
3.2. Markovian process
A stochastic process is a Markovian process if
p(x(ti ) = ct |x(t1 ) = c1 , . . . , x(ti−1 ) = ct−1 )
= p(x(ti ) = ci |x(ti−1 ) = ct−1 ),
∀ti , ci ∈ R.
A Markovian process with both discrete time and discrete states is A
Markov chain:
p(x(1) = c1 , . . . , x(t) = ct ) = p(x(t) = ct |x(t − 1)
= ct−1 )p(x(t − 1) = ct−1 |
x(t − 2) = ct−2 ) · · · p(x(2) = c2 |x(1) = c1 ),
where p(x(t) = ct |x(t − 1) = ct−1 ), t = 2, 3, . . . , are state transition
probabilities. A Markov chain is stationary if state transition probabilities
are independent of time. A stationary Markov chain with limited states is a
homogeneous Markov chain, if
p(x(t) = cj |x(t − a) = ci ) = p(x(t + a) = cj |x(t) = ci ).
Suppose {x(t)} is a stochastic process. Define
p(u, τ|v, t) = p(x(τ) = u|x(t) = v),
p(u, τ) = p(x(τ) = u),
ω(u|v, t) = lim [p(u, t + t|v, t) − p(u, t|v, t)]/t,
t→0
where ω is the velocity of transition probability, and ω(u|v) = ω(u|v, t) if
ω is time independent.
For a homogeneous Markov chain of continuous time, suppose 0 ≤
n ≤ N, and define
ω− (n) = ω(n − 1|n),
ω+ (n) = ω(n + 1|n),
ω(m|n) = 0, if m = n ± 1,
March 23, 2010
12
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Computational Ecology
and
x = n/N,
x = n/N = 1/N = ε,
p(x, t) = Np(n, t).
The Fokker–Planck equation is
∂p(x, t)/∂t = −∂[g(x)p(x, t)]/∂x + (2/ε)∂2 [h(x)p(x, t)]/∂x2 ,
where the drift coefficient g(x), and diffusion coefficient h(x), are defined as
g(x) = (ω+ (n) − ω− (n))/N,
h(x) = (ω+ (n) + ω− (n))/N.
Markov process–based models are not suitable for conducting discriminant decision-making. However, the latter can be achieved by artificial
neural networks. Therefore the combination of Markov models and artificial neural networks is advantageous to decision-making. Hidden Markov
Model (HMM) may be used to describe the time series (Yan and Zhang,
2000). In a sense, the forward–backward algorithm of HMM is equivalent
to that of BP neural network (Hochberg et al., 1991). A HMM is a Markov
chain with n states, xi , i = 1, 2, . . . , n. Given the state at t, x(t), and possible
outputs, yi , i = 1, 2, . . . , m. A discrete HMM is thus represented by a state
transition matrix, P = (pij )n×n , a observation matrix, B = (bij )n×m , and
an initial distribution vector L = (li )n . The probabilities of observed output series, yηi , i = 1, 2, . . . , T , are calculated from the forward–backward
algorithm (Rabiner, 1989).
αi = li ,
αi (t) =
n
αj (t − 1)pji biηT ,
j
βi = δin ,
βi (t) =
n
βj (t + 1)pij bjηt+1 ,
j
where αi (t) is the feedforward probability, and βi (t) is the backward prob
ability. The probability of the entire observed output series is αi (t)βi (t).
March 23, 2010
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Mathematical Foundations of Artificial Neural Networks
13
3.3. Stochastic differential equation
When a continuous dynamic system is perturbed by an external random
variable, a chaotic term, E(t), must be incorporated in its differential equation (Langevin equation)
dx/dt = f(x, t) + V(t).
The chaotic term, E(t), is a discrete function of time. The differential equation has a solution as the following
t
x(t) = x(t0 ) +
f(x, τ)dτ + v(t) − v(t0 ).
t0
It is equivalent to
dx(t) = f(x, t)dt + dv(t).
The above equation is a stochastic differential equation.
A coefficient, w(t), may be included in the chaotic term to generate a
new equation
dx(t) = f(x, t)dt + w(t)dv(t),
and the corresponding solution is
t
t
x(t) = x(t0 ) +
w(t)dv(t),
f(x, τ)dτ +
t0
t0
where the latter integral term is defined with Wiener integral. If w(t) is also
dependent on x, the following Ito integral and Stratonovich integral should
be used:
(1) Ito integral:
t
w(x(t))dv(t) = lim
w(x(ti−1 ))(v(ti ) − v(ti−1 )).
t0
ti→0
i
(2) Stratonovich integral:
t
w(x(τi ))(v(ti ) − v(ti−1 )),
w(x(t))dv(t) = lim
t0
τi→0
i
March 23, 2010
14
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Computational Ecology
where τi = (ti −ti−1 )/2. The Ito and Stratonovich stochastic differential
equations and their solution expressions are represented by
dxi = fi (x(t), t)dt +
wij (x(t))dvj (t),
j
xi (t) = xi (s) +
t
fi (x, τ)dτ +
s
j
t
wij (x(τ))dvj (τ),
s
i = 1, 2, . . . , n.
3.4. Canonical system
A canonical system
dxi /dt = fi (x1 , x2 , . . . , xn ),
i = 1, 2, . . . , n
must satisfy the following condition
∂fi /∂xi = 0.
i
4.
Interpolation
Interpolation means using discrete data to construct a functional relationship or simplifying a given functional relationship to compute intermediate values. The functional relationship achieved by interpolation is called
interpolation formula. Through interpolation, we can calculate functional
values at certain nodes (Li et al., 2001; Burden and Faires, 2001; Zhang,
2007).
Similar to ANNs, interpolation methods are used to generate missing data, to smooth data series, to construct empirical models, to simplify
complex models, as well as to predict population dynamics. The interpolation formulae based on algebraic polynomials will be discussed.
Firstly, introduce the space of continuous functions, C[a, b]. It is a
space composed of all continuous functions on [a, b]. C[a, b] is a normed
space, and norm is defined as a ∞-norm
f = max |f(x)|.
a≤x≤b
March 23, 2010
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Mathematical Foundations of Artificial Neural Networks
15
It is a metric space and the metric is defined as
d(f, p) = f − p ,
f(x), p(x) ∈ C[a, b].
C[a, b] is also a Banach space.
Suppose there are n + 1 different nodes, i.e., interpolation nodes, of
f(x) ∈ C[a, b], on [a, b]
a ≤ x0 , x1 , . . . , xn ≤ b,
functional values at these nodes are f(xi ), i = 0, 1, . . . , n. If Ln (x) ∈ Hn
exists, where Hn is the set of polynomials with degree less than or equal to
n, such that
Ln (xi ) = f(xi ),
i = 0, 1, . . . , n,
then Ln (x) is the interpolation polynomial, i.e., interpolation function, of
f(x) on [a, b], based on (xi , f(xi )), i = 0, 1, . . . , n. Ln (x) is existential
and unique.
It is possible to calculate the approximate values of f(x) from Ln (x),
if x = xi , i = 0, 1, . . . , n. It is called interpolation if x ∈ [a, b], or else it is
called extrapolation.
4.1. Lagrange interpolation
Lagrange interpolation polymial is
Ln (x) =
n
li (xi )f(xi ),
i=0
where li (x) is the basis function and
li (x) = [(x − x0 )(x − x1 ) · · · (x − xi−1 )(x − xi+1 ) · · · (x − xn )]/
[(xi − x0 )(xi − x1 ) · · · (xi − xi−1 )(xi − xi+1 ) · · · (xi − xn )]
i = 0, 1, . . . , n.
Lagrange interpolation is a linear interpolation if n = 1; a quadratic
interpolation if n = 2; and a cubic interpolation if n = 3.
The error of interpolation, i.e., the remainder of Ln (x), Rn (x) = f(x)−
Ln (x), is given by the following theorem: suppose that f(x) is n-order
March 23, 2010
16
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Computational Ecology
continuously differentiable on [a, b], and f (n+1) (x) is existential on (a, b),
then the remainder term of interpolation polynomial, Ln (x), is
Rn (x) = f (n+1) (ξ)ωn+1 (x)/(n + 1)!,
x ∈ [a, b],
where ξ = ξ(x) ∈ (a, b), and
ωn+1 (x) = (x − x0 )(x − x1 ) · · · (x − xn ).
4.2. Newton interpolation
Newton interpolation is equivalent to Lagrange interpolation, but the calculation is simple.
First, define difference quotients with different order:
• First-order difference quotient:
f [xi , xi+1 ] = (f(xi ) − f(xi+1 ))/(xi − xi+1 ),
i = 0, 1, . . . , n − 1
• Second-order difference quotient:
f [xi , xi+1 , xi+2 ] = (f [xi , xi+1 ] − f [xi+1 , xi+2 ])/(xi − xi+2 ),
i = 0, 1, . . . , n − 2
..
.
• n-th order difference quotient:
f [x0 , x1 , . . . , xn ] = (f [x0 , x1 , . . . , xn−1 ]−f [x1 , x2 , . . . , xn ])/(x0 −xn )
Then the Newton interpolation polynomial is
Nn (x) = f(x0 ) + f [x0 , x1 ](x − x0 ) + f [x0 , x1 , x2 ](x − x0 )(x − x1 ) + · · ·
f [x0 , x1 , . . ., xn ](x − x0 )(x − x1 ) · · · (x − xn−1 ).
4.3. Hermite interpolation
If interpolation conditions are as follows
H(xi ) = f(xi ), H ′ (xi ) = f ′ (xi ),
i = 0, 1, . . . , n,
March 23, 2010
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Mathematical Foundations of Artificial Neural Networks
17
the following Hermite interpolation polynomial can be used:
H(x) =
n
αi (x)f(xi ) +
i=0
delete this star
n
βi (x)f ′ (xi ),
i=0
where
αi (x) = 1 − 2(x − xi )
n
j=0
j =1
∗
2
(1/(xi − xj )) ωn+1
(x)/
′
(xi )]2 },
{(x − xi )2 [ωn+1
2
′
(x)/{(x − xi )[ωn+1
(xi )]2 },
βi (x) = ωn+1
′
ωn+1
(xi ) = (xi − x0 )(xi − x1 ) · · · (xi − xi−1 )(xi − xi+1 ) · · · (xi − xn ).
Suppose that f(x) ∈ C[a, b] has 2n + 2 order derivative on (a, b), then
the remainder term of Hermite interpolation polynomial is
2
R(x) = f(x) − H(x) = f (n+2) (ξ)ωn+1
(x)/(2n + 2)!,
x ∈ [a, b],
where ξ = ξ(x) ∈ (a, b).
4.4. Spline interpolation
The above-mentioned interpolation methods become complex and unstable
as the order of polynomial increases. For example, the error of interpolation
would be considerably large at end nodes of interpolation interval. Spline
interpolation method can avoid such issues. Spline interpolation is the most
effective interpolation method, in which cubic spline function is the most
widely used. Cubic spline interpolation is a piecewise cubic interpolation.
Its first- and second-order derivatives are continuous at inner nodes and it
is thus relatively smooth.
Suppose there are n + 1 different nodes on given interval [a, b]
a ≤ x0 < x1 < · · · < xn ≤ b.
If the order of polynomial S(x), on each sub-interval [xi , xi+1 ], is less
than or equal to m and larger than or equal to 1, and its m − 1 order
derivative, S (m−1) (x), is continuous at inner nodes, x1 , . . . , xn−1 , S(x) is
called the m-degree spline function. If f(x) ∈ C[a, b], and S(xi ) = f(xi ),
March 23, 2010
18
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Computational Ecology
i = 0, 1, . . . , n, then S(x) is called the m-degree spline interpolation polynomial of f(x) on [a, b].
A cubic spline interpolation polynomial
S(x) = ai x3 + bi x2 + ci x + di , x ∈ [xi , xi+1 ],
i = 0, 1, . . . , n − 1.
should meet the following conditions:
(1) There are in total 2n conditions for interpolation and continuity;
(2) The first- and second-order derivatives should be continuous at inner
nodes
S ′ (xi −0) = S ′ (xi +0), S ′′ (xi −0) = S ′′ (xi +0),
i = 1, 2, . . . , n−1.
Moreover, the following three boundary conditions are usually
demanded to be met
(3) S ′ (x0 ) = f ′ (x0 ), S ′ (xn ) = f ′ (xn ),
(4) S ′′ (x0 ) = f ′′ (x0 ), S ′′ (xn ) = f ′′ (xn ),
(5) S ′ (x0 + 0) = S ′ (xn − 0), S ′′ (x0 + 0) = S ′′ (xn − 0).
The cubic spline interpolation polynomial is expressed as
S(x) = Mi+1 (x − xi )3 /(6li ) + Mi (xi+1 − x)3 /(6li )
+ (f(xi+1 )/ li − Mi+1 li /6)(x − xi ) + (f(xi )/ li − Mi li /6)
× (xi+1 − x)x ∈ [xi , xi+1 ],
i = 0, 1, . . . , n − 1,
where Mi = S ′′ (xi ), i = 0, 1, . . . , n; li = xi+1 − xi , i = 0, 1, . . . , n − 1.
Mi , i = 0, 1, . . . , n, are obtained by solving three bending moment
equation.
Cubic spline function is existential and unique under certain conditions.
Moreover, it shows the property of the best approximation.
The time-changing survivorship of a moth, Spodoptera litura F., under
20◦ , was interpolated using a linear interpolation (Zhang, 2007), Hermite
interpolation, and spline interpolation (Fig. 1). The following are Matlab
codes.
x=0.5:1:25.5;
%Time:0.5,1.5,3.5,...,24.5,25.5
fx=[1 1 1 1 1 1 1 0.96 1 1 1 1 0.98 1 0.98 0.92 0.94 0.85 0.9
0.59 0.38 0.33 0.27 0.17 0.17 0.17];
%Survivorship
intp=0.5:0.1:25.5; %Interpolation points:0.5,0.6,0.7,...,
25.4,25.5
subplot(4,1,1);plot(x,fx,’k.’);
March 23, 2010
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Mathematical Foundations of Artificial Neural Networks
19
Figure 1. Survivorship interpolation of Spodoptera litura F.
ylabel(’Observed’);
lx=interp1(x,fx,intp,’linear’);
LinearInterp=[intp;lx]
subplot(4,1,2);plot(intp,lx,’k.’);
ylabel(’Linear Interpolation’);
hx=interp1(x,fx,intp,’cubic’);
HermiteInterp=[intp;hx]
subplot(4,1,3);plot(intp,hx,’k.’);
ylabel(’Hermite Interpolation’);
sx=interp1(x,fx,intp,’spline’);
SplineInterp=[intp;sx]
subplot(4,1,4);plot(intp,sx,’k.’);
ylabel(’Spline Interpolation’);
xlabel(’Time’);
%Linear interpolation
%Results
%Cubic Hermite interpolation
%Results
%Cubic spline interpolation
%Results
The interpolation methods may be extended to a higher dimension
(Fig. 2). The Matlab codes for two-dimensions are listed (Zhang, 2007):
x=1:5;
y=1:5;
%x: 1,2,3,4,5
%y: 1,2,3,4,5
March 23, 2010
20
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Computational Ecology
Figure 2. Two-dimensional interpolation.
fx=[1.2 3.5 4.2 2.5 2.0;1.9 5.8 4.1 3.9 3.5;1.5 4.2 6.6 3.2
2.9;1.4 4.1 3.7 2.9 1.6;1.1 5.8 2.6 2.2 1.1];
xi=1:0.1:5;
yi=1:0.1:5;
subplot(4,1,1);mesh(x,y’,fx);
zlabel(’Observed’);
lx=interp2(x,y,fx,xi,yi’,’linear’);
%Linear interpolation
LinearInterp=lx
%Results
subplot(4,1,2);mesh(xi,yi’,lx);
zlabel(’Linear Interpolation’);
hx=interp2(x,y,fx,xi,yi’,’cubic’);
%Cubic Hermite interpolation
HermiteInterp=hx
%Results
subplot(4,1,3);mesh(xi,yi’,hx);
zlabel(’Hermite Interpolation’);
sx=interp2(x,y,fx,xi,yi’,’spline’);
%Cubic Spline interpolation
SplineInterp=sx
%Results
subplot(4,1,4);mesh(xi,yi’,sx);
zlabel(’Spline Interpolation’);
March 23, 2010
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Mathematical Foundations of Artificial Neural Networks
5.
21
Function Approximation
In ecological studies, we often encounter functions with complex forms
and tedious calculation. Function approximation methods can be used to
simplify those functions, which will simplify and speed up the calculation
without losing accuracy (Shi and Gu, 1999; Li et al., 2001; Burden and
Faires, 2001; Zhang and Barrion, 2006; Zhang, 2007). Some ANNs have
used function approximation methods as mathematical algorithms (Zhang
et al., 2008).
Suppose that f(x) is a function on [a, b]. We try to construct a simple
function, p(x), to approximate the given function, f(x). It is given as function approximation. Function approximation can also be stated as follows:
for a given function f(x) ∈ C[a, b], to find a function p(x) ∈ C[a, b], such
that the error between f(x) and p(x) on [a, b] is the smallest.
Common error metrics include:
(1) uniform approximation: find a function, p(x), so that
lim d(f, p) = 0.
n→∞
(2) Lq approximation: find a function, p(x), so that
b
|f(x) − p(x)|q w(x)dx = 0,
lim
n→∞ a
where q ≥ 1, and w(x) is the weight function on [a, b]. It is the quadratic
approximation if q = 2. The weight function, w(x), satisfies the following conditions:
(a) w(x) ≥ 0, x ∈ [a, b];
b
(b) a |x|n w(x)dx is existential, n = 0, 1, . . .;
(c) for the continuous function, g(x) ≥ 0, g(x) ∈ C[a, b], if
b
g(x)w(x)dx = 0,
a
then g(x) ≡ 0 on (a, b).
(3) least square estimation: find a function, p(x), so that
min =
m
i=1
w(xi )(p(xi ) − f(xi ))2 .
March 23, 2010
22
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Computational Ecology
We discuss these methods using algebraic polynomial p(x) to approximate f(x) in the following text.
5.1. Uniform approximation
5.1.1. Existence of algebraic polynomial of uniform
approximation
For uniform approximation, Weierstrass theorem can be used: suppose
f(x) ∈ C[a, b], then for any ε > 0, there is an algebraic polynomial p(x),
so that
f(x) − p(x) < ε
uniformly holds on [a, b]. Weierstrass theorem shows the existence of algebraic polynomial of uniform approximation.
5.1.2. Chebyshev’s best uniform approximation
Suppose 1, x, . . . , xn are a group of linearly independent functions on [a, b],
Hn is a set of polynomials with degree less than or equal to n, Hn =
span{1, x, . . . , xn }, Hn = C[a, b]. Any pn (x) ∈ Hn can be expressed as
pn (x) = a0 + a1 x + · · · + an xn ,
where ai ∈ R, i = 0, 1, . . . , n.
Suppose f(x) ∈ C[a, b], then f(x)−pn (x) is defined as the deviation
between f(x) and pn (x) on [a, b], and
En = min f(x) − pn (x) ,
pn (x) ∈ Hn ,
is the least deviation. Find p∗n (x), so that
f(x) − p∗n (x) = En = min f(x) − pn (x) ,
pn (x) ∈ Hn .
This is the Chebyshev’s best uniform approximation; p∗n (x) is the Chebyshev polynomial of best uniform approximation.
March 23, 2010
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Mathematical Foundations of Artificial Neural Networks
23
Chebyshev’s alternate node pattern. Suppose p(x) ∈ C[a, b], if there
is a node pattern xi , i = 1, 2, . . . , n, a ≤ x1 < · · · < xn ≤ b, such
that
|p(xi )| = max |p(x)|,
i = 1, 2, . . . , n,
a≤x≤b
and
p(xi ) = −p(xi+1 ),
i = 1, 2, . . . , n − 1,
then it is said to be the Chebyshev’s alternate node pattern of p(x)
on [a, b].
The following Chebyshev theorem shows the existence and uniqueness of Chebyshev’s polynomial of best uniform approximation: suppose
f(x) ∈ C[a, b], p(x) ∈ Hn , then p(x) is the polynomial of best uniform
approximation of f(x), if and only if there is a node pattern of f(x) − p(x)
on [a, b], which contains at least n + 2 nodes.
By the theorem, we can see that if f(x) ∈ C[a, b], then there is a unique
polynomial of best uniform approximation in Hn , and it is a Lagrange
interpolation polynomial of f(x).
The polynomial of best uniform approximation, p(x), is always hard
to be found. Generally, it can be obtained based on the following theorem:
suppose f(x) is the n+1 th derivative on [a, b], and f (n+1) (x) does not alter
its sign on [a, b], then the end nodes a and b belong to the node pattern of
f(x) − p(x), if p(x) ∈ Hn is the polynomial of best uniform approximation
of f(x).
5.2. Quadratic approximation
If there is a p∗n (x) such that
b
a
|f(x) − p∗n (x)|2 w(x)dx
= inf
b
|f(x) − pn (x)|2 w(x)dx,
a
pn (x) ∈ Hn ,
then p∗n (x) is the best quadratic approximation of f(x) on [a, b] with respect
to the weight function, w(x).
March 23, 2010
24
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Computational Ecology
5.2.1. Orthogonal function system
Suppose f(x), g(x) ∈ C[a, b], and w(x) is the weight function on [a, b],
then the inner product of f(x) and g(x) on [a, b] is
b
(f, g) =
f(x)g(x)w(x)dx.
a
The continuous function space C[a, b] with a definition of inner product
is an inner product space. f(x) and g(x) are w(x)-weighted orthogonal on
[a, b], if (f, g) = 0. If functions ϕ0 (x), ϕ1 (x), . . . , ϕn (x), such that
b
(ϕi , ϕj ) =
ϕi (x)ϕj (x)w(x)dx = 0, i = j,
a
(ϕi , ϕj ) = Cij ,
i = j,
{ϕi } is defined as the orthogonal function system on [a, b] with weight
function w(x). If Cij ≡ 1, it is the orthonormal function system. If ϕi (x),
i = 0, 1, . . . , are algebraic polynomials, then they are said to be orthogonal
polynomials.
The common orthogonal polynomials are as follows:
(1) Legendre polynomial
Legendre polynomial is defined on [−1, 1], its weight function w(x) ≡
1, and orthogonal function system {ϕi } = {1, x, . . . , xn , . . .}, then Legendre polynomial is
ϕ0 (x) = 1,
ϕ1 (x) = x,
ϕn+1 (x) = ((2n + 1)xϕn (x) − nϕn−1 (x))/(n + 1),
x ∈ [−1, 1],
n = 1, 2, . . .
(2) Chebyshev polynomial
Chebyshev polynomial is defined on [−1, 1], its weight function
w(x) = 1/(1 − x2 )1/2 , and orthogonal function system {ϕi } =
{1, x, . . . , xn , . . .}, then Chebyshev polynomial is
ϕ0 (x) = 1,
ϕ1 (x) = x,
ϕn+1 (x) = 2xϕn (x) − ϕn−1 (x),
x∈ ∈ [−1, 1], n = 1, 2, . . .
March 23, 2010
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Mathematical Foundations of Artificial Neural Networks
25
(3) Laguerre polynomial
Laguerre polynomial is defined on [0, ∞), its weight function w(x) =
e−x , and orthogonal function system {ϕi } = {1, x, . . . , xn , . . .}, then
Laguerre polynomial is
ϕ0 (x) = 1,
ϕ1 (x) = 1 − x,
ϕn+1 (x) = (2n + 1 − x)ϕn (x) − n2 ϕn−1 (x),
x ∈ [0, ∞),
n = 1, 2, . . . ,
(4) Hermite polynomial
Hermite polynomial is defined on (−∞, ∞), its weight function w(x) = e−x∗x , and orthogonal function system {ϕi } =
{1, x, . . . , xn , . . .}, then Hermite polynomial is
ϕ0 (x) = 1,
ϕ1 (x) = 2x,
ϕn+1 (x) = 2xϕn (x) − 2nϕn−1 (x),
x ∈ (−∞, ∞),
n = 1, 2, . . .
(5) Trigonometric function
Trigonometric function is defined on [0, 2π], and its weight function
w(x) ≡ 1. It has the following form
ϕ0 (x) = 1,
ϕ1 (x) = cos(x), ϕ2 (x) = sin(x),
ϕ3 (x) = cos(2x), ϕ4 (x) = sin(2x),
ϕ5 (x) = cos(3x), ϕ6 (x) = sin(3x), . . . ,
x ∈ [0, 2π].
5.2.2. Linearly independent function system
If ϕk (x) ∈ C[a, b], k = 0, 1, . . . , n − 1, such that
a0 ϕ0 (x) + a1 ϕ1 (x) + · · · + an−1 ϕn−1 (x) = 0,
if and only if a0 = a1 = · · · = an−1 = 0, we say that ϕk (x), k =
0, 1, . . . , n − 1, are linearly independent. {ϕi } is a linearly independent
function system, if a finite number of arbitrary ϕk (x) in function system
{ϕi } are linearly independent. Expanded from linearly independent function
March 23, 2010
26
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Computational Ecology
system, ϕk (x), k = 0, 1, . . . , n − 1,
S = span{ϕ0 , ϕ1 , . . . , ϕn−1 }
= {a0 ϕ0 (x) + a1 ϕ1 (x) + · · · + an−1 ϕn−1 (x)|a0 , a1 , . . . , an−1 ∈ R}
is the subspace of C[a, b].
5.2.3. Best quadratic approximation
The following theorem shows the existence and uniqueness of best quadratic
approximation, L2 : suppose f(x) ∈ C[a, b], then f(x) has one and only
one best quadratic approximation.
Suppose ϕk (x), k = 0, 1, . . . , n, are linearly independent, solve the
following equations
i = 0, 1, . . . , n.
(ϕi , ϕ0 )a0 + (ϕi , ϕ1 )a1 + · · · + (ϕi , ϕn )an = (ϕi , f),
The best quadratic approximation of f(x) is
a0 ϕ0 (x) + a1 ϕ1 (x) + · · · + an ϕn (x).
A simple way is taking ϕi (x) = xi , w(x) ≡ 1.
Orthogonal function systems, in particular orthogonal polynomials, are
usually used as linearly independent function systems. For this situation,
we obtain the solution of the above equations
ai = (ϕi , f)/(ϕi , ϕi ),
that is
ai =
b
ϕi (x)f(x)w(x)dx/
a
b
(ϕi (x))2 w(x)dx,
i = 0, 1, . . ., n,
a
and ai are said to be the generalized Fourier coefficients of f(x) with respect
to the orthogonal function system, {ϕi }. The expansion of f(x)
∞
ai ϕi (x)
i=0
is the generalized Fourier series. As a result, any function, f(x) ∈ C[a, b],
can be expanded as a generalized Fourier series.
March 23, 2010
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Mathematical Foundations of Artificial Neural Networks
27
5.3. Least Square Approximation
Find the relationship between independent and dependent variables from
experimental data (xi , f(xi )), i = 0, 1, . . . , m, so that the error
m
w(xi )(p(xi ) − f(xi ))2
i=0
is minimal. p(x) is the least square approximation. Here, w(x) is the weight
function representing the weights of various data nodes.
By constructing orthogonal functions, ϕk (x), k = 0, 1, . . . , n, n ≤ m,
the least square approximation is
n
aj ϕj (x).
j=0
The inner product is defined as follows
(ϕi , ϕj ) =
m
w(xk )ϕi (xk )ϕj (x),
k=0
(ϕi , f) =
m
w(xk )ϕi (xk )f(xk ),
k=0
then
ai = (ϕi , f)/(ϕi , ϕi ).
As an example, suppose that the population dynamics of an animal is
f(t) ∈ C[a, b], the best quadratic approximation of f(t) is found based on
trigonometric function. The expansion of f(t) is
p(t) = a0 /2 +
∞
(ai cos it + bi sin it),
i=1
where
ai = (1/π)
bi = (1/π)
2π
f(t) cos itdt, i = 0, 1, 2, . . .
0
2π
f(t) sin itdt, i = 1, 2, . . .
0
March 23, 2010
28
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Computational Ecology
This is just the Fourier series. The Fourier series will uniformly converge to
f(t), if f ′ (t) is piecewise continuous on [0,2π]. Thus its partial sum can be
taken as the best quadratic approximation of f(t). Periodicity of population
dynamics can also be analyzed according to Fourier series.
6.
Optimization Methods
The error function in a neural network can be minimized through various
optimization methods.
6.1. Steepest descent method
The steepest descent method is a basic unconstrained optimization technique. Given an unconstrained optimization problem: min f(x), where
x = (x1 , x2 , . . . , xn )T , and f(x) is a nonlinear function of x. Calculate
t i ∈ R, such that
f(xi + t i pi ) = min f(xi + tpi ).
t
The direction of search, pi , is determined by
pi = −∇f(xi ) = −(∂f/∂x1 , ∂f/∂x2 , . . . , ∂f/∂xn )|x = xi .
Finally, the next point, xi+1 = xi + t i pi , is achieved.
6.2. Conjugate gradient method
Suppose the objective function, f(x), is approximately a quadratic function
in the neighborhood of the extreme point, x∗
f(x) ≈ a + bT x + xT Ax/2.
Calculate
p0 = −b − Ax 0 ,
gi = b + Ax i ,
βi−1 = gi 2 / gi−1 2 ,
pi = −gi + βi−1 pi−1 ,
March 23, 2010
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Mathematical Foundations of Artificial Neural Networks
29
such that
f(xi + t i pi ) = min f(xi + tpi ),
t
the next point, xi+1 = xi + t i pi , is thus achieved. The iteration terminates
if gi ≤ ε.
6.3. Newton method
Suppose the objective function, f(x), can be expanded as a quadratic Taylor
polynomial near in the neighborhood of the point, xi
f(x) ≈ f(xi ) + ∇f(xi )T x + xT Ai x/2,
where x = x − xi . Calculate t i ∈ R, such that
f(xi + t i pi ) = min f(xi + tpi ).
t
The direction of search, pi , is determined by: pi = −(Ai )−1 ∇f(xi ). The
next point, xi+1 = xi + t i pi , is therefore achieved.
7.
Manifold and Differential Geometry
Neural field theory was developed based on differential manifold (Amari,
1985,1987,1998; Luo, 2004), which treats the global and macroscopic properties of neural networks. It is dependent on the relationship between probability distribution and neural network. Probability space is a non-Euclidean
space and is thus generally studied using manifold theory. Manifold always
shows invariant properties under chart transformation and belongs to the
category of differential geometry. Manifold is a topological space and thus
an important topic in algebraic topology and differential geometry, the latter
treats the global properties of space and manifold, in particular the relationship between local and global properties (Chen and Chen, 1980; Meng and
Liang, 1999; Wu, 1981). In general, a neural network can be represented
as a manifold. It may be mapped to a point on statistical manifold and the
parameters of neural network are coordinates on the manifold. A neural
network model can be embedded into a flat dual manifold. That is, a family
of parameterized probability distributions are embedded into a Riemannian
March 23, 2010
30
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Computational Ecology
manifold in probability distribution space, and every probability distribution is a point on the manifold and thus represents a neural network. The
relationship between manifold and submanifold, and coordination among
hierarchical, multi-model neural networks can be studied by neural field
theory and manifold theory, details may be found in the studies of Amari
(1985,1987,1995,1998) and Luo (2004).
7.1. Differential manifold
Manifold is the generalization and extension of Euclidean space, curve, or
curvature. The neighborhood of any point on manifold is homeomorphic to
an open set in Euclidean space. Local isomorphism between manifold and
Euclidean space may be constructed by a homeomorphic mapping (Amari,
1985, 1987; Luo, 2004).
Suppose M is a Hausdoff space, and H n is a half closed space of Rn
M is an n-dimensional manifold if for every x ∈ M, there is an open
neighborhood of x, which is homeomorphic to Rn or H n . For x ∈ M, if
there is an open neighborhood of x, which is homeomorphic to H n , x is
called a bound point. The set of all bound points of M is the bound of
M, ∂M. If ∂M = ϕ, M is a (non-bound) manifold, or else it is a bound
manifold.
Differential manifold and chart. The set M is a topological (differential)
manifold, if M is a topological space, and
(1) M is a Hausdoff space;
(2) M is a countable base of topological space;
(3) for any point p ∈ M, there is a neighborhood U of p, and an open set
of homeomorphic mapping fu : U → f(U), where (U, fu ) is a chart
of M.
Cr differential manifold. given a chart set A = {(U, fu ), (V, fv ), . . . ,
(X, fx )} on a m-dimensional manifold M, A is defined as a Cr differential
structure of M if the following conditions hold:
(1) {U, V, . . . , X} is an open covering of M;
(2) any two charts in A are Cr compatible;
March 23, 2010
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Mathematical Foundations of Artificial Neural Networks
31
(3) for any chart (U, fu ) of M, if it is Cr compatible with every chart in
A, it must be in A. Given a Cr differential structure on M, then M is
called a Cr differential manifold.
Cr differential construction. A chart set on M : ψ = {(U∂ , f∂ )|∂ ∈ A},
satisfies these conditions:
(1) M = ∪∂ U∂ ;
(2) ψ∂◦ ψγ−1 : ψγ (U∂ ∩ Uγ ) ⊂ Rn → ψ∂ (U∂ ∩ Uγ ) ⊂ Rn , and ψγ◦ ψ∂−1 :
ψ∂ (U∂ ∩ Uγ ) ⊂ Rn → ψγ (U∂ ∩ Uγ ) ⊂ Rn , are Cr differential homeomorphic, where (U∂ , ψ∂ ) ∈ φ, (Uγ , ψγ ) ∈ φ, and U∂ ∩ Uγ = φ;
(3) (U, f) ∈ ψ, if (U, f) is Cr compatible with every chart in ψ. ψ is called
Cr differential construction on n-dimensional manifold M.
Consider a probability distribution, M = {p(x; θ)}, where x ∈ X
is a random variable, p(x; θ) > 0 is the density function of x, θ =
(θ 1 , θ 2 , . . . , θ n ) ∈ , is analogous to the chart on manifold, ⊂ Rn is an
open set. M will exhibit the differential manifold structure when p(x; θ) is
sufficiently smooth in the neighborhood of every point of θ.
Several coordinate functions can be used in manifold, for examples, α =
f(s), and β = g(s). A one-to-one correspondent coordinate transformation
exists between α and β : α = f(g−1 (β)), β = g(f −1 (α)).
Suppose X is a compact set composed of states of system, and X is a
smooth manifold in Rn , the observation will be
y = f(x), x ∈ X, y ∈ R.
The following Takens embedding theorem is fundamental to time series
modeling: given the observation series, Y(n) = [y(n), y(n − τ), . . . , y(n −
(m − 1)τ)], the state x of system at time n, can be reconstructed by mdimensional vector, Y(n), where m ≥ 2d + 1, d is the dimension of phase
space of the system. The minimum m to conduct the construction is named
as the embedding dimension (Yan and Zhang, 2000).
7.2. Riemannian manifold
All tangent vectors of a point p on manifold M, are localized around p and
thus form a tangent vector space Tp of p. For the probability distribution
March 23, 2010
32
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Computational Ecology
manifold, M = {p(x; θ)}, take l(x; θ) = log p(x; θ), then ∂i = ∂l(x; θ)/∂θ i ,
i = 1, 2, . . . , n, are linearly independent. {∂i } is the base function of Tp , i.e.,
v=
n
vi ∂i ,
i=1
where v is any tangent vector, v ∈ Tp , and vi is the i-th component of v.
Riemannian manifold. If there is a metrics of inner product on tangent
vector space of every point of manifold M
hij (u) = ∂i , ∂j = ∂/∂ui , ∂/∂uj ,
i, j = 1, 2, . . . , n,
M is called a Riemannian manifold. Matrix H = (hij (u)) is called the
metric tensor, {∂i } is base function of Tp , (U, ui ) is a chart on M.
Suppose x, y are tangent vectors, hx is an inner product function and
satisfies the following conditions:
(1) the mapping (x, y) → hx (x, y) is double linear, and hx (x, y) =
hx (y, x), ∀x, y ∈ Tx (M);
(2) hx (x, y) ≥ 0, ∀x, y ∈ Tx (M), and x = 0 if hx (x, y) = 0;
(3) hx (x, y) is the smooth function on M, if both x and y are smooth vector
fields.
The length of tangent vector x, is defined as |x|2 = x, x = xi xj hij . If
x, y = xi yj hij = 0, tangent vectors x and y are orthogonal.
Affine connection. ∇∂j is defined as the internal change of j-th base tangent
vector when point θ on manifold changes to θ + dθ, and ∇∂i ∂j , i.e., the
covariant derivative, is defined as the internal change of ∂j when θ changes
in the direction of ∂i , where ∇∂i ∂j = Ŵki,j (θ)∂k (θ) and Ŵki,j (θ) = ∇∂i ∂j , ∂k .
Suppose T(M) is the smooth vector field on Riemannian manifold M, the
affine connection on M is a covariant derivative ∇ : T(M)×T(M) → T(M),
which satisfies the following condition: for vector fields A, B ∈ T(M), the
covariant derivative of B in the direction of A is a vector field C. A curve ρ(t)
satisfying ∇ρ ρ = 0 is called geodesic curve. Given a family of probability
distributions, S = log P(x; θ), Amari (1985, 1987, 1995, 1998) defined
March 23, 2010
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Mathematical Foundations of Artificial Neural Networks
33
Ŵki,j (θ) as
Ŵαi,jk (θ) = E[(∂i ∂j l(x; θ) + ∂i l(x, θ)∂j l(x; θ)(1 − α)/2)∂k l(x; θ)].
If α = 1, the curve is an e-geodesic curve and if α = −1, the curve is an
m-geodesic curve.
7.3. Submanifold
Embeded manifold. For a smooth mapping f : M → N, where M and
N are smooth manifolds, such that f is a monomorphism and f : Tx M →
Tf(x) N is a monomorphism, ∀x ∈ M, (M, f) is an embedded manifold of
N, and f is the embedded mapping.
Closed submanifold. (M, f) is a closed submanifold of manifold N, if
(1) f(M) is a closed subset of N;
(2) ∀y ∈ f(M), there is a chart (U, ui ) of N, such that y ∈ U; f(M) ∩ U
is defined by um+1 = um+2 = · · · = un = 0, where m = dim M,
n = dim N. If (M, f) is a submanifold of the manifold N, and f :
M → f(M) is a homeomorphic mapping when f(M) is a topological
subspace of N, then (M, f) is called a regular submanifold of N.
7.4. Dual flat manifold
The differential manifold–based neural computation treats the invariant
geometrical and topological structures on manifold that are composed of
neural networks (Amari, 1985, 1987). A structure of differential geometry,
dual flat manifold, has been used in neural networks (Amari, 1995, 1998;
Luo, 2004).
Riemannian connection. Riemannian connection is an affine connection
with invariant Riemannian measure, which is featured by the following
components:
Ŵ0i,jk (θ) = (∂i gjk + ∂j gik − ∂k gij )/2.
Dual flat manifold. The tangent vector space Tθ is composed of base
tangent vectors ei = ∂i l(x; θ), i = 1, 2, . . . , n; Riemannian measure on
manifold N is represented by gij (θ) = (ei , ej ) = E[∂i l(x; θ)∂j l(x; θ)],
March 23, 2010
34
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Computational Ecology
i, j = 1, 2, . . . , n. Covariant derivatives ∇ and ∇ ∗ , i.e., dual connections
on manifold N, are dual if the equality holds:
ST, V = ∇Sα T, V + T, ∇S−α V ,
where S, T , and V are arbitrary vector fields on N. If the Riemannian torsion
of N concerning ∇ and ∇ ∗ is 0, N is defined as dual flat manifold (Amari,
1985, 1987, 1995, 1998).
If N is a dual flat manifold, there are affine coordinate systems, θ and
η, and potential energy functions, φ(θ) and ϕ(η), such that
gij (θ) = ∂2 φ(θ)/(∂θi ∂j ),
θi = ∂ϕ(η)/∂ηi ,
gij∗ (η) = ∂2 ϕ(η)/(∂ηi ηj ),
ηi = ∂φ(θ)/∂θi .
Base vectors, ei and e∗i , are dual and
j
ei , e∗j = ∂/∂θi , ∂/∂ηj = δi
A statistical model is a family of probability distributions. A probability
space, usually represented by a set of parameters, i.e., parameter space, is
a manifold if a topological structure is defined. This is a dual flat manifold
(Amari, 1985,1998; Luo, 2004).
A parameter set θ = (θ 1 , θ 2 , . . . , θ n ) of probability distribution of a
random variable x, generates an n-dimensional manifold N = {p(x; θ)}.
Take l(x, θ) = log p(x; θ), then the tangent vector space is Tθα = {T(x)},
where T(x) = {T i ∂i lα (x; θ)}, and
lα (x; θ) = l(x; θ),
if α = 1,
lα (x; θ) = 2p(x; θ)2/(1−α) /(1 − α),
if α = 1.
The inner product may be defined for S, T, V ∈ Tθα :
S, T = Eα (Slα Tlα ),
∇Sα T, V
= Eα (STl α Vl α ).
7.5. Manifold of exponential family
If a neural network with a fixed topological structure can be recognized
by the probability model of exponential family with the parameter set θ =
March 23, 2010
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Mathematical Foundations of Artificial Neural Networks
35
(θ1 , θ2 , . . . , θn ), i.e., the structure of a neural network can be expressed as
the probability distribution of exponential family of the random variable, x
p(x; u) = exp(θi (u)ri (x) + k(x) − ϕ(θ(u))),
where ϕ(θ) is the potential energy function; the neural network model is
called a neural network of manifold of exponential family. The probability
distribution family, S, is called an n-dimensional manifold of distribution
of exponential family. S is a dual flat manifold. When r = (r1 , r2 , . . . , rn )
is a random real variable, the probability distribution is
p(r; θ) = exp(θi ri + k(r) − ϕ(θ)).
By these equalities (Amari, 1985, 1987, 1995, 1998): ∂i l(r; θ) = ri −
∂i ϕ(θ), ∂i ∂j l(r; θ) = ∂i ∂j ϕ(θ), and ∂i ∂j ∂k l(r; θ) = ∂i ∂j ∂k ϕ(θ), the metrics
of manifold S can be achieved:
Eθ (ri ) = ∂i ϕ(θ),
gij (θ) = ∂i ∂j ϕ(θ),
Tijk (θ) = ∂i ∂j ∂k l(r; θ),
Ŵαi,jk (θ) = (1 − α)Tijk (θ)/2.
Neural networks of manifold of exponential family include Boltzmann
machine with hidden tiers, etc. Most of neural networks can be described
by probability distributions and manifolds of exponential family (Amari,
1985, 1987, 1995, 1998; Luo, 2004).
7.6. Topological structure of manifold
According to the topological properties of input–output relationships, we
may define a representation of topological structure (Li, 2004; Chen, 1987).
Suppose both X and Y are simplicial complex, if there is a continuous
mapping H : X × I → Y , such that: X = H(x, 0), Y = H(y, 1), then X is
homotopic to Y , and H is a homotopic mapping.
Suppose X and Y are homotopic, the mapping f : X → Y , and g :
Y → X, such that: f ◦ g = Iy , g◦ f = Ix , then H(x, t) = tf ◦ g + (1 − t)g◦ f ,
is the homotopic mapping of X to Y .
March 23, 2010
36
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Computational Ecology
Suppose Cp (K) is an r-dimensional simplicial complex, and ∂p :
Cp (K) → Cp−1 (K) is a homotopic transformation, ∂p is called the boundary operator (i.e., boundary resolution). A structure of simplicial complex
can be chain-decomposed to achieve the hierarchical structure of chain
and tree. For example, an m-dimensional simplicial complex Xm , can be
decomposed as the following (Li, 2004)
∂m
Tree : Xm −→
∂m
∂m−1
Xm−1 −−−→
∂m−1
∂m−2
∂1
→
Xm−2 −−−→ · · · −
∂m−2
···
X0 ;
∂1
→ X0 .
Chain : Xm −→ Xm−1 −−−→ Xm−2 −−−→ · · · −
8.
Functional Analysis
8.1. Functional representation
Technically, the functional relationship, s = f(r), where s = y(t), r = x(t),
is called an operator. The functional relationship, y(t) = f(t, x(τ)|τ ≤ t),
is generally called a functional (Yan and Zhang, 2000). Usually a linear
functional can be represented by
f(t, x(τ)|τ ≤ t) = g(t, τ)x(τ)dτ.
A common nonlinear functional is the n-th order regular homogeneous
functional
f(t, x(τ)|τ ≤ t) =
· · · g(t, τ1 , τ2 , . . . , τn )
× x(τ1 )x(τ2 ) · · · x(τn )dτ1 dτ2 · · · dτn .
A functional, f(t, x(τ)|τ ≤ t), can be expanded as a Volterra series:
f(t, x(τ)|τ ≤ t) =
· · · g(t, τ1 , τ2 , . . ., τn )x(τ1 )x(τ1 ) · · ·
× x(τn )dτ1 dτ2 · · · dτn = g0 (t) + g(t, τ)x(τ)dτ
+
g(t, τ1 , τ2 )x(τ1 )x(τ2 )dτ1 dτ2
March 23, 2010
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Mathematical Foundations of Artificial Neural Networks
+··· +
···
37
g(t, τ1 , τ2 , . . ., τn )x(τ1 )x(τ1 ) · · ·
× x(τn )dτ1 dτ2 · · · dτn + · · · ,
which is equivalent to the three-tier feedforward artificial neural network
(Yan and Zhang, 2000).
8.2. Functional analysis
Some methods of functional analysis may be used in the mathematical
analysis of ANNs. Some principles and methods of functional analysis are
discussed in the following (Rudin, 1991; Liu, 2000; Men and Feng, 2005;
Zhang, 2007).
Deflation Principle 1. Suppose (X, d) is a complete metric space, a mapping T : X → X, such that
d(Tx, Ty) ≤ θd(x, y),
∀x, y ∈ X,
where θ ∈ (0, 1), then there is only one fixed point x′ ∈ X, such that
Tx ′ = x′ .
Deflation Principle 2. suppose (X, d) is a complete metric space and T :
X → X, is a mapping, if there is a natural number n0 , such that
d(T n0 x, T n0 y) ≤ θd(x, y),
∀x, y ∈ X,
where θ ∈ [0, 1), then there is only one fixed point x′ ∈ X, such that
Tx ′ = x′ .
According to Deflation Principle 1, take x0 ∈ X, and conduct iterative
calculation xn+1 = Tx n , if {xn } is sequential convergent, then the limit of
{xn } is the fixed point x′ ; moreover, the error estimation to approximate x′
with xn is
d(xn , x′ ) ≤ θ n d(x0 , Tx 0 )/(1 − θ).
The nearer x0 approaches Tx 0 , the smaller the error is.
As an example, consider a differential equation
dx/dt = f(x, t),
x|t=t0 = x0 ,
March 23, 2010
38
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Computational Ecology
where f(x, t) is continuous on R2 , and satisfies the condition
|f(x1 , t) − f(x2 , t)| ≤ K|x2 − x1 |
with respect to x. To show that the above problem has a unique solution in
the neighborhood of t0 :
Take δ > 0, such that Kδ < 1. Define an operator on C[t0 − δ, t0 + δ]
t
f(x(τ), τ)dτ + x0 ,
Tx(t) =
t0
then T is the mapping of R2 to itself, and
t
(f(x1 (τ), τ) − f(x2 (τ), τ))dτ
d(Tx 1 , Tx 2 ) = max
|t−t0|≤δ
≤ max
|t−t0|≤δ
t0
t
K|x2 (τ) − x1 (τ)|dτ
t0
≤ Kδ max |x2 (τ) − x1 (τ)|
|t−t0|≤δ
= Kδd(x1 , x2 ).
The space C[t0 − δ, t0 + δ] is complete, and 0 ≤ Kδ < 1. The
existence and uniqueness of solution is thus shown by the Deflation
Principle.
Suppose X and Z are normed spaces, S(T) is the linear subspace of X.
If the mapping T :S(T) → Z, satisfies the following conditions:
T(x + y) = Tx + Ty, T(αx) = αTx,
∀x, y ∈ S(T), α ∈ K,
then T is said to be a linear operator from inside Xto inside Z. S(T) is the
domain of definition of T , then T is said to be a linear operator on X into
Z, if S(T) = X. f is called a linear functional if it is a linear operator on
normed space X into a number field K. For the linear operator T on normed
space X into normed space Z, ∃M ∈ K, such that Tx ≤ M x , ∀x ∈ X,
T is called a bounded linear operator. Then T is continuous if and only if
T is bounded.
March 23, 2010
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Mathematical Foundations of Artificial Neural Networks
39
Hahn–Banach Theorem on Real Space. Suppose M is the linear subspace
of linear real space X, g : X → R, and
g(x + y) ≤ g(x) + g(y),
g(αx) = αg(x),
∀x, y ∈ X, α ≥ 0,
Moreover, f is a linear functional on M and
f(x) ≤ g(x),
x ∈ M,
then there is a linear functional p(x) on X, such that
p(x) = f(x),
x ∈ M,
−g(−x) ≤ p(x) ≤ g(x),
x ∈ X.
Riesz Theorem 1. Suppose f is the bounded linear functional on C[a, b],
then there is a function of bounded variation, v(t), on [a, b], such that
b
x(t)dv(t), x ∈ C[a, b],
f(x) =
a
f = V(v),
where V(v) is the total variation of v(t) on [a, b].
Moreover, based on any function of bounded variation, v(t) on [a, b],
we may define a bounded linear functional on C[a,b] through the above
expression.
Riesz Theorem 2. Suppose H is a Hilbert space, f is an arbitrary bounded
linear functional on H, then there is only one yf ∈ H, such that
f(x) = (x, yf ),
∀x ∈ H,
f = yf .
Hilbert–Schmidt Theorem. Suppose T is a self-conjugate compact operator on Hilbert space H, then there is an orthonormal system {en }, which
is composed of eigenvectors corresponding to eigenvalues {λn }, λn = 0,
such that
x=
αn en + x0 , Tx 0 = 0,
Tx =
λn αn en .
March 23, 2010
40
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Computational Ecology
If {en } is infinite, then
lim λn = 0.
n→∞
The decomposition of eigenspace is an important topic in signal processing.
9. Algebraic Topology
Homotopy method (Lin, 1998) has been used to solve the local minimum
problem in BP algorithm.
Using homotopy method to find the zero point of nonlinear function
f(x), we need to get a related and simpler function g(x). Firstly, the zero
point of g(x) is obtained, and gradually via transition the zero point of f(x)
is obtained. A homotopy function is constructed as follows:
H(t, x) = (1 − t)g(x) + tf(x),
where t is the parameter variable. During the training twill gradually change
from 0 to 1, and
H(0, x) = g(x), if t = 0; zero point is easy to obtain;
H(1, x) = f(x), if t = 1; zero point is what we need.
The trajectory x0 (t) of zero point of H(t, x) is traced when t changes from
0 to 1, and the solution transits from x0 (0) to x0 (1) — the zero point is thus
obtained.
10.
Motion Stability
Given the motion equation of a system
dx/dt = f(x, t),
where
x(t) = (x1 (t), x2 (t), . . . , xn (t))T ,
f(x, t) = (f1 (x, t), f2 (x, t), . . . , fn (x, t))T ,
March 23, 2010
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Mathematical Foundations of Artificial Neural Networks
41
the system can be categorized by the form of f(x, t): if f(x, t) = f(x, t),
this is a nonlinear and nonstationary system; if f(x, t) = f(x), a nonlinear
stationary system; if f(x, t) = Ax, a linear stationary system; if f(x, t) =
A(t)x, a nonstationary linear system.
10.1.
Motion stability and discrimination
Suppose x ∈ ⊂ Rn , t ∈ I = (t1 , t2 ), the domain of definition of f(x, t)
is × I and f(x, t) is continuous on × I. If there is a constant, K, such
that
|f(x, t) − f(y, t)| ≤ K|x − y|,
∀x, y ∈ , t ∈ I,
then f(x, t) is said to satisfy Lipschitz condition on × I. f(x, t) satisfies
Lipschitz condition, if ∂fi (x, t)/∂xi , i = 1, 2, . . . , n, are finite. Given a
system, dx/dt = f(x, t), if f(x, t) is continuous and satisfies Lipschitz
condition on × I, then there is a constant c > 0, and an unique solution
x = x(t) on [t0 −c, t0 +c] for any (x0 , t0 ) ∈ ×I, where x(t) is continuous
and x(t0 ) = x0 .
Suppose z(t), y(t), and x(t) are the given motion, perturbation, and
observed motion, respectively, i.e., z(t) = x(t) − y(t). It is obvious that
dy/dt = f(y(t) + z(t), t) − f(z(t), t) = g(y, t).
Given the perturbation equation of a given system, dz/dt = f(z, t),
dy/dt = g(y, t),
g(0, t) = 0.
Inside a B-neighborhood of y(t) = 0, i.e.,
{(y1 (t), y2 (t), . . . , yn (t))|yi (t) < B,
i = 1, 2, . . . , n},
given a real value 0 < ε < B, if there is a real value, δ = δ(ε, t0 ), such that
the perturbed motion, yi (t), satisfies the following:
|yi (t)| < ε,
i = 1, 2, . . . , n;
∀t ≥ t0 ,
when |yi (0)| ≤ δ, i = 1, 2, . . . , n; the given motion, z(t), is said to be
stable. If the given motion, z(t), is stable and
lim yi (t) = 0,
t→∞
i = 1, 2, . . . , n,
March 23, 2010
42
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Computational Ecology
it is asymptotic stable. If there is a real value ε > 0, such that a value,
δ, that satisfies the stability condition, is not existent, the given motion is
unstable.
If x(t) = xe = (x1e , x2e , . . . , xne )T , or f(x, t) = 0, the system is in an
equilibrium state, and xe is the equilibrium state. For any real value ε > 0,
if there is a value, δ, such that
|xi (t) − xie | < ε,
i = 1, 2, . . . , n;
∀t ≥ t0 ,
when |xi (0) − xie | ≤ δ, i = 1, 2, . . . , n; xe is stable. If xe is stable and
lim xi (t) = xie ,
t→∞
i = 1, 2, . . . , n,
the state xe is asymptotic stable. If there is a real value ε > 0, such that a
value, δ, that satisfies the stability condition, is not existent, the state xe is
unstable (Yan and Zhang, 2000).
If the nonlinear function, f(x, t), is considerably smooth, the system
can be linearized around the equilibrium state, xe ,
dx/dt = A(t)x(t),
by taking
x(t) = xe + x(t),
f(x, t) = xe + A(t)x(t),
where A(t) = ∂f(x, t)/∂x|x=xe . Suppose A(t) = A, and |A−1 | = 0: if
the eigenvalues of A are negative real values, xe is a stable node; if the
eigenvalues are conjugate complex numbers with negative real parts, xe is
a stable focal point.
Given a real function V(x) defined in a neighborhood, , of the origin.
V(0) = 0. V is single valued on , and is continuously derivative with
respect to xi , i = 1, 2, . . . , n. V(x) is positive definite (negative definite),
if for any x ∈ , in exception of x = 0, V(x) > 0 (V(x) < 0); V(x) is
half positive definite (half negative definite), if for any x ∈ , V(x) ≥ 0
(V(x) ≤ 0); V(x) is sign changing, if V(x) will be positive, zero, or negative
values for different x ∈ .
March 23, 2010
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Mathematical Foundations of Artificial Neural Networks
43
Based on the definition of V(x), the Liapunov theorem, which is used
to discriminate the stability of nonlinear systems, is as follows
(1) the system is stable on the origin, if there is a positive (negative) definite
function V(x) in the neighborhood, , of the origin, and the derivative
of V(x) is half negative (positive) definite;
(2) the system is asymptotic stable on the origin, if there is a positive
(negative) definite function V(x) in the neighborhood, , of the origin,
and the derivative of V(x) is negative (positive) definite;
(3) the system is unstable on the origin, if there is a function V(x) in the
neighborhood, , of the origin, and the derivative of V(x) is positive (negative) definite, but V(x) itself is not half negative (positive)
definite.
10.2.
Stability of feedback networks
Feedback neural networks are nonlinear dynamic systems and therefore
the stability of a system is an important topic (Yan and Zhang, 2000). The
stability of neural networks can be analyzed in two ways. One way is to treat
a neural network as a deterministic system and use a group of nonlinear
differential equations to describe it; another way is to treat it as a stochastic
system and use a group of nonlinear stochastic differential equations to
describe it.
A feedback neural network may correspond to a model of continuous time–continuous state, discrete time–discrete state, discrete time–
continuous state, and continuous time–discrete state.
Given a completely connected feedback neural network that contains
n neurons, in which any neuron is connected to remaining n − 1 neurons,
the weight between neuron i to neuron j is wij , The output of i-th neuron
at time t is xi (t) = 1 or −1, i = 1, 2, . . . , n. The behavior of feedback
neural network can be considered to be a process of state transition, i.e., a
dynamic process (Luo, 2004), which is represented by
dx/dt = x(t) + f(wx).
The stability of the system can be discriminated by using Liapunov
theorem.
March 23, 2010
44
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Computational Ecology
11.
Entropy of a System
The probability model of a dynamic system can be developed based on
the principle of maximum entropy and the principle of minimum relative
information.
11.1.
Principle of maximum entropy
If the prior distribution of micro-states is not given, the principle of maximum entropy will be expressed as the following optimum problem
max H = −
n
pi log pi ,
i=1
n
pi f(xi ) = f(xi ),
i=1
n
pi = 1,
i=1
where x(t) = (x1 (t), x2 (t), . . . , xn (t))T ; f(x) = (f(x1 ), f(x2 ), . . . ,
f(xn ))T ≥ 0, is the state function, which is an analogue of energy function;
f(x) is the mean of f(x), and pi is the occurrence probability of micro-state
xi . Using Lagrange multiplier method, the maximum entropy is achieved as
the following: Hmax = µ + µf(x), where µ ≥ 0, is the Lagrange parameter
(1/µ is the analogue of temperature in thermodynamics), and
L(µ) =
n
exp(−µf(xi )),
i=1
µ = log L(µ),
pi = exp(−µf(xi ))/L(µ),
11.2.
i = 1, 2, . . . , n.
Principle of minimum relative information
If the prior distribution of micro-states, (p01 , p02 , . . . , p0n ), is given, the principle of minimum relative information will be represented by
min I =
n
i=1
pi log(pi /p0i ),
March 23, 2010
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Mathematical Foundations of Artificial Neural Networks
n
45
pi f(xi ) = f(xi ),
i=1
n
pi = 1.
i=1
The principle of minimum relative information minimizes the difference
between {pi } and {p0i }. This principle is the generalization of principle of
maximum entropy. The former is applicable to both continuous and discrete systems. For a continuous system, the principle of minimum relative
information is represented by
min I =
p(x) log(p(x)/p0 (x))dx,
p(x)f(x)dx = f(x),
p(x)dx = 1.
The solution to this problem is
L(µ) =
p0 (x) exp(−µf(x))dx,
p(x) = p0 (x) exp(−µf(x))/L(µ),
Imin = −µf(x) − log L(µ).
11.3.
Principle of minimum mean energy
The principle of minimum mean energy describes the convergence degree
of system to its limit state, given a certain degree of disorder. The principle
of minimum mean energy is represented by
min f(x) =
n
i=1
pi f(xi ),
March 23, 2010
46
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Computational Ecology
−
n
pi log pi = E,
i=1
n
pi = 1.
i=1
The solution is
L(µ) =
n
exp(−µf(xi )),
i=1
pi = exp(−µf(xi ))/L(µ),
11.4.
i = 1, 2, . . . , n.
Probability distribution with maximum entropy
The entropy of exponential distribution
p(x) = λ exp(−λx),
p(x) = 0,
x > 0, λ > 0;
x≤0
is H = log λ − 1. Among all of the density functions with the same mean
the exponential distribution has the maximum entropy.
The entropy of normal distribution is
∞
H =−
p0 (x) log p(x)dx,
−∞
where p0 (x) is an arbitrary density function having the same mean and variance with the normal distribution, p(x). Among all of the density functions
with the same mean and variance the normal distribution has the maximum
entropy.
12.
Distance or Similarity Measures
Measures of distance and similary are frequently used in the design of
neural network models. There are many measures of distance and similary
(Zhang and Fang, 1982; Zhang, 2007).
Given two n-dimensional vectors, x = (x1 , x2 , . . . , xn ), y =
(y1 , y2 , . . . , yn ), the mathematical representations of some measures are
described bellow.
March 23, 2010
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Mathematical Foundations of Artificial Neural Networks
12.1.
47
Similarity measures
(1) Angular cosine:
s = xyT /(xx T yyT )1/2
(2) Alliance coefficient:
s = (r 2 /(r 2 + n.. ))1/2
where there are pdiscrete values, t1 , t2 , . . . , tp , for x, and q discrete
values, r1 , r2 , . . . , rq , for y. nkl is the number of elements of which x
has tk and y has rl , k = 1, 2, . . . , p; l = 1, 2, . . . , q, and
q
p
n2ij /(ni. n.j ) − 1
r 2 = n..
i=1 j=1
n.. =
p
ni..
q
nij
p
nij
i=1
ni.. =
j=1
n.j. =
i=1
(3) Linkage coefficient 1:
s = (r 2 /(n.. max(p − 1, q − 1)))1/2
(4) Linkage coefficient 2:
s = (r 2 /(n.. min(p − 1, q − 1)))1/2
(5) Linkage coefficient 3:
s = (r 2 /(n.. ((p − 1)(q − 1))1/2 ))1/2
(6) Point correlation coefficient:
s = (ad − bc)/((a + b)(c + d)(a + c)(b + d))1/2
March 23, 2010
48
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Computational Ecology
where a is the number of element pairs that both x and y are zero; d
is the number of element pairs that both x and y are non-zero; b is the
number of element pairs that x is zero and y is non-zero; c is the number
of element pairs that x is non-zero and y is zero.
(7) Quarter correlation coefficient:
s = sin((a + d − (b + c))/(a + b + c + d) ∗ 3.1415926/2)
(8) Angular cosine variant 1:
s = (a ∗ a/((a + b)(a + c)))1/2
(9) Angular cosine variant 2:
s = (a ∗ a ∗ d ∗ d/((a + b)(a + c)(b + d)(c + d)))1/2
12.2.
Distance measures
(1) Euclidean distance:
d = ((x − y)(x − y)T )1/2 /n
(2) Manhattan distance:
d=
n
|xk − yk |/n
k=1
(3) Chebyshov distance:
d = max |xk − yk |
(4) Jaccard coefficient:
d = (bx + by )/(cx + cy − a)
where bx is the number of element pairs that x is non-zero and y is zero;
by is the number of element pairs that y is non-zero and x is zero; cx
and cy are the number of non-zero elements of x and y, respectively; a
is the number of element pairs that both x and y are non-zero.
March 23, 2010
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Mathematical Foundations of Artificial Neural Networks
49
The Matlab codes for measures above are as follows
Euclidean distance:
function distance=euclideandis(x,y)
%x and y: two vectors to
be tested.
%label1
if (max(size(x))˜=max(size(y)))
error(’Array sizes do not match.’);
end
if ((min(size(x))˜=1) | (min(size(y))˜=1))
error(’Both x and y are vectors’);
end
%label2
distance=sqrt(sum((x-y).ˆ2))/max(size(x));
Manhattan distance:
function distance=manhattandis(x,y)
%x and y: two vectors to
be tested.
%insert here the contents between label1 to lable2 in the codes of
%Euclidean distance
distance=sum(abs(x-y))/max(size(x));
Chebyshov distance:
function distance=chebyshovdis(x,y)
%x and y: two vectors to
be tested.
%insert here the contents between label1 to lable2 in the codes
of Euclidean distance
distance=max(abs(x-y));
Jaccard coefficient:
function distance=jaccarddis(x,y)
%x and y: two vectors to be tested.
%insert here the contents between label1 to lable2 in the codes
of Euclidean distance
bb=0; cc=0; dd=0; nn1=0;rr1=0;
for kk=1:max(size(x))
if (x(kk)˜=0) nn1=nn1+1; end
if (y(kk)˜=0) rr1=rr1+1; end
if ((x(kk)==0) & (y(kk)˜=0)) bb=bb+1; end
if ((x(kk)˜=0) & (y(kk)==0)) cc=cc+1; end
if ((x(kk)˜=0) & (y(kk)˜=0)) dd=dd+1; end
end
distance=(cc+bb)/(nn1+rr1-dd);
Angular cosine:
function similarity=angularcosinesim(x,y)
%x and y: two vectors to be tested.
%insert here the contents between label1 to lable2 in the codes
March 23, 2010
50
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Computational Ecology
of Euclidean distance
aa=sum(x.*y,2);
bb=sum(x.ˆ2,2);
cc=sum(y.ˆ2,2);
similarity=aa./sqrt(bb.*cc);
Alliance coefficient
function similarity=linkagesim(x,y)
%insert here the contents between label1 to lable2 in the codes
of Euclidean distance
pn=1; qn=1;
%label3
pp(1)=y(1); ww(1)=x(1);
for kk=1:max(size(x))
jj=0;
for ii=1:pn if (y(kk)˜=pp(ii)) jj=jj+1; end; end
if (jj==pn) pn=pn+1; pp(pn)=y(kk); end
jj=0;
for ii=1:qn if (x(kk)˜=ww(ii)) jj=jj+1; end; end
if (jj==qn) qn=qn+1; ww(qn)=x(kk); end
end
for kk=1:pn
for jj=1:qn
temp(kk,jj)=0;
for ii=1:
max(size(x)) if ((y(ii)˜=pp(kk))&(x(ii)˜=ww(jj)))
temp(kk,jj)=temp(kk,jj)+1; end; end
end
end
summ=0;
for kk=1:pn
pp(kk)=0;
for jj=1:qn pp(kk)=pp(kk)+temp(kk,jj); end
summ=summ+pp(kk);
end
for kk=1:qn
ww(kk)=0;
for jj=1:pn˜ ww(kk)=ww(kk)+temp(jj,kk); end;
end
xsquare=0;
for kk=1:pn
for jj=1:qn
xsquare=xsquare+temp(kk,jj)*temp(kk,jj)/(pp(kk)*ww(jj)); end;
end
xsquare=summ*(xsquare-1);
%label4
similarity=sqrt(xsquare/(xsquare+summ));
Linkage coefficient 1:
function similarity=colinkage1sim(x,y)
%insert here the contents between label1 to lable2 in the codes
March 23, 2010
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Mathematical Foundations of Artificial Neural Networks
51
of Euclidean distance
%insert here the contents between label3 to lable4 in the codes
of alliance coefficient
similarity =sqrt(xsquare/(summ*max(pn-1,qn-1)));
Linkage coefficient 2:
function similarity=colinkage2sim(x,y)
%insert here the contents between label1 to lable2 in the codes
of Euclidean distance
%insert here the contents between label3 to lable4 in the codes
of alliance coefficient
similarity=sqrt(xsquare/(summ*min(pn-1,qn-1)));
Linkage coefficient 3:
function similarity = colinkage3sim(x,y)
%insert here the contents between label1 to lable2 in the codes
of Euclidean distance
%insert here the contents between label3 to lable4 in the codes
of alliance coefficient
similarity=sqrt(xsquare/(summ*sqrt((pn-1)*(qn-1))));
Point correlation coefficient:
function similarity=pointcorresim(x,y)
%insert here the contents between label1 to lable2 in the codes
of Euclidean distance
aa=0; bb=0; cc=0;
%label5
dd=0;
%label6
for kk=1:max(size(x))
if ((x(kk)==0)&(y(kk)==0)) aa=aa+1; end
if ((x(kk)==0)&(y(kk)˜=0)) bb=bb+1; end
if ((x(kk)˜=0)&(y(kk)==0)) cc=cc+1; end
if ((x(kk)˜=0)&(y(kk)˜=0)) dd=dd+1; end
%label7
end
%label8
similarity =(aa*dd-bb*cc)/sqrt((aa+bb)*(cc+dd)*(aa+cc)*(bb+dd));
Quarter correlation coefficient:
function similarity=quadraticcorresim(x,y)
%insert here the contents between label1 to lable2 in the codes
of Euclidean distance
%insert here the contents between label5 to lable8 in the codes
of point correlation coefficient
similarity =sin((aa+dd-(bb+cc))/(aa+bb+cc+dd)*3.1415926/2);
March 23, 2010
52
20:1
9in x 6in
B-922
b922-ch10
1st Reading
Computational Ecology
Angular cosine variant 1:
function similarity = angularcosine1sim(x,y)
%insert here the contents between label1 to lable2 in the codes
of Euclidean distance
%insert here the contents between label5 to lable8 in the codes
of point correlation coefficient,
%without codes of label7 and label7
similarity = sqrt(aa*aa/((aa+bb)*(aa+cc)));
Angular cosine variant 2:
function similarity=angularcosine2sim(x,y)
%insert here the contents between label1 to lable2 in the codes
of Euclidean distance
%insert here the contents between label5 to lable8 in the codes
of point correlation coefficient
similarity = sqrt(aa*aa*dd*dd/((aa+bb)*(aa+cc)*(bb+dd)*(cc+dd)));
March 22, 2010
18:24
9in x 6in
B-922
b922-ch11
1st Reading
CHAPTER 11
Matlab Neural Network Toolkit
Matlab is one of the most popular softwares for scientific and engineering
computation in the world. A neural network tool box (Neural Network
Toolbox) is provided in Matlab (Mathworks, 2002). As Matlab is updated
with new versions, more neural network models are supplemented to the
neural network tool box. Basic network models in Matlab include perceptron model, linear networks, BP network, RBF network, self-organizing
networks, ELMAN network, feedback networks, etc. Matlab provides various learning algorithms. Moreover, it permits the users to independently
design their own neural network models. Some Matlab functions of neural networks are described in this chapter. More details can be found in
Mathworks (2002) and Fecit (2002).
1.
Functions of Perceptron
1.1. Neural network functions
(1) newp
Newp creates a perceptron used to make simple classification (Mathworks,
2002; Fecit, 2002).
Syntax:
net=newp;
net=newp(mr,s,tf,lf);
where mr: r × 2 matrix of minimum and maximum values of r input
elements; s: number of neurons; tf: transfer function, i.e., hardlims,
1
March 22, 2010
18:24
9in x 6in
B-922
b922-ch11
1st Reading
2 Computational Ecology
or hardlim (default); lf: learning function, i.e., learnpn, or learnp
(default).
Default functions used in newp are adaptFcn (adapt function):
trains; gradientFcn (gradient function): calcgrad; initFcn (initialization function): initlay; performFcn (performance function): mae;
trainFcn (training function): trainc.
Examples:
net=newp([-1 1; -5 5],2, ‘hardlims’, ‘learnpn’);
net=newp([-2 1; -3 9; -1 1],3);
(2) init
Init assigns initial parameter values for iteration.
Syntax:
net=init(net0);
net=init(net0,var,mea,sp);
where net0: original network; net: network with initial parameter values; var: variance of initial parameters (default: 1); mea: mean of initial
parameters (default: [], i.e., the parameter values of net0); sp: stability
requirement of predictor or system (sp=s (system is stable), p (predictor is
stable), or b (both system and predictor are stable); default: p, i.e., predictor
is stable).
(3) sim
Sim simulates a Simulink model with user’s parameter settings in
dialog box.
Syntax:
[tim,sta,out]=sim(net, tspn, opt, uptin);
[tim,sta,out1,out2,...,outn]=sim(net, tspn, opt, uptin);
where tim: time vector; sta: state (matrix or stuucture format); out:
output (matrix or structure format); out1,out2,...,outn: outputs of
n root-level outport blocks; net: a network model; tspn: time span (a
time, or time interval, or specified time points); opt: optional parameters;
optin: optional input.
March 22, 2010
18:24
9in x 6in
B-922
b922-ch11
1st Reading
Matlab Neural Network Toolkit
3
Example:
p=[1 2 3 4 5 6 7 8 9 10];
net=sim(net,p);
(4) train
This function is used to train a neural network according to
net.trainFcn and net.trainParam.
Syntax:
[net,tr,out,err,find,flad]=train(net0,in,tar,ind,lad,strv,strt);
where net: trained network; net0: initial network; tr: training epoch and
performance; out: outputs of network; err: errors of network; find: final
conditions of input delay; flad: final conditions of layer delay conditions;
in: inputs of network; tar: targets of network (default: zeros); ind: initial
conditions of input delay (default: zeros); lad: initial conditions of layer
delay (default: zeros); strv: structure of validation vectors (default: []);
strt: structure of test vectors (default: []).
Example:
p=[1 2 03 4 5 6 7 8 9 10];
t=[0.1 0.2 0.4 0.1 0.3 -0.2 0.5 -0.6 0.7 -0.4];
net.trainParam.epochs=1000;
net.trainParam.goal=0.001;
net=train(net,p,t);
1.2. Initilization functions
(1) initlay
Initlay initializes each layer i according to initialization function
net.layers{i}.initFcn.
Syntax:
net=initlay(net);
%returns a layer-updated network
inf=initlay(para);
%returns function information
Application:
net. layers{i}.initFcn=’initlay’;
%The weights and biases of layer i are initialized
March 22, 2010
18:24
9in x 6in
B-922
b922-ch11
1st Reading
4 Computational Ecology
(2) initwb
Layer initialization function initwb initializes a layer’s weights and
biases according to its own initialization functions. It returns the initialized network with some layer’s weights and biases updated.
Syntax:
net=initwb(net,i);
where i: layer index.
1.3. Input function
The input function netsum combines a layer’s weighted inputs and bias
to achieve a layer’s net input.
Syntax:
inp=netsum({x1,x2,...,xn},fp);
%takes x1 −xn and optional function parameters (fp)
and returns elementwise sum of x1 , ..., xn
deri=netsum(dx,i,x,y,fp);
%returns the derivative of y with respect to xi
%Default values are used if fp is not supplied
inf=netsum(para);
%returns function information
where xi: s × q matrices; fp: function parameters.
Example:
x1=[2 1 3; 5 9 6];
x2=[7 4 5; -1 3 4];
su=netsum({x1,x2});
1.4. Weight function
(1) dotprod
The weight function dotprod yields dot product weights and applies
weights to an input to get weighted inputs.
Syntax:
dp=dotprod(w,p,fp);
%returns the s×q dot product of w and p. There
are s layers and q input vectors
dim=dotprod(size,s,r,fp);
March 22, 2010
18:24
9in x 6in
B-922
b922-ch11
1st Reading
Matlab Neural Network Toolkit
5
%takes the layer dimension s, input dimention r, and
function parameters, and returns the
%weight size s×r.
dp=dotprod(dp,w,p,x,fp);
%returns the derivative of x with respect to p
dw=dotprod(dw,w,p,x,fp);
%returns the derivative of x with respect to w
inf=dotprod(para);
%returns function information
where w: s × r weight matrix; p: r × q weight matrix of q input vectors; fp:
row cell array of function parameters (optional); s: dimension of layer; r:
dimension of input.
Applications:
net.inputWeight{i,j}.weightFcn=’dotprod’;
%change to dotprod for input weight use
net.inputWeight{i,j}.weightFcn=’dotprod’;
%change to dotprod for a layer weight use
(2) normprod
Normprod is the normalized dotprod.
Syntax:
dp=normprod(w,p,fp);
1.5. Transfer functions
(1) hardlim
Hard limit transfer function hardlim calculates a layer’s output from the
net input. It takes the form: y = hardlim(x) = 0, if x < 0; y = 1, if x ≥ 0.
Syntax:
out=hardlim(in);
%returns 1 if in is non-negative, and 0 if in is
%negative
inf=hardlim(para);
%returns function information
where in is a s × q matrix of s-dimensional net input vectors.
Application and example:
net.layers{i}.transferFcn = ’hardlim’;
%Assignthis transfer function to layer i of network
a=-3:0.1:8;
b=hardlim(a);
March 22, 2010
18:24
9in x 6in
B-922
b922-ch11
1st Reading
6 Computational Ecology
(2) hardlims
Hardlims calculates a layer’s output from the net input. It takes the form:
y = hardlim(x) = −1, if x < 0; y = 1, if x ≥ 0.
Syntax:
out=hardlims(in);
%returns 1 if in is non-negative and -1 if in is
%negative
inf=hardlims(para);
%returns function information
where in: s × q matrix of s-dimensional net input vectors.
Example:
a=-3:0.1:8;
b=hardlim(a);
1.6. Learning functions
(1) learnp
Learnp is a perceptron weight/bias learning function.
Syntax:
[dw,nls]=learnp(w,in,win,nin,out,ltv,lev,gp,ogp,nd,lp,ls);
where dw: s × r weight/bias change matrix; nls: new learning state; w:
s × r weight matrix; in: r × q input vectors; win: s × q weighted input
vectors; nin: s × q net input vectors; out: s × q output vectors; ltv:
s × q layer target vectors; lev: s × q layer error vectors; gp: s × r gradient
with respect to performance; ogp: s × q output gradient with respect to
performance; nd: s × s neuron distances; lp: learning parameters (default:
[]); ls: learning state (default: []).
Applications:
net.inputWeights{i,j}.learnFcn=‘learnp’;
net.layerWeights{i,j}.learnFcn=‘learnp’;
net.biases{i}.learnFcn=‘learnp’;
(2) learnpn
It is a normalized perceptron weight/bias learning function, which performs
faster learning than learnp when input vectors have widely varying
magnitudes (Mathworks, 2002).
March 22, 2010
18:24
9in x 6in
B-922
b922-ch11
1st Reading
Matlab Neural Network Toolkit
7
1.7. Performance functions
(1) mae
Performance function mae is a mean absolute error function.
Syntax:
per=mae(er,x,pp);
%returns mean absolute error of network
per=mae(er,net,pp);
inf=mae(para);
%returns function information
where er: matrix or cell array of error vectors; x: vector of all weight
and bias values, which can be obtained from a network; pp: performance
parameters.
Application:
net.performFcn=‘mae’;
%set performance function of network
(2) mse
Mse calculates mean squared error of network output.
Syntax:
per=mse(er,out,webi,fp)
deri=mse(‘dy’,er,out,webi,per,fp);
%returns derivative of per with respect to out
deri=mse(‘dx’,er,out,webi,perf,fp);
%returns derivative of per with respect to webi
where er: matrix or cell array of error vectors; out: matrix or cell array
of output vectors; webi: vector of all weight and bias values; fp: function
parameters.
2.
Functions of Linear Neural Networks
2.1. Neural network functions
(1) newlind
Linear neural network function newlind is used to design a linear layer
that yields an output given an input with minimum sum square error.
March 22, 2010
18:24
9in x 6in
B-922
b922-ch11
1st Reading
8 Computational Ecology
Syntax:
net=newlind(in,tcla);
net=newlind(in,tcla,ind);
where in: r × q matrix with q input vectors; tcla: s × q matrix with q
target class vectors; ind: initial input delay states.
Example:
in={2 3 5 2 1 6};
ini={7 2};
tcla={1 1 2 1 1 2};
net=newlind(in,tcla,ini);
(2) newlin
Newlin generates a linear layer that is generally used as adaptive filters
for signal processing and prediction (Mathworks, 2002).
Syntax:
net=newlin(in,noel,ind,lr);
net=newlin(in,noel,0,inp);
%returns a linear layer with the maximum stable
%learning rate for input inp
where in: r × 2 matrix of minimum and maximum values for r input
elements; noel: size of output vector; ind: input delay vector (default:
[0]); lr: learning rate (default: 0.01); inp: matrix of input vectors.
Default functions used in newlin are adaptFcn: trains; gradientFcn: calcgrad; initFcn: initlay; performFcn: mse; trainFcn:
trainb.
Example:
net=newlin([-2 2],2,[0 1],0.001);
in={1 2 -2 0 -1 -1 2 1 0 0};
out=sim(net,in);
2.2. Learning function
(1) learnwh
Widrow–Hoff function learnwh is a weight/bias learning function, known
as the least mean squared (LMS) rule (Mathworks, 2002).
Syntax:
[dw,nls]=learnp(w,in,win,nin,out,ltv,lev,gp,ogp,nd,lp,ls);
March 22, 2010
18:24
9in x 6in
B-922
b922-ch11
1st Reading
Matlab Neural Network Toolkit
9
where dw: s × r weight/bias change matrix; nls: new learning state; w:
s × r weight matrix; in: r × q input vectors; win: s × q weighted input
vectors; nin: s × q net input vectors; out: s × q output vectors; ltv:
s × q layer target vectors; lev: s × q layer error vectors; gp: s × r gradient
with respect to performance; ogp: s × q output gradient with respect to
performance; nd: s × s neuron distances; lp: learning parameters (default:
[]); ls: learning state (default: []).
Applications:
net.inputWeights{i,j}.learnFcn=‘learnwh’;
net.layerWeights{i,j}.learnFcn=‘learnwh’;
net.biases{i}.learnFcn=‘learnwh’;
2.3. Analysis function
The analysis function maxlinlr is used to calculate learning rates of
newlin (Mathworks, 2002).
Syntax:
lr=maxlinlr(inp);
%returns the maximum learning rate for a linear
%layer without a bias that is to be trained only
%on the vectors in inp.
lr=maxlinlr(in,‘bias’);
Example:
inp=[2 5 3 1; -1 4 7 0];
lr=maxlinlr(inp);
3.
Functions of BP Neural Network
3.1. Neural network functions
(1) newff
Newff is used to create a feedforward backpropagation network.
Syntax:
net=newff(in,[s1 s2...sn],tf1 tf2...tfn,btf,blf,per);
%returns n layer feed-forward backpropagation network
where in: r × 2 matrix of minimum and maximum values for r input elements; si: size of ith layer; tfi: transfer function for ith layer (tansig,
March 22, 2010
10
18:24
9in x 6in
B-922
b922-ch11
1st Reading
Computational Ecology
logsig, purelin. Default: tansig); btf: backpropagation training
function (trainlm, traingd, trainbfg, trainrp, etc. Default:
trainlm); blf: backpropagation weight/bias learning function (default:
learngdm); per: performance function (default: mse).
Example:
in=[1 2 3 4 5 6 7 8];
y=[5 4 3 2 1 0 -1 -2];
net=newff(minmax(in),[8 1],{‘tansig’ ‘purelin’});
net=train(net,in,y);
out=sim(net,in);
(2) newcf
This function is used to create a cascade-forward backpropagation network.
Syntax:
net=newcf(in,[s1 s2...sn],{tf1 tf2...tfn},btf,blf,per);
%returns n layer cascade-forward backpropagation
%network
where in: r × 2 matrix of minimum and maximum values of r input elements; si: size of ith layer; tfi: transfer function for ith layer (tansig,
logsig, purelin. Default: tansig); btf: backpropagation training
function (trainlm, traingd, traingdm, trainrp, etc. Default:
trainlm); blf: backpropagation weight/bias learning function (default:
learngdm); per: performance function (default: mse).
3.2. Transfer functions
(1) purelin
The linear function purelin calculate a layer’s output from its input. It
takes the form: y = purelin(x) = x.
Syntax:
a=purelin(in,fp);
%returns a s×q matrix
deri=purelin(‘dn’,in,x,fp);
%returns s×q derivative of x with respect to in
where in: s × q matrix of s-dimensional input vectors; fp: function
parameters.
March 22, 2010
18:24
9in x 6in
B-922
b922-ch11
1st Reading
Matlab Neural Network Toolkit
11
Example and application:
in=-1:0.1:1
a=purelin(in);
net.layersi.transferFcn=‘purelin’;
%assign purelin to layer i of network
(2) tansig
Tansig (hyperbolic tangent sigmoid function) calculates a layer’s output from its input. It takes the form: y = tansig(x) = (exp(x) −
exp(−x))/(exp(x) + exp(−x)).
Syntax:
a=tansig(in,fp)
%returns a s×q matrix
deri=tansig(‘dn’,in,x,fp);
%returns s×q derivative of x with respect to in
where in: s × q matrix of s-dimensional input vectors; fp: function
parameters.
Example and application:
in=-1:0.1:1
a=tansig(in);
%Fig. 1
net.layers{i}.transferFcn=‘tansig’;
%assign tansig to layer i of network
(3) logsig
Logsig calculates a layer’s output from its input. It takes the form: y =
logsig(x) = 1/(1 + exp(−x)).
(4) dtansig
This function is the derivative function of tansig.
Syntax:
deri=dtansig(in,fp);
Similarly, there are transfer derivative functions, dpurelin, and
dlogsig.
March 22, 2010
12
18:24
9in x 6in
B-922
b922-ch11
1st Reading
Computational Ecology
Figure 1. Curve of tansig.
3.3. Training functions
(1) trainlm
Trainlm is a training algorithm that updates weight and bias values
using Levenberg–Marquardt optimization. It can train any network as long
as its weight, net input, and transfer functions have derivative functions
(Mathworks, 2002).
Syntax:
[net,inf]=trainlm(net0,din,ltar,idin,size,ts,vv,tv);
where net0: original network; net: trained network; inf: training information over each epoch (inf.epoch: epoch number; inf.perf: training performance; inf.vperf: validation performance; inf.tperf: test
performance); din: delayed input vectors; ltar: target vectors of layer;
idin: intial conditions of input delay; size: batch size; ts: time steps;
March 22, 2010
18:24
9in x 6in
B-922
b922-ch11
1st Reading
Matlab Neural Network Toolkit
13
vv: structure of validation vectors, or empty matrix; tv: structure of test
vectors, or empty matrix.
Application and example:
net.trainFcn=‘trainlm’;
%assign trainlm to network
net.trainParam.epochs=100;
%maximum training epochs with default 100
net.trainParam.goal=0.001;
%performance goal with default 0
net.trainParam.min_grad=1e-15;
%minimum performance gradient with default 1e-10
net.trainParam.show=10;
%epochs between displays with default 25
net.trainParam.time=100000;
%max time for training (seconds) with default infinity
(2) traingd
Traingd is a training function, which updates weight and bias values
according to gradient descent (Mathworks, 2002).
Syntax:
[net,inf,lout,lerr]=traingd(net0,din,ltar,idin,size,ts,vv,tv);
where lout: collective layer outputs of last epoch; lerr: layer errors
of last epoch. The meanings of the remaining variables are the same as
trainlm.
(3) traingdm
Traingdm is a training function, which updates weight and bias values
according to gradient descent with momentum (Mathworks, 2002).
Syntax:
[net,inf,lout,lerr]=traingdm(net0,din,ltar,idin,size,ts,vv,tv);
The meanings of all variables are the same as traingdm.
3.4. Learning functions
(1) learngdm
Learngdm is the gradient descent function with momentum weight/bias
learning.
March 22, 2010
14
18:24
9in x 6in
B-922
b922-ch11
1st Reading
Computational Ecology
Syntax:
[dw,nls]=leargdm(w,in,win,nin,out,ltv,lev,gp,ogp,nd,lp,ls);
where dw: s × r weight/bias change matrix; nls: new learning state; w:
s × r weight matrix; in: r × q input vectors; win: s × q weighted input
vectors; nin: s × q net input vectors; out: s × q output vectors; ltv:
s × q layer target vectors; lev: s × q layer error vectors; gp: s × r gradient
with respect to performance; ogp: s × q output gradient with respect to
performance; nd: s ×s neuron distances; lp: learning parameters (lp.lr:
learning rate with default 0.01; lp.mc: momentum constant with default
0.9; default: []); ls: learning state (default: []).
Applications:
net.inputWeights{i,j}.learnFcn=‘learngdm’;
net.layerWeights{i,j}.learnFcn=‘learngdm’;
net.biases{i}.learnFcn=‘learngdm’;
(2) learngd
Learngd is the gradient descent function weight/bias learning function
(Mathworks, 2002).
Syntax:
[dw,nls]=leargd(w,in,win,nin,out,ltv,lev,gp,ogp,nd,lp,ls);
The meanings of all variables are the same as learngdm.
4.
Functions of Self-Organizing Neural Networks
4.1. Neural network functions
(1) newsom
Newsom creates a self-organizing map network and resultant competitive
layers are used to make classification.
Syntax:
net=newsom(in,[s1,s2,...sn],tf,df,olr,st,tlr,tnd);
%returns a self-organizing map neural network
where in: r × 2 matrix of minimum and maximum values of r input
elements; si: size of ith layer (defaults: [5 8]); tf: topology function
March 22, 2010
18:24
9in x 6in
B-922
b922-ch11
1st Reading
Matlab Neural Network Toolkit
15
(hextop, gridtop, or randtop, etc. Default: hextop); df: distance
function (dist, linkdist, or mandist, etc.); olr: ordering phase
learning rate (default: 0.9); st: ordering phase steps (default: 1000); tlr:
tuning phase learning rate (default: 0.02); tnd: tuning phase neighborhood
distance (default: 1).
Default functions used in newsom are adaptFcn: trains; gradientFcn:
calcgrad; initFcn: initlay; trainFcn: trainr.
Example:
in=[rand(1,50)/5;rand(1,50)/2];
net=newsom([1 6; 0 5],[5 10]);
%distribute samples over 2-dimensional input space
net=train(net,in);
out=sim(net,in);
(2) nnt2som
Nnt2som is an update of self-organizing map in older neural network
toolbox.
Syntax:
net=nnt2som(in,[s1,s2,...sn],w,olr,st,tlr,tnd);
%returns a self-organizing map neural network
where in: r × 2 matrix of minimum and maximum values of r input elements; si: size of ith layer; w: s × r weight matrix; olr: ordering phase
learning rate (default: 0.9); st: ordering phase steps (default: 1000); tlr:
tuning phase learning rate (default: 0.02); tnd: tuning phase neighborhood
distance (default: 1).
In nnt2som the topology function and distance function of selforganizing map are gridtop and linkdist respectively.
(3) newc
Newc creates a competitive layer. Competitive layers are used to make
classification (Mathworks, 2002).
Syntax:
net=newc(in,n,klr,clr);
where in: r × 2 matrix of minimum and maximum values of r input elements; n: number of neurons; klr: Kohonen learning rate (default: 0.01);
clr: conscience learning rate (default: 0.001).
March 22, 2010
16
18:24
9in x 6in
B-922
b922-ch11
1st Reading
Computational Ecology
Example:
in=[2 4 3 7 5;5 9 0 1 0;7 1 0 6 4];
net=newc([0 1; 0 1;0 1],8);
%a five neuron layer with three input elements
net=train(net,in);
out=sim(net,in);
(4) nnt2c
Nnt2c is an update of self-organizing competitive network in older neural
network toolbox.
Syntax:
net=nnt2c(in,w,klr,clr);
where in: r × 2 matrix of minimum and maximum values of r input elements; w: s × r weight matrix; klr: Kohonen learning rate (default: 0.01);
clr: conscience learning rate (default: 0.001).
4.2. Topology functions
Hextop (gridtop) is usded to determine the neuron positions for layers whose neurons are arranged in an n-dimensional hexagonal pattern
(grid).
(1) hextop
Syntax:
pos=hextop(d1,d2,...,dN);
%returns n×s matrix made of n coordinate vectors
%s=d1×d2× ... ×dN
where di: layer size of dimension i.
(2) gridtop
Syntax:
pos=gridtop(d1,d2,...,dN);
%returns n×s matrix made of n coordinate vectors
%s=d1×d2×...×dN
where di: layer size of dimension i.
March 22, 2010
18:24
9in x 6in
B-922
b922-ch11
1st Reading
Matlab Neural Network Toolkit
17
4.3. Distance functions
Dist, mandist, and linkdist are distance weight functions that can
assign weights to inputs to yield weighted inputs, and also layer distance
functions used to determine between-neuron distances in a layer.
(1) dist
Dist is a Euclidean distance function.
Syntax:
d=dist(w,in);
%returns s×q matrix of between-vector distances
where w: s × r weight matrix; in: r × q matrix of r-dimensional vectors.
Applications:
net.inputWeight{i,j}.weightFcn=‘dist’;
net.layers{i}.distanceFcn=‘dist’;
%make layer i use dist in its topology
(2) mandist
Mandist is a Manhattan distance function.
Syntax:
d=mandist(w,in);
%returns s×q matrix of between-vector distances
(3) linkdist
Linkdist is a link distance function.
Syntax:
d=linkdist(w,in);
%returns s×q matrix of between-vector distances
4.4. Learning functions
(1) learnk
Learngk is a Kohonen weight learning function.
Syntax:
[dw,nls]=learngk(w,in,win,nin,out,ltv,lev,gp,ogp,nd,lp,ls);
The meanings of all variables are the same as learngdm.
March 22, 2010
18
18:24
9in x 6in
B-922
b922-ch11
1st Reading
Computational Ecology
(2) learnsom
Learnsom is a weight learning function for self-organizing map.
Syntax:
[dw,nls]=leargsom(w,in,win,nin,out,ltv,lev,gp,ogp,nd,lp,ls);
where the learning parameters, lp, can be parameterized as default values
[lp.order_lr (ordering phase learning rate): 0.9; lp.order_steps
(ordering phase steps): 1000; lp.tune_lr (tuning phase learning rate.):
0.02; lp.tune_nd (tuning phase neighborhood distance): 1]. The meanings of all remaining variables are the same as learngdm.
4.5. Transfer function
Compet is a competitive transfer function. It takes the form: y = 1 for the
neuron with maximum x; y = 0 for the remaining neurons.
Syntax:
a=compet(in,fp);
% returns a s×q matrix
deri=compet(‘dn’,in,x,fp);
%returns derivative of x with respect to in
where in: s × q matrix of input vectors; fp: optional function parameters.
Application:
net.layers{i}.transferFcn=‘compet’;
4.6. Plot function
Plotsom is a function plotting self-organizing map.
Syntax:
plotsom(pos);
%plots positions of neurons and links neurons
%within a Euclidean distance of 1
plotsom(w,d,nd);
%Fig. 2
where pos: n × s matrix of n-dimensional neural positions; w: s × r weight
matrix; d: s × s distance matrix; nd: neighborhood distance (default: 1).
March 22, 2010
18:24
9in x 6in
B-922
b922-ch11
1st Reading
Matlab Neural Network Toolkit
19
Figure 2. A spatial representation of plotsom (w,d,nd).
Example:
pos=hextop(20,15);
plotsom(pos);
5.
Functions of Radial Basis Neural Networks
5.1. Neural network functions
(1) newrb
The function newrb creates a radial basis network. Neurons are added
to the hidden layer of neural network by newrb until the specified mean
squared error goal is met (Mathworks, 2002).
March 22, 2010
20
18:24
9in x 6in
B-922
b922-ch11
1st Reading
Computational Ecology
Syntax:
[net,tr]=newrb(in,tar,goal,spr,mnn,nn);
%returns a radial basis neural network
where in: r × q matrix of q input vectors; tar: s × q matrix of q vectors
of target class; goal: mse goal (default: 0); spr: spread of radial basis
functions (default:1); mnn: maximum number of neurons (default: q); nn:
number of neurons to be added between displays (default: 25).
In newrb a larger value of spr means a smoother function approximation. Too small value of spr means many neurons will be required to fit
a smooth function, and the network may not generalize well (Mathworks,
2002).
(2) newrbe
The function newrbe creates a radial basis network very quickly.
Syntax:
net=newrbe(in,tar,,spr);
%creats an exact radial basis network
5.2. Transfer function
Radbas is a transfer function for radial basis network.
Syntax:
a=radbas(in,fp);
%returns an s×q matrix
deri=radbas(‘dn’,in,x,fp);
%returns derivative of x with respect to in
where in: s × q matrix of input vectors; fp: optional function parameters.
6.
Functions of Probabilistic Neural Network
6.1. Neural network function
The probabilistic neural network function newpnn is usually used to make
classification.
net=newpnn(in,tar,spr);
%creats a probabilistic neural network network
March 22, 2010
18:24
9in x 6in
B-922
b922-ch11
1st Reading
Matlab Neural Network Toolkit
21
where in: r×q matrix of q input vectors; fp: optional function parameters;
tar: s × q matrix of q vectors of target class; spr: spread of radial basis
functions (default: 0.1).
A very small value of spr will result in a network similar to a nearest
neighbor classifier.
6.2. Vector-index convertion functions
(1) ind2vec
This function converts indices to vectors (Mathworks, 2002).
Syntax:
vec=ind2vec(in);
%converts row vectors of indices to sparse
%matrix of vectors
Example:
in=[2 4 8 3 10];
vec=ind2vec(in);
(2) vec2ind
Vec2ind converts vectors to indices.
7.
Function of Generalized Regression
Neural Network
Generalized regression network function newgrnn is usually used for
function approximation.
Syntax:
net=newgrnn(in,tar,spr);
%returns a generalized regression neural network
where in: r × q matrix of q input vectors; tar: s × q matrix of q vectors
of target class; spr: spread of radial basis functions (default: 1).
8.
Functions of Hopfield Neural Network
(1) newhop
Newhop creates a Hopfield recurrent neural network used for pattern recall.
March 22, 2010
22
18:24
9in x 6in
B-922
b922-ch11
1st Reading
Computational Ecology
Syntax:
net=newhop(tar);
%returns a Hopfield network with stable points
%at the vectors in tar
where tar: r × q matrix of q target vectors with 1 or −1 as elements.
Example:
tar=[1 1 -1 -1; -1 1 -1 1; 1 -1 -1 1];
net=newhop(tar);
(2) satlins
Satlins is a transfer function that calculates a lyer’s output from the
input. It takes the form: y = −1, if x < −1; y = x, if −1 ≤ x ≤ 1; y = 1,
if x > 1.
Syntax:
a=satlins(in,fp);
%returns s×q matrix of in’s elements truncated
%into the intervals [-1,1]
deri=satlins(‘dn’,in,x,fp);
%returns s×q derivative of x with respect to in
where in: s × q matrix of input vectors; fp: optional function parameters.
9.
Function of Elman Neural Network
Newelm creates an Elman network.
Syntax:
net=newelm(in,[s1,s2,...sn],[tf1,tf2,...,tfn],btf,blf,per);
%returns an Elman network
where in: r × 2 matrix of minimum and maximum values of r input elements; si: size of ith layer; tfi: transfer function of ith layer (default:
tansig); btf: backpropagation network training function (traingd,
traingdm, traingda, traingdx, etc. Default: traingdx); blf:
backpropagation weight/bias learning function (learngd, learngdm,
etc. Default: learngdm); per: performance function (default: mse).
March 22, 2010
15:11
9in x 6in
B-922
b922-ch12
1st Reading
PART II
Applications of Artificial Neural
Networks in Ecology
1
March 22, 2010
15:11
9in x 6in
B-922
2
b922-ch12
1st Reading
March 22, 2010
15:11
9in x 6in
B-922
b922-ch12
1st Reading
CHAPTER 12
Dynamic Modeling of
Survival Process
Survival process means the survivorship–time curve of a living individual (Zhang, 2007; Zhang and Zhang, 2008). Survival process varies with
organisms. For example, an individual holometabolous insect, like ecologically and economically important lepidopterans, coleopterans, hymenopterans, and dipterans, etc., must survive different developmental stages (i.e.,
egg, the 1st to nth instar larvae, pupa, and adult) to finish the life cycle.
There are several static stages in the life cycle. Usually mortality declines
markedly during the transition of two developmental stages. Survival process is also affected by environmental conditions, like food resource, temperature, and humidity. Factors influencing survival process are complex
and the latter is always a nonlinear process. Due to the lack of theoretical
background (Schultz and Wieland, 1997), a mechanistic model, even if it
usually involves specific assumptions and limitations, is hard to be developed (Zhang et al., 1997). At present the most often used method to depict
survival process is to model mortality distribution with probability density
functions (Wagner et al., 1984; Pu, 1990).
Artificial neural networks are flexible function approximators to
describe nonlinear systems (Kuo et al., 2007; Zhang and Barrion, 2006;
Zhang et al., 2007; Zhang et al., 2008). It was reported that for ecosystems
with time-changing dynamics, neural networks were considered to be the
3
March 22, 2010
15:11
9in x 6in
B-922
b922-ch12
1st Reading
4 Computational Ecology
most suitable and robust models (Tan et al., 2006). Theoretically neural
networks can be used to model survival process of organisms.
In this chapter a BP network is evaluated for its effectiveness and
performance in modeling survival process and mortality distribution of
a holometabolous insect, Spodoptera litura F. (Lepidoptera: Noctuidae).
Empirical models, probability density functions, etc., were used to test
their modeling performances and to compare to BP network. Details can
be found in the paper by Zhang and Zhang (2008).
1.
Model Description
1.1. BP neural network
See previous chapters for the principles and algorithms of BP neural network. A complete Matlab algorithm of BP network is developed to model
the survivorship and mortality of S. litura, as indicated in the following
(Mathworks, 2002; Zhang and Zhang, 2008):
%The 1st row is days and the 2nd row is survival rates
or mortality. If both time and temperature are
%used as input variables, then the 1st and 2nd rows are days
and temperatures respectively (i.e., t=data(1:2,:)),
%and the 3rd row is survival %rates or mortality (i.e., gt=data(3,:))
t=data(1,:);
gt=data(2,:);
%Generate a two-layer BP neural network with 5 hidden neurons and 1
output neuron.
%If a three-layer BP neural network is to be built, then the
following syntax should be revised, for instance,
%net=newff(minmax(t),[3,2,1],{’tansig’ ’tansig’ ’purelin’},
’trainlm’, ’learngd’, ’mse’);
net=newff(minmax(t),[5,1],{’tansig’ ’purelin’},’trainlm’,’learngd’,
’mse’);
net.trainParam.epochs=1000;
%Maximum epochs
net.trainParam.goal=0.001;
%Training goal, mse=0.001
net=train(net,t,gt);
%Set a time series to be predicted. If both time and temperature are
used as input variables, then for example,
%T=[27 28 29 30 31;20 33 26 29 34].
T=[27 28 29 30 31];
tT=[t T];
ft=sim(net,tT);
%Print the simulated output.
[tT;ft]
figure;
March 22, 2010
15:11
9in x 6in
B-922
b922-ch12
1st Reading
Dynamic Modeling of Survival Process
5
%If there is only one input variable in BP network, then the
following codes %may be used to draw a graph
plot(tT,ft,’-’);
hold on
plot(t,gt,’*’);
%Print input weights, between-layer weights, and bias
net.IW{1,1} %Input weights
net.LW{2,1} %Between-layer weights (layer 1 to the output neuron)
net.b{1} %Bias of the first layer
net.b{2} %Bias of the output neuron
1.2. Empirical models
1.2.1. Models for survival process and mortality distribution
The following empirical models are used to model survival process of S.
litura.
f(t) = k/(a + bect ),
(1)
f(t) = (at + b)/(ct + d),
(2)
f(t) = a + b1 t + b2 t 2 + b3 t 3 ,
(3)
where f(t) is the survivorship at time t (days), k, a, b, c, d, b1 , b2 , and b3
are model parameters.
Several probability density functions are also used to describe the mortality distribution of S. litura.
Normal function:
f(t) = 1/((2π)1/2 σ) exp(−(t − µ)2 /(2σ 2 )),
(4)
Logarithmic Normal function:
f(t) = lg e/((2π)1/2 σt) exp(−(lg t − µ)2 /(2σ 2 )),
(5)
Cauchy function:
f(t) = 1/π∗ λ/(λ2 + (t − µ)2 ),
(6)
Chi-squared function:
f(t) = 1/(2n/2 Ŵ(n/2))t n/2−1 exp(−t/2),
(7)
March 22, 2010
15:11
9in x 6in
B-922
b922-ch12
1st Reading
6 Computational Ecology
Weibull function:
f(t) = m/b∗ (t − µ)m−1 exp(−(t − µ)m /b)),
(8)
where f(t): the mortality frequency at time t (day); µ and σ in model (4) and
(5), and µ and λ in model (6): the parameters relevant to mean and standard
deviation of probability distribution; n in model (7): mean of chi-squared
distribution; µ, m, and b in model (8): the parameters relevant to mean and
standard deviation of Weibull distribution.
The above empirical models and density functions are fitted with data
using a nonlinear least square method with Gauss–Newton algorithm (He,
2001).
The last is a dynamic model based on the multi-stage parametric models
(Zhang et al., 1997).
|dui (t)/dt/ui (t)| = ai + bi T
(9)
where T : temperature (◦ C); ui (t): the population size during development
stage i (i = 1, egg; i = 2 ∼ 7, 1st ∼ 6th instar larvae) at time t; ai , bi :
parameters. Parameters ai and bi for various development stages in model
(9) are fitted using regression method (Zhang et al., 1997). The function
1 − |dui (t)/dt/ui (t)| is the dynamic survivorship of stage i. Finally, the
time-changing survivorship is computed by the following model:
f(t) =
(1 − |dui (τ)/dt/ui (τ)|)
(10)
τ<t
i
1.2.2. Model for time–temperature dependent survivorship
and mortality distribution
The trend surface model (TSM; He, 2001) is used to simulate time–
temperature dependent survivorship and mortality distribution:
f(t, T) = a + b1 t + b2 T + b3 tT + b4 t 2 + b5 T 2
(11)
where f(t, T) is the survivorship or mortality frequency at temperature T
(◦ C) at time t (days); a, b1 , b2 , b3 , b4 , b5 : parameters. The parameters
in model (11) were estimated using the Matlab algorithm of TSM model
(Mathworks, 2002).
March 22, 2010
15:11
9in x 6in
B-922
b922-ch12
1st Reading
Dynamic Modeling of Survival Process
2.
7
Data Description
In total 26, 20, 13, 12, and 16 data points ((ti , f(ti )), i = 1, 2, . . . , 26) for
temperatures 20, 24, 32, 36, and 40◦ C respectively, are used in the training
and simulation of survivorship and mortality distribution.
In the cross validation of BP network, each data point is separately
removed from the above data set and the remaining data in the data set
are used to train BP network and to predict the removed data using the
trained BP network. Comparisons between the predicted and observed are
made and Pearson correlation coefficient and statistic significance were
calculated to validate the trained BP network.
In the prediction of survival process for certain temperature, the
observed survivorship for five predicted days (pupa stage) is the survivorship just before larvae pupated, which is used to test generalization performance of models.
Data were submitted to BP network for training or testing in their natural
sequences, i.e., increasing order of dates (days) and temperatures (◦ C).
3.
Results
3.1. Modeling survival process and mortality distribution
3.1.1. Simulation and prediction of survival process
A two-layer BP network is built to model the survival process of S.
litura. Transfer functions of hidden layer and output layer are hyperbolic
tangent sigmoid transfer function (tansig) and linear transfer function
(purelin), respectively. Training and learning functions are Levenberg–
Marquardt algorithm (trainlm) and gradient descent weight/bias function (learngd). Desired performance function is mean squared error
(MSE = 0.01). The network is trained 1000 epochs. An illustration for
the trained weights of BP network is shown in Fig. 1.
It can be found that with different number of hidden neurons, BP network performs better in the simulation and the five-day prediction (Table 1
and Fig. 2). The mean epoch decreases as the number of hidden neurons
increase. MSE is the lowest if five hidden neurons are used. With the constant survivorships, BP network simulates the static stages, egg and pupa
satisfactorily. In general the performance of simulation and prediction of
March 22, 2010
15:11
9in x 6in
B-922
b922-ch12
1st Reading
8 Computational Ecology
Figure 1. Connection weights of two-layer BP network with 5 neurons (20◦ C).
BP network is the best when five hidden neurons are used, while the performance with eight hidden neurons and two hidden neurons is not satisfactory
due to over-learning or deficient learning.
The performance of empirical models (1) to (3) and model (10) is not
better than the BP network (Figs. 3 and 4). Model (3) (Averaged MSE
for simulation and prediction: 0.00549 and 0.0769) shows the best performance and model (2) is the second best (Averaged MSE for simulation and
prediction: 0.00899 and 0.0263). Nevertheless the prediction of model (3)
is unpractical. As compared to empirical models and probability density
functions, model (10) has been directly derived from the survival process
of holometabolous insect but its simulation performance is the worst among
all models (Averaged MSE: 0.1550).
3.1.2. Simulation of mortality distribution
Mortality distributions of S.litura for various temperatures are fitted by
the density functions (4) to (8) and the two-layer BP network with five
hidden neurons (Fig. 5). The mortality distribution of an insect with a single developmental stage may probably be described by probability density
March 22, 2010
15:11
9in x 6in
B-922
b922-ch12
1st Reading
Dynamic Modeling of Survival Process
9
Table 1. Performance of BP network in the simulation (Sim.) and the five-day prediction
(Pred.) of survival process of S. litura, where maximum epochs = 1000.
Temperature
Performance
2 neurons
3 neurons
5 neurons
20◦ C
Epochs
Sim. MSE
Pred. MSE
Epochs
Sim. MSE
Pred. MSE
Epochs
Sim. MSE
Pred. MSE
Epochs
Sim. MSE
Pred. MSE
Epochs
Sim. MSE
Pred. MSE
Mean Epochs
Mean MSE
Mean MSE
71
0.00099
0
9
0.00067
0.001
1000
0.00221
0.0343
29
0.00049
0
1000
0.00242
0.0008
422
0.00136
0.00722
56
0.00098
0.0001
5
0.00086
0.0002
4
0.00048
0.0318
11
0.00087
0
109
0.00099
0.0017
37
0.00084
0.00676
5
0.00087
0.0006
4
0.00094
0
6
0.00056
0.0155
6
0.00003
0.0003
20
0.00098
0
8
0.00068
0.00328
24◦ C
32◦ C
36◦ C
40◦ C
Simulation
Prediction
8 neurons
6
0.00095
0.0005
3
0.00095
0.0118
2
0.00098
0.2898
6
0.00057
0.0039
3
0.00090
0
4
0.00087
0.06120
functions because of its homogeneous morphology, physiology, behavior,
or at least the slow changes but not sudden jumps in the developmental
process. However, the dramatic changes of mortality could occur over distinctive developmental stages of the holometabolous insect. The mortality
distribution of the holometabolous insect is thus more complex and cannot
be better fitted by probability density functions. BP network is the best
model in the simulation of mortality distribution. Simulation performance
of Cauchy density function is similar to the BP network, and chi-squared
function is the worst. Weibull function is not able to be fitted by the nonlinear least square method.
3.1.3. Cross validation of BP network for time
dependent relationship
Using the same settings but a different training goal (MSE = 0.001) in
the BP network with five hidden neurons, cross validation shows that BP
March 22, 2010
10
15:11
9in x 6in
B-922
b922-ch12
1st Reading
Computational Ecology
Figure 2. Performance of BP network in the simulation and prediction of survival process
of S. litura.
network is robust in the prediction of unknown data, particularly for survival
process (Fig. 6).
4.
Discussion
The above study demonstrates that BP network is able to effectively model
survival process and mortality distribution of holometabolous insects. BP
network outperforms empirical models, trend surface model, and probability density functions. It yields reasonable and robust prediction in all
situations. It has the advantages of simplified and more automated model
synthesis and analytical input–output models (Abdel-Aal, 2004).
March 22, 2010
15:11
9in x 6in
B-922
b922-ch12
1st Reading
Dynamic Modeling of Survival Process
11
Figure 3. Performance of empirical models (1) to (3) in the simulation and prediction of
survival process of S. litura.
To build a satisfied BP network, the following principles are suggested:
(1) Set appropriate number of hidden layers and neurons. Using the network
with multiple hidden layers helps to reduce neurons used. Excessive number
of hidden neurons would result in over-learning and reduce generalization
reliability. Overall the number of hidden layers and neurons should increase
as the complexity of the ecosystem studied is increased. (2) Set appropriate
maximum epochs and training goal. A higher training goal and excessive
training epochs would result in an over-learned network. Over-learning
can be avoided by using methods such as limitation of the complexity of
March 22, 2010
12
15:11
9in x 6in
B-922
b922-ch12
1st Reading
Computational Ecology
Figure 4. Performance of model (10) in the simulation of survival process of S. litura.
the model, weight decay, training with noise, etc (Ozesmi et al., 2006).
(3) Gather a representative data set. A better data set well representing the
sample space is crucial for building a neural network with great predictive
power. In addition to the good experimental or sampling design for data
acquisition, data quality may also be improved by using various methods to
eliminate data redundancy and enhance calculation efficiency (Kilic et al.,
2007).
A model describes a generation mechanism for data (Gentle, 2002). To
build a neural network means training the network with raw data and then
March 22, 2010
15:11
9in x 6in
B-922
b922-ch12
1st Reading
Dynamic Modeling of Survival Process
13
Figure 5. The simulation and prediction of mortality distribution using BP network and
probability density functions (4) to (7).
Figure 6. Cross validation of BP network in the prediction of survivorship and mortality
distribution of S. litura. One data point is removed from the data set and the remaining data
in the data set are used to train BP network and to predict the data point removed.
March 22, 2010
14
15:11
9in x 6in
B-922
b922-ch12
1st Reading
Computational Ecology
storing the mechanism of data into connection weights of the network. BP
network is thus an adaptive and flexible model.
Compared to other engineering applications, the studies on processbased neural network modeling are fewer in ecological and environmental areas (Lek and Baran, 1997; Abrahart and White, 2001; Viotti et al.,
2002; Sharma et al., 2003; Abdel-Aal, 2004; Almasri and Kaluarachchi,
2005; Pastor-Barcenas et al., 2005; Nour et al., 2006; Nagendra and
Khare, 2006). Further studies are desirable in the research of dynamic
analysis of ecosystems.
March 22, 2010
18:14
9in x 6in
B-922
b922-ch13
1st Reading
CHAPTER 13
Simulation of Plant
Growth Process
Plant growth process is a nonlinear system. Mechanistic models are always
recommended for modeling nonlinear ecosystems. However, they involve
certain assumptions and limitations and are so highly specialized that they
can be manipulated only by experienced researchers who have a deep understanding of the underlying theories (Schultz and Wieland, 1997; PastorBarcenas et al., 2005). Because of the lack of a consistent theoretical background, it is usually hard to develop a specialized mechanistic model for a
complex ecosystem like plant growth process (Zhang et al., 2007; Zhang
et al., 2008).
Artificial neural networks are flexible function approximators and system simulators, which have been used to model nonlinear systems (Acharya
et al., 2006; Zhang and Barrion, 2006; Zhang, 2007; Zhang et al., 2007;
Zhang and Zhang, 2008). In this chapter the Chinese cabbage growth process is modeled with Elman neural network and linear neural network.
Ordinary differential equation is used to compare the performance of neural networks. Sensitivity analysis is performed to detect the robustness of
these models. Further details can be found in Zhang et al. (2007).
1.
Model Description
1.1. Neural networks
1.1.1. Elman neural network
See Chap. 000.
1
March 22, 2010
18:14
9in x 6in
B-922
b922-ch13
1st Reading
2 Computational Ecology
1.1.2. Linear neural network
See Chap. 000.
Elman and linear neural networks are specially developed to simulate
x(t+ △ t) from x(t), where x(t) = (x1 (t), x2 (t), . . . , xn (t))T , and n is the
dimension of input. Matlab codes of Elman and linear neural networks used
are as follows:
%Read multivariable time series (dyndata.*) to P. In this file,
rows represent variables, and columns represent time series.
P=dyndata;
variables=size(P,1);
%The number of variables
times=size(P,2);
%The number of time points
%Develop a two-layer Elman neural network if it is used.
%Transfer function of $i$th layer is tansig (or logsig)
and purelin, i=1,2.
net=newelm(minmax(P),[30 5],{’tansig’,’purelin’},’traingdm’,
’learngdm’,’mse’);
%Develop a linear neural network if it is used. There are 5 output
variables. Learning rate is 0.01. Time delay is 0.
net=newlin(minmax(P),5,[0],0.01);
net.trainParam.epochs=1000;
%Train 1000 epochs
net.trainParam.goal = 0.000001;
%Set the training goal
for i=1:times-1;
%x(t), x(t+ △ t), t = 1, 2, . . . , times − 1
R=P(:,i);
Y=P(:,i+1);
%Train network with input x(t) and output x(t+ △ t) in ODEs,
dx(t)/ dt = f(x(t), t)
net=train(net,R,Y);
output(:,i)=sim(net,R);
%Compute x(t+ △ t) from x(t)
end
output
%Print input weights and between-layer weights
net.IW{1,1}
%Input weights
net.LW{1,1}
%Between-layer weights (layer 1 to layer 1)
net.LW{2,1}
%Between-layer weights (layer 1 to layer 2)
1.2. Ordinary differential equation
Mechanistic models to describe plant growth process are usually expressed
as a nonlinear ordinary differential equation (Department of Mathematics
of Nanjing University, 1978; Qi et al., 2001):
dx(t)/dt = f(x(t), t),
(1)
where x(t): the vector of state variables, x(t) = (x1 (t), x2 (t), . . . , xn (t))T ,
t: time.
March 22, 2010
18:14
9in x 6in
B-922
b922-ch13
1st Reading
Simulation of Plant Growth Process
3
The linear forms of the nonlinear ordinary differential equation are
mostly used as the following:
dx(t)/dt = A(t)x(t),
(2)
dx(t)/dt = Ax(t),
(3)
and
where A(t) and A are matrices with time-varied elements and time-constant
elements respectively. Models (2) and (3) require little information on the
mechanism of plant growth process. The difference equations corresponding to the models (1) and (3) are as follows:
x(t+ △t) = x(t) + f(x(t), t) △t,
(4)
x(t+ △t) = x(t) + Ax(t) △t.
(5)
and
2.
Data Source
Chinese cabbage was planted in a field. The weight of dry matter of leaves,
stem, and root (g/plant), the area of leaves (cm2 /plant), and the water content
of soil (g water/g dry soil) were measured every three days. In doing so a
multivariable data set was gathered (Qi et al., 2001).
3.
Results
3.1. Model comparison
Elman neural network, linear neural network and linear ordinary differential equation (3) are used to simulate Chinese cabbage growth process. The
trained weights of Elman network are illustrated in Fig. 1. The results indicate that Elman network (training goal = 0.000001) will be in accordance
with the system dynamics after training for a very short time (Fig. 2). Linear
network (training goal = 0.000001, learning rate = 0.01) fits the dynamics within a certain time, while linear ordinary differential equation yields
divergent dynamics from the beginning of the simulation (Fig. 2). Chinese cabbage growth process is a nonlinear system (Qi et al., 2001). The
March 22, 2010
18:14
9in x 6in
B-922
b922-ch13
1st Reading
4 Computational Ecology
Figure 1. Distribution of connection weights of two-layer Elman network (neurons:
(30,5)). Upper: 5 × 30; Middle: 30 × 30; Lower: 30 × 5.
eigenvalues of system matrix A of linear ordinary differential equation are
−1.3418, 0.6823, −0.4948 + 0.4333i, −0.4948 − 0.4333i, and −0.0754,
respectively. It is obvious that the system is unstable. This model therefore
is not able to simulate the dynamics well. Simulated dynamics by linear network diverges after 21 days, even though linear network is able to simulate
the weak nonlinear system.
Overall Elman network is the best in simulating the dynamics of the
multivariate nonlinear ecosystem. Linear network is only able to simulate the weak nonlinear system, and linear ordinary differential equation
March 22, 2010
18:14
9in x 6in
B-922
b922-ch13
1st Reading
Simulation of Plant Growth Process
5
Figure 2. Dynamic simulation of Chinese cabbage growth process using Elman network,
linear network, and the linear ordinary differential equation.
will yield unstable and unpractical simulation for multivariate nonlinear
systems.
3.2. Sensitivity analysis
Transfer functions, training functions, and learning functions are changed
to compare the simulation performance of Elman network (Table 1). It is
found that all transfer functions, except for “purelin”, of the second
layer yield bad simulation performance (Table 1). The changes of transfer
function of the first layer, training functions, and learning functions have
little influence on simulation performance and the influence seems to be
0.0019
0.001
0.001
0.2668
0.1419
0.0054
0.0022
0.0016
1.052
0.1638
0.0105
0.0021
0.0018
2.1734
0.1392
0.0138
0.0064
0.0016
3.2579
0.2
0.0257 0.0307 0.1239 0.1223 0.23
0.3047
0.5833
1
1.25
2.3
2.6
3.8913
0.0064 0.0034 0.0099 0.0102 0.02
0.0375
0.0417
0.125
0.12
0.15
0.23
0.3375
0.0069 0.0025 0.0065 0.0184 0.03
0.0447
0.06
0.1125
0.194
0.2
0.23
0.2625
9.3416 11.059 29.61
51.682 89.822 121.68
214.98
418.5
379
547.5
776
807
0.0839 0.121
0.1353 0.1297 0.1305
0.1201
0.1817
0.0861
0.1111
0.1429
0.0652
0.1187
−0.0003
0.001
0.0002
0.2644
0.1402
0.0006
traingdx −0.0001
learngd 0.0017
Changes of
0.2652
training &
0.1425
learning
functions
0.001
traingd −0.0004
learngd −0.0007
0.2658
0.1412
0.0049
0.002
0.0021
0.967
0.1636
0.0044
0.003
0.0022
1.0486
0.1644
0.009
0.0019
0.0027
0.9969
0.1387
0.0082
0.0019
0.0035
2.1718
0.1402
0.0132
0.0068
0.0017
0.9991
0.2001
0.0136
0.0067
0.0019
3.2605
0.2
0.0261 0.0319 0.1246 0.1225 0.2299
0.305
0.5833
0.9684
0.9938
0.9988
0.9994
0.9997
0.0062 0.0026 0.01
0.0099 0.0199
0.037
0.0417
0.125
0.12
0.15
0.23
0.3375
0.0068 0.0026 0.005
0.0183 0.0299
0.0449
0.06
0.1125
0.194
0.2
0.23
0.2625
0.9998 1
1
1
1
1
1
1
1
1
1
1
0.0842 0.1208 0.1363 0.1298 0.1305
0.1203
0.1817
0.0861
0.1111
0.1429
0.0652
0.1187
0.0275 0.0312 0.1262 0.1225 0.23
0.305
0.5833
1
1.25
2.3
2.6
3.8912
0.0058 0.0018 0.0101 0.01
0.02
0.037
0.0417
0.125
0.12
0.15
0.23
0.3375
0.0063 0.0039 0.005
0.0183 0.03
0.045
0.06
0.1125
0.194
0.2
0.23
0.2625
9.3411 11.061 29.61
51.678 89.818 121.68
214.98
418.5
379
547.5
776
807
0.0827 0.1206 0.135
0.1298 0.1305
0.1204
0.1817
0.0861
0.1111
0.1429
0.0652
0.1187
0.0051
0.0016
0.003
1.0481
0.1635
0.0102
0.0027
0.0038
2.171
0.1387
0.0138
0.0058
0.0024
3.2583
0.2005
0.0269 0.033
0.1249 0.1224 0.23
0.305
0.5833
1
1.25
2.3
2.6
3.8912
0.0064 0.002
0.0099 0.01
0.02
0.037
0.0417
0.125
0.12
0.15
0.23
0.3375
0.0078 0.0038 0.0055 0.0184 0.03
0.045
0.06
0.1125
0.194
0.2
0.23
0.2625
9.3388 11.061 29.608 51.678 89.818 121.68
214.98
418.5
379
547.5
776
807
0.0846 0.1201 0.136
0.1298 0.1305
0.1204
0.1817
0.0861
0.1111
0.1429
0.0652
0.1187
logsig
purelin
tansig
tansig
15
18
21
24
27
30
33
36
39
42
45
48
1st Reading
0.0269 0.0318 0.1265 0.1227 0.23
0.305
0.5833
1
1.25
2.3
2.6
3.8913
0.0066 0.0013 0.0103 0.0101 0.02
0.037
0.0417
0.125
0.12
0.15
0.23
0.3375
0.0062 0.0034 0.0055 0.0185 0.03
0.045
0.06
0.1125
0.194
0.2
0.23
0.2625
9.3379 11.061 29.61
51.682 89.819 121.68
214.98
418.5
379
547.5
776
807
0.0837 0.1203 0.1376 0.1301 0.1305
0.1204
0.1817
0.0861
0.1111
0.1429
0.0652
0.1187
b922-ch13
0.0136
0.007
0.0016
3.2622
0.1999
B-922
12
0.0075
0.0011
0.002
2.1712
0.1391
9in x 6in
9
0.0042
0.0007
0.0024
1.0495
0.1651
Changes of
transfer
functions
3
18:14
6
−0.0006
tansig
0.0003
purelin 0.0008
0.2662
0.1426
March 22, 2010
t (day)
6 Computational Ecology
Table 1. Sensitivity analysis of network functions in Elman neural network. The benchmark settings in the Matlab codes are: net =
newelm(minmax(P),[30 5],{’tansig’,’purelin’},’traingdm’,’learngdm’,’mse’). Training epochs=1000; training goal=0.000001. Five output variables in the table are leaf weight, stem weight, root weight, leaf area, and water content of soil.
March 22, 2010
18:14
9in x 6in
B-922
b922-ch13
1st Reading
Simulation of Plant Growth Process
7
somewhat distinctive at the beginning of the simulation. Sensitivity analysis
indicates that Elman neural network is considerably robust in the simulation
of multivariate dynamic system.
By changing the learning rate and training goal of linear network, the
results showed that the simulation performance would considerably vary
with the changes of learning rate and training goal. In some cases, if the
training goal is 0.000001 and the learning rate is 0.001, for instance, the
dynamics is not able to be simulated. The correct choice of learning rate
and training goal is important for an ideal simulation.
3.3. Model performance using various data sets
Similar conclusions are drawn when using various data sets (i.e., different
combinations of variables) or functions and parameters to test the performance and robustness of Elman network, linear network, and linear ordinary
differential equation. Elman network yields stable and best results. Linear
network performs better in some cases, and linear ordinary differential
equation produces the worst simulation.
4.
Discussison
Between-model comparisons indicate that Elman neural network is able to
simulate the dynamics of multivariate nonlinear ecosystem. Linear network
is able to simulate the simple nonlinear system with a weak nonlinearity.
Linear ordinary differential equation is not able to simulate the multivariate
nonlinear system. Sensitivity analysis proves that the choices of network
functions and training parameters influence the simulation performance of
neural networks. The choice of transfer function of the second layer in
Elman network will significantly affect the simulation performance, while
changes of other functions will not yield the considerable influence.
Besides system simulation, Elman network can be used to learn the
system structure on both temporal and spatial realms, and the patterns of
system dynamics can be detected by using this model. The number of
neurons in Elman network can also be adjusted to yield the dynamics at a
certain scale. Further details can be found in engineering theories (Hagan
et al., 1996; Mathworks, 2002; Fecit, 2003).
March 22, 2010
18:14
9in x 6in
B-922
b922-ch13
1st Reading
8 Computational Ecology
It is argued that several neural networks may be jointly used in the
simulation in order to achieve more reasonable results (Sharma et al., 2003;
Zhang and Barrion, 2006). The use of single neural network would sometimes not yield a better understanding on the system (Abdel-Aal, 2004;
Nour et al., 2006; Nagendra and Khare, 2006; Yu et al., 2006).
March 22, 2010
18:4
9in x 6in
B-922
b922-ch14
1st Reading
CHAPTER 14
Simulation of Food Intake
Dynamics
Insect pests harm plants and cause economic loss by feeding on the crops.
The dynamics of food intake of insect is a function of time. For the
holometabolous insect, an individual needs to survive several developmental stages, e.g., egg, the 1st to nth instar larva, pupae, and adult (Zhang
and Zhang, 2008). Food intake largely changes within a developmental
stage and particularly between adjacent stages (Zhang et al., 1997). Food
intake varies with insect species, food sources, and environmental conditions such as temperature and humidity. It is a system that lacks theoretical
background (Schultz and Wieland, 1997) and a nonlinear system (PastorBarcenas, et al., 2005). Therefore it is hard to build mechanistic models
for food intake dynamics (Zhang, 2007; Zhang et al., 2008). In this chapter
an algorithm for functional link artificial neural network is used to simulate insect’s food intake dynamics. Conventional models are compared to
FLANN for their simulation performance. More details can be found in
Zhang et al. (2008).
1.
Model Description
1.1. Functional link artificial neural network
The basic algorithm of functional link artificial neural network (FLANN)
can be found in Chap. 000. In FLANN one of the following functions may
be chosen as the nonlinear function ρ(·).
1
March 22, 2010
18:4
9in x 6in
B-922
b922-ch14
1st Reading
2 Computational Ecology
Linear function: ρ(S) = a + b S
Negative exponential function: ρ(S) = a e−bS
Decreasing function with lower asymptote: ρ(S) = a + b/S
Logarithmic linear function: ρ(S) = a + b ln(S)
Power function: ρ(S) = a S b
Logistic function: ρ(S) = 1/(a + b e−S )
Anti-exponential function: ρ(S) = a eb/S
Transcendental tangent function: ρ(S) = tanh(S) = (1−e−2S )/(1+e−2S )
where a and b are constants.
Given K training samples {xk , yk }, k = 1, 2, . . . , K, where xk =
k
k )T , and if the kth sample is added,
(x1 , x2k , . . . , xnk )T , yk = (y1k , y2k , . . . , ym
its value for the inverse function of nonlinear function ρ(·) should be
computed. For example, for ρ(S) = (1 − e−2S )/(1 + e−2S ), we have
yk = (1 − e−2Sk )/(1 + e−2Sk ), sk = ln[(1 + yk )/(1 − yk )]/2; for
ρ(S) = a + bS, we have yk = a + b sk , sk = (yk − a)/b.
The orthogonal functions are basis functions: Legendre function,
Chebyshov function, Laguerre function, Hermite function, or trigonometric
function, as described in previous chapters.
Norm of input vector (Rudin, 1991; Li et al., 2001) was defined as
1/2
xi2
x = x =
,
where the resultant x is a scalar variable, and x in x is a vector. The norm
was normalized in order to coincide with the domain of definition of orthogonal functions. The normalized x in Legendre functions and Chebyshov
functions is
x = (2x − (max x + min x))/(max x − min x).
In Laguerre functions, the normalized x is: x = x − min x, and in trigonometric functions, the normalized x is
x = 2πx/(max x − min x) − 2π min x/(max x − min x).
The Matlab codes of FLANN are listed bellow (Zhang et al., 2008)
%Variables and data used in all of the Matlab functions are
defined as the follows
%basefunselec: 1:Legendre Function; 2:Chebyshov Function;
3:Laguerre Function; 4:Hermite Function;
March 22, 2010
18:4
9in x 6in
B-922
b922-ch14
1st Reading
Simulation of Food Intake Dynamics
3
%5:Trigonometric Function.
%nolinfunsele: 1:Linear Function (ρ(S )= a+ bS); 2:
Negative Exponential Function (ρ(S) = a e−bS );
%3:Decreasing Function With Lower Asymptote (ρ(S) = a + b/S);
4:Logarithmic Linear Function
%(ρ(S) = a + b ln(S)); 5: Power Function (ρ(S) = a S b );
6:Logistic Function (ρ(S) = 1/(a + b e−S ));: Anti
%Exponential Function (ρ(S) = a eb/S );
8:Transcendental Tangent Function (ρ(S) = (1 − e−2S )/(1 + e−2S )).
%nodimin: the number of dimensions for input vector.
nodimout: the number of dimensions for output vector.
%notrns: the number of training samples.
nobasefun: the number of basis functions;
%a, b: the constants in nonlinear function.
%traindata: input matrix (x) and output matrix (y) in the file
for training samples. The number of training
%samples is the number of rows in training sample file; the
number of columns in training sample file is
%the summation of the number of dimensions of input vector and
output vector; In each row of the training
%sample file, the first are the values for input vector and the
second are the values for output vector.
% predidata: input matrix ( xp) for predicted samples,
the same format as x.
nodimin=2;
nodimout=1;
notrns=80;
nobasefun=50;
basefunselec=1;
nonlinfunsele=1;
%If ρ(S) = tanh(S) is chosen as the nonlinear function, a and b
can be set to be
%arbitrary values.
a=1;
b=2;
x=traindata(:,1:nodimin);
y=traindata(:,(nodimin+1):(nodimin+nodimout));
xp=predidata;
net=flann(nodimout,notrns,nobasefun,basefunselec,nonlinfunsele,
a,b,x,y)
%The Simulated Output Vector for Each Training Sample
disp(’Simulated results:’);
simu(net,nodimout,notrns,nobasefun,basefunselec,nonlinfunsele,
a,b,x);
mse=sum(sum((si-y).ˆ2))/(notrns*nodimout)
%The predicted output vector for each sample to be predicted
disp(’Predicted results:’);
simu(net,nodimout,size(xp,1),nobasefun,basefunselec,nonlinfunsele,
a,b,xp);
March 22, 2010
18:4
9in x 6in
B-922
b922-ch14
1st Reading
4 Computational Ecology
function net=flann(nodimout,notrns,nobasefun,basefunselec,
nonlinfunsele,a,b,x,y)
xmm=vectornorm(basefunselec,notrns,x);
for k=1:notrns;
xx(k,:)=basefun(basefunselec,nobasefun,xmm(k));
for i=1:nodimout;
[rr,yy(k,i)]=nonlinfun(nonlinfunsele,a,b,y(k,i));
end
end
net=(inv(xx’*xx)*xx’*yy)’;
function result=simu(net,nodimout,nosam,nobasefun,basefunselec,
nonlinfunsele,a,b,mat)
xmm=vectornorm(basefunselec,nosam,mat);
for k=1:nosam;
p=basefun(basefunselec,nobasefun,xmm(k));
for j=1:nodimout;
temp=sum(net(j,:).*p);
[result(k,j),rt]=nonlinfun(nonlinfunsele,a,b,temp);
end
end
result
function p=basefun(basefunselec,nobasefun,x)
switch basefunselec
case 1
p(1)=x;
for i=1:nobasefun-1;
if (i==1) p(2)=(3*x*p(1)-1)/2; continue; end
p(i+1)=((2*i+1)*x*p(i)-i*p(i-1))/(i+1);
end
%Legendre
case 2
p(1)=x;
for i=1:nobasefun-1;
if (i==1) p(2)=2*x*p(1)-1; continue; end
p(i+1)=2*x*p(i)-p(i-1);
end
%Chebyshov
case 3
p(1)=1-x;
for i=1:nobasefun-1;
if (i==1) p(2)=(3-x)*p(1)-1; continue; end
p(i+1)=(2*i+1-x)*p(i)-iˆ2*p(i-1);
end
%Laguerre
case 4
p(1)=2*x;
for i=1:nobasefun-1;
if (i==1) p(2)=2*x*p(1)-2; continue; end
March 22, 2010
18:4
9in x 6in
B-922
b922-ch14
1st Reading
Simulation of Food Intake Dynamics
end
p(i+1)=2*x*p(i)-2*i*p(i-1);
%Hermite
case 5
for i=1:nobasefun;
if (round(i/2)==(i/2)) p(i)=sin(i/2*x);
else p(i)=cos((i+1)/2*x);
end
end
%Trigonometric
end
function [rr,rt]=nonlinfun(nonlinfunsele,a,b,x)
%Nonlinear value, nonlinear inverse value
switch nonlinfunsele
case 1
rr=a+b*x; rt=(x-a)/b;
case 2
rr=a*exp(-b*x); rt=(log(a)-log(x))/b;
case 3
rr=a+b/x; rt=b/(x-a);
case 4
rr=a+b*log(x); rt=exp((x-a)/b);
case 5
rr=a*xˆb; rt=(x/a)ˆ(1/b);
case 6
rr=1/(a+b*exp(-x)); rt=-log((1-a*x)/(b*x));
case 7
rr=a*exp(b/x); rt=b/(log(x)-log(a));
case 8
rr=(1-exp(-2*x))/(1+exp(-2*x)); rt=0.5*log((1+x)/(1-x));
end
function xmm=vectornorm(basefunselec,nosam,mat)
for i=1:nosam;
xmm(i)=sqrt(sum((mat(i,:)).ˆ2));
end
if (basefunselec˜=4)
maxx=max(xmm);
minn=min(xmm);
if (maxx==minn)
warning(’Divided by zero!’);
exit;
end
for i=1:nosam;
switch basefunselec
case {1,2}
xmm(i)=(2*xmm(i)-(maxx+minn))/(maxx-minn);
case 3
xmm(i)=xmm(i)-minn;
5
March 22, 2010
18:4
9in x 6in
B-922
b922-ch14
1st Reading
6 Computational Ecology
case 5
xmm(i)=2*3.1415926*xmm(i)/(maxx-minn)-2*3.1415926*minn/
(maxx-minn);
end
end
end
1.2. Conventional models
The following empirical models are used to model food intake dynamics:
Fractional function: f(t) = (at + b)/(ct + d)
Polynomial function: f(t) = a + b1 t + b2 t 2 + b3 t 3
Exponential function: f(t) = aect
Multivariate linear regression: f(t, T) = a + b1 t + b2 T
Trend surface model: f(t, T) = a + b1 t + b2 T + b3 tT + b4 t 2 + b5 T 2
2.
Data Description
Six temperatures, i.e., 20, 24, 28, 32, 36, and 40◦ C, were used to measure
the food intake dynamics of Spodoptera litura. Feed the larvae with clean
leaves of Chinese cabbage after eggs hatched into larvae, and measure the
averaged fresh weight (g) of leaves consumed by a larva each day until all
larvae become pupae. Fresh weight of leaves consumed per larva was daily
accumulated and was used as food intake dynamics. A data set was thus
gathered (Zhang et al., 1997).
3.
Results
3.1. Modeling food intake dynamics of S. litura
In FLANN, 50 Legendre functions are set as basis functions. Nonlinear
function is chosen to be ρ(S) = a + bS (linear function), and parameter
values are a = 1, b = 2.
The results indicate that the averaged MSE (Mean Squared Error) of
FLANN, fractional function, polynomial function, and exponential functions are 0.01674, 0.15482, 0.07565, 0.1289, respectively. It is obvious that
FLANN has better performance than conventional models in the simulation of food intake dynamics at six temperatures (Fig. 1). Among these
March 22, 2010
18:4
9in x 6in
B-922
b922-ch14
1st Reading
Simulation of Food Intake Dynamics
7
Figure 1. Simulation of food intake dynamics using FLANN and conventional models.
conventional models polynomial function has the lowest error and fractional function yields a larger deviation in simulation.
Daily food intake of S. litura larva in a stage (instar) will change from
zero to a maximum and decline to zero, preparing to develop into the next
stage (instar or pupa). As a result, the food intake dynamics of a larva is not a
strict increasing function of time. Most conventional models tend to smooth
these fluctuations. Theoretically FLANN performs well in describing these
details of dynamics.
March 22, 2010
18:4
9in x 6in
B-922
b922-ch14
1st Reading
8 Computational Ecology
3.2. Modeling temperature–time relationship of food intake
In the simulation of temperature–time dependent food intake relationship,
both multivariate linear regression (MSE = 1.8575) and trend surface model
(MSE = 1.2238) are capable of pursuing the trend of this relationship
(Fig. 2).
In total 80 data points are used in the FLANN (the same setting as
above) simulation of temperature–time dependent food intake relationship.
It is found that FLANN has a better fitting goodness than conventional
models above (MSE = 0.3461; Fig. 3).
3.3. Sensitivity analysis
3.3.1. Basis functions
Different types and numbers of basis functions are used in FLANN
to detect the sensitivity of temperature–time dependent food intake
Figure 2. Simulation of temperature–time dependent food intake using multivariate linear
regression and trend surface model.
March 22, 2010
18:4
9in x 6in
B-922
b922-ch14
1st Reading
Simulation of Food Intake Dynamics
9
Figure 3. FLANN simulation of temperature–time dependent food intake relationship.
Table 1. Mean squared error (MSE) of FLANN simulation for various types and numbers
of basis functions. In total 10, 20, 30, 40, and 50 basis functions are used respectively
(Zhang et al., 2008).
Legendre
Chebyshov
Laguerre
Hermite
Trigonometric
10
20
30
40
2.5415
2.5366
2.5707
2.4913
2.8268
1.0305
0.94284
50.642
14.396
0.79089
0.43109
0.41592
127.93
1244.2
0.62204
0.3767
0.37695
22.719
32.382
0.564
50
0.34606
0.34605
1.6892
1837.2
0.50087
(Table 1 and Fig. 4). Nonlinear function was ρ(S) = a + bS, and parameter
values a = 1, b = 2.
The results demonstrate that the simulation performance using Legendre functions, Chebyshov functions, and trigonometric functions is better than that using Laguerre functions and Hermite functions. The fitted
error (MSE) of Legendre functions, Chebyshov functions, and trigonometric functions decrease as the number of these basis functions increases. On
the contrary, simulation performances of Laguerre functions and Hermite
March 22, 2010
10
18:4
9in x 6in
B-922
b922-ch14
1st Reading
Computational Ecology
Figure 4. Output sensitivity of FLANN to different type and parameter values of nonlinear functions in the simulation of temperature–time dependent food intake relationship
of S. litura, where a and b are parameters in nonlinear functions. Nonlinear functions 1–8
denote the functions indicated in the mathematical description above. Fifty Legendre functions are used in FLANN.
functions are unstable as the number of basis functions changes. Overall
Legendre functions and Chebyshov functions are the best basis functions
in FLANN modeling (Table 1).
3.3.2. Nonlinear functions
Different nonlinear functions and parameter values are used in FLANN
simulation (50 Legendre functions) of temperature–time dependent food
intake relationship (Fig. 4).
March 22, 2010
18:4
9in x 6in
B-922
b922-ch14
1st Reading
Simulation of Food Intake Dynamics
11
Simulation performance varies with the change of type of nonlinear
functions and parameter values in the function. Nonlinear functions 1 (linear
function), 2 (negative exponential function) and 5 (power function) are the
best functions that yield relatively stable outputs at the change of parameter values (Fig. 4). Logarithmic linear function yields better results in
most cases. The remaining nonlinear functions, e.g., transcendental tangent
function and logistic function, have bad simulation performance.
4.
Discussion
As indicated above, FLANN performs better than conventional models in
simulating the food intake dynamics of S. litura. The type and number of
basis functions, and the type and parameter values of nonlinear functions
in FLANN will to a certain extent influence the simulation performance.
The weight matrix W in FLANN may also be approximated by using
the iterative algorithm (Yan and Zhang, 2000), that is,
W(k + 1) = W(k) + ηδ(k)ϕ(xk ),
where δ(k) = (δ1 (k), δ2 (k), . . . , δm (k))T , and
δj (k) = ρ′ (Sj )ej (k),
where ρ′ (S) is the derivative function of ρ(S). For example, for ρ(S) =
tanh(S), we have ρ′ (S) = 1 − ρ2 (S), and for ρ(S) = a + bS, we have
ρ′ (S) = b.
FLANN is a function generator using orthogonal series (Gentle, 2002).
It is substantially a generalized form of radial basis function neural network
(Yan and Zhang, 2000; Zhang and Qi, 2002; Zhang and Barrion, 2006). In
addition to the choices of basis functions and nonlinear functions, a data
set with quality is also indispensable for a better simulation performance
of FLANN.
March 22, 2010
15:45
9in x 6in
B-922
b922-ch15
1st Reading
CHAPTER 15
Species Richness Estimation
and Sampling Data
Documentation
1.
Estimation of Plant Species Richness
on Grassland
Biodiversity presents vast numbers of unexploited opportunities for solving
environmental problems (Brown, 1991; Cowell, 1992). Plants account for
20% of total number of species globally (Chen and Ma, 2001). On a temperate grassland, plants have the largest biomass (20 000 kg/ha), followed by
microorganisms (7000 kg/ha) (Pimental et al., 1992; Chen and Ma, 2001).
Plant diversity is the basis of animal diversity (Andow, 1991; Dong et al.,
2005; Jia et al., 2006). Researches on plant biodiversity always start with
the estimation of species richness. A large number of studies estimating
plant species richness have been reported (Dony, 1963; Williams, 1964;
Rosenzweig, 1995; Chen and Ma, 2001).
Grassland is a natural ecosystem with high productivity. Grasslands
account for 16 ∼ 30% of terrestrial land. However, global grasslands have
been degrading in recent decades due to overgrazing and reclamation for
various uses. To evaluate grassland biodiversity is an important subject in
ecological researches.
Various methods were used to estimate species richness in previous
studies, among which many of them are nonparametric estimators (Chao,
1984; Burnham and Overton, 1978, 1979; Smith and van Belle, 1984; Chao
1
March 22, 2010
15:45
9in x 6in
B-922
b922-ch15
1st Reading
2 Computational Ecology
and Lee, 1992). The performance of richness estimators could be assessed
by comparing estimators to the measured species richness at one or a few
areas (Miller and Wiegert, 1989) or by simulation (Smith and van Belle,
1984). According to the past studies, the performances of these estimators
varied with the habitats, growing systems, and distribution of species (Efron
and Gong, 1983; Palmer, 1990, 1991; Bunger and Fitzpatrick, 1993; Colwell
and Coddington, 1994; Walther and Morand, 1998; Hellman et al.,1999;
Zhang et al., 2004). Besides nonparametric estimators, the parametric models like Arrhenius model, logarithmic normal model, etc., were widely
applied in earlier studies (Preston, 1960). These parametric models were
usually used to describe various richness–area relationships. Artificial neural networks are recognized as universal function approximators (Acharya
et al., 2006; Zhang and Barrion, 2006; Zhang et al., 2007). They have been
used to predict invertebrate species richness in rice field (Zhang, 2007). This
section aims to estimate plant species richness on grassland using a neural
network and compare its performance to some nonparametric estimators
and conventional models.
1.1. Model description
1.1.1. Nonparametric estimators
Seven nonparametric models (Colwell and Coddington, 1994; Zhang and
Schoenly, 1999) are used to estimate plant species richness on the grassland. These estimators have been denoted as Chao 1 and Chao 2 (Chao,
1984), Jackknife 1 and Jackknife 2 (i.e., first-order jackknife and secondorder jackknife, see Burnham and Overton, 1978, 1979), Bootstrap (Smith
and van Belle, 1984), Chao 3 and Chao 4 (Chao and Lee 1992). See Colwell and Coddington (1994), Zhang and Schoenly (1999) for mathematical
description of these nonparametric estimators.
To evaluate performances of these models, two bias indices, i.e., absolute bias (AB) and relative bias (RB), are used in this study (Zhang and
Schoenly, 1999)
AB =
|Si − S|/sim
RB =
(|Si − S|/S)/sim ∗ 100%
March 22, 2010
15:45
9in x 6in
B-922
b922-ch15
1st Reading
Species Richness Estimation and Sampling Data Documentation
3
where Si denotes the estimated richness in the ith randomization and S is
the observed species richness for all samples in the data set. AB and RB take
the average over all randomizations (sim) of the nonparametric estimators.
Among these indices, RB was considered as the most revealing index for
evaluating estimators (Zhang and Schoenly, 1999).
1.1.2. Artificial neural network
A three-layer neural network is developed for modeling the relationship
between total species richness and cumulative sample size (Fig. 1).
In this network, an input set, xi ∈ R; the corresponding output set,
yi ∈ R. xi is the cumulative sample size up to sample i, and yi is the
total number of plant species up to sample i, i = 1, 2, . . . , n. Both the
first and second layers contained fifteen neurons, and bias is used on
each layer. Transfer functions for layers 1–3 are hyperbolic tangent sigmoid transfer function, logistic sigmoid transfer function, and linear transfer function, respectively. Initialization of network, and weights and bias
for each layer, is performed by a function that initializes each layer i
(i = 1, 2, 3) according to its own initialization function (Hagan et al.,
1996; Mathworks, 2002; Fecit, 2003). Network is trained by Levenberg–
Marquardt backpropagation algorithm. Performance function is mean
Figure 1. Neural network developed in present study.
March 22, 2010
15:45
9in x 6in
B-922
b922-ch15
1st Reading
4 Computational Ecology
squared error function (mse). Both the first and second layers receive the
same inputs from sample space and yield outputs for the third layer. The
third layer learns from the input space. For each layer the net input functions calculated the layer’s net input by combining its weighted inputs and
biases.
Network performance is evaluated using mse, Pearson correlation coefficient, and significance level for the linear relationship between simulated
and observed species richness. The artificial neural network is developed
using Matlab (Mathworks, 2002). Matlab codes of the neural network are
listed as bellow:
clc
%Raw data dat(mm,nn) in which mm is number of plant species and
%nn is number of samples
mm=48;
nn=50;
ram=100;
numn=15; %Number of neurons in layers 1 and 2 respectively
obs=zeros(nn-1,ram);
yy=zeros(ram,ram-1);
y=yy’;
for simm=1:ram;
da=zeros(nn-1,2);
temp=2:nn;
da(:,1)=temp’;
ra=randperm(nn);
for i=2:nn;
da(i-1,2)=0;
for k=1:mm;
u(k)=0;
for j=1:i;
u(k)=u(k)+dat(k,ra(j));
end;
if u(k)˜=0 da(i-1,2)=da(i-1,2)+1;end;
end;
end;
obs(:,simm)=obs(:,simm)+da(:,2);
clear net;
disp([’Simu=’ num2str(simm)])
n=1;
data=da(:,1);
net=network;
net.numInputs=1;
net.numLayers=3;
net.inputs{1}.size=n;
net.biasConnect=[1;1;1];
net.inputConnect=[1;1;0];
March 22, 2010
15:45
9in x 6in
B-922
b922-ch15
1st Reading
Species Richness Estimation and Sampling Data Documentation
5
net.layerConnect=[0 0 1;0 0 1;1 1 0];
net.outputConnect=[0 0 1];
net.targetConnect=[0 0 1];
mi=min(data);ma=max(data);
for i=1:n;
tt(i,1)=mi(i);tt(i,2)=ma(i);
end;
net.inputs{1}.range=tt;
net.layers{1}.size=numn;
net.layers{2}.size=numn;
net.layers{1}.transferFcn=’tansig’;
net.layers{2}.transferFcn=’logsig’;
net.layers{3}.transferFcn=’purelin’;
net.layers{1}.initFcn=’initlay’;
net.layers{2}.initFcn=’initlay’;
net.layers{3}.initFcn=’initlay’;
net.layerWeights{1,3}.delays=1;
net.layerWeights{2,3}.delays=1;
net.initFcn=’initlay’;
net.performFcn=’mse’;
net.trainFcn=’trainlm’;
net.trainParam.goal=1e-05;
net.trainParam.epochs=500;
net=train(net,data’,da(:,2)’);
pred=51:100;
yy(simm,:)=sim(net,[data’ pred]);
y=yy’;
end;
obs
y
disp(’Mean Standard Dev’)
[mean(y,2) std(y,2)]
%Print input weights and between-layer weights
net.IW{1,1}
%Input weights
net.IW{2,1}
%Input weights
net.LW{3,1}
%Between-layer weights
net.LW{3,2}
%Between-layer weights
net.LW{1,3}
%Between-layer weights
net.LW{2,3}
%Between-layer weights
1.1.3. Conventional models
The polynomial function, a flexible and adaptable model, is used to model
the above relationship (He, 2001; Mathworks, 2002). The polynomial function with a lower order could not achieve the satisfied fitting goodness, and
the function with a higher order would over-fit the curve. As a result, the
two- and three-order polynomial functions, and asymptotic function are
March 22, 2010
15:45
9in x 6in
B-922
b922-ch15
1st Reading
6 Computational Ecology
used to model the relationship.
y = a2 x2 + a1 x + b,
(1)
y = a3 x3 + a2 x2 + a1 x + b,
(2)
y = (ax + b)/(cx + d),
(3)
where x is the cumulative sample size, and y is the total number of plant
species. Confidence intervals are obtained from bootstrap procedure.
The last conventional model is lognormal distribution (Krebs, 1989).
The methods of Cohen (1959, 1961) are used to fit lognormal distribution
to species abundance (cover-degree) data and total species richness for a
given cumulative sample size is estimated by means of these methods.
1.1.4. Bootstrap procedure
Bootstrap procedure is used to produce yield–effort curve in simulation
analysis (Zhang and Schoenly, 1999). The yield–effort curve plots the
cumulative number of species, defined as the sum of the number of species
in the previous sample(S) and the number of species in the present sample
that were not observed in any previous sample. For the first sample, the
cumulative number of species is defined to equal its number of species.
The order in which samples are added to the total number of samples
affects the shape of the curve. Variation in curve shape due to sample order
is different from sampling error caused by between-sample heterogeneity
(Zhang and Schoenly, 1999). The present study bootstrapped the columns
of the sample-by-species matrix. Repeating this process, e.g., 100 or 1000
randomizations, generates a family of curves from which the mean number of species and its standard deviation (or confidence interval) can be
calculated for each cumulative sample size in the curve.
1.2. Data source
In total 50 samples, each with the area of 1 m × 1 m, were surveyed on the
natural grassland in Zhuhai, China. Plant species and their cover-degrees
(%) were recorded and measured for each sample.
March 22, 2010
15:45
9in x 6in
B-922
b922-ch15
1st Reading
Species Richness Estimation and Sampling Data Documentation
7
1.3. Results
In total 48 plant species and 17 families have been found on the grassland.
1.3.1. Nonparametric estimation of species richness
1.3.1.1. Performance evaluation
Among seven nonparametric estimators, Bootstrap yields the least absolute
bias (AB) and relative bias (RB) in the estimation of plant species richness
(Table 1). In addition, Bootstrap is basically insensitive to cumulative sample size as compared to other estimators (Fig. 2). It yields similar species
richness estimates under various cumulative sample sizes. Overall Bootstrap is the best nonparametric estimator.
Chao 2 is the robust model second-best to Bootstrap. Its estimate tends
to be stable after cumulative sample size reaches 10 samples (Fig. 2). Chao 3
and Chao 4 show a similar trend. Estimates of Chao 3, Chao 4, and Jackknife
1 are stable when cumulative sample size is between approximately 20 and
40 samples. Estimates of Jackknife 2 tend to be lower than observed richness
if cumulative sample size is larger than 40–50 samples. Chao 1 is the most
sensitive to cumulative sample size. Its estimation curve is almost the same
as the observed. Chao 1 is in this sense the worst model.
Maximum richness of plant species on the grassland, estimated by
Chao 1, Chao 2, Jackknife 1, Jackknife 2, Bootstrap, Chao 3, and Chao 4,
was 50(±6), 54(±22), 55(±7), 58(±17), 54(±2), 60(±13), and 68(±26)
species, respectively. On average, the maximum richness was 70. The cumulative sample sizes to achieve maximum estimates of species richness are
41, 30, 30, 22, 19, 30, and 29 samples for above estimators.
1.3.1.2. Nonparametric estimation of plant species richness
Total plant species richness on the grassland is estimated using seven nonparametric models, as illustrated in Table 2. It can be found that the averaged
Table 1. Bias of seven nonparametric estimators.
AB
RB (%)
Chao 1
Chao 2
Jackknife 1
Jackknife 2
Bootstrap
Chao 3
Chao 4
8.55
17.81
9.06
18.88
8.65
18.02
10.18
21.21
5.16
10.75
11.27
23.49
16.83
35.06
March 22, 2010
15:45
9in x 6in
B-922
b922-ch15
1st Reading
8 Computational Ecology
Figure 2. Performance of nonparametric models for the simulation of species richness vs.
sample size relationship. In total 1000 randomizations are used in bootstrap procedure.
richness estimate is similar to the estimate of Bootstrap. On average, the
total species richness on the grassland, estimated with nonparametric models from all samples, is 48 to 55 species.
1.3.2. Estimation of species richness using other models
Using neural network developed above (Fig. 1), polynomial functions [Eqs.
(1) and (2)], asymptotic function [Eq. (3)], and lognormal function (Cohen,
1959, 1961; Krebs, 1989) are used to model species richness vs. sample
size curve (Figs. 3 and 4). In total 100 randomizations are used in bootstrap
procedure. Simulation performance of neural network and lognormal function is the best. Combining simulation and prediction performance, neural
March 22, 2010
15:45
9in x 6in
B-922
b922-ch15
1st Reading
Species Richness Estimation and Sampling Data Documentation
9
Table 2. Estimates of plant species richness using seven nonparametric models.
Chao 1
Chao 2
Jackknife 1
Jackknife 2
Bootstrap
Chao 3
Chao 4
Average
Species Richness
95% Lower Limit
95% Upper Limit
49
48.35
50.94
41.59
51.18
51.76
52.19
49.28
47.42
47.77
29.94
—
48.2
49.8
50.12
45.54
50.58
48.92
71.94
—
54.16
53.72
54.26
55.59
Figure 3. Performance of neural network for modeling species richness vs. sample
size curve.
network is considered to be the best model, and polynomial functions yield
the worst predication (Fig. 3). From neural network model, the estimated
total species richness on the grassland is about 48 to 60.
The asymptotic function [Eq. (3)] is fitted as follows:
y = (2.823x + 5.520)/(0.046x + 0.808).
March 22, 2010
10
15:45
9in x 6in
B-922
b922-ch15
1st Reading
Computational Ecology
Figure 4. Performance of Arrhenius function, lognormal function, asymptotic function,
and polynomial functions for modeling species richness vs. sample size curve.
According to this model, the total plant species richness on the grassland
is estimated to be 61 species (x → ∞).
1.4. Conclusions
Among seven nonparametric estimators tested, Bootstrap yields the least
absolute bias and relative bias, and it yields similar estimates under various cumulative sample sizes. Bootstrap is considered to be the best nonparametric estimator. Chao 2 is the second most robust estimator in seven
nonparametric models. Chao 1 is most sensitive to cumulative sample size,
March 22, 2010
15:45
9in x 6in
B-922
b922-ch15
1st Reading
Species Richness Estimation and Sampling Data Documentation
11
and is considered to be the worst model in estimating species richness.
The total species richness on the grassland, estimated with nonparametric
models and Bootstrap, is on average 48 to 55 species.
Artificial neural network, developed in present study, yields an estimate
of 48 to 60 species. Estimate of asymptotic function is 60 species.
In general the artificial neural network is reliable in the estimation
of species richness. It proves that neural network models are more effective than conventional models including nonparametric estimators used
above.
2.
Documentation of Sampling Data
of Invertebrates
Biodiversity and conservation studies in ecology often begin with issues
of sampling. Descriptions of sampling data are important topics in biodiversity analysis (Steele et al., 1984; Miller and White, 1986; Miller and
Wiegert, 1989; Moreno and Halffter, 2000). Researchers may need to know,
for example, how representative and complete their ecological community
being sampled is. If taking a few samples in a single field captures the
same abundant taxa as taking more samples does, then the future surveys
can be done at a lower cost and with a minimal loss of essential ecological
information (Zhang and Schoenly, 1999).
To measure completeness of sampling, a curve may be drawn (Cohen,
1978; Cohen et al., 1993; Colwell and Coddington, 1994; Zhang and
Schoenly, 1999), which plots the number of taxa sampled vs. the sample
size, the number of taxa sampled vs. the abundance, or the mean abundance of newly sampled species vs. the sample size. These curves increase
ecological understanding of dominance–diversity relationships and spatial
distributions of taxa. A large number of studies have dedicated to these
problems (Bunge and Fitzpatrick, 1993; Coleman et al., 1982; Colwell and
Coddington, 1994; Krebs, 1989; Schoenly et al., 1999, 2003; Shahid et al.,
2003; Zhang and Schoenly, 1999).
As the indicator of biodiversity conservation and pest management, invertebrate diversity in farmlands has been an attractive topic
in recent years (Brown, 1991; Kremaen et al., 1993). A pilot study in
agro-biodiversity might constitute a set of samples gathered from one
March 22, 2010
12
15:45
9in x 6in
B-922
b922-ch15
1st Reading
Computational Ecology
experimental plot or farmer’s field. Sampling analysis is therefore one of
the major topics of invertebrate biodiversity.
Artificial neural networks have been widely used in numerical computation, pattern recognition, classification, and system control, etc (Hagan
et al., 1996). This section aims to conduct documentation (i.e., sampling
information is stored in trained neural networks) on the sampling data using
two artificial neural network models, BP and RBF networks, based on the
invertebrate data sampled in irrigated rice field. Several ecological functions are also tested to compare their performance with neural networks.
More details can be found in Zhang and Barrion (2006).
2.1. Model description
2.1.1. Neural networks
2.1.1.1. BP neural network
See chap. 000.
2.1.1.2. RBF neural network
See previous chapters. The transfer functions in the hidden layer are Gaussian kernel functions:
uj = exp(−(x − cj )2 /(2σj2 )),
j = 1, 2, . . . , N,
where uj is the output of jth hidden neuron, x is the input, cj is the centralized value of Gaussian function, σj is the standardized constant, and N is
the number of neurons in the hidden layer.
Matlab codes of RBF network and BP network used in the present study
are listed as follows:
%Read input x to variable p1 (cumulative sample size) and
target y to variable t1 (cumulative
%number of species sampled), ….
p1=Data1(:,1);p2=Data2(:,1);p3=Data3(:,1);p4=Data4(:,1);
t1=Data1(:,2);t2=Data2(:,2);t3=Data3(:,2);t4=Data4(:,2);
%Develop a RBF network if it is used
eg = 0.01;
%Set square sum error
sc = 1;
%Set expansion constant
net1=newrb(p1,t1,eg,sc); net2=newrb(p2,t2,eg,sc);
net3=newrb(p3,t3,eg,sc); net4=newrb(p4,t4,eg,sc);
%Simulate input-output function with trained RBF network
March 22, 2010
15:45
9in x 6in
B-922
b922-ch15
1st Reading
Species Richness Estimation and Sampling Data Documentation
13
Y1=sim(net1,p1);Y2=sim(net2,p2);Y3=sim(net3,p3);Y4=sim(net4,p4);
%Develop a BP networks if it is used
n=5; %Set 5 neurons in hidden layer
net1=newff(minmax(p1’),[n,1],{’tansig’ ’purelin’},’trainlm’);
net2=newff(minmax(p2’),[n,1],{’tansig’ ’purelin’},’trainlm’);
net3=newff(minmax(p3’),[n,1],{’tansig’ ’purelin’},’trainlm’);
net4=newff(minmax(p4’),[n,1],{’tansig’ ’purelin’},’trainlm’);
%Training BP network
net1.trainParam.epochs =1000; net2.trainParam.epochs = 1000;
net3.trainParam.epochs =1000; net4.trainParam.epochs = 1000;
net1.trainParam.goal = 0.01; net2.trainParam.goal = 0.01;
net3.trainParam.goal = 0.01; net4.trainParam.goal = 0.01;
%Set maximum mean square error
net1=train(net1,p1’,t1’); net2 =train(net2,p2’,t2’);
net3=train(net3,p3’,t3’); net4 =train(net4,p4’,t4’);
%Simulate input-output function with trained BP network
m=100;
Y5=sim(net1,1:m);Y6=sim(net2,1:m);Y7=sim(net3,1:m);Y8=sim(net4,1:m);
%Draw figures
figure;
subplot(4,2,1);plot(1:m,Y5,’-’);hold on;plot(p1’,t1’,’*’);
legend(’BP Fitted’,’Data1’);
for i=1:m-1;
if (Y5(i+1)-Y5(i))<=0.01
plot(0:i,ones(1,i+1)*Y5(i),’b:’);
plot(ones(1,i)*i,linspace(20,Y5(i),i),’b:’);
disp([i Y5(i)]);
break
end
end
if i==m-1
plot(0:m,ones(1,m+1)*Y5(m),’b:’);
disp([m Y5(m)]);
end
subplot(4,2,2);plot(p1,Y1,’-’);hold on;plot(p1,t1,’*’);
legend(’RBF Fitted’,’Data1’);
subplot(4,2,3);plot(1:m,Y6,’-’);hold on;plot(p2’,t2’,’*’);
legend(’BP Fitted’,’Data2’);
for i=1:m-1;
if (Y6(i+1)-Y6(i))<=0.01
plot(0:i,ones(1,i+1)*Y6(i),’b:’);
plot(ones(1,i)*i,linspace(20,Y6(i),i),’b:’);
disp([i Y6(i)]);
break
end
end
if i==m-1
plot(0:m,ones(1,m+1)*Y6(m),’b:’);
disp([m Y6(m)]);
end
March 22, 2010
14
15:45
9in x 6in
B-922
b922-ch15
1st Reading
Computational Ecology
ylabel(’Cumulated Number of Species Sampled’);
subplot(4,2,4);plot(p2,Y2,’-’);hold on;plot(p2,t2,’*’);
legend(’RBF Fitted’,’Data2’);
subplot(4,2,5);plot(1:m,Y7,’-’);hold on;plot(p3’,t3’,’*’);
legend(’BP Fitted’,’Data3’);
for i=1:m-1;
if (Y7(i+1)-Y7(i))<=0.01
plot(0:i,ones(1,i+1)*Y7(i),’b:’);
plot(ones(1,i)*i,linspace(20,Y7(i),i),’b:’);
disp([i Y7(i)]);
break
end
end
if i==m-1
plot(0:m,ones(1,m+1)*Y7(m),’b:’);
disp([m Y7(m)]);
end
subplot(4,2,6);plot(p3,Y3,’-’);hold on;plot(p3,t3,’*’);
legend(’RBF Fitted’,’Data3’);
subplot(4,2,7);plot(1:m,Y8,’-’);hold on;plot(p4’,t4’,’*’);
xlabel(’Cumulative Sample Size’);
legend(’BP Fitted’,’Data4’);
for i=1:m-1;
if (Y8(i+1)-Y8(i))<=0.01
plot(0:i,ones(1,i+1)*Y8(i),’b:’);
plot(ones(1,i)*i,linspace(20,Y8(i),i),’b:’);
disp([i Y8(i)]);
break
end
end
if i==m-1
plot(0:m,ones(1,m+1)*Y8(m),’b:’);
disp([m Y8(m)]);
end
subplot(4,2,8);plot(p4,Y4,’-’);hold on;plot(p4,t4,’*’);
xlabel(’Cumulative Sample Size’);
legend(’RBF Fitted’,’Data4’);
%Print input weights and between-layer weights
net1.IW{1,1} net2.IW{1,1} net3.IW{1,1} net4.IW{1,1}
net1.LW{2,1} net2.LW{2,1} net3.LW{2,1} net4.LW{2,1}
2.1.2. Conventional models
(1) Arrhenius model is used to fit the species richness vs. the sample size
curve (Preston, 1960): N = aS b , where N is the number of the species
when sample size is S. The species richness vs. the sample size curves
capture the information that describes relationships between sample
March 22, 2010
15:45
9in x 6in
B-922
b922-ch15
1st Reading
Species Richness Estimation and Sampling Data Documentation
15
size (i.e., cumulative number of samples) and the number of species
sampled (Cohen, 1978; Cohen et al., 1993; Coleman et al., 1982; Colwell and Coddington, 1994).
(2) Species richness is positively correlated with abundance. Samples containing large numbers of individuals thus harbor more taxa than smaller
samples. The rarefaction curve (Sanders, 1968; Hurlbert, 1971; Simberloff, 1972; Gotelli and Graves, 1996; Schoenly and Zhang, 1999)
is used to test the null hypothesis that two or more sampled communities come from the same parent distribution and have the same species
richness.
2.1.3. Bootstrap method
Bootstrap procedure is used to produce the species richness vs. the sample
size curves. The curves plot the cumulative number of species, defined as
the sum of the number of species in the previous sample(s) and the number
of species in the present sample that were not observed in any previous
sample. For the first sample, the cumulative number of species is defined
to equal the number of species found in this sample (Zhang and Schoenly,
1999; Zhang, 2007).
This study bootstraps the columns of the sample-by-species matrix.
By doing so it produces a different sampling pathway through the field.
Repeating this process, for instance, 1000 randomizations, will generate a
family of curves from which the mean number of species can be calculated
for each sample size in the curve. The bootstrap procedure is also used to
yield rarefaction curves, from all 60 samples.
2.2. Data description
Samples for invertebrates were collected in the irrigated rice field. In total
60 samples were collected for each of four sampling dates. Invertebrates
were sorted to stage (immatures, adults) and then identified to the lowest
possible taxon. Data from the records were stored as sample-by-functional
species (immatures and adults were listed separately, and defined as different functional species in the present study) matrices and their spreadsheet
files were represented as Data 1, Data 2, Data 3, and Data 4, respectively
(Zhang and Schoenly, 2004).
March 22, 2010
16
15:45
9in x 6in
B-922
b922-ch15
1st Reading
Computational Ecology
2.3. Results
In total 99 invertebrate families are found in the four sampling sets, of which
50 families are identified as the abundant families that comprise 99% of the
total invertebrate abundance. Invertebrate fauna of these sampling sets are
proved to be different from each other (Zhang and Schoenly, 2004).
2.3.1. Species richness vs. sample size curves
Both BP network and RBF network perform better in fitting species richness vs. sample size data (Fig. 5). The asymptote of BP approximation is
achieved when the absolute error for asymptote is set to 0.01. RBF networks are quickly trained with training goal (SSE = 0.01; SSE: square sum
error) but more neurons (60 neurons) are needed (the number of neurons is
automatically produced by the RBF network).
The total number of functional species of rice invertebrates for sampling
sets Data 1, Data 2, Data 3, and Data 4, extrapolated from the function
asymptotes of trained BP network, is 140, 149, 147 and 144, respectively
(Fig. 5), while the observed functional species richness is 126, 141, 140,
and 131 when the sample size is fixed at 60.
MSEs (mean squared error) of Arrhenius model are larger than that of
BP fitted and RBF fitted functions (Table 3 and Fig. 5. The MSEs of RBF
networks are zeros in Table 3). Overall BP network and RBF network are
superior to Arrhenius model in function approximation of species richness
vs. sample size data.
Table 3. Comparisons of fitting goodness to curves
of species richness vs. sample size. Performance functions: MSE (for BP and Arrhenius), SSE (for RBF).
Data 1
Data 2
Data 3
Data 4
BP
RBF
Arrhenius
0.022
0.327
0.009
0.011
0
0
0
0
5.2722
25.501
17.4749
15.4121
March 22, 2010
15:45
9in x 6in
B-922
b922-ch15
1st Reading
Species Richness Estimation and Sampling Data Documentation
17
Figure 5. Curves of species richness vs. sample size. Each point is the mean of 1000 randomizations of bootstrap procedure. Asymptotes are indicated based on BP extrapolation.
2.3.2. Rarefaction curves
In the rarefaction analysis, BP network and RBF network demonstrate better
performance than rarefaction method (Table 4 and Fig. 6).
2.4. Discussion
Compared to conventional models above, both BP network and RBF network are better models in the documentation of sampling information. The
mathematical function for sampling data is able to be satisfactorily fitted
March 22, 2010
18
15:45
9in x 6in
B-922
b922-ch15
1st Reading
Computational Ecology
Table 4. Comparisons of fitting goodness to the rarefaction curves among
various methods. Performance functions: MSE (for BP and Rarefaction),
SSE (for RBF).
Data 1
Data 2
Data 3
Data 4
BP
RBF
Rarefaction
1.9244
5.3682
3.4713
7.0201
0
0
0
0
9.0187
15.1631
5.1665
8.9528
Figure 6. Curves of rarefaction method. Each point is the mean of 1000 randomizations
of bootstrap procedure.
March 22, 2010
15:45
9in x 6in
B-922
b922-ch15
1st Reading
Species Richness Estimation and Sampling Data Documentation
19
using artificial neural network. It is found that the total numbers of functional species of rice invertebrates, extrapolated from trained BP network,
are between 140 and 149 in the irrigated rice field.
In their regional flora research, Miller and Wiegert (1989) used different
models, canonical lognormal, uniform, and random, etc., to fit sampling
data. Significant errors were produced when fitting with these models (Fig. 1
in Miller and Wiegert, 1989). These models will yield good fitness only if
some simplified assumptions on source data are met (Colwell and Coddington, 1994). Contrary to the explicit mathematical functions, BP network
and RBF network can learn from sampling information, as well as allow
for the network to mine intrinsic mechanism hidden in sampling data, and
no mathematical assumptions are needed to fit the sampling data.
As pointed out in previous chapters, the sampling information is represented by the connection weights of neural network. This is the essence of
documentation even though connection weights are not explicitly listed.
March 22, 2010
16:52
9in x 6in
B-922
b922-ch16
1st Reading
CHAPTER 16
Modeling Arthropod Abundance
from Plant Composition of
Grassland Community
A large number of studies have been dedicated to the relationship between
arthropod diversity and plant composition. It was reported that weeds influence insect diversity in a crop-weed-insect system (Altieri and Letourmeau,
1984; Altieri, 1994, 1995). Community with more complex plant species
composition will contain more diverse insects (Sheng et al., 1997). Some
forest studies showed that the relationship between plant community and
insect community is significant (Dong et al., 2005; Jia et al., 2006). Furthermore, there is a positive correlation between plant community and predatory
and parasitic insect community, and a negative correlation between plant
community and defoliator insect community (Dong et al., 2005). Dominant
arthropod population on farmland is negatively regulated by vegetational
diversity, but a positive regulation would occur in some cases (Andow,
1991). Many facts have revealed that significant but complex relationships
exist between arthropods and plant composition. They seem to be nonlinear
relationships. Mechanisms to yield these relationships could not be clearly
explained.
In the research areas of arthropods and plant communities, artificial neural networks are successfully used to make simulation, prediction, recognition, and classification (Moisen and Frescino, 2002; Worner and Gevrey,
2006; Zhang, 2007; Zhang et al., 2007; Zhang and Wei, 2008; Zhang and
Zhang, 2008). So far there lacks research on modeling arthropod abundance
1
March 22, 2010
16:52
9in x 6in
B-922
b922-ch16
1st Reading
2 Computational Ecology
from plant composition on grassland. This chapter aims to find the relationship between arthropod abundance and plant composition on grassland, to
develop neural network for modeling this relationship, and to compare simulation performance of neural network to conventional models.
1.
Model Description
1.1. Neural network
A three-layer neural network is developed for modeling arthropod abundance from plant composition (Fig. 1). In this network, an input set, xi ∈ Rp ;
the corresponding output set, yi ∈ R. p is the number of indices for plant
composition, xi is the vector of plant composition for sample i, and yi is the
arthropod abundance for sample i, i = 1, 2, . . . , n. Thirty neurons are used
in both the first and second layers. Bias is used on all of the layers. Transfer
functions for layers 1 to 3 are hyperbolic tangent sigmoid transfer function
(tansig), logistic sigmoid transfer function (logsig), and linear transfer function (purelin), respectively. Weights and bias for each layer are
initiated by Nguyen–Widrow algorithm (Hagan et al., 1996; Mathworks,
Figure 1. Architecture of the neural network designed.
March 22, 2010
16:52
9in x 6in
B-922
b922-ch16
1st Reading
Modeling Arthropod Abundance from Plant Composition
3
2002; Fecit, 2003). Network initialization is made with a function that initializes each layer i according to its own initialization function (initlay).
Network is trained by Levenberg–Marquardt algorithm (trainlm). Performance function is mean squared error function (mse). Both the first and
second layers receive the same inputs from sample space and yield outputs
for the third layer. The third layer learns from the input space. For each
layer the net input functions (netsum) calculate the layer’s net input by
combining its weighted inputs and biases.
The neural network is developed by using Matlab (Mathworks, 2002)
in this chapter. Matlab codes are listed as follows:
clear net;
clc
n=17; %Dimension of the input
numn=30; %Number of neurons in layers 1 and 2
net=network;
net.numInputs=1;
net.numLayers=3;
net.inputs{1}.size=n;
net.biasConnect=[1;1;1];
net.inputConnect=[1;1;0];
net.layerConnect=[0 0 1;0 0 1;1 1 1];
net.outputConnect=[0 0 1];
net.targetConnect=[0 0 1];
mi=min(data);
ma=max(data);
for i=1:n;
tt(i,1)=mi(i);
tt(i,2)=ma(i);
end;
net.inputs{1}.range=tt;
net.layers{1}.size=numn;
net.layers{2}.size=numn;
%Transfer functions: logsig,tansig,purelin,radbas,satlins,tribas
net.layers{1}.transferFcn=’tansig’;
net.layers{2}.transferFcn=’logsig’;
net.layers{3}.transferFcn=’purelin’;
net.layers{1}.initFcn=’initnw’;
net.layers{2}.initFcn=’initnw’;
net.layers{3}.initFcn=’initnw’;
net.layerWeights{1,3}.delays=1;
net.layerWeights{2,3}.delays=1;
net.layerWeights{3,3}.delays=1;
net.initFcn=’initlay’;
net.performFcn=’mse’;
net.trainFcn=’trainlm’;
net.trainParam.goal=1e-05;
March 22, 2010
16:52
9in x 6in
B-922
b922-ch16
1st Reading
4 Computational Ecology
net.trainParam.epochs=10000;
net=train(net,data(:,1:n)’,data(:,n+1)’);
y=sim(net,data(:,1:n)’);
disp(’Observed Simulated’)
[data(:,n+1) y’]
mmse=sum((y-data(:,n+1)’).ˆ2)/50
plot(data(:,n+1)’,y,’*’);
%Print input weights and between-layer weights
net.IW{1,1} %Input weights
net.IW{2,1} %Input weights
net.LW{3,1} %Between-layer weights
net.LW{3,2} %Between-layer weights
net.LW{1,3} %Between-layer weights
net.LW{2,3} %Between-layer weights
net.LW{3,3} %Between-layer weights
1.2. Conventional models
1.2.1. Multivariate model
The multivariate regression (He, 2001; Mathworks, 2002) is used for modeling arthropod abundance from plant composition: f(x) = a + bT x,
where f(x): arthropod abundance (individuals per sample); a: constant;
b = (b1 , b2 , . . . , bp )T : parametric vector; x = (x1 , x2 , . . . , xp )T : the vector of plant composition (p plant taxa, e.g., species, families, etc.).
1.2.2. Response surface model (RSM)
The response surface model (He, 2001; Mathworks, 2002) is also used in
the present modeling: f(x) = a +bT x +xT cx, where f(x): arthropod abundance (individuals per sample); x = (x1 , x2 , . . . , xp )T : the vector of plant
composition; b = (b1 , b2 , . . . , bp )T , c = (c1 , c2 , . . . , cp )T : parametric
vectors; a: constant.
1.2.3. Principal components extraction (PCE)
PCE is often used in data reduction to identify a small number of factors
that explain most of the variance that is observed in a much larger number
of manifest variables (SPSS, 2006). In this chapter it is used to reduce the
dimensionality of input space and generate independent principal components from a larger number of plant taxa without significant loss of variance
information.
March 22, 2010
16:52
9in x 6in
B-922
b922-ch16
1st Reading
Modeling Arthropod Abundance from Plant Composition
2.
5
Data Description
Plant composition and arthropod abundance were recorded on the natural grassland in Zhuhai, China. In total 50 samples, each with an equal
size of 1 × 1 m, were investigated. Plant species and their cover-degrees
were recorded and measured, and individuals of various arthropods were
collected and counted for each sample.
(1) Plant family data. In the modeling of arthropod abundance from plant
family data, in total 17 plant families (17-dimensional input space,
R17 ), 50 samples (n = 50) are used to train neural network or to build
multivariate regression. The output space is a one-dimensional space
(arthropod abundance).
(2) PCE based data. PCE procedure from plant family data yielded p principal components. The seventeen-dimensional input space is thus transformed into a p-dimensional input space (Rp ). Fifty samples in the
p-dimensional input space are used to train neural network, or to build
multivariate regression and response surface model. The output space
is a one-dimensional space, i.e., real domain R (arthropod abundance).
(3) Cross validation. Each sample is separately removed from the input set
of 50 samples, and the remaining samples are used to train model and
to predict the removed samples using the trained model. Comparisons
between the predicted and observed arthropod abundances are made
and Pearson correlation coefficient (r) and statistic significance are
calculated to validate models.
(4) Samples are submitted to neural network in two ways, i.e., their natural
IDs, and randomized sequences of samples.
3.
Results
3.1. Complexity of plant–arthropod interactions
Using the algorithm for biological interaction network (Zhang, 2007),
an interaction network for plant families–arthropod orders is obtained.
There are a lot of direct or indirect interactions between plants and arthropods in the network as follows: (Oxalidaceae, Araneida), (Leguminosae,
Diptera), (Leguminosae, Araneida), (Gramineae, Hemiptera), (Gramineae,
Diptera), (Gramineae, Coleoptera), (Gramineae, Odonata), (Apocynaceae,
March 22, 2010
16:52
9in x 6in
B-922
b922-ch16
1st Reading
6 Computational Ecology
Lepidoptera), (Malvaceae, Diptera), (Compositae, Hemiptera), (Compositae, Diptera), (Compositae, Orthoptera), (Onagraceae, Diptera), (Connaraceae, Araneida), (Cyperaceae, Coleoptera), (Cyperaceae, Isoptera),
(Lycopodiaceae, Orthoptera), (Convolvulacea, Diptera), (Commelinaceae,
Coleoptera), (Commelinaceae, Araneida). The results for the interaction
network indicate that many of the plant–arthropod interactions on grassland
are positive interactions, except for the negative interactions (Leguminosae,
Araneida), (Gramineae, Hemiptera), (Gramineae, Diptera), and (Compositae, Orthoptera). Theoretically, it will be possible to model arthropod
abundance from plant family composition.
3.2. Modeling arthropod abundance
Multivariate regression fitted with plant species data reveals that arthropod abundance could not be reasonably described by plant species and
their cover-degrees (r = 0.1995, p = 0.05). However, the multivariate
regression fitted with plant family data (50 samples, 17 plant families)
demonstrated that arthropod abundance is significantly dependent upon
plant families and their cover-degrees (r = 0.4182, p < 0.005; Fig. 2).
Neural network performs better than multivariate regression in the simulation of arthropod abundance based on the plant family data, as illustrated
in Fig. 2.
The p (p = 2, 3, 4, 5, 6) principal components are yielded from plant
family data using PCE procedure. Fifty samples in the p-dimensional input
space are used to train neural network by 10 000 epochs. The results reveal
that the simulation based on four principal components has the best goodness. The four principal components better explain about 50% of variation
observed in the plant family data.
The simulation performance of neural network from plant family data
appears to be worse than that from PCE-extracted data (Fig. 2), even not
better than the performance based on two principal components (2 PCs,
Table 1). Simulation performance with four PCs (regression constant≈0,
regression coefficient ≈ 1, p < 0.05, mse = 5.9944) is much better than
that with plant family data.
From the results above, we may find that a suitable dimensionality
of input space is necessary to produce a soundly trained neural network.
For the situations with large number of indices for plant composition, the
March 22, 2010
16:52
9in x 6in
B-922
b922-ch16
1st Reading
Modeling Arthropod Abundance from Plant Composition
7
Figure 2. Simulation performance of neural network and multivariate regression. Family
data (50 samples, 17 plant families) are used to train models. Samples are submitted to
neural network in natural IDs.
reduction of indices, for example, by PCE, is suggested in training neural
network. A high dimensionality for input space and a few samples in the
input set would result in the deficient learning of neural network.
With the data of four principal components, multivariate regression
and response surface model are developed. Compared with neural network, response surface model does not sufficiently fit arthropod abundance
(Fig. 3). Multivariate regression yields the worst performance in all of these
models.
Different from the above cases, if samples are submitted to neural network in randomized sequences, the neural network will effectively function (regression constant = 2.688, regression coefficient = 0.74, p < 0.05,
Simulated =
6.5152 + 0.3674∗
Observed
r = 0.6451, p < 0.05
mse = 68.7176
Simulated =
3.3760 + 0.6803∗
Observed
r = 0.8967, p < 0.05
mse = 25.075
Simulated =
0.7896 + 0.9240∗
Observed
r = 0.9743, p < 0.05
mse = 5.9944
Simulated =
2.3718 + 0.7801∗
Observed
r = 0.9271, p < 0.05
mse = 17.2497
Simulated =
6.6830 + 0.3652∗
Observed
r = 0.6733, p < 0.05
mse = 65.7586
b922-ch16
6 PCs
B-922
5 PCs
9in x 6in
4 PCs
16:52
3 PCs
March 22, 2010
2 PCs
8 Computational Ecology
Table 1. Comparisons between simulated and observed arthropod abundance. Simulation is conducted based on p (p = 2, 3, 4, 5, 6)
principal components (PCs). Samples are submitted to neural network in natural IDs.
1st Reading
March 22, 2010
16:52
9in x 6in
B-922
b922-ch16
1st Reading
Modeling Arthropod Abundance from Plant Composition
9
Figure 3. Simulation performance of neural network, multivariate regression, and response
surface model. Four principal components extracted from family data (50 samples, 17 plant
families) are used to train models. Samples are submitted to neural network in natural IDs.
mse = 15.1725; Fig. 4), but its performance is worse than the neural network trained with the samples submitted in natural IDs.
3.3. Cross validation of models
Cross validation is based on four principal components, in which samples
are submitted to neural network in natural IDs. In cross validation the neural
network demonstrates better robustness in predicting unknown samples
(r = 0.2296, p = 0.1; Fig. 5), while multivariate regression (r = 0.0131,
p = 0.9143 > 0.05) and response surface model (r = 0.1096, p =
0.4479 > 0.05) do not effectively predict unknown samples.
March 22, 2010
10
16:52
9in x 6in
B-922
b922-ch16
1st Reading
Computational Ecology
Figure 4. Neural network simulation. Samples are submitted to neural network in randomized sequences. Left: simulation (mean of ten randomizations of sample sequences);
right: arthropod abundance, the observed, and simulated 95% confidence interval from ten
randomizations of sample sequences.
If samples are submitted to neural network in randomized sequences,
the neural network will badly function in the cross validation. About half of
the observed data fall beyond the 95% confidence interval of the predicted,
as illustrated in Fig. 6.
4.
Discussion
This study proves that arthropod abundance on grassland is dependent upon
plant families and their cover-degrees (plant composition). Neural network
model is superior to multivariate regression and response surface model in
modeling arthropod abundance from plant composition.
March 22, 2010
16:52
9in x 6in
B-922
b922-ch16
1st Reading
Modeling Arthropod Abundance from Plant Composition
11
Figure 5. Cross validation of multivariate regression, response surface model, and neural
network. Four principal components are used to train models. Samples are submitted to
neural network in natural IDs.
A reasonable dimension of input space is conducive to train a better
neural network. Algorithms for dimension reduction, such as PCE, etc., are
suggested to be used in the data pre-treatment in neural network modeling.
A high dimension of input space and a few samples in the input set would
result in the deficient learning of neural network.
Randomization procedure is useful to reduce the sequence correlation
in sample submission. It is obvious that sequence correlation may be eliminated, but the performance of neural network modeling is lowered. Sequential submission of samples would yield a neural network with undetermined
March 22, 2010
12
16:52
9in x 6in
B-922
b922-ch16
1st Reading
Computational Ecology
Figure 6. Cross validation of neural network. Left: Prediction (mean of ten randomizations of sample sequences); Right: Arthropod abundance: the observed, and predicted 95%
confidence interval from ten randomizations of sample sequences.
information and thus produce unpractical prediction for unknown samples.
It is suggested that randomization procedure is used in sample submission
for situations with a large number of samples and a lower dimension of
input space.
March 22, 2010
16:53
9in x 6in
B-922
b922-ch17
1st Reading
CHAPTER 17
Pattern Recognition and
Classification of Ecosystems
and Functional Groups
Invertebrate diversity in the farmland has been a natural control force for
crop pests, and is usually served as an indicator of agricultural environment health (Brown, 1991; Kremen et al., 1993; Way and Heong, 1994).
It has been studied in various aspects by using neural network models. For
instance, the BP and radial basis function neural networks were used to
make function approximation on sampling data, and possible richness of
functional invertebrate species was predicted (Zhang and Barrion, 2006).
A stream classification based on characteristic invertebrate species assemblages was also satisfactorily conducted using self-organizing map neural
network, and theoretical assemblages were suggested to be used to define
representative or reference sites for biological surveillance (Cereghino
et al., 2001). Self-organizing map neural network was also used to determine the risk of insect pest invasion (Worner and Gevrey, 2006; Watts and
Worner, 2009), and to assess the community (Song et al., 2007). Overall
the applications of neural network models in invertebrate diversity still fall
behind practical requirements. This chapter aims to present some topological functions for self-organizing map neural network (SOM) and evaluate
effectiveness of some neural network models in the recognition and classification of invertebrate habitat zones (ecosystems) and functional groups.
Further details can be found in Zhang (2007), and Zhang and Li (2007).
1
March 22, 2010
16:53
9in x 6in
B-922
b922-ch17
1st Reading
2 Computational Ecology
1.
Model Description
1.1. Neural networks
1.1.1. Probabilistic neural network
See Chap. 000.
1.1.2. Generalized regression neural network
See Chap. 000.
1.1.3. Linear neural network
See Chap. 000.
Matlab codes of probabilistic network, general regression network, and
linear network used in this study are as follows:
%Load sampling data file (SamplingData.*). In this file,
columns %represent samples to be recognized and rows
represent variables (or %indices, attributes, etc.) (P),
and the last row is classes these samples %fall into
(lastrow)
varis=size(SamplingData,1)-1;
samples=size(SamplingData,2);
lastrow=SamplingData(varis+1,:);
P=SamplingData(1:varis,:);
C=ind2vec(lastrow);
%Load the file (RecSamples.*) for samples to be recognized.
The same
%format as sampling data file
Q=RecSamples;
%Generate a probabilistic neural network (newpnn) and set
spread of %radial basis functions as 0.1, if this neural
network is to be used
net=newpnn(P,C,0.1);
%Generate a general regression neural network (newgrnn) and
set spread %of radial basis functions as 0.2 (a smaller
spread would fit data better %but be less smooth.), if this
neural network is to be used
net=newgrnn(P,C,0.2);
%Generate a linear neural network (newlind) if this neural
network is
%used
net=newlind(P,C);
%Make classification on trained samples
outputclass=sim(net,P)
March 22, 2010
16:53
9in x 6in
B-922
b922-ch17
1st Reading
Pattern Recognition and Classification of Ecosystems and Functional Groups
3
%Make recognition on the samples with unknown classification
recognition=sim(net,Q)
1.1.4. Self-organizing map neural network
See Chap. 000.
1.1.5. Self-organizing competitive learning neural network
See Chap. 000.
Matlab codes of SOM and self-organizing competitive learning neural
networks are developed as follows:
%Load sampling data file (SamplingData.*). In this file,
rows represent samples and colomns represent
%variables (or indices, attributes, etc.)
da=SamplingData(:,:);
P=da(:,:)’;
%Load the file (RecSamples.*) for samples to be recognized.
The same format with sampling data file
newsamples=RecSamples(:,:);
Q=newsamples(:,:)’;
%Generate a self-organizing map neural network (newsom) and
set 5 neurons if it is used
net=newsom(minmax(P),[5]);
%Generate a self-organizing competitive learning neural
network (newc) and set 5 neurons,
%Kohonen learning rate=0.01, and Conscience learning
%rate=0.001 if it is used
net=newc(minmax(P),5,0.01,0.001);
%Specify the distance measure as Chebyshov distance if
necessary
net.layers{1}.distanceFcn = ’chebyshovdist’;
net.inputWeights{1,1}.weightFcn = ’chebyshovdist’;
%Train the network. 1000 epochs is set
net.trainParam.epochs=1000;
net=init(net);
net=train(net,P);
%Obtain samples and updated weights, and make
classification on
%samples
for i=1:size(P,2);
a=vec2ind(sim(net,P(:,i)));
outputclass(1,i)=i;
outputclass(2,i)=a;
end
outputclass
March 22, 2010
16:53
9in x 6in
B-922
b922-ch17
1st Reading
4 Computational Ecology
%Make recognition on the samples with unknown
classification
recognition= vec2ind(sim(net,Q))
%Print input weights
net.IW{1}
1.1.6. New topological functions for self-organizing
map neural network
Four topological functions are established based on the template of topological function mytopf in Matlab neural network toolkit, as indicated in
the following:
cossintopf: cos(sin(cx))
sincostopf: sin(cx)+cos(cx)
acossintopf: acos(sin(cx))
expsintopf: e sin(cx)
(1) Source codes of topological function cossintopf
function pos=cossintopf(varargin)
dim=[varargin{:}];
%The dimensions as a row vector
size=prod(dim);
%Total number of neurons
dims=length(dim);
%Number of dimensions
pos=zeros(dims,size); %The size that POS will need to be set
len=1;
pos(1,1)=0;
for i=1:length(dim)
dimi=dim(i);
newlen=len*dimi;
pos(1:(i-1),1:newlen)=pos(1:(i-1),rem(0:(newlen-1),len)+1);
posi=0:(dimi-1);
pos(i,1:newlen)=posi(floor((0:(newlen-1))/len)+1);
len=newlen;
end
for i=1:length(dim)
pos(i,:)=pos(i,:)*0.7+cos(sin([1:size]*exp(1)/5*i))*0.3;
end
(2) Source codes of topological function sincostopf
function pos=sincostopf(varargin)
%The source codes here are the same as cossintopf
for i=1:length(dim)
pos(i,:)=pos(i,:)*0.6+sin([1:size]*exp(1)/5*i)*0.2
+cos([1:size]*exp(1)/5*i)*0.2;
end
March 22, 2010
16:53
9in x 6in
B-922
b922-ch17
1st Reading
Pattern Recognition and Classification of Ecosystems and Functional Groups
5
(3) Source codes of topological function acossintopf
function pos=acossintopf(varargin)
%The source codes here are the same as cossintopf
for i=1:length(dim)
pos(i,:)=pos(i,:)*0.7+acos(sin([1:size]*exp(1)/5*i))*0.3;
end
(4) Source codes of topological function expsintopf
function pos=expsintopf(varargin)
%The source codes here are the same as cossintopf
for i=1:length(dim)
pos(i,:)=pos(i,:)*0.7+exp(sin([1:size]*exp(1)/5*i))*0.3;
end
1.2. Conventional model
Linear discriminant analysis (SPSS, 2006) is conducted to evaluate the
effectiveness of neural networks.
2.
Data Source
Invertebrates were recorded at sixty sites in an irrigated rice field. Invertebrates were sorted to stage (immatures, adults) and then identified to the
lowest possible taxon. Invertebrate taxa were lumped into habitat zones
(Shoenly and Zhang, 1999). Four sampling sets (Data1, Data2, Data3, and
Data4) were achieved, and then combined into a site-by-habitat zone matrix
and a sample-by-functional group matrix.
If invertebrate habitat zones are classified and recognized, they are
treated as samples. In total five habitat classes were defined as follows:
(1) plant canopy (terrestrial); (2) neustonic (water surface); (3) planktonic (water column); (4) benthic (bottom dwelling); (5) soil-dweller (dryland). Habitat zones 1–18 represent plant canopy (terrestrial), planktonic
(water column), neustonic (water surface), benthic (bottom dwelling),
plant canopy (terrestrial), planktonic (water column), neustonic (water
surface), benthic (bottom dwelling), plant canopy (terrestrial), planktonic
(water column), neustonic (water surface), soil-dweller (dryland), benthic
(bottom dwelling), plant canopy (terrestrial), planktonic (water column),
neustonic (water surface), benthic (bottom dwelling), soil-dweller (dryland) (Schoenly and Zhang, 1999).
March 22, 2010
16:53
9in x 6in
B-922
b922-ch17
1st Reading
6 Computational Ecology
If invertebrate functional groups are classified and recognized, they are
treated as samples to be classified or recognized.
3.
Results
3.1. Supervised classification of habitat zones
3.1.1. Recognition of known habitat zones
The recognition results of neural network models [i.e., probabilistic network
(learning rate = 0.1), generalized regression network (learning rate = 0.2),
and linear network] and linear discriminant analysis (prior probabilities
are computed from sizes of habitat zones) show that three neural network
models yield 100% coincidence with known habitat zones (Table 1). A
Table 1. Recognition of invertebrate habitat zones using probabilistic network, generalized
regression network, linear network, and linear discriminant analysis.
Invertebrate
Habitat
Zones
Practical
Classification
Probabilistic
Network
Recognition
Gen. Regr.
Network
Recognition
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
1
3
2
4
1
3
2
4
1
3
2
5
4
1
3
2
4
5
1
3
2
4
1
3
2
4
1
3
2
5
4
1
3
2
4
5
1
3
2
4
1
3
2
4
1
3
2
5
4
1
3
2
4
5
Linear
Linear
Discri. Anal.
Network
Recognition Recognition
1
3
2
4
1
3
2
4
1
3
2
5
4
1
3
2
4
5
1
3
2
4
1
3
2
4
1
3
2
5
5
1
3
2
4
5
∗ Classes 1–5: (1) plant canopy (terrestrial); (2) neustonic (water surface); (3) planktonic
(water column); (4) benthic (bottom dwelling); (5) soil-dweller (dryland).
March 22, 2010
16:53
9in x 6in
B-922
b922-ch17
1st Reading
Pattern Recognition and Classification of Ecosystems and Functional Groups
7
habitat zone is incorrectly recognized by linear discriminant analysis. Its
recognition coincidence is 94.4%.
3.1.2. Recognition of unknown habitat zone
An unknown habitat zone, represented by an invertebrate species ID 10766
from Lepidoptera in early study, can be recognized using the trained neural network. All neural network models and linear discriminant analysis
recognize this species as class 5, i.e., soil-dweller (dryland).
3.1.3. Influence of topological functions
As illustrated in Fig. 1, different topological functions yield distinctive
topological structures of neuron positions.
Figure 1. Neuron positions of different topological functions.
March 22, 2010
16:53
9in x 6in
B-922
b922-ch17
1st Reading
8 Computational Ecology
The topological functions in an SOM network are changed to analyze
the changes of network outputs. Some of the Matlab codes above may be
revised, for example, as the following:
net=newsom(minmax(P),[5]);
%Set cossintopf as the topological function in the first layer
net.layers{1}.topologyFcn=’cossintopf’;
Using the SOM with different topological functions as described above,
and with other default functions in the SOM of Matlab, a self-organizing
unsupervised clustering was conducted on invertebrate orders. The results
for the neural networks with four topological functions and default functions
are obtained as follows (Zhang and Li, 2007):
(1) Data 1
• sincostopf: (Ephemeroptera, Orthoptera, Thysanoptera, Blattodea),
the rest of the orders are of the same category;
• cossintopf: (Ephemeroptera, Orthoptera, Thysanoptera, Blattodea),
the rest of the orders are of the same category;
• acossintopf: (Ephemeroptera, Orthoptera, Thysanoptera, Blattodea), the rest of the orders are of the same category;
• expsintopf: (Ephemeroptera, Orthoptera, Thysanoptera, Blattodea),
the rest of the orders are of the same category;
• System default topological function: (Orthoptera, Thysanoptera, Blattodea), the rest of the orders are of the same category.
(2) Data 2
• sincostopf: (Lepidoptera, Thysanoptera, undetermined order), the
rest of the orders are of the same category;
• cossintopf: (Lepidoptera, Thysanoptera, undetermined order), the
rest of the orders are of the same category;
• acossintopf: (Lepidoptera, Thysanoptera, undetermined order), the
rest of the orders are of the same category;
• expsintopf: (Lepidoptera, Thysanoptera, undetermined order), the
rest of the orders are of the same category;
• System default topological function: (Lepidoptera, Thysanoptera, undetermined order), the rest of the orders are of the same category.
March 22, 2010
16:53
9in x 6in
B-922
b922-ch17
1st Reading
Pattern Recognition and Classification of Ecosystems and Functional Groups
9
(3) Data 3
• sincostopf: (Ephemeroptera, Thysanoptera), (Dermaptera, Strepsiptera), the rest of the orders are of the same category;
• cossintopf: (Ephemeroptera, Dermaptera, Strepsiptera), the rest of
the orders are of the same category;
• acossintopf: (Ephemeroptera, Odonata, Dermaptera, Strepsiptera,
Thysanoptera, Blattaria), the rest of the orders are of the same category;
• expsintopf: (Ephemeroptera, Dermaptera, Strepsiptera, Thysanoptera, Blattaria), the rest of the orders are of the same category;
• System default topological function: (Ephemeroptera, Dermaptera,
Strepsiptera, Thysanoptera, Blattaria), the rest of the orders are of the
same category.
(4) Data 4
• sincostopf: (Hymenoptera, Odonata), (Strepsiptera, Neuroptera),
the rest of the orders are of the same category;
• cossintopf: (Hymenoptera, Odonata), (Strepsiptera, Neuroptera),
the rest of the orders are of the same category;
• acossintopf: (Lepidoptera, Strepsiptera, Neuroptera, Blattodea), the
rest of the orders are of the same category;
• expsintopf: (Hymenoptera, undetermined order), (Strepsiptera, Neuroptera, Blattodea), the rest of the orders are of the same category;
• System default topological function: (Hymenoptera, Odonata), (Strepsiptera, Neuroptera, Blattodea), the rest of the orders are of the same
category.
It is found that the general trends from various classifications are similar.
Nevertheless, the results between various topological functions are somewhat different.
3.2. Unsupervised classification of functional groups
3.2.1. Classification using different neural networks
Overall SOM and self-organizing competitive learning neural networks
produce similar classifications under the same distance (similarity) measure
March 22, 2010
10
16:53
9in x 6in
B-922
b922-ch17
1st Reading
Computational Ecology
and the number of neurons. Between-model differences can also be found,
particularly if more neurons are set in the networks.
For example, using Euclidean distance and two neurons, the classification of the functional group terrestrial crawler, walker, jumper or hunter,
and the functional group mixed is different between two models, and the
remaining functional groups are classified into the same classes. If eight
neurons are used, classification yields differences for the following functional groups: external plant feeder, flying adult that is searching, ovipositing, or larvipositing, neustonic (water surface) swimmer (semi-aquatic),
and shredder, chewer of coarse particulate matter, and most of the functional groups that belong to the same classes for the two models.
3.2.2. Classification using different distance (similarity)
measures
Using different distance (similarity) measures will yield different
classifications. Generally the distance (similarity) measures with the same
category, e.g., Euclidean and Chebyshov distances, tend to produce similar classifications. Pearson correlation shows a different pattern for the
classification.
Using two neurons, except for functional groups such as collector
(filterer, suspension feeder), terrestrial web-builder, herbivore, predator,
detritivore, and mixed, the remaining functional groups have the same classifications. Nevertheless, there is a unique classification for using neurons 2
and 8 if Pearson correlation is used in self-organizing competitive learning
neural network.
3.2.3. Classification using different number of neurons
The number of neurons in the network represents the maximum number
of classes in the classification process. Statistically, classes increase at the
increase of neurons. The classification using two neurons yields more consistent results.
Using Chebyshov distance and 2, 5, and 8 neurons in SOM will produce
2, 4, and 5 classes of functional groups, and for Euclidean distance these
numbers of neurons yield 2, 4, and 4 classes, respectively. However, various
results are produced with Pearson correlation.
March 22, 2010
16:53
9in x 6in
B-922
b922-ch17
1st Reading
Pattern Recognition and Classification of Ecosystems and Functional Groups
11
3.2.4. General results
SOM can learn both the topology of sample space and the distribution
of samples, but self-organizing competitive learning neural network will
only learn the distribution of samples. SOM is therefore superior to the
latter. For example, if Pearson correlation is used in the networks, then the
classification for SOM is more informative than the latter.
3.2.5. Classification of functional groups
If between-functional group difference of individual numbers is the focus
of consideration, i.e., Chebyshov distance is used, then the following classifications should be acceptable:
• Two classes:
Class I: external plant feeder; flying adult that is searching, ovipositing,
or larvipositing; terrestrial crawler, walker, jumper or hunter; neustonic
(water surface) swimmer (semi-aquatic); collector (filterer, suspension
feeder); terrestrial web-builder; herbivore, predator, and detritivore;
mixed;
Class II: terrestrial blood sucker; terrestrial flyer; planktonic (water column) swimmer and diver; tourist (nonpredatory species with no known
functional role other than as prey in ecosystem); gall former; collector (gather, deposit feeder); predator and parasitoid; shredder, chewer of
coarse particulate matter; leaf miner; pollen feeder; idiobiont (acarine
ectoparasitoid); leaf roller/webber.
The two classes above represent two different invertebrate indicator systems in the rice field.
• Four classes:
Class I: external plant feeder; terrestrial crawler, walker, jumper or
hunter; mixed;
Class II: flying adult that is searching, ovipositing, or larvipositing; collector (gather, deposit feeder); collector (filterer, suspension feeder); terrestrial web-builder; herbivore, predator, and detritivore; leaf miner;
Class III: planktonic (water column) swimmer and diver; neustonic
(water surface) swimmer (semi-aquatic); shredder, chewer of coarse particulate matter;
March 22, 2010
12
16:53
9in x 6in
B-922
b922-ch17
1st Reading
Computational Ecology
Class IV: terrestrial blood sucker; terrestrial flyer; tourist (nonpredatory
species with no known functional role other than as prey in ecosystem);
gall former; predator and parasitoid; pollen feeder; idiobiont (acarine
ectoparasitoid); leaf roller/webber.
• Five classes:
Class I: external plant feeder;
Class II: shredder, chewer of coarse particulate matter;
Class III: mixed;
Class IV: flying adult that is searching, ovipositing, or larvipositing;
planktonic (water column) swimmer and diver; collector (gather, deposit
feeder); collector (filterer, suspension feeder); herbivore, predator, and
detritivore; leaf miner;
Class V: terrestrial blood sucker; terrestrial flyer; terrestrial crawler,
walker, jumper or hunter; tourist (nonpredatory species with no known
functional role other than as prey in ecosystem); neustonic (water surface) swimmer (semi-aquatic); gall former; predator and parasitoid; terrestrial web-builder; pollen feeder; idiobiont (acarine ectoparasitoid);
leaf roller/webber.
The distribution of connection weights of SOM for five classes is indicated
in Fig. 2.
Figure 2. Distribution of connection weights of SOM with 5 neurons.
March 22, 2010
16:53
9in x 6in
B-922
b922-ch17
1st Reading
Pattern Recognition and Classification of Ecosystems and Functional Groups
13
There is a major trend of consistency between different classification
results that yield different numbers of classes. This is interpretable to some
extent. For example, in the classification with two classes, plant pests (external plant feeder), and natural enemies (terrestrial web-builder, flying adult
that is searching, ovipositing, or larvipositing, etc.) have the similar magnitude or change in individual numbers. Terrestrial blood sucker, terrestrial
flyer, and tourist closely correlate with each other in all of the classifications.
4.
Discussion
Probabilistic network, generalized regression network, linear network, and
linear discriminant analysis are capable of recognizing known and unknown
habitat zones. Neural network models are proved to be better than conventional linear discriminant analysis in pattern recognition. Linear neural network may have better recognition ability than linear discriminant analysis
even for linear divisible problems. Previous research also showed that neural
network models of demersal fish distribution outperformed linear discriminant analysis and attained better recognition and prediction performance for
distribution of those species (Maravelias et al., 2003). Generalized regression neural network model outperforms traditional regression-based model
like the linear discriminant analysis. This finding supports the past studies.
In a research on temporal predictions of functional attributes of the ecosystem at regional scales, the neural network model was much better than the
traditional regression model (Paruelo and Tomasel, 1997).
Both SOM and self-organizing competitive learning network have been
proved to be effective models in the pattern classification and recognition of
sampling information. Overall SOM is superior to self-organizing competitive learning neural network. The different settings of distance (similarity)
measures and topological functions will to a certain extent affect the network output.
Dec. 17, 2009
16:46
9in x 6in
B-922
b922-ch18
1st Reading
CHAPTER 18
Modeling Spatial
Distribution of Arthropods
Spatial distribution means the distribution of animal or plant individuals in a
space, particularly on the ground. A lot of probability distribution functions
and aggregation indices have been developed and used to describe spatial
distribution of individuals (Krebs, 1989). In such methods the number of
individuals found in a sample is supposed to be a random variable that
follows some probability distribution, e.g., binomial distribution, Poisson
distribution, negative binomial distribution, etc. Because of lack of spatial
variables in the function, it cannot be used to predict the abundance at any
given location. On the other hand, due to the lack of theoretical background,
it is also hard to construct a mechanistic model that calculates individual
distribution from spatial information. Questions on spatial distribution are
therefore data-driven. Like most of other ecological problems, the relationship between individual distribution and spatial information is usually a
nonlinear one (Gevrey et al., 2006; Moisen and Frescino, 2002; Zhang,
2007; Zhang and Barrion, 2006; Zhang et al., 2008).
This chapter aims to present several models and evaluate their effectiveness in modeling spatial distribution of arthropods. A self-designed neural network, BP network, LVQ network, linear network, response surface
model, linear discriminant analysis, spline function, and a partial differential equation are developed or used to model field distribution of arthropods.
Models are validated and compared for their power in predictability. More
details can be found in Zhang et al. (2008).
1
Dec. 17, 2009
16:46
9in x 6in
B-922
b922-ch18
1st Reading
2 Computational Ecology
1.
Model Description
1.1. Neural networks
1.1.1. Self-designed artificial neural network
The artificial neural network for modeling spatial distribution of arthropods is a mapping from input space (with the spatial coordinates of
quadrat as the element) to output space (with the number of arthropod
individuals in the quadrat as the element), U: R2 → R and u(x) = v,
where u ∈ U = {u|u : R2 → R}. For an input set, xi ∈ R2 , and
the output set, vi ∈ R, there is a mapping f that satisfies f(xi ) = vi ,
i = 1, 2, . . . , n. A mapping u ∈ U = {u|u : R2 → R}, represented by this network, should approximate f(x) and satisfy the following
condition
|u(x) − f(x)| < ε,
x ∈ R2 ,
where x = (x, y)T , and ε > 0 is the known threshold for error.
A three-layer neural network is developed for modeling spatial distribution of arthropods. Both the first and second layers contain 30 neurons, and bias is used for each layer. Transfer functions for layers 1
to 3 are hyperbolic tangent sigmoid function, logistic sigmoid function,
and linear transfer function, respectively. Initialization of network, including adding weights and bias for each layer, is performed by a function
that initializes each layer i (i = 1, 2, 3) according to its own initialization function (Fecit, 2003; Hagan et al., 1996). Network is trained
using Levenberg–Marquardt algorithm. Performance function is mean
squared error (mse) function. The first and second layers receive inputs
from input space and produce outputs for the third layer. There is a
closed loop for the third layer. For each layer, the net input functions
calculate the layer’s net input by combining its weighted inputs and
biases.
Mathematically, the network output is
f(x) ≈ u(x) =
3
k=1
wk ak (·),
(1)
Dec. 17, 2009
16:46
9in x 6in
B-922
b922-ch18
1st Reading
Modeling Spatial Distribution of Arthropods
3
where
a1 (·) = 2/(1 + exp(−2(w11 x + w21 y + b1 ))) − 1,
a2 (·) = 1/(1 + exp(−(w12 x + w22 y + b2 ))),
a3 (·) =
3
wk3 ak (·) + b3 .
k=1
In Eq. (1), x = (x, y)T is the input, u = u(x) is the output; wi , i = 1, 2, 3;
wij , i, j = 1, 2; wi3 , i = 1, 2, 3 . . . , and bi , i = 1, 2, 3, are the parameters.
The artificial neural network is developed using Matlab (Mathworks,
2002). Modeling performance of the neural network is represented by mse,
Pearson correlation coefficient, and significance level for the linear regression between the simulated and observed.
The Matlab algorithm for the neural network developed and used in
this study are as follows:
clc
clear net;
data=arth;
n=30; % Number of neurons in layers 1 and 2 respectively
rows=8;
cols=8;
s=0;
for i=1:rows;
for j=1:cols;
s=s+1;
da(s,1)=j;
da(s,2)=i;
da(s,3)=data(i,j);
end;
end;
datax=da(:,1);
datay=da(:,2);
net=network;
net.numInputs=2;
net.numLayers=3;
net.inputs {1}.size=1;
net.inputs {2}.size=1;
net.biasConnect=[1;1;1];
net.inputConnect=[1 1;1 1;0 0];
net.layerConnect=[0 0 0;0 0 0;1 1 1];
net.outputConnect=[0 0 1];
net.targetConnect=[0 0 1];
mi=min(datax);
Dec. 17, 2009
16:46
9in x 6in
B-922
b922-ch18
1st Reading
4 Computational Ecology
ma=max(datax);
tt(1,1)=mi;
tt(1,2)=ma;
net.inputs {1}.range=tt;
mi=min(datay);
ma=max(datay);
tt(1,1)=mi;
tt(1,2)=ma;
net.inputs {2}.range=tt;
net.layers {1}.size=n;
net.layers {2}.size=n;
%Transfer functions: logsig,tansig,purelin,radbas,satlins,tribas
net.layers {1}.transferFcn=’tansig’;
net.layers {2}.transferFcn=’logsig’;
net.layers {3}.transferFcn=’purelin’;
net.layers {1}.initFcn=’initlay’;
net.layers {2}.initFcn=’initlay’;
net.layers {3}.initFcn=’initlay’;
net.inputWeights {2,1}.delays=[0 1];
net.inputWeights {1,2}.delays=[0 1];
net.layerWeights {3,3}.delays=1;
net.initFcn=’initlay’;
net.performFcn=’mse’;
net.trainFcn=’trainlm’;
net.trainParam.goal=1e-05;
net.trainParam.epochs=10000;
net=train(net,[da(:,1)’;da(:,2)’],da(:,3)’);
y=sim(net,[da(:,1)’;da(:,2)’]);
y
%Print input weights and between-layer weights
net.IW {1,1} %Input weights
net.IW {2,1} %Input weights
net.LW {3,1} %Between-layer weights
net.LW {3,2} %Between-layer weights
net.LW {3,3} %Between-layer weights
1.1.2. BP neural network
See Chap. 5 Secs. 1–3.
1.1.3. LVQ neural network
See Chap. 6 Sec. 5.
1.1.4. Linear neural network
See Chap. 3.
Dec. 17, 2009
16:46
9in x 6in
B-922
b922-ch18
1st Reading
Modeling Spatial Distribution of Arthropods
5
Matlab algorithms for BP, LVQ, and linear neural networks used in the
present study are as follows:
%Load sampling data file (SamplingData.*). In this file,
columns represent quadrats and rows represent spatial
coordinates of input vectors (P), but the last row are classes
these quadrats fall into (lastrow)
%n is the size of input vector, i.e., the dimension of input
space Rn
%m is the number of classes, i.e., the size of output vector,
or the dimension of output space Rm
n=size(SamplingData,1)-1;
samples=size(SamplingData,2);
lastrow=SamplingData(n+1,:);
P=SamplingData(1:n,:);
C=ind2vec(lastrow);
m=max(lastrow);
% Load the file (RecSamples.*) for samples to be recognized.
The same format with sampling data file
Q=RecSamples;
% Generate a BP neural network (newff) with 30 hidden neurons
and m output neurons
net=newff(minmax(P),[30,m], ’tansig’ ’purelin’, ’trainlm’,
’learngd’,’mse’);
%net=newff(minmax(P),[10,10,10, m],’tansig’ ’tansig’ ’tansig’
’purelin’,’trainlm’, ’learngd’,’mse’);
net.trainParam.epochs=1000;
net.trainParam.goal=0.001;
net=train(net,P,C);
%Produce an element vector of typical class percentages
of samples that fall into each category if LVQ neural network
is used
for i=min(lastrow):m;
percentages(i)=0;
for j=1:samples;
if lastrow(j)==i percentages(i)=percentages(i)+1;
end
end
percentages(i)=percentages(i)/samples;
end
percentages
%Generate a LVQ neural network (newlvq) with 200 hidden neurons.
Learning rate is 0.01. Learning function is learnlv1.
net=newlvq(minmax(P),200,percentages,0.01, ’learnlv1’);
%Train the network. 1000 epochs is set.
net.trainParam.epochs=1000;
net=train(net,P,C);
% Generate a linear neural network (newlind)
net=newlind(P,C);
%Make classification on trained quadrats
Dec. 17, 2009
16:46
9in x 6in
B-922
b922-ch18
1st Reading
6 Computational Ecology
out=sim(net,P);
maxout=max(out);
for i=1:samples;
for j=1:m;
if (out(j,i)==maxout(i)) outputclass(1,i)=i; outputclass(2,i)=j;
break;
end
end
end
outputclass
%Make recognition on the quadrats with unknown classification
recog=sim(net,Q);
maxout=max(recog);
for i=1:size(Q,2);
for j=1:m;
if (recog(j,i)==maxout(i)) recognized(1,i)=i; recognized(2,i)=j;
break;
end
end
end
recognized
1.2. Conventional models
1.2.1. Linear discriminant analysis
See Chap. 00 Sec. 000.
1.2.2. Response surface model (RSM)
See Chap. 00 Sec. 000.
1.2.3. Spline function
The cubic spline function used in the present study is
u(x) = Mi+1 (x − xi )3 /(6li ) + Mi (xi+1 − x)3
/(6li ) + (f(xi+1 )/ li − Mi+1 li /6)(x − xi )
+ (f(xi )/ li − Mi li /6)(xi+1 − x),
x ∈ [xi , xi+1 ],
i = 0, 1, . . . , n − 1,
(2)
where xi = i + 1, Mi = S ′′ (xi ), i = 0, 1, . . . , n; li = xi+1 − xi ,
i = 0, 1, . . . , n − 1. Mi , i = 0, 1, . . . , n, are obtained from three-bending
moment equation (Zhang, 2007b).
Dec. 17, 2009
16:46
9in x 6in
B-922
b922-ch18
1st Reading
Modeling Spatial Distribution of Arthropods
7
1.2.4. Partial differential equation
The following partial differential equation (PDE) is developed to model
spatial distribution of arthropods in a limited region G with a boundary Ŵ.
∂2 u/∂x2 + p(x, y)∂2 u/∂y2 + q(x, y)∂u/∂x
+ v(x, y)∂u/∂y + w(x, y)u
= f(x, y), (x, y) ∈ G,
u|Ŵ = (x, y),
(3)
(x, y) ∈ Ŵ,
where u(x, y) is the number of arthropod individuals in (x, y). p(x, y),
q(x, y), v(x, y), w(x, y), and f(x, y) are continuous functions on G + Ŵ.
Let xi = x0 + ih, yj = y0 + jτ, ui,j = u(xi , yj ), pi,j = p(xi , yj ),
qi,j = q(xi , yj ), vi,j = v(xi , yj ), wi,j = w(xi , yj ), fi,j = f(xi , yj ), i =
0, ±1, ±2, . . . ; j = 0, ±1, ±2, . . . ; then all differences are derived as the
following:
(∂2 u/∂x2 )ij ≈ (ui+1,j − 2ui,j + ui−1,j )/ h2 ,
(∂2 u/∂y2 )ij ≈ (ui,j+1 − 2ui,j + ui,j−1 )/τ 2 ,
(∂u/∂x)ij ≈ (ui+1,j − ui−1,j )/(2h),
(∂u/∂y)ij ≈ (ui,j+1 − ui,j−1 )/(2τ),
and the difference equation for Eq. (3) is expressed as the following:
(ui+1,j − 2ui,j + ui−1,j )/ h2 + pij (ui,j+1 − 2ui,j + ui,j−1 )/τ 2
+ qij (ui+1,j − ui−1,j )/(2h) + vij (ui,j+1 − ui,j−1 )/(2τ)
+ wij ui,j = fi,j .
(4)
Let h = τ = 1. The functions p(x, y), q(x, y), v(x, y), w(x, y), and
f(x, y), are expressed as the following linear functions:
p(x, y) = ap + bp x + cp y,
q(x, y) = aq + bq x + cq y,
v(x, y) = av + bv x + cv y,
w(x, y) = aw + bw x + cw y,
f(x, y) = af + bf x + cf y.
(5)
Dec. 17, 2009
16:46
9in x 6in
B-922
b922-ch18
1st Reading
8 Computational Ecology
The 15 parameters in Eq. (5) are obtained by fitting spatial distribution data
with a group of linear equations based on Eqs. (4) and (5).
2.
Data Description
Arthropods were collected, identified, and counted for every quadrat (1 ×
1 m2 each) of 8 × 8 quadrats on the grassland. Insects were sorted and
identified to order level and the other arthropods were identified to class
level.
For self-designed neural network, partial differential equation, spline
function, and response surface model, the following are noted:
(1) Training data. In the modeling of spatial distribution, in total 64
quadrats (n = 64) are used to train the neural network and the response
surface model. The input space is a two-dimensional space [coordinates of quadrat, e.g., (1,2), (5,7), etc.], and the output space is a onedimensional space (arthropod abundance).
(2) Cross validation. One of the cross validation methods is adopted. In
this method, each quadrat is separately removed from the input set
of 64 quadrats, and the remaining quadrats are used to train model
and to predict the removed quadrats using the trained model. As a
consequence, the cross validation may be conducted within the data set
in the same study. Comparisons between the predicted and observed
arthropod abundances are made and Pearson correlation coefficient (r)
and statistic significance are calculated to validate models.
(3) Quadrats are submitted to neural network in two ways, i.e., fixed
sequences and randomized sequences of quadrats.
For BP neural network, LVQ neural network, linear neural network,
and linear discriminant analysis, a norm of input vector is as follows: z =
n
i=1 |zi |, where z is the total number of insects in a quadrat, and n is the
number of insect orders in the quadrat. Four types of classifications are
designed to represent spatial distribution patterns at different ecological
scales (Zhang et al., 2008)
Five classes: I: z 10; II: 11 z 20; III: 21 z 30; IV: 31 z
40; V: z > 40.
Four classes: I: z 20; II: 21 z 40; III: 41 z 60; IV: z > 60.
Dec. 17, 2009
16:46
9in x 6in
B-922
b922-ch18
1st Reading
Modeling Spatial Distribution of Arthropods
9
Three classes: I: z 30; II: 31 z 60; III: z > 60.
Two classes: I: z 10; II: z > 10.
3.
Results
3.1. Modeling population size-based spatial distribution
Most arthropods found on the grassland are insects, which belong to
the orders Homoptera (523 individuals), Orthoptera (230 individuals),
Hymenoptera (110 individuals), Coleoptera (55 individuals), and Diptera
(40 individuals), etc. Other arthropods are sparsely distributed on the
grassland.
3.1.1. Modeling spatial distribution with neural network
and response surface model
The artificial neural network developed above [Eq. (1)] is used to simulate
spatial distributions of arthropods and the most abundant orders Orthoptera,
Hymenoptera, and Homoptera. The neural network is trained for the 10,000
epochs and the training goal (mse) is 0.00001. The results reveal that the
neural network has excellent simulation performance. The simulated spatial
distribution perfectly coincides with the observed data (intercept ≈ 0,
slope ≈ 1, r ≈ 1, p < 0.0001), as illustrated by Fig. 1.
Using a deviation function, stmse = mse/u2 , where u is the averaged
individuals per quadrat, together with Fig. 1, it is found that the lower
abundance will lead neural network to yield better simulation performance
[Arthropods (stmse = 6.39 × 10−3 ), Homoptera (stmse = 2.16 × 10−2 ),
Orthoptera (stmse = 5.88 × 10−7 ), Hymenoptera (stmse = 1.46 × 10−6 )].
Response surface model fits the spatial distribution of arthropods well
but the simulation performance is not as good as that of the neural network
(Table 1).
3.1.2. Cross validation of models
Cross validation, in which quadrats are submitted to neural network in
fixed sequences, demonstrates that neural network performs much better than response surface model in predicting unknown quadrats (Fig. 2).
Dec. 17, 2009
10
16:46
9in x 6in
B-922
b922-ch18
1st Reading
Computational Ecology
Figure 1. Neural network simulation of spatial distribution of arthropods. Quadrats are
submitted to neural network in fixed sequences.
Table 1. Simulation performance of response surface model.
Arthropods
Orthoptera
Hymenoptera
Homoptera
Observed
= 6.2978
+ 0.5957
× Simulated
r = 0.9496
p < 0.0001
mse = 77.1757
Observed
= 1.7310
+ 0.5183
× Simulated
r=1
p < 0.0001
mse = 7.2065
Observed
= 0.7731
+ 0.5502
× Simulated
r = 0.9647
p < 0.0001
mse = 2.7761
Observed
= 3.4850
+ 0.5735
× Simulated
r = 0.9555
p < 0.0001
mse = 47.0975
Dec. 17, 2009
16:46
9in x 6in
B-922
b922-ch18
1st Reading
Modeling Spatial Distribution of Arthropods
11
Figure 2. Cross validation of neural network and response surface model for the prediction
of spatial distribution of arthropods. Quadrats are submitted to neural network in fixed
sequences.
In most cases, response surface model produces a negative correlation between the predicted and observed abundances. As a result using
response surface model to predict spatial distribution of arthropods is not
recommended.
Neural network has a better generalization performance in the case of
larger abundance than lower abundance on the grassland (Fig. 2), which
means that compared to simulation, the neural network needs more information to train itself to produce a reasonable prediction.
Dec. 17, 2009
12
16:46
9in x 6in
B-922
b922-ch18
1st Reading
Computational Ecology
Figure 3. Cross validation of neural network for the prediction of spatial distribution
of arthropods. Quadrats are submitted to neural network in randomized sequences. Five
randomizations are conducted.
In an additional cross validation of neural network to predict the spatial
distribution of arthropods, quadrats are submitted in randomized sequences
and five randomizations are used. The results show that neural network
performs better (r = 0.5323, p < 0.0001). More than 60% of quadrats
are correctly predicted and they fall inside 95% confidence intervals of the
predicted data (Fig. 3).
The cross validation of spline function [Eq. (2)] reveals that spline
function performs worse than both neural network and response surface
model (Table 2).
Dec. 17, 2009
16:46
9in x 6in
B-922
b922-ch18
1st Reading
Modeling Spatial Distribution of Arthropods
13
Table 2. Cross validation of spline interpolation.
Arthropods
Orthoptera
Observed = 15.6520 − 0.0041 × Simulated
r = −0.0100, p > 0.01(0.9341),
mse = 1164.4
Observed = 3.8052 − 0.0545 × Simulated
r = −0.0975, p > 0.01(0.4436),
mse = 37.7524
Hymenoptera
Homoptera
Observed = 1.7092 + 0.0067 × Simulated
r = 0.0100, p > 0.01(0.9289),
mse = 15.4195
Observed = 8.3154 − 0.0148 × Simulated
r = −0.0316, p > 0.01(0.8072),
mse = 445.9147
3.1.3. Describing spatial distribution with PDE
A group of linear equations are developed according to Eqs. (4) and (5).
The coefficient matrix of the linear equations is derived from distribution data of arthropods. For total arthropods, and the most abundant three
orders, Orthoptera, Hymenoptera, and Homoptera, their coefficient matrices (15 × 36) are fully ranked (rank = 15). The parameters obtained
are listed in Table 3, from which the partial differential equation will be
Table 3. Parameters in partial differential equation [Eqs. (3) to (5)].
ap
bp
cp
aq
bq
cq
av
bv
cv
aw
bw
cw
af
bf
cf
Arthropods
Orthoptera
Hymenoptera
Homoptera
1.2535
−0.0645
−0.1102
−0.7771
0.4671
−0.5469
2.8476
−0.5548
−0.1927
3.8523
−0.3495
0.1113
21.9082
5.2626
−0.551
−0.5867
−0.0301
0.1902
−4.322
0.4873
0.2556
−0.7527
−0.1756
0.3112
−0.8764
0.0595
0.6596
−10.6057
1.3989
2.972
1.2676
−0.0113
−0.2396
−0.8041
0.1044
0.0068
−0.0643
0.0505
−0.1443
3.0785
0.0512
−0.3247
6.0289
0.7485
−1.1149
1.6496
−0.2635
−0.0236
−1.6311
0.216
−0.0558
0.9427
−0.3421
0.0872
6.1561
−0.7088
−0.1529
32.9485
−0.9004
−2.8578
Dec. 17, 2009
14
16:46
9in x 6in
B-922
b922-ch18
1st Reading
Computational Ecology
achieved for a specific taxonomic group. For example, the partial differential equation for spatial distribution of arthropods is the following:
∂2 u/∂x2 + (1.254 − 0.065x − 0.110y)∂2 u/∂y2
+ (−0.777 + 0.467x − 0.547y)∂u/∂x
+ (2.848 − 0.555x − 0.193y)∂u/∂y
+ (3.852 − 0.349x + 0.111y)u
= 21.908 + 5.263x − 0.551y,
(x, y) ∈ G,
u|Ŵ = (x, y), (x, y) ∈ Ŵ.
This equation is used to describe the spatial distribution of arthropods.
It fully fitted the spatial distribution. Theoretically this equation can be used
to extrapolate the spatial distribution of arthropods.
3.2. Modeling population scaling-based spatial distribution
3.2.1. Modeling spatial distribution with neural networks
and linear discriminant model
Four fineness levels of spatial distribution of the total insect population
per quadrat are fitted using BP, LVQ, linear network, and linear discriminant model, of which BP and LVQ neural networks are designed with the
following settings:
BP: mse=0.001; training 1000 epochs
net=newff(minmax(P),[30,m],{‘tansig’
‘purelin’},‘trainlm’,‘learngd’,‘mse’);
LVQ: training 1000 epochs
net=newlvq(minmax(P), 200,percentages,0.01, ‘learnlv1’).
It is found that BP network fits the spatial distribution patterns with zero
error (Fig. 4 and Table 4). BP network with more than two hidden layers,
each with 10 neurons, also fits these patterns with zero error. It suggests
that BP network should be the best model to fit spatial distribution patterns
of grassland insects.
Dec. 17, 2009
16:46
9in x 6in
B-922
b922-ch18
1st Reading
Modeling Spatial Distribution of Arthropods
15
Figure 4. Fitting spatial distribution patterns of grassland insects using neural networks
and linear discriminant model.
Table 4. Goodness of fit of spatial distribution patterns using variousmodels.
BP fitted
LVQ fitted
No.
classes
Correctly
fitted (%)
Total
differences
Tot. diff./
Tot. obs.
Correctly
fitted (%)
Total
differences
Tot. diff./
Tot. obs.
Five
Four
Three
Two
100
100
100
100
0
0
0
0
1
1
1
1
75
86
91
89
25
10
7
7
0.198
0.118
0.097
0.071
Linear NN fitted
Five
Four
Three
Two
67
89
89
80
31
9
8
13
Linear discri. fitted
0.246
0.106
0.111
0.131
64
88
89
80
30
10
8
13
0.238
0.118
0.111
0.131
Total observed = summation of classifications of all quadrats; total differences =
summation of absolute differences between observed and fitted classifications.
Dec. 17, 2009
16
16:46
9in x 6in
B-922
b922-ch18
1st Reading
Computational Ecology
Simulated patterns of linear network are analogous to linear discriminant model at four ecological scales (Fig. 4 and Table 4). At the finest
classification, i.e., five classes, there are three different areas on the grassland. The capability of linear methods to recognize the general trends tends
to be weak as the ecological scale increase (from the finer to the coarser,
Fig. 4). Performance of LVQ network is between BP network and linear
methods.
It is concluded that BP network is capable of approximating the details
of spatial distribution patterns. On the contrary, the general trend will be
readily detected from the fitted or recognized pattern of linear network.
3.2.2. Cross validation of models
Each of the 64 quadrats is predicted by the model trained with the remaining
63 quadrats. Combining both training time and recognition performance,
BP and LVQ neural networks are created with the following settings:
BP: mse==0.001, training 1000 epochs
net=newff(minmax(P),[10,10,10,10,10,10,m],{‘tansig’
‘purelin’},‘trainlm’,‘learngd’, ‘mse’);
LVQ: training 500 epochs
net=newlvq(minmax(P),50,percentages,0.01, ’learnlv1’).
The results reveal that the correctly recognized quadrats increase but
the general trend tends to be weak at the expansion of ecological scale
(from the finer to the coarser, Fig. 5 and Table 5). BP network tends to yield
details but linear methods tend to produce an overall trend. LVQ network
is still an intermediate method.
Recognition performance is proved to be dependent on not only ecological scale but also classification criteria.
4.
Discussion
The artificial neural network developed in this study shows excellent performance in the simulation of spatial distribution of arthropods. Both
response surface model and spline function are also capable of fitting the
Dec. 17, 2009
16:46
9in x 6in
B-922
b922-ch18
1st Reading
Modeling Spatial Distribution of Arthropods
17
Figure 5. Reconstruction of spatial distribution patterns of grassland insects using neural
networks and linear discriminant model.
Table 5. Recognition performance of spatial distribution patterns using various models.
BP
LVQ
No.
classes
Correctly
fitted (%)
Total
differences
Tot. diff./
Tot. obs.
Correctly
fitted (%)
Total
differences
Tot. diff./
Tot. obs.
Five
Four
Three
Two
48
78
83
66
54
15
13
22
0.429
0.176
0.181
0.222
55
77
84
70
45
20
12
19
0.357
0.235
0.167
0.192
Linear NN
Five
Four
Three
Two
58
84
89
77
39
12
8
15
Linear discri.
0.309
0.141
0.111
0.152
64
83
89
78
34
13
8
14
0.269
0.153
0.111
0.141
Total observed = summation of classifications of all quadrats; total differences =
summation of absolute differences between observed and fitted classifications.
Dec. 17, 2009
18
16:46
9in x 6in
B-922
b922-ch18
1st Reading
Computational Ecology
spatial distribution of arthropods. However, the simulation performance
of response surface model is worse than neural network. Cross validation
proves that neural network has much better performance than response surface model and spline function in predicting unknown quadrats. Submitting
quadrats in randomized sequences helps to yield the confidence interval of
the results in the neural network modeling because a series of stochast outputs are produced. The partial differential equation developed in this study
may also be used to extrapolate spatial distribution of arthropods.
Among BP, LVQ and linear neural networks, BP neural network is the
best algorithm to fit spatial distribution patterns of insects. BP network is
able to describe the spatial details of distribution patterns. Linear neural
network performs better in detecting the general trends of distribution patterns. Performance of LVQ network is always between that of BP network
and linear methods.
March 22, 2010
15:57
9in x 6in
B-922
b922-ch19
1st Reading
CHAPTER 19
Risk Assessment of Species
Invasion and Establishment
To assess the establishment risk of an invasive species, the characteristics
of the invasive species and the ecosystems susceptible to establishment
need to be examined (Worner and Gevery, 2006). Successful establishment
of a species depends on both biotic and abiotic factors in the new environment, including climate and environmental conditions of the habitat in
the area of invasion. Climate is considered to be the most important factor influencing the establishment of insect pests in new locations (Worner,
1988; Peacock et al., 2006). It is necessary to develop quantitative methods
that can detect the climatic factors influencing the establishment for individual species from existing climatic and species distribution data (Anderson,
2005; Dentener et al., 2002; Park et al., 2003; Sutherst and Maywald, 2005;
Worner, 1994).
Various models and tools have been developed and applied to predict
the establishment of invasive species in new regions (Baker et al., 2005;
Watts and Worner, 2008, 2009; Worner and Gevery, 2006). These include
regression techniques (Eyre et al., 2005; Lehmann et al., 2002), multivariate
analysis (Peacock et al., 2006), generalized additive models (Guisan et al.,
2002), evolving computation (Soltic et al., 2004) and stochastic simulations
(Rossi et al., 1993). Because of the nonlinearity of invasion problems, it
is hard for those conventional methods to have good performance in the
prediction of species invasion and establishment. For this reason, artificial
neural networks are being used in the risk assessment of species invasion
and establishment.
1
March 22, 2010
15:57
9in x 6in
B-922
b922-ch19
1st Reading
2 Computational Ecology
This chapter presents some of the recent applications on neural network
risk assessment of species invasion and establishment.
1.
Invasion Risk Assessment Based
on Species Assemblages
Worner and Gevrey (2006) used the SOM network to model insect pest
species assemblages for species invasion risk assessment. Their study is
described briefly here.
1.1. Data source
In this study, the data of 844 phytophagous insect pests for 459 geographical areas worldwide are examined. The presence and the absence of each
species in each geographical area constitute a data matrix with the size
844 × 459.
1.2. Model description
According to Worner and Gevery, the SOM projection of data onto twodimensional space allows the classification of global geographical areas
according to the similarity of their pest species assemblages. The SOM
network is composed of two layers: the input layer and the output layer
(Kohonen, 1995). The output layer is represented by a map or a rectangular grid with l × m neurons. A batch learning algorithm (Kohonen and
Somervuo, 1998), significantly faster and no need to specify learning rate,
is used in SOM of this study.
The input layer has 844 neurons connected to the 459 geographical
areas. In total 459 sample vectors are thus achieved. The number of neurons in output layer is defined by 5(n)1/2 , where n is the number of training
samples. There are connection weights between input neurons (844 neurons) and output neurons. Several SOM networks with different sizes are
generated and compared to achieve a final network. Combining both network errors and computer limitation, a SOM network with 108 neurons,
gridded as 12 rows and 9 columns, is finally obtained.
Each neuron of the output layer in the trained SOM has a vector with
element values between 0 and 1, which represents a pest species assemblage.
March 22, 2010
15:57
9in x 6in
B-922
b922-ch19
1st Reading
Risk Assessment of Species Invasion and Establishment
3
The value of element denotes the risk index, i.e., a species’ potential to be
present or to be associated with the geographical areas within each neuron.
All the geographical areas associated with the same neuron have a similar
pest species assemblage composition. A species present in one area but not
in another in the same neuron is considered to have a high risk of invasion
to that geographical area.
1.3. Results
The resultant geographical areas on two-dimensional space of SOM are
again clustered using cluster analysis. Neighbors on the grid are combined to a cluster. The results indicate that two species, Mediterranean
fruit fly Ceratitis capitata (Wiedemann), and the gypsy moth Lymantria
dispar L., both absent in New Zealand, are high-risk pests. C. capitata has
a higher risk (risk index = 0.73) and L. dispar has a lower risk of invasion
(risk index = 0.31).
2.
Determination of Abiotic Factors Influencing
Species Invasion
The MLP network was used by Watts and Worner (2008) to determine the
relative importance of abiotic factors that influence the establishment of
invasive insect species. In this method, insect pest species are divided into
two groups, those species that are recorded as being present in a region,
and those that are not. The non-established species are ordered according
to the threat posed by the species.
2.1. Data source
In this study the climate data for each of 459 geographical areas worldwide,
and the presence and the absence of each species of 844 phytophagous insect
pests are used.
In total 135 climate variables involving temperature, rainfall, soil moisture, heat (degree-days), and their ranges, etc., are used as the input variables
of MLP. The data for each variable are linearly normalized to the range of
0 to 1.
March 22, 2010
15:57
9in x 6in
B-922
b922-ch19
1st Reading
4 Computational Ecology
The data were randomly split into two major sets. The first data set
(training and test set) contains 80% of the data. The second is the validation
set used to perform an independent evaluation of the prediction performance
for each target species.
2.2. Model description
The standard MLP, with unmodified backpropagation with momentum
learning algorithm (Rumelhart et al., 1986), is used in the study of Watts and
Worner. The numbers of hidden neurons, training epochs, and learning rate
and momentum are determined by examining training and generalization
performance. Three hidden neurons are finally built in MLP. The method
used in their study is similar to that suggested in Flexer (1996) and Prechelt
(1996).
The contributions of each input neuron to the output of the network are
calculated by the connection weight method of Olden et al. (2004).
In this study the performance of MLP for prediction is evaluated by
Cohen’s Kappa statistics (Cohen, 1960). In the sensitivity analysis, all input
variables but the variable investigated are given their mean values, and
the values of the input variable being investigated vary across the range
of 0 to 1.
2.3. Results
The results reveal that MLP network is able to learn the relationships
between climate variables and species presence–absence. It is able to predict the establishment of insect pest species from climate variables. The
trainin g performance for the non-established species is in general lower
than that for established species. Spring and summer rainfalls, and autumn
temperatures are significant positive factors influencing the establishment
of those species investigated.
March 22, 2010
16:57
9in x 6in
B-922
b922-ch20
1st Reading
CHAPTER 20
Prediction of Surface Ozone
Ozone (O3 ) is a trace reactive oxidant that absorbs ultraviolet radiation and
stabilizes atmospheric temperature (Lars, 2007; Yazdanpanah et al., 2008).
Ozone distributes over the stratosphere (20–30 km above the earth’s surface) and the troposphere (0–15 km above the earth’s surface). Ozone in
the stratosphere is naturally generated and protects life from injury. Nevertheless, ozone of the troposphere originates in both human’s activities
and natural changes, and is harmful to life (Agirre-Basurko et al., 2006;
Yazdanpanah et al., 2008). Overall, global ozone concentration has been
declining at the mid-latitudes of the Northern Hemisphere during the last
two decades (Aires et al., 2002). The thickness of the stratospheric ozone
layer has decreased by about 0.5% per year for all latitudes since 1974 as a
result of ozone breakdown by chlorine released from emitted chlorofluorocarbons (Rozema et al., 2005; Yazdanpanah et al., 2008). However, ozone
concentration always changes with space and time. The prediction of daily
total ozone therefore is an important issue (Cardelino et al., 2001; Simpson
and Layton, 1983).
Surface ozone is largely dependent upon a number of highly nonlinear meteorological factors (Agirre-Basurko et al., 2006; Zanetti, 1990). It
has been predicted by various researchers using artificial neural networks
(Agirre-Basurko et al., 2006; Ballester et al., 2002; Pastor-Barcenas et al.,
2005; Gardner and Dorling, 1998; Hornik et al., 1989; Yazdanpanah et al.,
2008). This chapter presents some of the successful case studies.
1
March 22, 2010
16:57
9in x 6in
B-922
b922-ch20
1st Reading
2 Computational Ecology
1.
BP Prediction of Daily Total Ozone
A BP network has been used successfully to predict daily total ozone.
Further details can be found in Yazdanpanah et al. (2008).
1.1. Model description
In this study, a BP network, the standard backpropagation training algorithm
with the extended Delta Bar Delta (DBD) learning rule, is used. There
is only a hidden layer in the BP network. The transfer functions of the
BP network are sigmoid hyperbolic tangent and sine functions. The sine
transfer function takes the trigonometric sine of the input modified by a
gain. Extended DBD is an extension of DBD which calculates a momentum
term for each connection. In the Delta rule, the error in the output layer
is computed as the difference between the desired output and the actual
output. The error is transformed by the derivative of the transfer function,
and is propagated to prior layers where it is accumulated (Paschalidou et al.,
2007).
The best BP architecture is determined by the selection procedure for
forecasting ozone with meteorological variables (Vahidinasab, 2008; Yazdanpanah, 2002).
1.2. Data description
Daily geopotential height, air temperature, dew point temperature, wind
direction and speed, and previous day’s total ozone are selected as input
variables (6 input variables and 1 output variable, i.e., today’s total ozone)
of BP network. Data of these variables were collected from an ozonometric
station where radio-sounding measurement was done during 1997–2004.
1.3. Results
The results reveal that the BP network with 11 neurons in hidden layer is the
best. Training the network (transfer function: sigmoid) 50 000 epochs, the
corrected R2 and MSE for simulation are 0.8145 and 0.0380, respectively,
and that for prediction are 0.8131 and 0.0399, respectively.
This study shows that different learning rules and transfer functions
result in different model performance.
March 22, 2010
16:57
9in x 6in
B-922
b922-ch20
1st Reading
Prediction of Surface Ozone
3
In the sensitivity analysis, sequentially one of the input variables is
removed and the network with 5 input variables is trained again. Yazdanpanah et al. use a weight index, wi , to evaluate the importance of ith input
variable:
Fi = (MSEi )1/2 − (MSEt )1/2 /(MSEt )1/2 ,
wi = Fi /
Fj ,
where MSEi is MSE with variables in exception of variable i, and MSEt
is MSE with all input variables. It is demonstrated that the input variables
with importance value from large to small are geopotential height, total
ozone of the previous day, air temperature, wind speed, wind direction, and
dew point temperature.
Yazdanpanah et al. suggest that the predictive power of neural network
is excellent when the input variables consist of temperatures (dry and dew
point) and geopotential heights at standard levels of 100, 50, 30, 20 and
10 hPa with their wind speed and direction, together with previous day’s
total ozone.
2.
MLP Prediction of Hourly Ozone Levels
MLP was used in predicting hourly ozone levels. Further details can be
found in Agirre-Basurko et al. (2006).
2.1. Model description
Both multilayer perceptron (MLP) and multiple linear regression are used
in the study of Agirre-Basurko et al. (2006).
The MLP used in their study has an input layer (N neurons), a hidden
layer (S neurons) and an output layer (1 neuron). The transfer functions are
the hyperbolic tangent (tansig) for the hidden layer and the linear function for the output layer. The scaled conjugated gradient algorithm is used
to train MLP network (Moller, 1993). A stopping rule is used to avoid overfitting (Sarle, 1995). With the difference in input variables, MLP is used as
MLP1 and MLP2. In total 5 variables are determined to be input variables
of MLP1. There are 9 input variables for MLP2. The number of hidden
neurons of MLP is calculated using a simple rule of Amari et al. (1997).
March 22, 2010
16:57
9in x 6in
B-922
b922-ch20
1st Reading
4 Computational Ecology
2.2. Data description
The data used in the work of Agirre-Basurko et al. are hourly current (at
time t) data and historical (at time t-z, z = 1, . . . , 15) data from the air
pollution network and the traffic network of Bilbao during the years 1993
to 1994. Data from 1993 are used to build the models and data from 1994
are used to test the models.
Potential input variables include wind speed, wind direction, temperature, relative humidity, atmospheric pressure, solar radiation, thermal gradient, number of vehicles in a unit time (NV), occupation percentage (OP:
the fraction of time for which the area of road is occupied by a vehicle),
velocity (NV/OP), ozone level, etc. In multiple linear regression, NO2 level
is also used as an independent variable.
Correction coefficient between the observed and predicted (R), normalized MSE (NMSE), the factor of two (FA2), fractional bias (FB), and
fractional variance (FV), are used to represent model performance.
2.3. Results
The results indicate that the best forecast has NMSE, FV and FB equal to
zero, and R and FA2 are approximately 1. The ozone forecasts O3 (t + i),
i = 1, 2, . . . , 8, show better coincidence with the observed ones.
In this study the ozone forecasts of MLP networks are proved to be
better than multiple linear regression.
March 20, 2010
14:48
9in x 6in
B-922
b922-ch21
1st Reading
CHAPTER 21
Modeling Dispersion
and Distribution of Oxide
and Nitrate Pollutants
Oxide and nitrate pollutants are important sources that lead to environmental pollution. The levels of oxide and nitrate pollutants vary with space and
time. Mechanistic and statistical models have been developed to forecast
pollutant levels. Mechanistic models are based on explicit mathematical
relationships that describe the processes involved in the formation of pollutants (Zanetti, 1990; Agirre-Basurko et al., 2006). A widely used mechanistic model is UAM (Urban Airshed Model; Scheffe and Morris, 1993).
However, mechanistic models are overall more suitable over large areas and
require detailed data on the emission and transportation of pollutants and
meteorological conditions. Statistical models are empirical models. They
are used to establish the input–output relationships without understanding
the intrinsic mechanisms of the formation of pollutants (Agirre-Basurko
et al., 2006). The examples of statistical models used in pollutant prediction include time series analysis (Simpson and Layton, 1983; Hsu, 1992)
and multiple linear regression (Cassmassi, 1998; Cardelino et al., 2001).
Oxide and nitrate levels are dependent on human activities and meteorological conditions and their relationships are highly nonlinear (Gardner
and Dorling, 1999, 2000). Neural networks are therefore used to model
these relationships (Gardner and Dorling, 1998, 1999, 2000; Elkamel et al.,
2001). For example, they have been developed to model nitrogen dioxide
dispersion (Nagendra and Khare, 2006), to assess nitrate contamination of
1
March 20, 2010
14:48
9in x 6in
B-922
b922-ch21
1st Reading
2 Computational Ecology
rural private wells (Ray and Klindworth, 2000), and simulate nitrate leaching to ground water (Kaluli et al., 1998). This chapter presents some of the
recent case studies.
1.
Modeling Nitrogen Dioxide Dispersion
Nagendra and Khare (2006) used a feedforward three-layer neural network
to model nitrogen dioxide dispersion and confirmed the effectiveness of the
neural network in the pollutant prediction.
1.1. Model description
In the neural network of Nagendra and Khare, the number of neurons of
input layer is the number of input variables, i.e., meteorological and traffic
variables. The output layer contains one neuron, i.e., the 24 h average NO2
concentration. There is one hidden layer. The number of hidden neurons is
determined by training neural network and comparing the training errors.
Transfer functions of the hidden neurons are hyperbolic tangent functions.
The input and output neurons use identity function as target values (Gardner
and Dorling, 2000).
The neural network is trained by a supervised back-propagation learning algorithm (Haykin, 2001). Computed by the gradient descent algorithm,
the back-propagation training algorithm yields an “approximation” of the
trajectory in weight space (Battiti, 1992). All weights and bias are initialized
as the uniformly distributed random values in the range [−2.4/Fi , 2.4/Fi ],
where Fi is the total number of inputs. A smaller range will reduce the
probability of the saturation of the neurons in the network and thus avoid
the occurrence of the error gradients (Wasserman, 1989).
The performance of neural network is evaluated by using RMSE
(Root Mean Square Error), MBE (mean bias error), and R2 (determinant coefficient), etc. The “d” index, a measure of the degree to which
the predictions are error free, is also used in this study (Willmott, 1982):
d = 1 − (pi − oi )2 / (|pi − ō| + |oi − ō|)2 , where pi : predicted value,
oi : observed value, i = 1, 2, . . ., n, and ō: average of the observed values.
For the neural modeling of NO2 with both meteorological and traffic
variables (ANNNO2A ), 11 meteorological (cloud cover, humidity, mixing
height, pressure, Pasquill stability, sun shine hour, temperature, visibility,
March 20, 2010
14:48
9in x 6in
B-922
b922-ch21
1st Reading
Modeling Dispersion and Distribution of Oxide and Nitrate Pollutants
3
sine and cosine wind direction, wind speed) and 6 traffic variables [two
wheeler, three wheeler, four wheeler (gasoline), four wheeler (diesel), CO
and NO2 source strength] are used as the input variables. There are 17
input neurons, 5 hidden neurons, and 1 output neuron. For the neural modeling with only meteorological variables (ANNNO2B ), 10 meteorological
variables are used as the input variables, and for the neural modeling with
only traffic variables (ANNNO2C ), 5 traffic variables are used as the input
variables.
1.2. Data description
The data of 24 h average NO2 concentrations, meteorological and traffic
variables are the observed values for a period of three years from 1997 to
1999. Two-year data from 1997 to 1998 are used for the model training and
one-year data in 1999 are used for model test and evaluation.
1.3. Results
The results of this study indicate that ANNNO2A performs better than
ANNNO2B . ANNNO2C performs pooly in the evaluation of the model.
The model performance is improved as the time-averaging interval
increases, which reveals an increasing trend of nonlinearity to linearity.
2.
Simulation of Nitrate Distribution
in Ground Water
Besides conventional neural networks, modular neural networks have been
used also in the pollutant prediction. For example, they were used to predict two-dimensional nitrate distribution in ground water by Almasri and
Kaluarachchi (2005).
2.1. Model description
Modular neural network (MNN) is composed of multiple expert networks
(modules) competing to learn different aspects of a problem and a gating
network (control module) (Haykin, 1994). The gating network assigns different features of the input space to the different expert networks (Neural
March 20, 2010
14:48
9in x 6in
B-922
b922-ch21
1st Reading
4 Computational Ecology
Ware, 2000). Each expert network yields an output corresponding to the
input vector and the output of MNN is the weighted sum of these outputs
with the weights equal to the output of the gating network. MNN is trained
by backpropagation algorithm (Rumelhart et al., 1986; Maier and Dandy,
1998). In the study of Almasri and Kaluarachchi (2005), There is only one
hidden layer in the MNN. There are 13 input variables, 14 hidden neurons,
and 1 output variable.
MNN is developed using the Neural Works Professional II/Plus
(Neural Ware, 2000). In this software MNN has parameters such as learning count (194850), learning rate for the output (0.15) and hidden layer
(0.3), momentum (0.4), seed (257), transfer function (tanh), epoch (16) ,
scaling intervals (Input: [−1, 1]; output: [−0.8, 0.8]), and learning rule (the
extended delta-bar-delta), etc.
Model performance is evaluated using RMSE, correlation coefficient, etc.
2.2. Data description
The study area is divided into 100 × 100 m2 of cells. The distributions of
on-ground nitrogen loadings and ground water recharge are estimated based
on practical data. The nitrate concentration data are obtained from various
agencies. A total of 665 input–output patterns are divided into training and
testing data sets.
2.3. Results
According toAlmasri and Kaluarachchi, MNN is superior when considering
the upgradient contributing areas of nitrate receptors in formulating the
input–output response patterns. However, it is to some extent inferior due
to the lack of training patterns. It performs poorly for the extreme input–
output patterns. It is found that using the total number of input and output
neurons as the number of the hidden neurons yields the optimal MNN
performance and the training time is also shorter.
A comparison between MNN and artificial neural network (ANN;
setting of neurons: 13-14-1) indicates that MNN is superior to ANN.
March 22, 2010
16:57
9in x 6in
B-922
b922-ch22
1st Reading
CHAPTER 22
Modeling Terrestrial Biomasss
Terrestrial earth harbors huge amount of biomass and supports numerous
organisms. On a temperate grassland, plants have the largest biomass
(20 000 kg/ha), followed by arthropods (1 000 kg/ha), microorganisms
(7 000 kg/ha), mammals (1.2 kg/ha), birds (0.3 kg/ha), and nematodes
(120 kg/ha) (Pimental et al., 1992; Chen and Ma, 2001). The biomass estimation for various terrestrial landscapes is a major challenge (Schino et al.,
2003). Some cases of biomass estimation based on neural networks are
presented in this chapter.
1.
Estimation of Aboveground Grassland Biomass
Grasslands and savannas cover nearly 40% of the terrestrial earth (Chapin
et al., 2001). Grassland biomass is determined by plants composition, soil
condition and topography, and meterological conditions. Three types of
methods can be used to estimate grassland biomass (Moreau et al., 2003;
Xie et al., 2009): (1) the empirical relationships of spectral vegetation
indices, (2) Monteith’s efficiency model, and (3) the canopy process-based
models (van der Werf et al., 2007). In recent years the artificial neural networks are suggested to be used in the biomass estimation. This section
describes the MLP approach of biomass estimation of grassland. Details
can found in Xie et al. (2009).
1.1. Data description
In total 568 sample sites were selected and the geographical coordinate of each site was recorded. Five quadrats (1 × 1 m2 ) of each site
1
March 22, 2010
16:57
9in x 6in
B-922
b922-ch22
1st Reading
2 Computational Ecology
were harvested and the aboveground biomass recorded. Different spectral bands of images, i.e., Band 1, Band 2, Band 3 (spectral reflection of
read band), Band 4 (spectral reflection of near infrared band), and NDVI
[NDVI = (Band 4 − Band 3)/(Band 4 + band 3)] and aspect were recorded
and normalized.
1.2. Model description
The MLP model is developed using Statistica 6.0 neural network module.
MLP has one input layer, one hidden layer, and one output layer. The output
variable of MLP is the aboveground biomass and 7 statistically important
input variables are Band 1, Band 3, Band 4, Band 5, Band 7, NDVI, and
aspect. The input layer uses a linear function. The hidden layer is set with
5 to 10 neurons respectively and the resultant networks are tested for their
performance. Finally 7 hidden neurons are determined based on the balance
between model complexity and performance. The transfer functions of the
hidden neurons are sigmoid transfer functions.
At the beginning, a backpropagation algorithm is used to train the network and a conjugate gradient descent algorithm is then used to converge
and optimize MLP. The learning rate is 0.75 and the momentum is 0.45.
Model performance is evaluated using RMSE and RMASEŴ (relative
RMSE).
1.3. Results
According to Xie et al., Band 7 is the most sensitive variable to predict
biomass, followed by Band 1, Band 3, Band 5, Band 4, NDVI, and aspect.
Overall, both MLP and multiple linear regression have the relative RMSE.
However, the simulation and prediction performance of MLP is superior to
that of multiple linear regression in terms of RMSE and MLPŴ . The performance of MLP and multiple linear regression is the best for the training
set, medium for the entire dataset, and worst for the testing set.
2.
Estimation of Trout Biomass
In their study, Lek and Baran (1997) analyzed relationships between macrohabitat variables and the biomass of brown trouts, Sulmo trutta L., in
Pyrenees mountain streams.
March 22, 2010
16:57
9in x 6in
B-922
b922-ch22
1st Reading
Modeling Terrestrial Biomasss
3
2.1. Data description
In total 232 units are sampled. Eight habitat variables (mean Froude number,
mean depth, mean bottom velocity, mean surface velocity, surface of shelter,
surface of total cover, surface of deep water, and elevation) and two biomass
variables (biomass of total trouts and cacheable trouts) are recorded for
these units.
2.2. Model description
Lek and Baran use the BP network in their study. The BP network contains
three layers, i.e., the input layer (8 neurons), the hidden layer (10 neurons),
and the output layer (1 neuron). The backpropagation learning rule is used
to train the network (Rumelhart et al., 1986; Smith, 1994).
The determinant coefficient between the observed and estimated is used
to evaluate the network model.
2.3. Results
With the determinant coefficients between 0.85 and 0.9, the neural network
performs well in the training process. The network model is also superior in
the prediction of trout biomass. The determinant coefficients are between
0.74 to 0.88 for the biomass prediction of total and cacheable trouts.
March 23, 2010
19:50
9in x 6in
B-922
b922-ref
1st Reading
References
Abdel-Aal RE. Hourly temperature forecasting using abductive networks. Engineering
Applications of Artificial Intelligence, 17: 543–556, 2004.
Abrahart RJ, White SM. Modelling sediment transfer in Malawi: Comparing backpropagation neural network solutions against a multiple linear regression benchmark using small
data set. Physics and Chemistry of the Earth (B), 26(1): 19–24, 2001.
Acharya C, Mohanty S, Sukla LB et al. Prediction of sulphur removal with Acidithiobacillus
sp. using artificial neural networks. Ecological Modelling, 190: 223–230, 2006.
Agirre-Basurko E, Ibarra-Berastegi G, Madariaga I. Regression and multilayer perceptronbased models to forecast hourly O3 and NO2 levels in the Bilbao area. Environmental
Modelling & Software, 21(4): 430–446, 2006.
Aires F et al. A regularized neural net approach for retrieval of atmospheric and surface
temperature with the IASI instrument. Journal of Applied Meteorology, 41: 144–159,
2002.
Albus JS. A theory of cerebellar function. Mathematical Biosciences, 10: 25–61, 1971.
Almasri MN, Kaluarachchi JJ. Modular neural networks to predict the nitrate distribution
in ground water using the on-ground nitrogen loading and recharge data. Environmental
Modelling and Software, 20: 851–871, 2005.
Altieri MA. Biodiversity and Pest Management in Agroecosystems. Haworth Press, New
York, USA, 1994.
Altieri MA. Agroecology: The Science of Sustainable Agriculture. Westview Press, Boulder,
Colorado, USA, 1995.
Altieri MA, Letourneau DK. Vegetation diversity and insect pest outbreaks. CRC Critical
Review in Plant Science, 2: 131–169, 1984.
Amari SI. Field theory of self-organizing neural nets. IEEE Transactions on Systems, Man
and Cybernetics, 13: 741–748, 1983.
Amari SI. Mathematical foundation of Neurocomputing. Proceedings of the IEEE on Neural
Networks, 78: 1443–1463, 1990.
Amari S. Differential Geometrical Methods in Statistics. Springer Lecture Notes in Statistics
Vol. 28. Springer-Verlag, Berlin, Germany, 1985.
1
March 23, 2010
19:50
9in x 6in
B-922
b922-ref
1st Reading
2 Computational Ecology
Amari S. Differential geometry in statistical inference. Proc. ISI, 46th Session of the ISI,
52(2): 321–338, 1987.
Amari S. Differential geometry of a parametric family of invertible linear systemsRimannian metric, dual affine connections and divergence. Mathematical Systems Theory,
20: 53–82, 1987.
Amari S. On mathematical methods in the theory of neural networks. Proc. 1st IEEE ICNN,
Vol. III, pp. IU 3-Ell 10, 1987.
Amari S. Information geometry of EM and EM algorithm for neural networks. Neural
Networks, 8(9): 1379–1408.
Amari S. The natural gradient learning algorithm for neural networks. Theoretical Aspects
of Neural Computation — A Multidisciplinary Perspective, Wong KYM, King I, Yeung
DY (Eds.). Hong Kong International Workshop. Springer, 1998.
Amari S, Murata N, Muller K, Finke M,Yang HH. Asymptotic statistical theory of overtraining and cross-validation. IEEE Transactions on Neural Networks, 85: 985–996, 1997.
Anderson JA. A simple neural network generating an interactive memory. Mathematical
Biosciences, 14: 197–220, 1972.
Anderson JA, Rosenfeld E. Neurocomputing: Foundations of Research. MIT Press, Cambridge, USA, 1989.
Andersen MC. Potential applications of population viability analysis to risk assessment for
invasive species. Human and Ecological Risk Assessment, 11: 1083–1095, 2005.
Andow DA. Vegetational diversity and arthropod population response. Annual Review of
Entomology, 36: 561–586, 1991.
Anthony M, Biggs N. Computational Learning Theory. Cambridge University Press, Boston,
USA, 1992.
Baker R, Cannon R, Bartlett P, Barker I. Novel strategies for assessing and managing the
risks posed by invasive alien species to global crop production and biodiversity. Annals
of Applied Biology, 146: 177–191, 2005.
Ballester EB, Valls GCI, Carrasco-Rodriguez JL et al. Effective 1-day ahead prediction
of hourly surface ozone concentrations in eastern Spain using linear models and neural
networks. Ecological Modelling, 156: 27–41, 2002.
Battiti R. First and second order methods for learning between steepest descent and Newton’s
method. Neural Computation, 4: 141–166, 1992.
Berry T, Linoff J. Data Mining Techniques. JohnWiley and Sons, New York, USA, 1997.
Bian ZQ, Zhang XG. Pattern Recognition (2nd Edition). Tsinghua University Press, Beijing,
China, 2000.
Bork EW, Hudson RJ, Bailey AW. Upland plant community classification in Elk Island
National Park, Alberta, Canada, using disturbance history and physical site factors. Plant
Ecology, 130: 171–190, 1997.
Bradshaw CJA, Davis LS, Purvis M et al. Using artificial neural networks to model the
suitability of coastline for breeding by New Zealand fur seals (Arctocephalus forsteri).
Ecological Modelling, 148: 111–131, 2002.
March 23, 2010
19:50
9in x 6in
B-922
b922-ref
1st Reading
References
3
Breiman L. Statistical modeling: The two cultures (with discussion). Statistical Science, 16:
199–231, 2001.
Broomhead DS, Lowe D. Multivariable functional interpolation and adaptive networks.
Complex Systems, 2: 321–355, 1988.
Brown KS Jr. Conservation of neotropical insects: Insects as indicators. The Conservation
of Insects and Their Habitats, Collins NM, Thomas JA (Eds.). Academic Press, London,
England, 1991.
Bunge J, Fitzpatrick M. Estimating the number of species: A review. Journal of American
Statistician Association, 88: 364–373, 1993.
Burden R, Faires JD. Numerical Analysis (7th Edition). Thomson Learning, Inc., New York,
USA, 2001.
Burnham KP, Overton WS. Estimation of the size of a closed population when capture
probabilities vary among animals. Biometrika, 65: 623–633, 1978.
Burnham KP, Overton WS. Robust estimation of population when capture probabilities vary
among animals. Ecology, 60: 927–936, 1979.
Cardelino C, Chang M, St. John J et al. Ozone predictions in Atlanta, Georgia: Analysis
of the 1999 ozone season. Journal of the Air and Waste Management Association, 51:
1227–1236, 2001.
Carpenter GA, Grossberg S.A massive parallel architecture for a self-organing neural pattern
recognition machine. Computer Vision, Graphics, and Image Processing, 37: 54–115,
1987.
Carpenter GA, Grossberg S. ART2: Self-organzation of stable category recognition codes
for analog input patterns. Applied Optics, 26(23): 4019–4930, 1987.
Carpenter GA, Grossberg S. ART3: Hierachical search using chemical transmitters in selforganizing pattern recognition architectures. Neural Networks, 3(23): 129–152, 1990.
Carpenter GA, Grossberg S, Reynolds J. ARTMAP: Supervised real-time learning and
classification of nonstationary data by a self-organizing neural network. Neural Networks,
4(5): 169–181, 1991.
Cassmassi JC. Objective ozone forecasting in the South Coast Air Basin: Updating the objective prediction models for the late 1990s and southern California ozone study (SCOS97NARSTO) application. Proceedings of the 12th Conference on Numerical Weather Prediction 54–58, American Meteorology Society, Boston, MA, USA, 1998.
Castellano G, Fanelli AM, Pelillo M. An iterative pruning algorithm for feedforward neural
networks. IEEE Transactions on Neural Networks, 8(3): 519–531, 1997.
Cereghino R, Giraudel JL, Compin A. Spatial analysis of stream invertebrates distribution
in the Adour-Garonne drainage basin (France), using Kohonen self organizing maps.
Ecological Modelling, 146: 167–180, 2001.
Chao A. Non-parametric estimation of the number of classes in a population. Scandinavian
Journal of Statistics, 11: 265–270, 1984.
ChaoA, Lee SM. Estimating the number of classes via sample coverage. Journal of American
Statistician Association, 87: 210–217, 1992.
March 23, 2010
19:50
9in x 6in
B-922
b922-ref
1st Reading
4 Computational Ecology
Chapin FS. Effects of plant traits on ecosystem and regional processes: A conceptual framework for predicting the consequences of global change. Annals Botany, 91: 1–9, 2003.
Chapin FS, Sala OE, Huber-Sannwald E. Global Biodiversity in a Changing Environment:
Scenarios for the 21st Century. Springer-Verlag, New York, USA, 2001.
Charalambous C. Conjugate gradient algorithm for efficient training of artificial neural
networks. IEEE Proceedings G, 139(3): 301–310, 1992.
Chen JX. Course Notes on Algebraic Topology. Higher Education Press, Beijing, China,
1987.
Chen LZ, Ma KP. Biodiversity Science: Principles and Practices. Shanghai Science and
Technology Press, Shanghai, China, 2001.
Chen XS, Chen WH. Course Notes on Differential Geometry. Beijing University Press,
Beijing, China, 1980.
Chen XR et al. Modern Regression Analysis. Anhui Education Press, Hefei, China, 1987.
Chen K, Xu L, Chi H. Improved learning algorithms for mixture of experts in multiclass
classification. Neural Networks, 12(9): 1229–1252, 1999.
Cherkassky V, Lari-Najafi H. Constrained topological mapping for nonparametric regression
analysis. Neural Networks, 4: 27–40, 1991.
Churing Y. Backpropagation, Theory, Architecture and Applications. Lawrence Erbaum
Publishers, New York, USA, 1995.
Cohen ACJ. Simplified estimators for the normal distribution when samples are singly
censored sample. Technometrics, 1: 217–237, 1959.
Cohen ACJ. Tables for maximum likelihood estimates: Singly truncated and singly censored
sample. Technometrics, 3: 535–541, 1961.
Cohen JE. Food webs and niche space. Monographs in Population Biology 11, Princeton
University Press, Princeton, USA, 1978.
Cohen JE et al. Improving food webs. Ecology, 74: 252–258, 1993.
Cohen J. A coefficient of agreement for nominal scales. Educational and Psychological
Measurement, 18(20): 3746, 1960.
Coleman BD, Mares MA, Willig MR, HsiehYH. Randomness, area, and species richness.
Ecology, 63: 1121–1133, 1982.
Cowell RK. Human aspects of biodiversity: An evolutionary perspective. Biological Diversity and Global Change, Solbrig OT, van Emden HM, van Oordt PGWJ (Eds.). International Union of Biological Sciences, Monograph No. 8. IUBS Press, Paris, France,
1992.
Colwell RK, Coddington JA. Estimating terrestrial biodiversity through extrapolation. Philosophical Transactions of the Royal Society London B, 345: 101–108, 1994.
Dempster AP, Laird NM,Rubin DB. Maximum likelihood from imcomplete data via EM
algorithm. Journal of the Royal Statistical Society (Series B), 39: 1–38, 1977.
Dentener PR, Whiting DC, Connolly PG. Thrips palmi Karny (Thysanoptera: Thripidae):
Could it survive in New Zealand? New Zealand Plant Protection, 55: 18–22, 2002.
Department of Mathematics of Nanjing University. Ordinary Differential Equations. Science
Press, Beijing, China, 1978.
March 23, 2010
19:50
9in x 6in
B-922
b922-ref
1st Reading
References
5
Dimopoulos I, Chronopoulos J, Chronopoulou-Sereli A, Lek S. Neural network models to
study relationships between lead concentration in grasses and permanent urban descriptors
in Athens city (Greece). Ecological Modelling, 120: 157–165, 1999.
Dong BL, Ji LZ, Wei CY et al. Relationship between plant community and insect community
in Korean pine broad-leaved mixed forest of Changbai Mountain. Chinese Journal of
Ecology, 24(9): 1013–1016, 2005.
Dony JG. The expectation of plant records from prescribed areas. Watsonia, 5: 377–385,
1963.
Efron B, Gong G. A leisurely look at the bootstrap, the jackknife, and cross-validation. The
American Statistician, 37: 36–48, 1983.
Eyre MD, Rushton SP, Luff ML, Telfer MG. Investigating the relationships between the
distribution of British ground beetle species (Coleoptera, Carabidae) and temperature,
precipitation and altitude. Journal of Biogeography, 32: 973–983, 2005.
Fecit. Analysis and Design of Neural Networks in MATLAB 6.5. Electronics Industry Press,
Beijing, China, 2003.
Filippi AM, Jensen JR. Fuzzy learning vector quantization for hyperspectral coastal vegetation classification. Remote Sensing of Environment, 100: 512–530, 2006.
Flexer A. Statistical evaluation of neural network experiments: Minimum requirements and
current practice. Cybernetics and Systems ’96 Proceedings of the 13th European Meeting
on Cybernetics and Systems Research, Trappl R (Ed.). 1005–1008, Austrian Society for
Cybernetic Studies, 1996.
Frean M. The upstart algorithm: A method for constructing and training feed forward neural
networks. Neural Computation, 2(2): 198–209, 1990.
Gao XR, Yang FS. A new method for training MLP neural networks. Chinese Journal of
Computers, 19(6): 687–694, 1996.
Gardner MW, Dorling SR.Artificial neural networks (the multiplayer perceptron) — a review
of applications in the atmospheric sciences. Atmospheric Environment, 32: 2627–2636,
1998.
Gardner MW, Dorling SR. Neural network modelling and prediction of hourly NOx and
NO2 concentrations in urban air in London. Atmospheric Environment, 33(5): 709–719,
1999.
Gardner MW, Dorling SR. Statistical surface ozone models: An improved methodology to
account for non-linear behaviour. Atmospheric Environment, 34(1): 21–34, 2000.
Gentle JE. Elements of Computational Statistics. Springer Science+Business Media, Inc.,
Netherlands, 2002.
Gevrey M, Dimopoulos I, Lek S. Review and comparison of methods to study the contribution of variables in artificial neural network models. Ecological Modelling, 160: 249–264,
2003.
Gevrey M, Dimopoulos I, Lek S. Two-way interaction of input variables in the sensitivity
analysis of neural network models. Ecological Modelling, 195: 43–50, 2006.
Gotelli NJ, Graves GR. Null Models in Ecology. Smithsonian Institution Press, Washington
DC, USA, 1996.
March 23, 2010
19:50
9in x 6in
B-922
b922-ref
1st Reading
6 Computational Ecology
Grime JP. Benefits of plant diversity to ecosystems: Immediate, filter and founder effects.
Journal of Ecology, 86: 902–910, 1998.
Grossberg S. Adaptive pattern classification and universal recoding: I. Parallel development
and coding of neural feature detectors. Biological Cybernetics, 23: 121–134, 1976.
Grossberg S. How does the brain build a cognitive code? Psychological Review, 88: 375–407,
1980.
Gu HZ, Takahashi H. Towards more practical average bounds on supervised learning. IEEE
Transactions on Neural Networks, 7(44): 953–968, 1996.
Guisan A, Edwards TC, Hastie T. Generalized linear and generalized additive models in
studies of species distributions: Setting the scene. Ecological Modelling, 157: 89–100,
2002.
Hagan MT, Demuth HB, Beale MH. Neural Network Design. PWS Publishing Company,
Boston, USA, 1996.
Hagan MT, Menhaj M. Training feedforward networks with the Marquardt algorithm. IEEE
Transactions on Neural Networks, 5(6): 989–993, 1994.
Haykin S. Neural Networks: A Comprehensive Foundation. Macmillan College Publishing
Company, New York, USA, 1994.
He RB. MATLAB 6: Engineering Computation and Applications. Chongqing University
Press, Chongqing, 2001.
Hebb DO. The Organization of Behavior. Wiley, New York, USA, 1949.
Hellmann JJ, Fowler GW. Bias, precision, and accuracy of foue measures of species richness.
Ecological Applications, 3: 824–834, 1999.
Heltshe JF, Forester NE. Estimating species richness using the jackknife procedure. Biometrics, 39: 1–11, 1983.
Herrick JE, Bestelmeyer BT, Archer S, Tugel AJ, Brown JR. An integrated framework for
science-based arid land management. Journal of Arid Environments, 65: 319–335, 2006.
Hinton GE, Sejnowski TJ, Ackley DH. Boltzmann Machines: Constraint Satisfaction Networks that Learn. Carnegie-Mellon University Computer Science Technical Report:
CMU-CS-84-119, Carnegie-Mellon University, Pittsburgh, USA, 1984.
Hochberg MM et al. HMM/NN training techniques for connected alphadigit speech recognition. Proc. IEEE ICASSP’91, Toronto, Canada, 109–112, 1991.
Hoffmann A. Paradigms of Artificial Intelligence: A methodological and Computational
Analysis. Springer, Singapore, 1998.
Hopfield JJ. Neural networks and physical systems with emergent collective computational
abilities. Proceedings of the National Academy of Sciences USA, 79: 2554–2558, 1982.
Hopfield JJ. Neurons with graded response have collective computational properties like
those of two-state neurons. Proceedings of the National Academy of Sciences USA, 81:
3088–3092, 1984.
Hopfield JJ, Tank DW. Neural computation of decisions in optimization problems. Biological
Cybernetics, 52: 141–154, 1985.
Hornik KM, Stinchcombe M, White H. Multilayer feedforward networks are universal
approximators. Neural Networks, 2(5): 359–366, 1989.
March 23, 2010
19:50
9in x 6in
B-922
b922-ref
1st Reading
References
7
Hsu KJ. Time series analysis of the interdependences among air pollutant. Atmospheric
Environment, 26(4): 491–503, 1992.
Hurt NE. Phase Retrieval and Zero Crossing. Kluwer Academic Publisher, NewYork, USA,
1989.
Hurlbert SH. The concept of species diversity: A critique and alternative parameters. Ecology, 52: 577–585, 1971.
Ingber L, Rosen B. Generic algorithms and very fast simulated reannealing: A comparison.
Ibid, 16: 87–100.
Ingber L. Simulated annealing: Practice versus theory. Mathematical Computer and Modelling, 18: 29–57, 1993.
Jacobs RA. Increased rates of convergence through learning rate adaptation. Neural Networks, 1(4): 295–308, 1988.
Jacobs RA, Jordan MI, Nowlan SJ, Hinton GE. Adaptive mixtures of local experts. Neural
Computation, 3: 79–87, 1991.
Jackson RD, Bartolome JW. A state-transition approach to understanding nonequilibrium plant community dynamics in Californian grasslands. Plant Ecology, 162: 49–65,
2002.
Jasinski JP, Payette S. The creation of alternative stable states in the sourthern boreal forest,
Québec, Canada. Ecological Monographs, 75: 561–583, 2005.
Jia CS, Chi DF, Hu YY. Effects of forest plant communities on forest insect communities.
Journal of Anhui Agricultural Science, 34(9): 1871–1872, 2006.
Jordan MI, Jacobs RA. Hierarchies of adaptive experts. Advances in Neural Information
Processing Systems 4. San Mateo, USA, 1992.
Jorgensen SE, Verdonschot P, Lek S. Explanation of the observed structure of functional
feeding groups of aquatic macro-invertebrates by an ecological model and the maximum
exergy principle. Ecological Modelling, 158: 223–231, 2002.
Kaluli JW, Madramootoo CA, Djebbar Y. Modeling nitrate leaching using neural networks.
Water Science and Technology, 38(7): 127–134, 1998.
Kemp SJ, Zaradic P, Hansen F. An approach for determining relative input parameter importance and significance in artificial neural networks. Ecological Modelling, 204: 326–334,
2007.
Kilic H, Soyupak S, Tuzun I et al. An automata networks based preprocessing technique for
artificial neural network modelling of primary production levels in reservoirs. Ecological
Modelling, 201: 359–368, 2007.
Kohonen T. Correlation matrix memories. IEEE Transactions on Computers, 21: 353–359,
1972.
Kohonen T. Self-organizing formation of topologically correct feature maps. Biological
Cybernatics, 43: 59–69, 1982.
Kohonen T. Self-Organization and Associative Memory (3rd Edition). Springer-Verlag, New
York, USA, 1988.
Kohonen T. The self-organizing map. Proceedings of the IEEE, 78(9): 1464–1480, 1990.
Kohonen T. Self-Organizing Maps. Springer-Verlag, Heidelberg, Germany, 1995.
March 23, 2010
19:50
9in x 6in
B-922
b922-ref
1st Reading
8 Computational Ecology
Kohonen T, Somervuo P. Self-organizing maps of symbol strings. Neurocomputing, 21:
19–30, 1998.
Krebs CJ. Ecological Methodology. HarperCollinsPublishers, Inc., USA, 1989.
Kremen C, Colwell RK, Erwin TL, Murphy DD. Invertebrate assemblges: Their use as
indicators in conservation planning. Conservation Biology, 7: 796–808, 1993.
Kuo JT, Hsieh MH, Lung WS et al. Using artificial neural network for reservoir eutrophication prediction. Ecological Modelling, 200: 171–177, 2007.
Laird N. The EM algorithm. Handbook of Statistics Vol 9, Rao CR (Ed.). 509–520, Elsevier
Science Publisher, The Netherlands, 1993.
Lars OB. Stratospheric ozone, ultraviolet radiation, and cryptogams. Biological Conservation, 135(3): 326–333, 2007.
Lavorel S, Garnier E. Predicting the effects of environmental change on plant community
composition and ecosystem functioning: Revising the Holy Grail. Functional Ecology,
16: 545–556, 2002.
Le Cun Y. Une procedure d’apprentissage pour reseau a seuil assymetrique. Cognitiva, 85:
599–604, 1985.
Lee WT, Tenorio MF. On an asymptotically optimal adaptive classifier design criterion.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(3): 312–318, 1991.
Lehmann A, Overton J, Leathwick JJ. GRASP: Generalized regression analysis and spatial
prediction. Ecological Modelling, 157: 189–207, 2002.
Lek S, Baran P. Estimations of trout density and biomass: A neural networks approach.
Nonlinear Analysis, Theory, Methods & Applications, 30(8): 4985–4990, 1997.
Lek S, Belaud A, Baran P, Dimopoulos I, Delacoste M. Role of some environmental variables
in trout abundance models using neural networks. Aquatic Living Resources, 9: 23–29,
1996.
Li QY, Wang NC, Yi DY. Numerical Analysis (Fourth Edition). Tsinghua University Press,
Springer, Beijing, China, 2001.
Liang Y, Page EW. Multiresolution learning paradigm and signal prediction. IEEE Transactions on Signal processing, 45: 2858–2864, 1997.
Lin JK. Foundations of Topology. Science Press, Beijing, China, 1998.
Liu BC. Functional Analysis. Science Press, Beijing, China, 2000.
Loot G, Giraudel JL, Lek S. A non-destructive morphometric technique to predict Ligula
intestinalis L. plerocercoid load in roach (Rutilus rutilus L.) abdominal cavity. Ecological
Modelling, 156: 1–11, 2002.
Luo SW. Theoretical Principles of Large-scale Artificial Neural Networks. Tsinghua University Press, Northern Jiaotong University Press, 2004.
Mackey DJC. Bayesian interpolation. Neural Computation, 4: 415–447, 1992.
Maier HR, Dandy GC. Understanding the behaviour and optimising the performance of
back-propagation neural networks: An empirical study. Environmental Modelling & Software, 13(2): 179–191, 1998.
Manly BFJ. Randomization, Bootstrap and Monte Carlo Methods in Biology (2nd Edition).
Chapman & Hall, London, Britain, 1997.
March 23, 2010
19:50
9in x 6in
B-922
b922-ref
1st Reading
References
9
Maravelias CD, Haralabous J, Papaconstantinou C. Predicting demersal fish species distributions in the Mediterranean Sea using artificial neural networks. Marine Ecology-Progress
Series, 255: 249–258, 2003.
Marchand M, Golea M, Ruján P. A convergence theorem for sequential learning in twolayer perceptrons. Europhysics Letters, 11(65): 487–492, 1990.
Marchant JA, Onyango CM. Comparison of a Bayesian classifier with a multilayer feedforward neural network using the example of plant/weed/soil discrimination. Computers
and Electronics in Agriculture, 39: 3–22, 2003.
Mathworks. MATLAB 6.5: Neural Network Toolbox. Mathworks, Natick, USA, 2002.
McCulloch W, Pitts W. A logical calculus of the ideas immanent in nervous activity. Bulletin
of Mathematical Biophysics, 5: 115–133, 1943.
McKenna JE. Application of neural networks to prediction of fish diversity and salmonid
production in the Lake Ontario basin. Transactions of The American Fisheries Society,
134(1): 28–43, 2005.
Men SP, Feng JH. Applied Functional Analysis. Science Press, Beijing, China, 2005.
Meng DJ, Liang K. Differential Geometry. Science Press, Beijing, China, 1999.
Mezard M, Nadal JP. Learning in feedforward layered network: The tiling algorithm. Journal
of Physics A, 22: 2191–2204, 1989.
Miller RJ, Wiegert RG. Documenting completeness, species-area relations, and the speciesabundance distribution of a regional flora. Ecology, 70: 16–22, 1989.
Miller RJ, White PS. Condiderations for preserve design based on the distribution of rare
plants in Great Smoky Mountains National Park. U.S.A. Journal of Environmental Management, 10: 119–124, 1986.
Minsky M, Papert S. Perceptrons. MIT Press, Cambridge, USA, 1969.
Moisen GG, Frescino TS. Comparing five modelling techniques for predicting forest characteristics. Ecological Modelling, 157: 209–225, 2002.
Moller MF. A scaled conjugate gradient algorithm for fast supervised learning. Neural
Networks, 6: 525–533, 1993.
Moody JE, Darken CJ. Learning with a localized receptive fields. Proceedings of the 1988
Connectionist Model Summer School. Morgan Kaufmann Publishers, San Mateo, USA,
133–143, 1988.
Moody JE, Darken CJ. Fast learning in networks of locally-tuned processing units. Neural
Computation, 1: 281–294, 1989.
Moreau S, Bosseno R, Gu X., Baret F. Assessing the biomass dynamics of Andean bofedal
and totora high-protein wetland grasses from NOAA/AVHRR. Remote Sensing of Environment, 85: 516–529, 2003.
Moreno CE, Halffter G. Assessing the completeness of bat biodiversity inventories using
species accumulation curves. Journal of Applied Ecology, 37: 149–158, 2000.
Nagendra SMS, Khare M. Artificial neural network approach for modelling nitrogen dioxide
dispersion from vehicular exhaust emissions. Ecological Modelling, 190: 99–115, 2006.
Narendra KS, Mukhopadhyay S. Adaptive control of nonlinear multivariable systems using
neural networks. IEEE Transactions on Neural Networks, 8(3): 475–485, 1997.
March 23, 2010
10
19:50
9in x 6in
B-922
b922-ref
1st Reading
Computational Ecology
Narendra KS, Mukhopadhyay S. Adaptive control of nonlinear multivariable systems using
neural networks. Neural Networks, 7(5): 737–752, 1994.
Neural Ware. Neural Computing. Neural Ware Inc., Pittsburgh, USA, 2000.
Nour MH, Smith DW, El-Din MG et al. The application of artificial neural networks to flow
and phosphorus dynamics in small streams on the Boreal Plain, with emphasis on the role
of wetlands. Ecological Modelling, 191: 19–32, 2006.
Olden JD. An artificial neural network approach for studying phytoplankton succession.
Hydrobiology, 436: 131–143, 2000.
Olden JD, Jackson DA. Illuminating the “black box”: A randomization approach for understanding variable contributions in artificial neural networks. Ecological Modelling, 154:
135–150, 2002.
Olden JD, Joy MK, Death RG. Rediscovering the species in community-wide predictive
modeling. Ecological Applications, 16(4): 1449–1460, 2006.
Olden JD, Joy MK, Death RG. An accurate comparison of methods for quantifying variable
importance in artificial neural networks using simulated data. Ecological Modelling, 178:
389–397, 2004.
Özesmi SL, Özesmi U. An artificial neural network approach to spatial habitat modelling
with interspecific interaction. Ecological Modelling, 116: 15–31, 1999.
Özesmi SL, Tan CO, Özesmi U. Methodological issues in building, training, and testing
artificial neural networks in ecological applications. Ecological Modelling, 195: 83–93,
2006.
Pal NR, Bezdek JC, Tsao ECK. Generalized clustering networks and Kohonen’s selforganizing scheme. IEEE Transcations on Neural Networks, 4: 549–557, 1993.
Palmer MW. The estimation of species richness by extrapolation. Ecology, 71: 1195–1198,
1990.
Palmer MW. Estimating species richness: The second-order jackknife reconsidered. Ecology, 72: 1512–1513, 1991.
Pao YH et al. Neural-net computing and intelligent control systems. International Journal
of Control, 56: 263–289, 1992.
Park J et al. Universal approximations using RBF networks. Neural Computation, 3: 246–
257, 1991.
Park YS, Chang JB, Lek S, Cao WX, Brosse S. Conservation strategies for endemic fish
species threatened by the Three Gorges Dam. Conservation Biology, 17: 1748–1758,
2003.
Parker DB. Learning-logic: Casting the cortex of the human brain in silicon. Technical Report
TR-47, Center for Computational Research in Economics and Management Science, MIT,
Cambridge, USA, 1985.
Paruelo JM, Tomasel F. Prediction of functional characteristics of ecosystems: A comparison
of artificial neural networks and regression models. Ecological Modelling, 98: 173–186,
1997.
Paschalidou AK, Iliadis LS, Kassomenos P, Bezirtzoglou C. Neural modelling of the tropospheric ozone concentrations in an urban site. Proceedings of the 10th International
March 23, 2010
19:50
9in x 6in
B-922
b922-ref
1st Reading
References
11
Conference on Engineering Applications of Neural Networks. Thessaloniki, Hellas, 29–31
Aug, 2007.
Pastor-Barcenas O, Soria-Olivas E, Martın-Guerrero JD. Unbiased sensitivity analysis and
pruning techniques in neural networks for surface ozone modeling. Ecological Modelling,
182: 149–158, 2005.
Pearson RG, Dawson TP, Berry PM et al. SPECIES: A spatial evaluation of climate impact
on the envelope of species. Ecological Modelling, 154(3): 289–300, 2002.
Peacock L, Worner SP, Sedcole R. Climate variables and their role in site discrimination of invasive insect species distributions. Environmental Entomology, 35(4): 958–963,
2006.
Perterson C, Anderson JR. A mean field theory learning algorithm for NN. Complex Systems,
1(5): 995–1019, 1987.
Pidgeon IM, Ashby E. Studies in applied ecology I: A statistical analysis of regeneration
following protection from grazing. Proceedings of the Linneon Society New South Wales,
65: 123–143, 1940.
Pimental D, Stachow U, Takacs DA, Brubaker HW. Conserving biological diversity in
agricultural/forestry systems. Bioscience, 42(5): 354–362, 1992.
Poggio T, Girosi F. Networks for approximation and learning. Proceedings of IEEE, 78(9):
1481–1496, 1990.
Powel MJD. The theory of radial basis function approximation. University of Cambridge
Numerical Analysis Reports. University of Cambridge, London, UK, 1990.
Prechelt L. A quantitative study of experimental evaluations of neural network learning
algorithms: Current research practice. Neural Networks, 9(3): 457–462, 1996.
Preston FW. Time and space and the variation of species. Ecology, 41: 785–790, 1960.
Pu ZL. Mathematical Models and Applications in the Management of Crop Insect Pests.
Guangdong Science and Technology Publishing House, Guangzhou, China, 1990.
Quétier F, Thébault A, Lavorel S. Plant traits in a state and transition framework as markers
of ecosystem response to land-use change. Ecological Monographs, 77(1): 33–52, 2007.
Rabiner LR. A tutorial on HMM and selected applications in speech recognition. Proc.
IEEE, 77: 257–286, 1989.
Ray C, Klindworth K. Neural networks for agrichemical vulnerability assessment of rural
private wells. Journal of Hydrologic Engineering, 5(2): 162–171, 2000.
Recknagel F, French M, Harkonen P, Yabunaka K. Artificial neural network approach for
modelling and prediction of algal blooms. Ecological Modelling, 96: 11–28, 1997.
Reed R. Prunning algorithm-a survey. IEEE Transactions on Neural Networks, 4(5): 740–
747, 1993.
Reyjol Y, Lim P, Belaud A, Lek S. Modelling of microhabitat used by fish in natural and
regulated flows in the river Garonne (France). Ecological Modelling, 146: 131–142, 2001.
Rosenblatt F. The perceptron:A probabilistic model for information storage and organization
in the brain. Psychological Review, 65: 388–408, 1958.
Rosenzweig ML. Species Diversity in Space and Time. Cambridge University Press, Cambridge, USA, 1995.
March 23, 2010
12
19:50
9in x 6in
B-922
b922-ref
1st Reading
Computational Ecology
Rossi RE, Borth PW, Tollefson JJ. Stochastic simulation for characterizing ecological spatial
patterns and appraising risk. Ecological Applications, 3(4): 719–735, 1993.
Rozema J, Boelen P, Blokker P. Depletion of stratospheric ozone over the Antarctic and Arctic: Responses of plants of polar terrestrial ecosystems to enhanced UV-B. Environmental
Pollution, 137(3): 428–442, 2005.
Rudin W. Functional Analysis (Second Edition). McGraw-Hill, Columbus, USA, 1991.
Rumelhart DE et al. Learning representation by BP errors. Nature, 7: 149–154, 1986.
Rumelhart DE, Hinton GE, Williams RJ. Learning representations by back-propagation
errors. Nature, 323: 533–536, 1986.
Rumelhart DE, McClelland JL. Parallel Distributed Processing: Explorations in the
Microstructure of Cognition Vol. 1. MIT Press, Cambridge, USA, 1986.
Sanders HL. Marine benthic diversity: A comparative study. The American Naturalist, 102:
243–282, 1968.
Sarle WS. Stopped training and other remedies for overfitting. Proceedings of the 27th
Symposium on the Interface of Computing Science and Statistics, 352–360, 1995.
Scardi M. Artificial neural networks as empirical models for estimating phytoplankton production. Marine Ecology Progress Series, 139: 289–299, 1996.
Scardi M, Harding Jr LW. Developing an empirical model of phytoplankton primary production: A neural network case study. Ecological. Modelling, 120: 213–223, 1999.
Scheffe RD, Morris RE. A review of the development and application of the urban airshed
model. Atmospheric Environment, 27b(1): 23–39, 1993.
Schino G, Iannetta M, Martini S et al. Satellite estimate of grass biomass in a mountainous
range in central Italy. Agroforestry Systems, 59: 157–162, 2003.
Schoenly KG, Cohen MB, Barrion AT, Zhang WJ, Gaolach B, Viajante VD. Effects of
Bacillus thuringiensis on non-target herbivore and natural enemy assemblages in tropical
irrigated rice. Environment and Biosafety Research, 3: 181–206, 2003.
Schoenly KG, Zhang WJ. IRRI Biodiversity Software Series. I. LUMP, LINK, AND JOIN:
Utility programs for biodiversity research. IRRI Technical Bulletin No. 1. Manila, Philippines, International Rice Research Institute, 1999.
Schoenly KG, Zhang WJ. IRRI Biodiversity Software Series. V. RARE, SPPDISS, and
SPPANK: programs for detecting between-sample difference in community structure.
IRRI Technical Bulletin No. 5. International Rice Research Institute, Manila, Philippines,
1999.
Schultz A, Wieland R. The use of neural networks in agroecological modeling. Computers
and Electronics in Agriculture, 18: 73–90, 1997.
Scott I, Mulgrew B. Nonlinear system identification and prediction using orthonormal functions. IEEE Transactions on Signal Processing, 45: 1842–1853, 1997.
Setiono R. A penalty-function approach for pruning feedforward NN. Neural Computation,
9(1): 185–204, 1997.
Shahid SA, Schoenly KG, Haskell NH, Hall RD, Zhang WJ. Carcass enrichment does
not alter decay rates or arthropod community structure: A test of the arthropod saturation hypothesis at the anthropology research facility in Knoxville, Tennessee. Journal of
Medical Entomology, 4: 559–569, 2003.
March 23, 2010
19:50
9in x 6in
B-922
b922-ref
1st Reading
References
13
Shang Y, Wah BW. Global optimization for NN training. IEEE Transactions on Computer,
29(3): 45–54, 1996.
Shanno DF. Recent advances in numerical techniques for large-scale optimization. Neural
Networks for Control, Miller, Sutton, and Werbos (Eds.). MIT Press, Cambridge, USA,
1990.
Sharma V, Negi SC, Rudra RP et al. Neural networks for predicting nitrate-nitrogen in
drainage water. Agricultural Water Management, 63: 169–183, 2003.
Sheng SG, Gao BJ, Zhang YH, Wang ZW, Wang HX, Wang XJ. Studies on the structure
of the insect communities in different plant types. Journal of Agricultural University of
Hebei, 20(4): 61–65, 1997.
Simberloff D. Properties of the rarefaction diversity measurements. The American Naturalist, 196: 414–418, 1972.
Simpson RW, Layton AP. Forecasting peak ozone levels. Atmospheric Environment, 17:
1649–1654, 1983.
Smith EP, van Belle G. Non-parametric estimation of species richness. Biometrics, 40:
119–129, 1984.
Smith M. Neural Networks for Statistical Modeling. Van Nostrand Reinhold, New York,
USA, 1994.
Soltic S, Pang S, Peacock L, Worner SP. Evolving computation offers potential for estimation
of pest establishment. International Journal of Computers, Systems and Signals, 5(2): 36–
43, 2004.
Song MY, Hwang HJ, Kwark IS et al. Self-organizing mapping of benthic macroinvertebrate communities implemented to community assessment and water quality evaluation.
Ecological Modelling, 203: 18–25, 2007.
Sontag ED. VC dimension of neural networks. Neural Networks and Machine Learning,
Bishop CM (Ed.). 69–95, NATO ASI Series, Springer, 1998.
Sprecht DF. Probabilistic neural networks for classification, mapping and associative memory. IEEE ICNN. San Diego, USA, 1988.
Sprecht DF. Probabilistic neural networks. Neural Networks, 3(1): 109–118, 1990.
Sprecht DF. A general regression neural network. IEEE Transcations on Neural Networks,
2(6): 568–576, 1991.
SPSS Inc. SPSS 15.0 for windows release 15.0.0.0. SPSS Inc., Chicago, USA, 2006.
Steele BB, Bayn RL Jr, ValGrant C. Environmental monitoring using populations of birds
and small mammals: Analysis of sampling effort. Biological Conservation, 30: 157–172,
1984.
Sutherst RW, Maywald GA. Climate model of the red imported fire ant, Solenopsis invicta
Buren (Hymenoptera: Formicidae): Implications for invasion of new regions, particularly
Oceania. Environmental Entomology, 34(2): 317–335, 2005.
Swingler K. Applying Neural Networks: A Practical Guide. Academic Press, London, UK,
1996.
Tan CO, Ozesmi U, Beklioglu M et al. Predictive models in ecology: Comparison of
performances and assessment of applicability. Ecological Informatics, 1(2): 195–211,
2006.
March 23, 2010
14
19:50
9in x 6in
B-922
b922-ref
1st Reading
Computational Ecology
Tollenaere T. SuperSAB: Fast adaptation back propagation with good scaling properties.
Neural Networks, 3(5): 561–573, 1990.
Valiant LG. A theory of the learnable. Communications of the ACM, 27(11): 1134–1142,
1984.
Vahidinasab V. Day-ahead price forecasting in restructured power systems using artificial
neural networks. Electric Power Systems Research, 78(8): 1332–1342, 2008.
van der Werf W, Keesman K, Burgesss P et al. Yield-SAFE: A parameter- sparse, processbased dynamic model for predicting resource capture, growth, and production in agroforestry systems. Ecological Engineering, 29(4): 419–433, 2007.
Van Rooij AJF, Jain LC, Johnson RP. Neural Netowork Training Using Generic Algorithm.
World Scientific, Singapore, 1996.
Vapnik V. The Nature of Statistical Learning Theory. Springer-Verlag, New York, USA,
1995.
Vapnik V. An overview of statistical learning theory. IEEE Transactions on Neural Networks,
10: 988–999, 1999.
Venkatech S. Computation and learning in the context of neural network capacity. Neural
Networks for Perception Vol 2, Wechsler H (Ed.). 173–207, Academic Press, 1992.
Viotti P, Liuti G, Di Genova P. Atmospheric urban pollution: Applications of an artificial
neural network (ANN) to the city of Perugia. Ecological Modelling, 148(1): 27–46, 2002.
Vogl TP, mangis JK, Zigler AK, Zink WT, Alkon DL. Accelerating the convergence of the
backpropagation method. Biological Cybernetics, 59: 256–264, 1988.
Von der Malsburg. Network self-organization. An Introduction to Neural and Eletric Network
Zornetger SF et al. (eds.). Academic Press, USA, 1990.
Wagner TL, Wu H, Sharpe PJH et al. Modeling distribution of insect development time:
A literature review and application of the Weibull function. Annuals of Entomological
Society of America, 77: 475–487, 1984.
Walther BA, Morand S. Comparative performance of species richness estimation methods.
Parasitology, 116: 395–405, 1998.
Wasserman PD. Neural Computing: Theory and Practice. Van Nostrand Reinhold, New
York, USA, 1989.
Watts MJ, Worner SP. Using artificial neural networks to determine the relative contribution of abiotic factors influencing the establishment of insect pest species. Ecological
Informatics, 3: 64–74, 2008.
Watts MJ, Worner SP. Estimating the risk of insect species invasion: Kohonen self-organising
maps versus k-means clustering. Ecological Modelling, 220: 821–829, 2009.
Way MJ, Heong KL. The role of biodiversity in the dynamics and management of insect
pests of tropical irrigated rice-a review. Bulletin of Entomological Research, 84: 567–587,
1994.
Werboss PJ. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral
Science. PhD Thesis, Harvard University, Cambridge, USA, 1974.
Wettschereck D et al. Improving the performance of RBF network by learning center locations. Advances in Neural Information Processing Systems. San Mateo, USA, 1992.
March 23, 2010
19:50
9in x 6in
B-922
b922-ref
1st Reading
References
15
Widrow B. Neural networks application in industry, business and science. Communication
of the ACM, 37: 93–105, 1994.
Widrow B, Hoff ME. Adaptive switching circuits. 1960 IRE WESCON Convention Record,
IRE Part 4, New York, USA, 1960.
Widrow B, Stearns SD. Adaptive Signal Processing. Prentice-Hall, New Jersy, USA, 1985.
Widrow B, Winter R. Neural nets for adaptive filtering and adaptive pattern recognition.
IEEE Computer Magazine, 3: 25–39, 1988.
Williams CB. Patterns in the Balance of Nature. Academic Press, London, England, 1964.
Willmott CJ. Some comments on the evaluation of model performance. Bulletin of the
American Meteorological Society, 63: 1309–1313, 1982.
Wilson EO. The little things that run the world. Conservation Biology, 1: 344–346, 1987.
Worner SP. Predicting the establishment of exotic pests in relation to climate. Quarantine Treatments for Pests of Food Plants, Sharp JL, Hallman GJ (Eds.). Westview Press,
Boulder, USA, 1994.
Worner SP, Gevrey M. Modelling global insect pest species assemblages to determine risk
of invasion. Journal of Applied Ecology, 43(5): 858–867, 2006.
Wu DR. Course Notes on Differential Geometry. Higher Education Press, Beijing, China,
1981.
Xie YC, Sha ZY, Yu M et al. A comparison of two models with Landsat data for estimating
above ground grassland biomass in Inner Mongolia, China. Ecological Modelling, 220:
1810–1818, 2009.
Xu L. Bayesian ying-yang system and theory as a unified statistical learning approach (1):
For unsupervised and semi-unsupervised learning. Brain-Like Computing and Intelligent
Information Systems, Amari S, Kasabov N (Eds.). Springer-Verlag, Germany, 241–274,
1997.
Yan PF, Zhang CS. Artificial Neural Networks and Computation of Simulated Evolution.
Tsinghua University Press, Beijing, China, 2000.
Yazdanpanah H. A Neural network model to predict wheat yield. Proceedings of Map India
Conference, New Delhi, India, 2002.
Yazdanpanah H, Karimi M, Hejazizadeh Z. Forecasting of daily total atmospheric ozone in
Isfahan. Environmental Monitoring and Assessment (DOI 10.1007/s10661-008-0531-z),
2008.
Yu R, Leung PS, Bienfang P. Predicting shrimp growth: Artificial neural network versus
nonlinear regression models. Aquacultural Engineering, 34: 26–32, 2006.
Zanetti P. Air Pollution Modelling: Theories, Computational Methods and Available Software. Computational Mechanics Publications, Southampton, Boston, USA, 1990.
Zhang WJ. Computer inference of network of ecological interactions from sampling data.
Environmental Monitoring and Assessment, 124: 253–261, 2007a.
Zhang WJ. Methodology on Ecology Research. Sun Yat-Sen University Press, Guangzhou,
China, 2007b.
Zhang WJ. Supervised neural network recognition of habitat zones of rice invertebrates.
Stochastic Environmental Research and Risk Assessment, 21: 729–735, 2007c.
March 23, 2010
16
19:50
9in x 6in
B-922
b922-ref
1st Reading
Computational Ecology
Zhang WJ. Pattern classification and recognition of invertebrate functional groups using selforganizing neural networks. Environmental Monitoring and Assessment, 130: 415–422,
2007d.
Zhang WJ, Bai CJ, Liu GD. Neural network modeling of ecosystems: A case study on
cabbage growth system. Ecological Modelling, 201: 317–325, 2007.
Zhang WJ, Barrion AT. Function approximation and documentation of sampling data using
artificial neural networks. Environmental Monitoring and Assessment, 122: 185–201,
2006.
Zhang WJ, Feng YJ, Schoenly KG. Performance of non-parametric richness estimators to
hierarchical invertebrate taxa in irrigated rice field. International Rice Research Notes,
29(1): 39–41, 2004.
Zhang WJ, Li QH. Development of topological functions in neural networks and their
application in SOM learning to biodiversity. The Proceedings of the China Association
for Science and Technology, 4(2): 583–586, 2007.
Zhang WJ, Liu GH, Dai HQ. Simulation of food intake dynamics of holometabolous insect
using functional link artificial neural network. Stochastic Environmental Research and
Risk Assessment, 22: 123–133, 2008.
Zhang WJ, Pang Y, Qi YH et al. Relationship between temperature and development of
Spodoptera litura F. ACTA SCIENTIARUM NATURALIUM UNIVERSITATIS SUN
YATSENI, 36(2): 6–9, 1997.
Zhang WJ, Qi YH. Functional link artificial neural network and agri-biodiversity analysis.
Biodiversity Science, 3: 345–350, 2002.
Zhang WJ, Schoenly KG. IRRI Biodiversity Software Series. II. COLLECT1 and COLLECT2: Programs for calculating statistics of collectors’ curves. IRRI Technical Bulletin
No. 2. International Rice Research Institute, Manila, Philippines, 1999.
Zhang WJ, Schoenly KG. IRRI Biodiversity Software Series. IV. EXTSPP1 and EXTSPP2:
Programs for comparing and performance-testing eight extrapolation-based estimators of
total species richness. IRRI Technical Bulletin No. 4. International Rice Research Institute,
Manila, Philippines, 1999.
Zhang WJ, Schoenly KG. Lumping and correlation analyses of invertebrate taxa in tropical
irrigated rice field. International Rice Research Notes, 1: 41–43, 2004.
Zhang WJ, Wei W. Spatial succession modeling of biological communities: A multi-model
approach. Environmental Monitoring and Assessment (DOI 10.1007/ s10661-008-05741), 2008.
Zhang WJ, Zhang XY. Neural network modeling of survival dynamics of holometabolous
insects: A case study. Ecological Modelling, 211: 433–443, 2008.
Zhang WJ, Zhong XQ, Liu GH. Recognizing spatial distribution patterns of grassland
insects: Neural network approaches. Stochastic Environmental Research and Risk Assessment, 22(2): 207–216, 2008.
Zhang YT, Fang KT. Introduction to Multivariable Statistical Analysis. Science Press, Beijing, China, 1982.