3.1. An Introduction to Genetic Algorithms
Many studies have shown that ANNs have the capability to learn the underlying mechanics of time series, or, in the case of trading applications, the market dynamics. However, it is often difficult to design good ANNs, because many of the basic principles governing information processing in ANNs are hard to understand, and the complex interactions among network units usually makes engineering techniques like divide and conquer inapplicable. When complex combinations of performance criteria (such as learning speed, compactness, generalization ability, and noise resistance) are given, and as network applications continue to grow in size and complexity, the human engineering approach will not work and a more efficient, automated solution will be needed.
GA is reminiscent of sexual reproduction in which the genes of two parents combine to form those of their children. When it is applied to problem solving, the basic premise is that we can create an initial population of individuals representing possible solutions to a problem we are trying to solve. Each of these individual has certain characteristics that make them more or less fit as members of the population. The most fit members will have a higher probability of mating than the less fit members, to produce offspring that have a significant chance of retaining the desirable characteristics of their parents. This method is very effective at finding optimal or near optimal solutions to a wide variety of problems, because it does not impose many of the limitations required by traditional methods. It is an elegant generate and test strategy that can identify and exploit regularities in the environment, and converges on solutions that were globally optimal or nearly so.
GA have been increasingly applied in ANN design in several ways: topology optimization, genetic training algorithms and control parameter optimization. In topology optimization, GA is used to select a topology (number of hidden layers, number of hidden nodes, interconnection pattern) for the ANN which in turn is trained using some training scheme, most commonly back propagation. In genetic training algorithms, the learning of a ANN is formulated as a weights optimization problem, usually using the inverse mean squared error as a fitness measure. Many of the control parameters such as learning rate, momentum rate, tolerance level, etc., can also be optimized using GA. In addition, GA have been used in many other innovative ways, for instance, creating new indicators based on existing ones, selecting good indicators, to evolve optimal trading systems and to complement other techniques such as fuzzy logic.
There are four stages in the genetic search process: initialization, evaluation and selection, crossover and mutation. In the initialization stage, a population of genetic structures which are randomly distributed in the solution space is selected as the starting point of the search.
In the second stage, each structure is evaluated using a fitness function and assigned a fitness value. On the basis of their relative fitness values, structures in the current population are selected for reproduction. A stochastic procedure ensures that the expected number of offspring associated with a given structure s is u(s)/u(P), where u(s) is the observed performance of s and u(P) is the average performance of all structures in the current population. Thus structures with high performance are more likely to be chosen for replication while poor performing structures are eventually removed from the population. In the absence of other mechanisms, such a selective process would cause the best performing structures in the initial population to occupy an increasingly larger proportion of the population over time.
The selected structures are recombined using crossover, with two complementary search functions. First, it provides new points for further testing of structures already present in the population; Secondly, it introduces instances of new structures into the population.
Generally, crossover draws only on the information present in the solutions of the current population in generating new solutions for evaluation. If specific information is missing (due to storage limitation or loss incurred during the selection process of a previous generation), then crossover is unable to produce new structures that contain this piece of information. A mutation operator, which arbitrarily alters one or more components of a selected structure, provides the means for introducing new information into the population. However, mutation functions as a background operator with a very low probability of application. The presence of mutation ensures that the probability of reaching any point in the search space is never zero.
3.2. Genetically Evolved Neural Networks
One method of automating ANN architecture design using GA is described below. It comprises two adaptive processes: genetic search through input data window, forecast horizon, network architecture space and control parameters to select the best performers, and backpropagation learning in individual networks to evaluate the selected architectures.
The method begins with an initial population of randomly generated networks which are represented by overlapped tree structures as illustrated in Fig. 1. and 2.
In Fig. 1 and 2, each rectangle represents an input node and the triangle represents an output node. The networks will grow and their hidden nodes will be inserted into the networks as the population evolves. The input nodes take a random combination of input data from an input data window which picks up data items from the training data file and supplies them to the network. The output node randomly selects a forecast horizon from an output window and uses the associated data item as target value. Both windows slide down the training data file sequentially or randomly. Fig. 3 illustrates an input window of size 3, and an output window with forecast horizons ranging from 1 to 3 steps ahead.
The initial population of networks then goes through the first evolution cycle. As in real biological systems, learning cycles are nested within cycles of evolution in populations. Each learning cycle involves the entire population of networks with the set of input output pairs provided by the input and output windows. The networks' outputs are compared with the desired target, and the connection weights are adjusted to achieve the desired input output mapping by minimizing the errors.
At the end of each learning cycle, the networks are ranked according to some pre-determined criteria such as their generalization capability. The poor-performing networks will be removed from the population, while the fitter ones are retained and selected for the crossover process to reproduce the offspring for the next generation.
There are several ways to cross the parent nets. One way is to produce new offspring by mating two most fit parents as illustrated in Fig. 4. The output nodes of the parents are used as the hidden nodes of the offspring which will then inherit the knowledge already acquired by their parents.
Occasionally mutation is introduced to ensure that networks will not be trapped in local minima during the learning process. One way to mutate is to randomize the weights of those lowly ranked networks, change their input combination in the input data window and/or their forecast horizon.
After each evolution cycle, an image of the fittest network is kept. The image includes the current input data combination, forecast horizon, interconnection patterns and weights of the fittest network, such that the fittest network can go through the next evolution and training cycles together with the rest of the newly formed population.
Fig. 5. The training cycle with the nested evolution cycle.
The image will remain intact during subsequent evolution cycles until another fittest network emerges. A complete evolution cycle, with the nested training cycle, is illustrated in Fig. 5.
3.3. An Application Example
In this example, the following fundamental data of a stock were used as training data to forecast its future average price per share. The input data window size is 3 and the range of the forecast horizon is 1 to 3 steps.
Year Avg$/Sh Sls/Sh CF/Sh Ern/Sh Div/Sh Cap$/Sh BV/Sh Avg P/E Rel P/E Div % 1979 4.61 9.21 0.69 0.53 0.23 0.48 2.68 8.7 1.26 5.1 1980 6.53 9.27 0.7 0.51 0.25 0.51 2.94 12.8 1.7 3.9 1981 7.44 10.29 0.84 0.61 0.26 0.38 3.29 12.2 1.48 3.5 1982 7.37 9.66 0.92 0.63 0.3 0.39 3.63 11.7 1.29 4.1 1983 11.03 10.32 1.05 0.74 0.35 0.25 4.05 14.9 1.26 3.2 1984 12.46 11.52 1.22 0.89 0.4 0.41 4.59 14 1.3 3.2 1985 12.72 11.42 1.13 0.8 0.43 0.39 4.99 15.9 1.29 3.4 1986 13.78 12.98 1.3 0.83 0.5 0.55 5.27 16.6 1.13 3.6 1987 15.79 14.12 1.36 0.94 0.53 0.7 5.76 16.8 1.12 3.4 1988 14.72 11.82 1.14 0.8 0.6 0.7 4.11 18.4 1.53 4.1 1989 13.75 13.28 1.26 0.87 0.62 0.58 4.4 15.8 1.2 4.5
After 30 generations, the fittest network, as shown in Fig. 6, emerged. The number of input nodes shown in Fig. 7 was less than that of Fig. 6, because some of those in Fig. 6 referred to the same input nodes in Fig. 7.
The result, as shown in Fig. 7, indicated that a window size of 2 is sufficient, and that the best forecast can be achieved with a forecast horizon of 1 step ahead. The output of the fittest network is shown in Fig. 8. The downward turn detected at the end of the training data indicates a strong sell signal which was confirmed by the actual target values.
One can also fix the desired forecast horizon to say, 2 or 3 steps ahead, but the forecast will not be as good as that of 1 step in this problem.
3.4. Conclusion
The theory of natural selection offers some compelling arguments that individuals with certain characteristics are better able to survive and pass on those characteristics to their offspring. A genetic algorithm is a general search procedure based on the ideas of genetics and natural selection, and its power lies in the fact that as members of the population mate, they produce offspring that have a significant chance of retaining the desirable characteristics of their parents, perhaps even combining the best characteristics of both parents. In this manner, the overall fitness of the population can potentially increase from generation to generation as we discover better solutions to our problem.
When applied to the optimization of ANNs for forecasting and classification problems, GAs can be used to search for the right combination of input data, the most suitable forecast horizon, the optimal or near optimal network interconnection patterns and weights among the neurons, and the control parameters (learning rate, momentum rate, tolerance level, etc.), based on the training data used and the pre-set criteria. Like ANNs, GAs do not always guarantee you a perfect solution, but in many cases, it can arrive at an acceptable solution without the time and expense of an exhaustive search.
Copyright © 1993-96, NIBS Pte Ltd.