Keras Evosearch

Authors:

Ángeles Soto Pérez
Salvador Corts Sánchez

Hyperparameter optimization of Keras fully-connected neural networks using Evolutionary Algorithms.

Introduction

Hyperparameter optimization of Neural Networks (from now on NN), is often a challenging task which requires a lot of expertice and trial and error. Grid-Search is the most common method to tune these parameters, this process consists on configuring the NN with all possible permutations of a given (predefined) set of parameters.

The main problem of Grid-Search is that it's very difficult to explore all the search-space of these parameters, so it's limited to a set of "promising" parameters. As a solution to this problem, we assess the use of evolutionaly algorithms as a tool to optimize the values of these hyper-parameters.

Evolutionary Algorithm Design

An Evolutionary Algorithm (EA) uses mechanisms inspired by biological evolution, such as reproduction, mutation, recombination, and selection. Candidate solutions to the optimization problem play the role of individuals in a population, and the fitness function determines the quality of the solutions. Evolution of the population then takes place after the repeated application of the above operators.

In this case we apply a simple generational model. Simply put it generates n offsprings from a population of size n and replaces the population with the offsprings. The offsprings are generated by selecting 2 individuals from the population and applying a crossover method to the selected individuals until the n offsprings have been generated. The newly generated offsprings are then optionally mutated before replacing the original population.

In our particular problem individuals represents a given set of hiperparameters consisnting of:

Model ID: Unique identifier of the individual (aka model).
Learning Rate: Also known as LR.
Optimizer: which can be either Adam, SGD or RMSProp.
Activation Function: which can be either Relu, Sigmoid, Softwamx or Tanh.
Layers: Which is a list where each element contains the number of neurons of that given layer. For example [1, 2, 3] means that our NN will have three hidden layers with 1, 2 and 3 neurons each respectively.
Dropout: Weather to apply a 25% of dropout after each layer or not.

We have implemented the following Genetic Operators:

Selection: Tournament selection involves running several "tournaments" among a few individuals chosen at random from the population. The winner of each tournament, which is the one with the best fitness, is selected for crossover.
Crossover: At a given probability, two individuals (aka models) will:
- Swap their Optimizers, Learning rates and Dropout,
- Apply a single-point crossover.
Mutation: At a given probability, we will:
- Increase / Decrease the LR with a random delta in the range (-0.05, 0.05).
- Randomly select a different optimizer.
- Randomly select a different activation fucntion.
- Toggle dropout
- For each layer, randomly add or substract up to a 25% of the neurons of the layer.
- Permutate the layers.
Evaluation: Consisting on training a model with the resulting set of parameters and evaluating the trained model against a validation and test dataset.

Distributed Design

The Evaluation operator is particularily compute-intensive. In order to better optimize the algorithm, we have implemented our genetic algoritms with a client-server distributed architecture where the server applies the Selection, Mutation and Crossover operators, and the clients evaluate the resulting individuals.

This allows us to be able to divide the amount of work between the several clients making our algorithm run considerably faster.

Note that such architecture only make sense in those problems where the operators computed by the clients require way more time than the latency to send/receive results between the client and the server.

Implementation

The server is written in Go since it is a compiled language (hence fast) which is ideal to implement distributed systems due to it's very unique features (e.g. channels, goroutines, etc.).

The client is written in Python since we are training the neural networks using the Google's TensorFlow library. Even though Python is a interpreted language (hence slower), the intense computation are done by TensorFlow which is optimized (written in C++).

Communication is done via gRPC, with the following API:

service API {
    rpc GetModelParams(Empty) returns (ModelParameters) {}
    rpc ReturnModel(ModelResults) returns (Empty) {}
}

Experiments and Results

To test our evolutionary model, we have designed an experiment where we will run 30 generations with 50 individuals each. The objective is to get the best NeuralNetwork for a binary-clasification problem.

We'll use the Algerian Forest Fires DataSet, available at the UCI Machine Learning Repository.

In this paper we can see that the best results were achieveed with an Adaboost model that obtained a Recall of 0.95 and a precission of 0.79, hence a F1-score of 0.86.

These are our results:

{"level":"warning","msg":"No models to evaluate","time":"2021-06-18T18:15:47Z"}
{"level":"info","msg":"Listening at 0.0.0.0:10000","time":"2021-06-18T18:16:33Z"}
2021/06/18 18:17:08 pop_id=Qut min=0.018868 max=1.000000 avg=0.240941 std=0.250306
{"level":"info","msg":"Best fitness at generation 0: 0.018868","time":"2021-06-18T18:17:08Z"}
2021/06/18 18:17:38 pop_id=Qut min=0.018868 max=1.000000 avg=0.176518 std=0.171339
{"level":"info","msg":"Best fitness at generation 1: 0.018868","time":"2021-06-18T18:17:38Z"}
{"level":"warning","msg":"No models to evaluate","time":"2021-06-18T18:17:38Z"}
2021/06/18 18:18:10 pop_id=Qut min=0.018868 max=1.000000 avg=0.182777 std=0.198851
{"level":"info","msg":"Best fitness at generation 2: 0.018868","time":"2021-06-18T18:18:10Z"}
2021/06/18 18:18:41 pop_id=Qut min=0.000000 max=1.000000 avg=0.254514 std=0.316698
{"level":"info","msg":"Best fitness at generation 3: 0.000000","time":"2021-06-18T18:18:41Z"}
{"level":"info","msg":"Best model found: model_id:\"42378d24-2be2-4cb3-9bb9-daa18ebbf0e2\" learning_rate:0.010019941 optimizer:RMSprop activation_func:Tanh layers:{num_neurons:164} layers:{num_neurons:214}","time":"2021-06-18T18:18:41Z"}

As we can see just in the first generation we already got a better model with a F1 Score of 0.981132 (1-0.018868), and after 3 generations we go a model that predicted all the test examples correctly.

All in all, the best model is the one that uses:

Learning Rate: 0.010019941
Optimizer: RMSprop
Activation Function: Tanh
Layers: [164, 214]
Dropout: No

Future Work

Allow other NN architectures than fully connected networks.
Allow different activation functions on each layer
Allow different dropout rates
Island-based distributed evolutionary algorithm
Pool-based evolutionary algorithms so the evolution is not delayed by slower clients.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
client		client
protobuf		protobuf
server		server
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
makefile		makefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Keras Evosearch

Introduction

Evolutionary Algorithm Design

Distributed Design

Implementation

Experiments and Results

Future Work

About

Uh oh!

Contributors 2

Uh oh!

Languages

License

salvacorts/keras_evosearch

Folders and files

Latest commit

History

Repository files navigation

Keras Evosearch

Introduction

Evolutionary Algorithm Design

Distributed Design

Implementation

Experiments and Results

Future Work

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 2

Uh oh!

Languages