What is Keras?
Keras is one of the world’s most used open-source libraries for working with neural networks. It is a modular tool, providing users with a lot of easy-to-work-with features, and it is natively fast. This gives Keras the edge that it needs over the other neural network frameworks out there. It was developed by one of Google’s engineers, Francois Chollet!
Keras, even though it cannot work with low-level computation, is designed to work as a high-level API wrapper, which then caters to the lower level APIs out there. With the Keras high-level API, we can create models, define layers, and set up multiple input-output models easily.
Since Keras has the amazing functionality to behave like a high-level wrapper, it can run on top of Theano, CTNK, and TensorFlow seamlessly. This is very advantageous because it becomes very convenient to train any kind of Deep Learning model without much effort.
Following are some of the noteworthy features of Keras:
- Keras gives users an easy-to-use framework, alongside faster prototyping methods and tools.
- It works efficiently on both CPU and GPU, without any hiccups.
- Keras supports working with both convolutional neural networks (CNNs) and recurrent neural networks (RNNs) for a variety of applications such as computer vision and time series analysis, respectively.
- Its seamless functionality provisions to make use of both CNN and RNN if need be.
- It completely supports arbitrary network architectures, making model sharing and layer sharing available to users to work with.
Who uses Keras?
Keras is so popular that it has over 250,000+ users and is growing by the minute. Be it researchers or engineers or graduate students, Keras has grown to be the favorite of many out there. From a variety of startups to Google, Netflix, Microsoft, and others now use it on a day-to-day basis for Machine Learning needs!
TensorFlow still receives the highest number of searchers and users in today’s world, but Keras is the runner up and catching up with TensorFlow pretty quickly!
Foundational Concepts of Keras
Among the top frameworks out there such as Caffe, Theano, Torch, and more, Keras offers users with four main components that make it easier for a developer to work with the framework. Following are the concepts:
- User-friendly syntax
- Modular approach
- Extensibility methods
- Native support to Python
With TensorFlow, there is full-blown support for performing operations such as tensor creation and manipulation and further operations, such as differentiation and more. With Keras, the advantage lies in the contact between Keras and the backend, which serves as the low-level library with an already existing tensor library.
Another notable mention is that, with Keras, we can use a backend engine of our choice, be it TensorFlow backend, Theano backend, or even Microsoft’s Cognitive Toolkit (CNTK) backend!
The Keras Workflow Model
To quickly get an overview of what Keras can do, let’s begin by understanding Keras via some code.
- Define the training data—the input tensor and the target tensor
- Build a model or a set of layers, which leads to the target tensor
- Structure a learning process by adding metrics, choosing a loss function, and defining the optimizer
- Use the fit() method to work through the training data and teach the model
Model Definition in Keras
Models in Keras can be defined in two ways. Following are the simple code snippets that cover them.
- Sequential Class: This is a linear stack of layers arranged one after the other.
from keras import models from keras import layers model = models.Sequential() model.add(layers.Dense(32, activation=’relu’, input_shape=(784,))) model.add(layers.Dense(10, activation=’softmax’))
- Functional API: With the Functional API, we can define DAG (Directed Acyclic Graphs) layers as inputs.
input_tensor = layers.Input(shape=(784,)) x = layers.Dense(32, activation=’relu’)(input_tensor) output_tensor = layers.Dense(10, activation=’softmax’)(x) model = models.Model(inputs=input_tensor, outputs=output_tensor)
Implementation of Loss Function, Optimizer, and Metrics
Implementing the above-mentioned concepts in Keras is very simple and has a very straightforward syntax as shown below:
from keras import optimizers
Passing Input and Target Tensors
model.fit(input_tensor, target_tensor, batch_size=128, epochs=10)
With this, we can check out how easy it is to build our own Deep Learning model with Keras.
Deep Learning with Keras
One of the most widely used concepts today is Deep Learning. Deep Learning originates from Machine Learning and eventually contributes to the achievement of Artificial Intelligence. With a neural network, inputs can easily be supplied to it and processed to obtain insights. The processing is done by making use of hidden layers with weights, which are continuously monitored and tweaked when training the model. These weights are used to find patterns in data to arrive at a prediction. With neural networks, users need not specify what pattern to hunt for because neural networks learn this aspect on their own and work with it!
Keras gets the edge over the other deep learning libraries in the fact that it can be used for both regression and classification. Let’s check out both in the following sections.
Regression Deep Learning Model Using Keras
Before beginning with the code, to keep it simple, the dataset is already preprocessed, and it is pretty much clean to begin working with. Do note that datasets will require some amount of preprocessing in a majority of the cases before we begin working on it.
When it comes to working with any model, the first step is to read the data, which will form the input to the network. For this particular use case, we will consider the hourly wages dataset.
Import pandas as pd
Import pandas as pd #read in data using pandas train_df = pd.read_csv(‘data/hourly_wages_data.csv’) #check if data has been read in properly train_df.head()
As seen above, Pandas is used to read in the data, and it sure is an amazing library to work with when considering Data Science or Machine Learning.
The ‘df’ here stands for DataFrame. What it means is that Pandas will read the data to a CSV file as a DataFrame. Followed by that is the head() function. This will basically print the first 5 rows of the DataFrame, so we can see and verify that the data is read correctly and see how it is structured as well.
Splitting up the Dataset
The dataset has to be split up into the input and the target, which form train_X and train_y, respectively. The input will consist of every column in the dataset, except for the ‘wage_per_hour’ column. This is done because we are trying to predict the wage per hour using the model, and hence it forms to be the target.
#create a dataframe with all training data except the target column train_X = train_df.drop(columns=[‘wage_per_hour’]) #check if target variable has been removed train_X.head()
As seen from the above code snippet, the drop function from Pandas is used to remove (drop) the column from the DataFrame and store in the variable train_X, which will form the input.
With that done, we can insert the wage_per_hour column into the target variable, which is train_y.
#create a dataframe with only the target column
train_y = train_df[[‘wage_per_hour’]] #view dataframe train_y.head(
Building the Neural Network Model
Building the model is a simple and straightforward process as shown in the below code segment. We will be using the sequential model as it is one of the easiest ways we can build in Keras. The layer build logic is what makes it structured and easy to comprehend, and each of these layers will comprise the weight of the layer that follows it.
from keras.models import Sequential from keras.layers import Dense #create model model = Sequential() #get number of columns in training data n_cols = train_X.shape #add model layers model.add(Dense(10, activation=’relu’, input_shape=(n_cols,))) model.add(Dense(10, activation=’relu’)) model.add(Dense(1))
As the name suggests, the add function is used here to add multiple layers to the model. In this particular case, we are adding two layers and an input layer as shown.
Dense is basically the type of layer that we use. It is a standard practice to use Dense, and it is cooperative enough to work with almost all cases of requirement. With Dense, every node in a layer is compulsorily connected with another node in the next layer.
The number ‘10’ indicates that there are 10 nodes in every single input layer. This can be whatever that is the need of the hour. More the number, the more the model capacity.
The activation function used is ReLu (Rectified Linear Unit) that allows the model to work with nonlinear relationships. It is pretty different to predict diabetes in patients of age from 9 to 12 or patients aged 50 and above. This is where the activation function helps.
One important thing here is that the first layer will need an input shape, i.e., we need to specify the number of columns and rows in the data. The number of columns present in the input is in the variable n_cols. The number of rows is not defined, i.e., there is no limit for the number of rows in the input.
The output layer will be the last layer with only one single node, which is used for the prediction.
For us to compile the model, we need two things (parameters). They are the optimizer and the loss function.
#compile model using mse as a measure of model performance model.compile(optimizer=’adam’, loss=’mean_squared_error’)
The optimizer ensures to control and maintain the learning rate. A commonly used optimizer is the Adam optimizer. Again, just like Dense, it works in most cases, and it works well to adjust the learning rate throughout the training process.
The learning rate is the measure of how fast the correct weights are calculated for the model. Smaller the learning rate, the more accurate the weights will be. The downside here is that it might take more time to compute the weights.
When it comes to the loss function, MSE is a very widely used loss function. MSE stands for Mean Squared Error, and it is calculated by taking the average between predicted values and actual values and later squaring this result. If the loss function is closer to zero, it means that the model is working well.
Model training will use the fit() function and takes in five parameters for the process. The parameters include the training data, the target data, validation split, the number of epochs, and callbacks.
from keras.callbacks import EarlyStopping #set early stopping monitor, so the model stops training when it won’t improve anymore early_stopping_monitor = EarlyStopping(patience=3) #train model model.fit(train_X, train_y, validation_split=0.2, epochs=30, callbacks=[early_stopping_monitor])
The validation split will simply split the data randomly as training and testing. Validation loss is seen during training as MSE on the validation set. If the validation split is set as 0.3, it means that 30 percent of the training data fed to the model will be kept aside to test the model performance later, and hence the model does not see this data at all.
The number of epochs denotes how many times the model will run through the data in an iteration. Until a certain point, more epochs will relate to model improvement directly, and further, it will not improve anymore. To check this and to stop the model, we make use of early stopping. It helps the model stop the training process if it reaches its culmination point before the number of epochs ends. Patience = 3 means, it will check for improvements in 3 epochs. If there are no improvements for 3 epochs straight, the model will stop training.
Predictions on Data
Performing predictions on data is easily done by making use of the predict() function as shown below:
#example on how to use our newly trained model to make predictions on the unseen data (we will pretend that our new data is saved in a dataframe called ‘test_X’)
test_y_predictions = model.predict(test_X)
With this, the model is actually built successfully! But, with Keras, we can make it a lot more accurate than this. Let’s talk about model capacity.
As mentioned previously, with more nodes and layers, the capacity goes up. More capacity means more accuracy in learning to a certain limit. With this, presenting the model with more data will make the model large. Larger the model, more computation power is needed. More computation power means more time to train! See the trend here?
#training a new model on the same data to show the effect of increasing model capacity #create model model_mc = Sequential() #add model layers model_mc.add(Dense(200, activation=’relu’, input_shape=(n_cols,))) model_mc.add(Dense(200, activation=’relu’)) model_mc.add(Dense(200, activation=’relu’)) model_mc.add(Dense(1)) #compile model using mse as a measure of model performance model_mc.compile(optimizer=’adam’, loss=’mean_squared_error’) #train model model_mc.fit(train_X, train_y, validation_split=0.2, epochs=30, callbacks=[early_stopping_monitor])
Here’s another model with the same data. Now, nodes in each layer are 200, and post-training, we can see that the validation loss went from 32.63 to 28.06 here.
With Keras, this is the advantage we get! Now, let us work on building the classification model.
Building a Classification Model in Keras
The advantage with Keras and its syntax in Python is that most of the steps we just did above will apply here as well. So, to keep the readability high, let’s discuss only the new concepts that we will need to predict if patients are diagnosed with diabetes or not.
Reading in the dataset and viewing them is straightforward:
#read in training data train_df_2 = pd.read_csv(‘documents/data/diabetes_data.csv’) #view data structure train_df_2.head()
Removal of the target column to ensure that we can keep it as the output to train for:
#create a dataframe with all training data except the target column train_X_2 = train_df_2.drop(columns=[‘diabetes’]) #check that the target variable has been removed train_X_2.head()
A patient with no diabetes will be represented by 0, while someone who has diabetes will be represented by 1. The to_categorical() function is used to perform one-hot encoding. With this, we will be removing the integers and putting in a binary value for each of the categories present. For us, here it is 2: no diabetes and diabetes. So, a patient with no diabetes will be represented as [1 0], while a patient with diabetes will be represented as [0 1].
from keras.utils import to_categorical #one-hot encode target column train_y_2 = to_categorical(train_df_2.diabetes)
#check that target column has been converted
And in this neural network, the last layer will have two nodes because of the criteria that the patient has diabetes or not.
Check out the following code snippet:
#create model model_2 = Sequential() #get number of columns in training data n_cols_2 = train_X_2.shape #add layers to model model_2.add(Dense(250, activation=’relu’, input_shape=(n_cols_2,))) model_2.add(Dense(250, activation=’relu’)) model_2.add(Dense(250, activation=’relu’)) model_2.add(Dense(2, activation=’softmax’))
As we can see above, the activation function used is softmax. With softmax, the output sums up to 1, and this makes it extremely convenient for us to interpret the probabilities as they are in the range of 0 to 1 now.
The model compilation is pretty straightforward as well. Categorical cross-entropy is used as the loss function as it works really well and is probably the most common choice to perform classification. The lower the score, the better the model performance (the same as before!)
#compile model using accuracy to measure model performance
model_2.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
The accuracy metric is used to check the accuracy score at the end of every single epoch to help in interpreting the results easier and quicker.
model_2.fit(X_2, target, epochs=30, validation_split=0.2, callbacks=[early_stopping_monitor])
Here, we have worked with the two categories of neural networks very easily in Keras and understood how powerful it can be at the same time.
As discussed in the entirety of the Keras tutorial, it adds benefit to reinforce the idea of how simple it is to work with Keras. Now, you can go onto building your own neural network models for various use cases. It is very straightforward and can help you solve a variety of problems.
A good point to add here is that Keras Developers are in need today. Companies out there are looking for certified professionals who can provide solutions to a variety of problems they face. Make sure to jump onto this demand-train to make the best use of Keras for your career!
What more would you like to see about Keras? Head to the comments section, and let us know!