ON/OFF ReLU in CNN – Inspired by retina structure

Anthony Kwok
5 min readAug 30, 2023

--

ON & OFF. Source: https://shorturl.at/duB38

Introduction

Being enthusiastic about machine learning and deep learning, reading some research papers and learning more techniques to improve model performance is a joyful thing. You may apply the same technique in your side projects or your business projects, to see if these techniques are helpful or not.

I would like to share one of the research papers I’ve read with you in this article. I will also give some of my comments at the end. Before starting, you may find the link to this research paper below.

Research Paper Link

https://www.sciencedirect.com/science/article/pii/S187705091631674X?ref=pdf_download&fr=RR-2&rr=7cbc47ee2b8f0ed8

Background of this Research Paper

1. Inspired by the Biological structure of the human retina

  • Both positive (bright) and negative (dark) information are useful for retinas to identify objects

2. Problem of ReLU

  • Dropped away all negative information

3. Problem of Leaky ReLU

  • Gradient vanishing problem may still exist for negative information

Pain Point

In classical convolutional neural network (CNN) models, there are multiple neural layers. After each layer, an activation function is applied to introduce the non-linearity of the neural network. Some common activation functions are:

1. Sigmoid Function & its gradient

The reason why we seldom use the Sigmoid function in CNN is because there are usually many layers in a neural network. When we perform back-propagation, the gradient may vanish over layers if the derivative is close to 0 (like the two tails of the black line). If the gradient vanishes, the model will no longer be able to learn anymore. Therefore, we try to avoid using sigmoid in neural networks. Instead, we use ReLu or Leaky ReLU below.

Source: https://shorturl.at/abPQ9

2. ReLU (Rectified Linear Unit)

From the graph below, you can see the derivative is either 1 or 0, which means the neuron is either alive (1) or dead (0). With ReLu, we can be sure that the alive neuron can capture and pass the information between layers. However, this function dropped all the negative information, which may lead to loss of information and poor model performance. So, Leaky ReLU is introduced to “not dropping” negative information, as shown in the next point.

Source: https://shorturl.at/bfntN

3. Leaky ReLu (Leaky Rectified Linear Unit)

Leaky ReLU is used to keep both positive and negative information by giving a smaller gradient for the neuron when the output value is negative (i.e. x < 0).

Source: https://shorturl.at/klxBD

This research paper is to introduce a new type of methodology to tackle the problem of ReLU, which is similar to Leaky ReLU, called the “ON/OFF” ReLU structure.

ON/OFF ReLU Structure

Instead of using one activation function in each layer, this structure uses two activation functions:

Source: Original Research Paper
  • Conventional ReLU (the “ON” structure)
Conventional ReLU. Source: Original Research Paper.
  • Reversed ReLU (the “OFF” structure)
Reversed ReLU. Source: Original Research Paper.

As there are extra activation functions, the back-propagation algorithm has to be updated as well.

Updated Back-propagation Function. Source: Original Research Paper.

The overall structure of the proposed model is as follows:

Overall Structure of Purposed Model. Source: Original Research Paper.

Model Evaluation Methods

There are extra features generated by the “OFF ReLU” function, which is a structural change. It cannot be compared directly with the conventional ReLU model.

The research team adopted two ways to compare the model performance.

  1. Including Negative features additionally
    > 100% positive features + 100% negative features
    > Double up the number of output features
    > Increase model size
  2. Replace half of the features with negative features
    > 50% positive features + 50% negative features
    > Loss of partial information
    > Expected decreased performance

Model Performance Comparison

Model Performance. Source: Original Research Paper.

From the table above, it is expected that conventional CNN with half features will result in worse performance.

By comparing the conventional CNN and ON/OFF ReLU structure on both the “halved-features” models and the “full-features” model, we can see that ON/OFF structure can further improve the model performance by 1.7% — 2.5%.

My Comments

If you are working with a convolution neural network, I think CNN with ON/OFF ReLU is worth trying. All you need is to adjust and re-train the model for the result.

If your model is heavily counting on the “negative features”, this CNN with ON/OFF ReLU structure could surprise you and improve your model performance.

Remarks

If you have spare time, it is recommended to read the research paper by yourself. This article only provides a brief summary and introduction of the research paper. The research paper also included a medical application with this “ON/OFF ReLU” structure.

** Full credit should be given to the original research team.
** Research Paper Link: https://www.sciencedirect.com/science/article/pii/S187705091631674X?ref=pdf_download&fr=RR-2&rr=7cbc47ee2b8f0ed8

--

--

Anthony Kwok

Data Science | ML & AI | Statistics | Blockchain | Cooking