Feature pyramid network for image classification

Feature pyramid network for image classification

Object detection is one of the main problems in computer vision that may fail when there are multi-scale objects in images. Using feature pyramids helps to solve this problem.

Some previous studies tried to use different kinds of feature pyramids to improve object detection. One method fed various sizes of the input image to the deep network to see objects with different scales. This way also helped improve object detection but increases computational costs and processing time so much that it is not efficient.

Feature pyramid network(FPN) was introduced by Tsung-Yi Lin et al., which enhanced object detection accuracy for deep convolutional object detectors. FPN solves this problem by generating a bottom-up and a top-down feature hierarchy with lateral connections from the network’s generated features at different scales. This helps the network generate more semantic features, so using FPN helps increase detection accuracy when there are objects with various scales in the image while not changing detection speed.

Here, I aim to introduce a new architecture based on FPN to improve classification accuracy. This architecture is proposed in my paper.

As described, FPN helps extract multi-scale features from the input image, which better presents objects with different scales. We have designed an architecture that utilizes FPN to understand better the important parts of the image that could exist in different sizes.

In the next figure, you can see our proposed architecture. This architecture was developed for classifying the patient CT scan images into normal and COVID-19. Researchers can modify this architecture for using on different datasets and classes.

This figure shows our model, which uses ResNet50V2 as the backbone and applies the feature pyramid network and the designed layers for classifying the images into two classes.

There is a backbone network that can be any of the deep convolutional networks, which here the ResNet50V2 was used.

Note that in this architecture, two neurons layers were used because two classes of images were available. Other researchers must set the number of neurons in the dense layers based on their dataset classes.

The FPN we used is like the original version of the FPN. With this difference, we used concatenation layers instead of adding layers inside the feature pyramid network due to the authors’ experience.

FPN extracts five final features that each one presents the input image features on a different scale. After that, we implemented dropout layers ( to avoid overfitting), followed by the first classification layers. Note that we did not use softmax functions for the first classification layers because we wanted to feed them to the final classification layer, and as the softmax function computes each output neuron based on the ratio of other output neurons, it is not suitable for this place. Relu activation function is more proper.

At the end of the architecture, we concatenated the five classified layers (each consisting of two neurons) and made a ten neurons dense layer. Then we connected this layer to the final classification layer, which applies the softmax function.

With running this procedure, the network utilizes different classification results based on various scales features. As a result, the network would become able to classify the images more accurately.

Researchers can use our proposed model for running classification in other cases and datasets to improve classification results.