From jderobot
Jump to: navigation, search

DetectionSuite (Deeplearning)[edit]

GSoC Project Repository : [1]
GSoC Wiki Documentation: [2]
Project Report: [3]

DeepLearningSuite [ Google Summer of code ][edit]



A Brief Summary of work done:

  • Added Support for benchmark and well known datasets like Pascal VOC, COCO and Imagenet. This includes writing read and write parsers for these datasets.
  • Added and improved support for various frameworks namely TensorFlow, Keras and Caffe.
  • Implemented Smart Class Mapping so as to map classnames between different datasets and make Converting dataset format more robust. It was achieved by storing classnames in a tree, where siblings represent synonyms, children represent sub classes or child classes, and ancestors represent super classes or parent classes.
  • Improved Command Line Tools and added Auto Evaluator which could evaluate multiple datasets using multiple inferencers and frameworks and outputs results in CSV format.
  • Improved Deployer to fetch images from Web Camera and over the network from ROS or ICE streams.
  • Removed JdeRobot as a dependency completely, and added inbuilt support for ROS and ICE, which was earlier dependent on JdeRobot
  • Shifted Config Files format to YAML
  • Improved Evaluation Support by adding metrics such as COCO Dataset’s mAP metric and Pascal VOC’s metric.
  • Instance Segmentation Inferencing and Evaluation Support.
  • Packaged the tool in App Images for easier Usage.
  • Improved Documentation and created Tutorials.

Detailed Description of Work done with TimeLine[edit]

Work done during GSOC on DetectionSuite described below in a weekly manner with the pull requests.

Weeks Brief Description of Work Done Links to Pull Requests and Wiki Revisions Remarks
  • Added Support for Keras FrameWork in Backend
Pull Request #29 This work was done and submitted a week before start of Coding Period, so as to compensate for exams at that time.
  • Added Support for COCO and Imagenet Datasets.
  • Extended Support for the same on Converter.
  • Tested Functionalities of the same on Viewer, Converted, Detector and Evaluator.
  • Added Documentation for Viewer and Detector in Github’s Wiki.
Pull Request #40

Pull Request #38

Wiki Revision 1
Wiki Revision 2

  • Added Support for Pascal VOC Dataset.
  • Improved Class Conversion for Converter using Trees for Robust Mapping.
  • Added option to generate Custom classnames file in Converter and updated Qt based GUI for the same.
  • Command Line Tool to Inference and Evaluate using a network on a dataset, using a single Config File.
Pull Request #43

Pull Request #42

  • Improved Support for Automatic Evaluator to evaluate multiple networks on a single dataset or multiple datasets, and write results in CSV files.
  • Added Documentation for Automatic Evaluator.
  • Tested Automatic Evaluator with Multiple Datasets and Networks
  • Researched Various ways to implement Caffe support in the most efficient way possible.
  • Start Learning and discussing various ways of implementing Segmentation Support on the tool.
Pull Request #46

Wiki Revision

Phase 1 Ends
  • Added support for fetching images from ROS/ICE streams, and also directly through a webcam, without the need of any extra dependency.
  • Also, JdeRobot is not a dependency at all, because all the necessary libraries like comm and config have been ported to DetectionSuite from JdeRobot.
  • Support for fetching images directly from ROS node like, ROS usb_camera.
  • Support for configuring parameters within Qt based GUI as well as from a YAML file if any extra parameters are necessary.
  • Added support for Caffe Framework using OpenCV’s dnn module.
  • Support for configuring extra inferencing parameters such as scaling factor, mean subtraction and inputSize in case of caffe Framework from the GUI itself.
Pull Request #55

Pull Request #56

  • Made OpenCV 3.4 an optional dependency necessary only for Caffe Support.
  • Shifted to YAML Configuration file format.
Pull Request #58

Pull Request #59

  • Added support for new evaluation metrics namely Average Precision, Average Recall, mAP, etc.
  • Improved IOU computation method and used COCO Dataset based technique for computation of IOUs for crowded objects.
  • Tested Precision and speed of above Metrics Computation against COCO Api and the results matched exactly.
  • Also the speed was around 4.68 times faster for a set 100 images, when run on an Intel i5 quad Core machine and Ubuntu OS. DetectionSuite takes 194.889 ms and COCO API takes 913.136 ms.
Pull Request #62

Pull Request #63

  • Added Instance Segmentation Inferencing Support
Pull Request #65

Pull Request #66

Phase 2 Ends
  • Added Support for storing Inferences in Deployer which can be generated from ROS/ICE streams, USB Camera, Video or JdeRobot Recorder Logs into JSON format.
  • Option to halt Deployer in the middle of Inferencing built in the GUI.
Pull Request #67
  • Added Option to change Confidence Threshold live while inferencing in the deployer.
  • Decreased Memory Consumption while inferencing large datasets. For Inferencing COCO Val2017 memory consumption was reduced from 4GB’s to 100MB’s. [ Run on intel i5 Quad Core with 8GB’s of RAM ].
  • Support for Detector to write results in reader Dataset Format, if writing is enabled.
Pull Request #73
  • Improved Segmentation Masks to have random colours and solid outline for Non-Crowd objects so as to improve visibility.
  • Support for Compressing Binary Segmentation Masks or Contours to a custom Run Length Encoding (RLE) so as to be more memory efficient while inferencing.
  • Support for Evaluation Metrics if using Segmented Mask as a Region
  • Tested Evaluation Metrics Support for Segmentation against COCO API and the results were again identical.
  • Added support for build on macOS and updated build instructions for the same.
Pull Request #77

Pull Request #79

  • Build support for both OpenCV versions 2.4 and 3.4 or greater.
  • Packaged DatasetEvaluationApp in an App Image which will enable users to run this GUI based app on any linux distribution with a single command.
  • Added support for Travis CI so as to check builds and publish App Images to releases with every push. Support was added for both operating systems Linux and macOS
Pull Request #83 -
Coding Period Ends
Final week
  • Released 2 AppImages one with ROS and ICE support and other without them.
  • Updated Travis CI to to package both of the continuously.
  • Changed DeepLearningSuiteLib Build Structure to support build different include directories.
  • Both Command Line Evaluator and GUI evaluator will now write Results in a CSV file in the same directory.
Pull Request #85 -

Documentation and Tutorials[edit]

Wiki Pages Final Wiki Revision till GSoC
Home 43c7b5e
Automatic Evaluation. 4c55faa
Beginner's Tutorial to DetectionSuite Part 1 bbf9177
ClassNames 33db4f4
Converter link title
Deployer d7fb531
Detector b12bf69
Evaluator link title
Frameworks 5706d60
Model Zoo af5ce03
Testing DetectionSuite 33db4f4
TroubleShooting link title
Viewer b12bf69

Object Detection using ILSVRC2014 Dataset (Imagenet's Challenge)[edit]

To improve upon previous results I used a better dataset which has bounding boxes over specific people. Some Statistics about Person class are given below:

Class(Person) Train Samples Validation Samples
Annonated Images 29,700 5,792
Total Objects 60,255 12,823

Due to its scale, this dataset will surely improve detection accuracy, so I trained a faster_rcnn_restnet50 and tested it again on the same test images. And the improvement is clearly visible.

Some, curves depicting localization and classification loss are also attached below:

Object Detection using Imagenet[edit]

To see the results of detection I used dl-objectdetector and modified it to load models on run time and then detect objects. Some of the pre-trained detectors are available are available in model zoo trained on datasets like COCO, KITTI and Open Images dataset, using various detectors like faster-rcnn-restnet101, ssd mobilenet, etc. So, I had to train a network which predicts bounding boxes on an image using bounding box regressors using Imagenet datasets. Therefore I used the people synset to train my detector which had around annotations for 3K images, containing people. Some sample Images of this synset are attached below:

As we can see there are no individual bounding boxes, and are covering the area where people are present rather than humans themselves. Anyways, I trained a faster-rccn-restnet101 after converting the dataset into tf.record format with this config file. With a batch size of 1 and a maximum of 300 region proposals, it was trained for 50K iterations, resulting in following detections.
Sample detections:

Though, the trained faster-rcnn-restnet101 is to slow to detect in real time using object detector tool, that's why I will be shifting to ssd mobile net or a lighter net to perform real time detections.

Also, the person dataset in ILSVRC challenge 2014 is much better and has better bounding boxes, therefore, I will be shifting to that dataset for further training.

Preliminary Detector using Tensorflow[edit]

A Covolutional Neural Network consisting of 2 Convolution layer with max pooling and a neural network layer for binary classification of Humans using the INRIA Dataset. Attached below are the curves of loss vs iterations and log of loss (ln(loss)) vs iterations. The network was trained using a batch size of 50 and learning rate of 0.001 with Adam Optimizer. Since log vs iterations is not suitable for visualisation since loss decreases by small margins, that's why log of loss better depicts training.

Test Results

Total Positive Samples 1126
Total Negative Samples 453
True Positives 1099
True Negatives 405
False Positives 48
False Negatives 27
Precision 0.9581
Recall 0.9760
F1 Score 0.9669

This is a pre-eliminary classifier, it will be further improved by using Hard Negative Mining.

Using Point Cloud Library[edit]

Point Cloud Library will play a significant role in the development of this project, and below is a sample video of tracking Humans on a Ground plane after selecting it using 3 non-collinear points. Also, the method uses Euclidean Clustering to segment out Point Cloud Clusters, and HoD (Histogram of Depths) is further used on them to extract features which are further fed into a Support Vector Machine for classification.

Replication of OpenPtrack v1[edit]

OpenPtrack is an Open Source Library for Human Detection and tracking which supports various sensors and cameras, including tracking using multiple cameras or depth sensors. Below is a video of OpenPTrack v1 using Microsoft Kinect v1 solely for input.

Getting started[edit]

Since, this component will be finally integrated with jderobot, the first step therefore should be to get familiar with jderobot infrastructure using examples.
Jderobot Color Tuner

Jderobot Kobuki Viewer

Since, the project will use RGB-D data as input, therefore a Depth Sensor is a must. For this project Microsoft Kinect v1 will be used and below is a sample video of fetching depth frames from the sensor.

As seen above one advantage of using Depth Frame is that it is independent of ambient lighting, i.e no matter how dark or bright the surroundings are, depth frame will only change if the surroundings are changed or the camera is moved.

Conventional Methods[edit]

Conventional Methods of detecting humans generally use 2D images and sliding window at different scales along with a feature extractor and classifier. For instance this example uses Histogram go Gradients(HoG) to extract features which are fed into a Linear SVM for classification. Also, the preliminary Linear SVM is trained for C = 0.01, which was further hard negative mined to reach a better Classifier. Raw results of classification are shown in the left, which are further minimised using Non maxima Suppression(NMS) and the results are drastically improved.

Such a classifier works fine as long as the pedestrians are standing or walking, but if someone's sitting or standing in a rather rare posture, then such a classifier fails.


The aim of this project is to detect Humans in images using Depth Information, and to also detect different body parts to be able to track them and therefore determine skeletal posture in real time. This will be an advancement to conventional techniques using RGB images by using depth data in order to detect humans and classify body parts.