- 1 DetectionSuite (Deeplearning)
- 1.1 DeepLearningSuite [ Google Summer of code ]
- 1.2 Object Detection using ILSVRC2014 Dataset (Imagenet's Challenge)
- 1.3 Object Detection using Imagenet
- 1.4 Preliminary Detector using Tensorflow
- 1.5 Using Point Cloud Library
- 1.6 Replication of OpenPtrack v1
- 1.7 Getting started
- 1.8 Conventional Methods
- 1.9 Introduction
DeepLearningSuite [ Google Summer of code ]
A Brief Summary of work done:
- Added Support for benchmark and well known datasets like Pascal VOC, COCO and Imagenet. This includes writing read and write parsers for these datasets.
- Added and improved support for various frameworks namely TensorFlow, Keras and Caffe.
- Implemented Smart Class Mapping so as to map classnames between different datasets and make Converting dataset format more robust. It was achieved by storing classnames in a tree, where siblings represent synonyms, children represent sub classes or child classes, and ancestors represent super classes or parent classes.
- Improved Command Line Tools and added Auto Evaluator which could evaluate multiple datasets using multiple inferencers and frameworks and outputs results in CSV format.
- Improved Deployer to fetch images from Web Camera and over the network from ROS or ICE streams.
- Removed JdeRobot as a dependency completely, and added inbuilt support for ROS and ICE, which was earlier dependent on JdeRobot
- Shifted Config Files format to YAML
- Improved Evaluation Support by adding metrics such as COCO Dataset’s mAP metric and Pascal VOC’s metric.
- Instance Segmentation Inferencing and Evaluation Support.
- Packaged the tool in App Images for easier Usage.
- Improved Documentation and created Tutorials.
Detailed Description of Work done with TimeLine
Work done during GSOC on DetectionSuite described below in a weekly manner with the pull requests.
|Weeks||Brief Description of Work Done||Links to Pull Requests and Wiki Revisions||Remarks|
||Pull Request #29||This work was done and submitted a week before start of Coding Period, so as to compensate for exams at that time.|
||Pull Request #40
||Pull Request #43
||Pull Request #46||-|
||Pull Request #55
||Pull Request #58
||Pull Request #62
||Pull Request #65||-|
||Pull Request #67
||Pull Request #73
||Pull Request #77
||Pull Request #83||-|
||Pull Request #85||-|
Documentation and Tutorials
|Wiki Pages||Final Wiki Revision till GSoC|
|Beginner's Tutorial to DetectionSuite Part 1||bbf9177|
Object Detection using ILSVRC2014 Dataset (Imagenet's Challenge)
To improve upon previous results I used a better dataset which has bounding boxes over specific people. Some Statistics about Person class are given below:
|Class(Person)||Train Samples||Validation Samples|
Due to its scale, this dataset will surely improve detection accuracy, so I trained a faster_rcnn_restnet50 and tested it again on the same test images. And the improvement is clearly visible.
Some, curves depicting localization and classification loss are also attached below:
Object Detection using Imagenet
To see the results of detection I used dl-objectdetector and modified it to load models on run time and then detect objects. Some of the pre-trained detectors are available are available in model zoo trained on datasets like COCO, KITTI and Open Images dataset, using various detectors like faster-rcnn-restnet101, ssd mobilenet, etc.
So, I had to train a network which predicts bounding boxes on an image using bounding box regressors using Imagenet datasets.
Therefore I used the people synset to train my detector which had around annotations for 3K images, containing people.
Some sample Images of this synset are attached below:
As we can see there are no individual bounding boxes, and are covering the area where people are present rather than humans themselves. Anyways, I trained a faster-rccn-restnet101 after converting the dataset into tf.record format with this config file. With a batch size of 1 and a maximum of 300 region proposals, it was trained for 50K iterations, resulting in following detections.
Though, the trained faster-rcnn-restnet101 is to slow to detect in real time using object detector tool, that's why I will be shifting to ssd mobile net or a lighter net to perform real time detections.
Also, the person dataset in ILSVRC challenge 2014 is much better and has better bounding boxes, therefore, I will be shifting to that dataset for further training.
Preliminary Detector using Tensorflow
A Covolutional Neural Network consisting of 2 Convolution layer with max pooling and a neural network layer for binary classification of Humans using the INRIA Dataset. Attached below are the curves of loss vs iterations and log of loss (ln(loss)) vs iterations. The network was trained using a batch size of 50 and learning rate of 0.001 with Adam Optimizer. Since log vs iterations is not suitable for visualisation since loss decreases by small margins, that's why log of loss better depicts training.
|Total Positive Samples||1126|
|Total Negative Samples||453|
This is a pre-eliminary classifier, it will be further improved by using Hard Negative Mining.
Using Point Cloud Library
Point Cloud Library will play a significant role in the development of this project, and below is a sample video of tracking Humans on a Ground plane after selecting it using 3 non-collinear points. Also, the method uses Euclidean Clustering to segment out Point Cloud Clusters, and HoD (Histogram of Depths) is further used on them to extract features which are further fed into a Support Vector Machine for classification.
Replication of OpenPtrack v1
OpenPtrack is an Open Source Library for Human Detection and tracking which supports various sensors and cameras, including tracking using multiple cameras or depth sensors. Below is a video of OpenPTrack v1 using Microsoft Kinect v1 solely for input.
Since, this component will be finally integrated with jderobot, the first step therefore should be to get familiar with jderobot infrastructure using examples.
Jderobot Color Tuner
Jderobot Kobuki Viewer
Since, the project will use RGB-D data as input, therefore a Depth Sensor is a must. For this project Microsoft Kinect v1 will be used and below is a sample video of fetching depth frames from the sensor.
As seen above one advantage of using Depth Frame is that it is independent of ambient lighting, i.e no matter how dark or bright the surroundings are, depth frame will only change if the surroundings are changed or the camera is moved.
Conventional Methods of detecting humans generally use 2D images and sliding window at different scales along with a feature extractor and classifier. For instance this example uses Histogram go Gradients(HoG) to extract features which are fed into a Linear SVM for classification. Also, the preliminary Linear SVM is trained for C = 0.01, which was further hard negative mined to reach a better Classifier. Raw results of classification are shown in the left, which are further minimised using Non maxima Suppression(NMS) and the results are drastically improved.
Such a classifier works fine as long as the pedestrians are standing or walking, but if someone's sitting or standing in a rather rare posture, then such a classifier fails.
The aim of this project is to detect Humans in images using Depth Information, and to also detect different body parts to be able to track them and therefore determine skeletal posture in real time. This will be an advancement to conventional techniques using RGB images by using depth data in order to detect humans and classify body parts.