- 1 Project Title: Creating realistic 3D Map from visual odometry and raw images
- 2 People
- 3 Project Repository
- 4 Project Overview
- 5 Current Progress (August 11st, 2018)
- 6 Short Summary (Development Log) on Individual Parts
- 7 Future Work
- 8 Weekly Progress
Project Title: Creating realistic 3D Map from visual odometry and raw images
This project aims at implementing a complete system for creating realistic 3D maps from visual odometry and raw images.
this projects also takes input from online sparse SLAM algorithms as they provide more accurate trajectory information than visual odometry.
- Jianxiong Cai (caijx97 [AT] gmail.com)
- Eduardo Perdices (eperdices [AT] gsyc.es)
- José María Cañas Plaza (jmplaza [AT] gsyc.es)
- The main project repository is : slam-MapGenerator.
- All the code in the above github repo are for this GSOC 2018 project.
- Modified DSO (to save result pointcloud) is at dso(forked).
- Refer to commit log 7d7419d4e.
- All datasets and some result (reconstructed surface) is at My Google Drive.
What is it?
This project is included within the computer vision field and, in particular, within the visual SLAM research field. Visual SLAM algorithms are focused on creating and updating a map of an unknown environment and simultaneously calculating the location of a robot/camera within this map. They often employ cameras (RGB or Depth cameras) as input and usually run in real time. These algorithms have many real applications, like real-time augmented reality, robot navigation or, like in this project, 3D map reconstruction and visualization.
Visual odometry (VO), however, focuses on calculating camera poses (movement) only. Compared to SLAM, They are more light-weighted, some of them can even run on mobile phones. However, they sometimes completely ignore map building and often produce less accurate trajectory and sparse pointcloud.
This project is built upon existing Visual Odometry algorithms (including sparse SLAM algorithms). Unlike most visual SLAM algorithms, which only takes RGB (or RGB-D) images as input, it takes the trajectory and pointcloud produced by VO algorithms as well as raw images. The goal of these project is to create maps from sparse pointcloud (with RGB images) in simple environments.
It has many potential applications, as it extends VO algorithm the capability to build maps with a relatively low cost. One of its possible and classical application could be to estimate the table plane for AR games.
Why is it important?
- Dense Mapping can't work on some condition.
- Although there is some dense mapping algorithm producing the really nice map, they can't work under some certain condition. For example, with the consideration of power consumption, it might not be preferred to use on mobile phones.
- Sparse online SLAM algorithms only produce sparse pointcloud, which is not ideal. Surface Reconstruction on that will be much more useful. (at least it would be nice for visualization)
- A complete system pipeline for utilizing other techniques for surface reconstruction. (such as image segmentation, deep-learning based object detection).
- In the long plan, it would be cool and probably useful to utilize object detection for surface reconstruction. As this could provide much useful prior knowledge.
- Even more, textureless surfaces is always a problem for SLAM. If we can know it's a plain wall using side-channel techniques, we can solve it.
- Note: deep-learning is not part of this project, it's just for the long-term plan.
To implement a system that could reconstruct the surface from sparse online SLAM pointcloud, the major problem is that the pointcloud is sparse and often of low quality. To be specific:
- Some algorithm does not have loop closure (like DSO, SVO)
- For real-time consideration, online SLAM / VO algorithms usually choose to perform windowed bundle ajustment.
- The pointcloud are sparse, especially on textureless surfaces)
To solve the above problems, this project contains 3 important parts:
- Loop Closure
- Loop Detection
- Loop Correction
- Pointcloud Alignment:
- Globally optimizing 3D points
- Surface Reconstruction
- Outliner Removal
- Surface Estimation
The overall architecture of this project:
The ultimate goal in the long-term (not GSOC 2018) would be to produce such a map (3D):
This would require future works because this is still an active research area and that is no mature solution for producing this ideal map.
Current Progress (August 11st, 2018)
In general, the project has 3 parts
- Loop Closing (Including Loop Detection and Loop Correction)
- Global Bundle Adjustment (BA)
- Surface Reconstruction
Currently Progress Overview: All the coding are finished and tested. Refer to the project repo (on the top of this page) for the code.
- Loop Closing: Completed and working.
- Global BA: Completed and working.
- Surface Reconstruction: Completed and working.
Detailed Current Progress:
- Loop Closing:
It has been completed and is working now. In the following demo, I use TUM Monocular Visual Odometry Dataset Sequence 43 as raw input, SD_SLAM (a derivated work from ORB2_SLAM) as the sparse SLAM algorithm to get the input trajectory. (Note: the images has been pre-processed to get undistorted images)
The original trajectory is shown following. There is an obvious gap between the start and end of the trajectory.
The optimized trajectory is shown following. The loop closing is working as the gap has been closed.
- Global Bundle Ajustment
For bundle adjustment, it would be hard to visualize the difference between optimized and non-optimized. So optimization report is provided as demo: The test setting are exactly same as loop-closing. (note: the loop closure is not enabled in producing input data with sd-slam). It tooks about 45 seconds on i7-6700 with 8G memory.
Solver Summary (v 2.0.0-eigen-(3.2.92)-lapack-suitesparse-(4.4.6)-cxsparse-(3.1.4)-eigensparse-openmp) Parameter blocks 13881 13881 Parameters 43611 43611 Residual blocks 68825 68825 Residuals 137650 137650 Minimizer TRUST_REGION Dense linear algebra library EIGEN Trust region strategy LEVENBERG_MARQUARDT Cost: Initial 1.621155e+05 Final 1.465352e+05 Change 1.558030e+04 Time (in seconds): Preprocessor 0.087415 Residual only evaluation 0.750797 (58) Jacobian & residual evaluation 2.197656 (58) Linear solver 40.533847 (58) Minimizer 43.962356 Postprocessor 0.003032 Total 44.052804
- Surface Reconstruction
The surfaces are not consistent and contain many holes in the above video. This is because there are not enough point observations in the input data. This is similar to the problem of the textureless plane in SLAM, and there is no mature solution for that.
One possible solution is to use some prior knowledge to aid the surface reconstruction, which I would be willing to try after GSOC. (for project scope consideration, this is not part of GSOC 2018)
Short Summary (Development Log) on Individual Parts
The Loop Closing contains two parts, loop detection and loop closing. The loop detection part is using DBoW2 with orb descriptor. A manually set threshold is used to identify possible loop closing pairs. The loop closing part is using ceres-solver for optimization. Part of the code is adapted from ceres-solver.
I have spent quite a long time implementing that (almost continued about 2-3 weeks, excluding loop detection). Here is a short summary of the problems I met and the final implementation.
- The main problem I encountered was that there is no suitable SFM library in C++.
- The loop closing part in ORB2_SLAM is tightly coupled with the co-visibility graph, which makes the code review a little bit harder.
- Besides, the ORB2_SLAM is a complete framework, which makes it hard to separate loop closing part, even though I understand the code. I gave up that approach when I found their loop closing requires the relative position calculated in the frame registration (only online-SLAM need that part).
- About third-party library, I tried with OpenCV, OpenMVG. They are great libraries, but somehow their related functions assume the camera intrinsic matrix have the same fx and fy.
The current approach I implemented here is that:
- In contrast to ORB2_SLAM, which run loop correction each time it detects a loop closing pair, slam-MapGen only run loop correction once and correct all loop closing pairs at the same time.
- The optimizer is using ceres-solver, which has a nice documentation.
Global Bundle Adjustment:
The Global Bundle Adjustment still uses ceres-solver as the optimization backend. Some part of the code is adapted from ceres-solver BA example, but there are modified. It only optimizes the camera extrinsic and point poses. Camera Intrinsic are taken from the input. It's using the pinhole camera model with fx, fy, cx, cy.
Surface Reconstruction The major problem for surface reconstruction is there are too many outliers and too few observations. To remove the outliners, we assume the surface needed to be reconstructed are plane, which is true for most indoor environment. So RANSAC with plane estimation is used for outlier removal.
Too few observations cause 2 major problems, as shown in the demo video. First, some surface is inconsistent, because some area simply contains nothing. Secondly, because some plane didn't get all three 2D vertexes observations in a single image, they are discarded to avoid false positive. These two problems are hard to solve, I may seek for solutions in the future.
YAML Trajectory file format A customized YAML file format for storing trajectory is widely used in this project and other JdeRobot SLAM Project (including slam-sd, slam-DSO). I have spent some time saving the trajectory in that format both for slam-DSO and this project.
Maybe we should come up with a library for YAML trajectory file format, such that other related projects don't have to copy, paste and modify this part of code.
We are interested in using deep learning, especially object detection and plane estimation, to aid the surface reconstruction. Thus, after GSOC 2018 (August 22st, 2018), we would like to do some further work.
There are several directions:
- Use some other surface reconstruction tecchqiue to handle more complex enviroment.
- Use deep learning (object detection) to provide initial guess on object existence
- Use segmentation on raw images to estimate textureless surface
- Use deep learning (surface detection) to provide more accurate prior knowledge on surface.
- Other interesting and prespecting idea
|Phase 1: Loop detection|
|1||Get familiar with dataset and tools (ORB-SLAM, SD-SLAM and slam-viewer, etc)||DONE||May 20, 2018|
|2||Overall Project Structure and Loop detection||DONE||May 27, 2018|
|3||Loop Detction||DONE||Jun 3, 2018|
|4||Cleanup codes from last phrase + Loop Correction||DONE||Jun 10, 2018|
|Phase 2: Pointcloud Alignment|
|5||Skipped for exams||DONE||Jun 17, 2018|
|6||Loop closure and Surface Reconstruction||DONE||Jun 24, 2018|
|7 - 8||Surface Reconstruction||DONE||Jun 8, 2018|
|Phase 3: 3D Map Reconstruction|
|9||Surface Reconstruction||DONE||Jun 15, 2018|
|10||Surface Reconstruction & Global BA||DONE||Jun 22, 2018|
|11||Surface Reconstruction & Global BA||DONE||Jun 29, 2018|
|12||Documentation and cleanup||DONE||Aug 11, 2018||
14 May - 20 May
- Get familiar with SD-SLAM, slam-viewer, datasets and other development utility
- get familiar with DBoW2
% Some demo about SD-SLAM and slam-viewer will be added here
21 May - 27 May
- Method: iterate through all the images to detect loop
- Why not use the co-visibility map? Although ORB2 SLAM implements the co-visibility map, there is no guarantee that all input point cloud has that information.
28 May - 3 Jun
- The loop detector is working, it can detect loops now.
- Replace the logging with boost::logging so that the debug info can be disabled.
- Extend standard DBoW2 library by templated inheritance.
- The visual vocabulary is obtained from ORB2_SLAM, as well as the code for reading and writing it.
- Add project namespace MapGen
- The threshold for loop closure detection is fixed as 0.575 now, maybe there should be a way to automatically adjust it.
- There was a concern that we should not perform loop closure if the camera hasn't moved enough distance (e.g. if the camera is spinning around). ORB2_SLAM is using co-visibility map for doing that, but we don't have that map in this project (so far).
- A possible solution would be to calculate the accumulated disparity and determine whether the camera has moved enough distance. Since there is only 4 loops in the following demo, it's not that urgent now. I am going to solve loop correction first.
DEMO: The loop detector detected 4 loops in TUM Monocular Visual Odometry Dataset (sequence 43), as listed following:
[2018-06-04 14:06:16.000819] [0x00007f658f53c740] [info] detected loop closing pair: 00000.jpg & 00437.jpg [2018-06-04 14:06:16.000841] [0x00007f658f53c740] [info] detected loop closing pair: 00437.jpg & 02121.jpg [2018-06-04 14:06:16.000846] [0x00007f658f53c740] [info] detected loop closing pair: 00443.jpg & 01970.jpg [2018-06-04 14:06:16.000851] [0x00007f658f53c740] [info] detected loop closing pair: 00443.jpg & 02037.jpg
Below are three frames from the dataset (00000.jpg, 00437.jpg & 02121.jpg):
3 Jun - 10 Jun
- Add code-generation in cmake for automatically set logging level with respect to CMAKE_BUILD_TYPE
- Add config file for loop-detection
- Add Acknowledgement throughout the code
- Move code repo from old repo to slam-MapGenerator
- Convert vocabulary to tar.gz (standard format supported by DBoW2), such that user may use other vocabularies if they want.
- Review the ORB2_SLAM loop correction part and search for libraries on SfM (Structure from Motion)
- The loop correction in ORB_SLAM2 is using co-visibility map, but currently, we don't have that map.
- Possible Solution:
- rebuild the co-visibility map (but then this project would be highly similar to ORB2_SLAM)
- reimplement the loop correction step (using pose-graph optimization directly)
- Possible Solution:
- Maybe use OpenMVG to slove two-view geometry.
18 Jun - 24 Jun
- Use OpenCV for solving the two-view geometry
- Store keypoints and ORB features in Keyframe so that they can be used for triangulation later.
- A problem with the fisheye camera from TUM dataset. (didn't find the camera intrinsic/undistorted images)
25 Jun - 3 July
Development Log (Loop Closure)
- Loop Closure is not working, because OpenCV API assume fx = fy, which is not always true. Besides, g2o in ORB2_SLAM got extended somehow. As the result, I am going to revisit loop-closure and redesign it weeks later, as too much time has been spent on this. I am going to work on Surface Reconstruction.
Development Log (Surface Visualization)
- I have completed the visualization for surface (triangles). it's using opengl for drawing and Pangolin for visualization. A demo image is shown following, it's a little bit hard to see the 3D geometry. You may try the dummy demo by:
- Issues in visualization: I tried to use a buffer to draw the triangle (OpenGL 3 supports that in this tutorial). However, it hides menu bar created by Pangolin somehow. The solution so far is to switch to an older API (use in OpenGL2)
Development Log (Surface Reconstruction)
- tried with PCL fast triangulation and it works, but with the sample data I have, it's of low performance. I will adjust the threshold and see what happens next.
4 July - 8 July
Development Log (Surface Reconstruction)
- Rewriting the logging utility. Get rid of Boost Logging Utility, because it can easily cause a compile error and its usage is a little bit complicated.
- Use PCL to get following results (video), and it does not look good because ORB2_SLAM provides too sparse points.
- Hacking DSO to get more dense pointcloud as the input
Demo (Saved DSO Pointcloud)
- DSO is a visual odometry algorithm. The algorithm produces a much denser pointcloud as the result, compared to ORB2_SLAM.
- I modify the code of DSO to save the resulting pointcloud. Currently, the pointcloud is saved in the form of PCD file.
- The visualization of PCD file is using CloudComapre for now.
Demo (Surface Reconstruction using ORB2_SLAM Pointcloud) I tried surface reconstruction with fast triangulation using ORB2_SLAM pointcloud. The result doesn't look good, because the pointcloud produced by ORB2_SLAM is too sparse to get reconstructed.
Working Demos (Surface Reconstruction using DSO PointCloud) Here are two surface reconstruction demos with , but a more completed demo with parameter compasion has been uploaded to the Section Current Progress.
Tested and not working demos
- Poisson Reconstruction with DSO PointCloud:
This Poisson Reconstruction with DSO is not working because this method aims at reconstructing a watertight object. This is under the assumption that the object to be reconstructed is watertight. I tried this method because we think the whole 3D map can be viewed as a huge object. However, The above demo shows that the map can't fit in the assumption, in our application scenario.
- Fast triangulation with ORB2_SLAM PointCloud:
The demo below contains 2 parts: the former part is the pointcloud produced by ORB2_SLAM (the input for surface reconstruction) and the latter part is the reconstructed surface.
PointCloud from ORB2_SLAM doesn't have good performance on surface reconstruction, because the ORB2_SLAM lose too many information (including lines). Even human can hardly recognize anything from the pointcloud produced by ORB2_SLAM.
8 July - 14 July
Development Log (Documentation and demos) For GSOC's second evaluation, I spent some time writing the documentation for the code, and update current progress with some more demos for this wiki. :)
Development Log (Loop Detection) I have successfully got the undistorted images from TUM sequence 43 datasets. With the undistorted images, I am able to detect more loop closing pair.
Several loops closing pair are detected (using undistorted images, other experiement setup remains the same as that in phase 1):
[INFO, loop_closing.cpp] detected loop closing pair: 00000.jpg & 00431.jpg [INFO, loop_closing.cpp] detected loop closing pair: 00000.jpg & 00437.jpg [INFO, loop_closing.cpp] detected loop closing pair: 00431.jpg & 02121.jpg [INFO, loop_closing.cpp] detected loop closing pair: 00437.jpg & 02121.jpg [INFO, loop_closing.cpp] detected loop closing pair: 00437.jpg & 02138.jpg [INFO, loop_closing.cpp] detected loop closing pair: 00443.jpg & 01970.jpg
I chosed 3 images to show (00000.jpg, 00437.jpg & 02121.jpg):
Development Log(Loop Closing)
- Finish the coding of Loop Closing, it is working now.
- This week's progress (Problems and Solutions) on loop closing has been added to the short summary on loop closing (In the Section 'Short Summary').
Development Log (Surface Reconstruction)
- Comparing the reconstruction results of different parameter setup.
- A more complete demo video comparing different parameter setup has been uploaded to the Section 'Current Progress'.
15 July - 22 July
Development Log (DSO Hacking):
- save DSO keyframes poses in JdeRobot file format, which can be visualized through slam-viewer
DSO trajectory saved in JdeRobot Fileformat:
DSO Pointcloud and Observation has been saved to file (in YAML format, which can be read by slam-viewer directly)
23 July - 30 July
Development Log (Surface Reconstruction Dataset)
- Record a video consist of 3 walls, but for some reason, DSO refuses to take that as input.
The video can be accessed through here
Development Log (Others)
- reconstruct the main() code. It came with a new nice config format, which can work with all 3 features now.
Development Log (Surface Reconstruction)
- Use RANSAC to estimate the planes in the simple environment, which consist of 5 planes.
- The RANSAC usually keep around 70% of the total points and throw away the outliners.
- This could automatically improve the input pointcloud quality by removing outliners. Besides, the estimated plane can be used for further reconstruction.
- Use greedy projection triangulation for surface reconstruction (based on the RANSAC outliner removal result)
- Each plane is reconstructed independently and concatenated together to get a better result.
- The following video shows the reconstructed surface. All surface are in green for now, and this will be improved in later updates. There are some holes in the reconstructed surface, because too few points are there.
- If we use a crazy configuration, the following surface can be reconstructed to eliminate the holes, but it causes false positive meanwhile.
Development Log (Global BA)
- Finished the coding and the optimization converges in the test case. Need some time to check the result and prepare for the demo.
31 July - August 7
Development LOG (Global BA)
- Fix bugs in global bundle adjustment:
- The camera intrinsic should not be optimized, because in our scenario, all images are taken with one camera model. It takes in the camera model in the trajectory file and just use it now.
- Fix a bug in projecting 3D points to 2D image, it converges and produces good results now.
- The BA takes around 45 seconds to optimize 13552 points and 328 keyframes on my computer (i7-6700 with 8G memory).
Development LOG(Code Integration)
- The 3 major parts have been integrated into one executable, usage will be updated on Github with the documentation soon.
- The executable supports 2 ways of visualization:
- 1) save to disk in YAML (JdeRobot Format), which can be open by slam-viewer.
- 2) visualize through pangolin
Development LOG (Surface Texture stitching)
- Finish majority of image stitching code
August 7 - August 11
- Surface Texture stitching is working.
- All GSOC milestones have been completed so far. ^_^
- Documentation will be available in a couple of hours.
Demo (Surface Reconstruction)
- This following demo contains some false-positive planes, but we can discard it based on some other criterions in future work.
Development Log (Surface Texture Stitching)
- The quality of the surface highly depends on the observations provided by online SLAM algorithm. It is working, but there is huge space to improve.
- First, the online SLAM algorithm often provides too few observations for each individual 3D point. This results in that many valid planes are discarded because there are no single image containing all 3 vertexes of the plane.
- Second, some observations are even incorrect (the observations is provided by slam-sd-slam, a derived work of ORB2-slam). You may check this in the video below. There is a straight line with some part sliding apart in final frames.