Extra Material

This page contains the experiments and some source code for the approach described in the work Exploring Representaion Learning for Frame to Frame Ego-Motion Estimation.

Darkened and Lightened Sequences

This section show the results on the three KITTI test sequences transformed in contrast and gamma values. The three set of parameters used for these transforms are:

  1. min contrast 0.0, max contrast 0.4, gamma 1.5, a.k.a. darkened
  2. min contrast 0.0, max contrast 0.6, gamma 5, a.k.a. darkened_2
  3. min contrast 0.2, max contrast 0.7, gamma 0.2, a.k.a. lightened

Darkened

Darkened sequences simulate dusk conditions. Lowering contrast makes more difficult for feature extractors to find corners, still with these values of contrast and gamma crisp shadows are still recognizable. Compared to standard sequences, the darkened PCNN sequence error on average is 1.39% higher, while the SVR-S is 6.53% higher and the VISO2 is 4.99% higher. When we look at the single trajectories, it is possible to notice that the decrease in performances is higher in the 09 and 10 sequences. We suppose that this is due to the higher field-depth and linear speeds, that make these sequences more challenging.

VISO2-M SVR VO Sparse SVR VO Dense PCNN
Trans [%] Rot [deg/m] Trans [%] Rot [deg/m] Trans [%] Rot [deg/m] Trans [%] Rot [deg/m]
08 26.33 0.0389 41.81 0.1114 18.06 0.0490 8.45 0.0249
09 13.64 0.0357 19.88 0.0669 34.30 0.0550 11.03 0.0338
10 22.74 0.0352 29.28 0.0670 25.49 0.0646 20.03 0.0458
Avg 23.54 0.0387 35.04 0.1005 20.34 0.0545 10.28 0.0300

Trajectories for baseline and proposed methods on sequences darkened with 0.4 maximum contrast and 1.5 of gamma value.

Average errors across length and speed on the darkened sequences.

Example of the darkened images for sequences 08, 09 and 10.

Darkened 2

Darkened 2 sequences simulate night conditions. At these levels of contrast and gamma the shadows in the images are very dark and a lot of small details are lost. Clearly these transforms are only an approximation of what happens with low-light vision, but they give an insight of what the estimator algorithms do when there is a comparable decrease of detail. With these sequences we see a stark difference between PCNN and other methods. SVR-S has the lowest performance, probably becouse of a very simple Lucas-Kanade sparse feature extraction. However SVR-D and VISO2 have a traslational error that is near doubled in respect to the PCNN's one, and a rotational error that is between 30-40% higher.

VISO2-M SVR VO Sparse SVR VO Dense PCNN
Trans [%] Rot [deg/m] Trans [%] Rot [deg/m] Trans [%] Rot [deg/m] Trans [%] Rot [deg/m]
08 37.82 0.0493 52.96 0.1479 30.18 0.0784 14.53 0.0366
09 30.18 0.0537 26.26 0.0842 23.66 0.0773 15.82 0.0458
10 25.97 0.1305 38.88 0.1024 24.36 0.0546 18.53 0.0464
Avg 35.28 0.0610 44.61 0.1340 28.10 0.0792 15.25 0.0413

Trajectories for baseline and proposed methods on sequences darkened with 0.6 maximum contrast and 5 of gamma value.

Average errors across length and speed on the darkened sequences.

Example of the darkened images for sequences 08, 09 and 10.

Lightened

Lightened sequences simulate high light conditions thanks to low value of gamma correction. These images have also very low contrast (min 0.2, max0.7), so they are particularly challenging. The highest performance issues are for VISO2 on sequence 10, where it fails to extract enough features in many frames, so the error is huge. As with the preceiding examples, the behaviour of PCNN is better than SVM and VISIO2.

VISO2-M SVR VO Sparse SVR VO Dense PCNN
Trans [%] Rot [deg/m] Trans [%] Rot [deg/m] Trans [%] Rot [deg/m] Trans [%] Rot [deg/m]
08 32.24 0.0378 46.51 0.1072 19.90 0.0595 10.16 0.0294
09 18.71 0.0268 22.85 0.0752 24.36 0.0491 20.08 0.0391
10 91.36 0.0541 43.73 0.1294 22.47 0.0734 21.02 0.0460
Avg 36.83 0.0380 40.45 0.1059 21.31 0.0617 13.51 0.0343

Trajectories for baseline and proposed methods on sequences lightened with 0.2 minimum contrast, 0.7 maximum contrast and 0.2 of gamma value.

Average errors across length and speed on the lightened sequences.

Example of the lightened images for sequences 08, 09 and 10.

Blurred Sequences

This section show the results on the three KITTI test sequences transformed in blur. The two blur radii used are 3 and 10 pixels.

Blurred - radius 3 pixels

These sequences are blurred with a small radius of 3 pixels. The motre striking result is that the effects of this blur are slightly beneficial to PCNN for the translational errors, while for the other methods are not. In detail, the decrease in error in respect to standard sequences is -0.33%, while VISO2 and SVR-S have an increase of +4.99% and +8.07%. The average rotational errors are +11% +106% 2.92% respectivlely for PCNN, SVR-S and VISO2, showing a higher under-performance of sparse SVR on this sequences.

VISO2-M SVR VO Sparse SVR VO Dense PCNN
Trans [%] Rot [deg/m] Trans [%] Rot [deg/m] Trans [%] Rot [deg/m] Trans [%] Rot [deg/m]
08 23.70 0.0431 25.23 0.0674 14.41 0.0386 7.41 0.0229
09 10.99 0.0317 12.43 0.0438 21.99 0.0332 6.74 0.0253
10 25.83 0.0454 20.09 0.0463 26.74 0.0590 19.35 0.0380
Avg 23.54 0.0387 21.88 0.0623 17.52 0.0414 8.63 0.0262

Trajectories for baseline and proposed methods on sequences blurred with radius 3 pixels.

Average errors across length and speed on the radius 3 blurred sequences.

Example of the radius 3 blurred images for sequences 08, 09 and 10.

Blurred - radius 10 pixels

These sequences are blurred with a radius of 10 pixels. The results show only a slight increase in error for the dense methods, and again PCNN performs better, showing that the feature it learns are robust to high levels of blur. However, SVR-S and SVR-D are very similar in performances, and they are better than the blur s3 case. This suggest that there is something in SVR that helps in reducing the effects of high blur. The errors of VISO2 on highly blurred images are more than doubled.

VISO2-M SVR VO Sparse SVR VO Dense PCNN
Trans [%] Rot [deg/m] Trans [%] Rot [deg/m] Trans [%] Rot [deg/m] Trans [%] Rot [deg/m]
08 53.25 0.0694 20.00 0.0493 14.81 0.0395 7.41 0.0229
09 38.02 0.0593 13.87 0.0503 22.06 0.0372 11.80 0.0350
10 82.37 0.2021 19.15 0.0514 26.61 0.0621 19.87 0.0416
Avg 54.32 0.0856 18.66 0.0519 17.96 0.0433 9.82 0.0286

Trajectories for baseline and proposed methods on sequences blurred with radius 10 pixels.

Average errors across length and speed on the radius 10 blurred sequences.

Example of the radius 10 blurred images for sequences 08, 09 and 10.

Code

In the following links you find the tar.gz archives containing the code we used, the HDF5 datasets with the optical flow images (for the unmodified kitti images) and the network weights. If you find the code useful for your research, please cite the two works related to it as:

@ARTICLE{Costante2016,
author={G. Costante and M. Mancini and P. Valigi and T. A. Ciarfuglia},
journal={IEEE Robotics and Automation Letters},
title={Exploring Representation Learning With CNNs for Frame-to-Frame Ego-Motion Estimation},
year={2016},
volume={1},
number={1},
pages={18-25},
doi={10.1109/LRA.2015.2505717},
month={Jan}
}

@ARTICLE{Ciarfuglia2014,
author = "Thomas A. Ciarfuglia and Gabriele Costante and Paolo Valigi and Elisa Ricci"
title = "Evaluation of non-geometric methods for visual odometry ",
journal = "Robotics and Autonomous Systems ",
volume = "62",
number = "12",
pages = "1717 - 1730",
year = "2014",
note = "",
issn = "0921-8890",
doi = "http://dx.doi.org/10.1016/j.robot.2014.08.001",
}