Darkened and Lightened Sequences

This section show the results on the three KITTI test sequences transformed in contrast and gamma values. The three set of parameters used for these transforms are:

min contrast 0.0, max contrast 0.4, gamma 1.5, a.k.a. darkened
min contrast 0.0, max contrast 0.6, gamma 5, a.k.a. darkened_2
min contrast 0.2, max contrast 0.7, gamma 0.2, a.k.a. lightened

Darkened

Darkened sequences simulate dusk conditions. Lowering contrast makes more difficult for feature extractors to find corners, still with these values of contrast and gamma crisp shadows are still recognizable. Compared to standard sequences, the darkened PCNN sequence error on average is 1.39% higher, while the SVR-S is 6.53% higher and the VISO2 is 4.99% higher. When we look at the single trajectories, it is possible to notice that the decrease in performances is higher in the 09 and 10 sequences. We suppose that this is due to the higher field-depth and linear speeds, that make these sequences more challenging.

	VISO2-M		SVR VO Sparse		SVR VO Dense		PCNN
		Trans [%]	Rot [deg/m]	Trans [%]	Rot [deg/m]	Trans [%]	Rot [deg/m]	Trans [%]	Rot [deg/m]
08	26.33	0.0389	41.81	0.1114	18.06	0.0490	8.45	0.0249
09	13.64	0.0357	19.88	0.0669	34.30	0.0550	11.03	0.0338
10	22.74	0.0352	29.28	0.0670	25.49	0.0646	20.03	0.0458
Avg	23.54	0.0387	35.04	0.1005	20.34	0.0545	10.28	0.0300

Trajectories for baseline and proposed methods on sequences darkened with 0.4 maximum contrast and 1.5 of gamma value.

Average errors across length and speed on the darkened sequences.

Example of the darkened images for sequences 08, 09 and 10.

Darkened 2

Darkened 2 sequences simulate night conditions. At these levels of contrast and gamma the shadows in the images are very dark and a lot of small details are lost. Clearly these transforms are only an approximation of what happens with low-light vision, but they give an insight of what the estimator algorithms do when there is a comparable decrease of detail. With these sequences we see a stark difference between PCNN and other methods. SVR-S has the lowest performance, probably becouse of a very simple Lucas-Kanade sparse feature extraction. However SVR-D and VISO2 have a traslational error that is near doubled in respect to the PCNN's one, and a rotational error that is between 30-40% higher.

	VISO2-M		SVR VO Sparse		SVR VO Dense		PCNN
		Trans [%]	Rot [deg/m]	Trans [%]	Rot [deg/m]	Trans [%]	Rot [deg/m]	Trans [%]	Rot [deg/m]
08	37.82	0.0493	52.96	0.1479	30.18	0.0784	14.53	0.0366
09	30.18	0.0537	26.26	0.0842	23.66	0.0773	15.82	0.0458
10	25.97	0.1305	38.88	0.1024	24.36	0.0546	18.53	0.0464
Avg	35.28	0.0610	44.61	0.1340	28.10	0.0792	15.25	0.0413

Trajectories for baseline and proposed methods on sequences darkened with 0.6 maximum contrast and 5 of gamma value.

Average errors across length and speed on the darkened sequences.

Example of the darkened images for sequences 08, 09 and 10.

Lightened

Lightened sequences simulate high light conditions thanks to low value of gamma correction. These images have also very low contrast (min 0.2, max0.7), so they are particularly challenging. The highest performance issues are for VISO2 on sequence 10, where it fails to extract enough features in many frames, so the error is huge. As with the preceiding examples, the behaviour of PCNN is better than SVM and VISIO2.

	VISO2-M		SVR VO Sparse		SVR VO Dense		PCNN
		Trans [%]	Rot [deg/m]	Trans [%]	Rot [deg/m]	Trans [%]	Rot [deg/m]	Trans [%]	Rot [deg/m]
08	32.24	0.0378	46.51	0.1072	19.90	0.0595	10.16	0.0294
09	18.71	0.0268	22.85	0.0752	24.36	0.0491	20.08	0.0391
10	91.36	0.0541	43.73	0.1294	22.47	0.0734	21.02	0.0460
Avg	36.83	0.0380	40.45	0.1059	21.31	0.0617	13.51	0.0343

Trajectories for baseline and proposed methods on sequences lightened with 0.2 minimum contrast, 0.7 maximum contrast and 0.2 of gamma value.

Average errors across length and speed on the lightened sequences.

Example of the lightened images for sequences 08, 09 and 10.

Blurred Sequences

This section show the results on the three KITTI test sequences transformed in blur. The two blur radii used are 3 and 10 pixels.

Blurred - radius 3 pixels

These sequences are blurred with a small radius of 3 pixels. The motre striking result is that the effects of this blur are slightly beneficial to PCNN for the translational errors, while for the other methods are not. In detail, the decrease in error in respect to standard sequences is -0.33%, while VISO2 and SVR-S have an increase of +4.99% and +8.07%. The average rotational errors are +11% +106% 2.92% respectivlely for PCNN, SVR-S and VISO2, showing a higher under-performance of sparse SVR on this sequences.

	VISO2-M		SVR VO Sparse		SVR VO Dense		PCNN
		Trans [%]	Rot [deg/m]	Trans [%]	Rot [deg/m]	Trans [%]	Rot [deg/m]	Trans [%]	Rot [deg/m]
08	23.70	0.0431	25.23	0.0674	14.41	0.0386	7.41	0.0229
09	10.99	0.0317	12.43	0.0438	21.99	0.0332	6.74	0.0253
10	25.83	0.0454	20.09	0.0463	26.74	0.0590	19.35	0.0380
Avg	23.54	0.0387	21.88	0.0623	17.52	0.0414	8.63	0.0262

Trajectories for baseline and proposed methods on sequences blurred with radius 3 pixels.

Average errors across length and speed on the radius 3 blurred sequences.

Example of the radius 3 blurred images for sequences 08, 09 and 10.

Blurred - radius 10 pixels

These sequences are blurred with a radius of 10 pixels. The results show only a slight increase in error for the dense methods, and again PCNN performs better, showing that the feature it learns are robust to high levels of blur. However, SVR-S and SVR-D are very similar in performances, and they are better than the blur s3 case. This suggest that there is something in SVR that helps in reducing the effects of high blur. The errors of VISO2 on highly blurred images are more than doubled.

	VISO2-M		SVR VO Sparse		SVR VO Dense		PCNN
		Trans [%]	Rot [deg/m]	Trans [%]	Rot [deg/m]	Trans [%]	Rot [deg/m]	Trans [%]	Rot [deg/m]
08	53.25	0.0694	20.00	0.0493	14.81	0.0395	7.41	0.0229
09	38.02	0.0593	13.87	0.0503	22.06	0.0372	11.80	0.0350
10	82.37	0.2021	19.15	0.0514	26.61	0.0621	19.87	0.0416
Avg	54.32	0.0856	18.66	0.0519	17.96	0.0433	9.82	0.0286

Trajectories for baseline and proposed methods on sequences blurred with radius 10 pixels.

Average errors across length and speed on the radius 10 blurred sequences.

Example of the radius 10 blurred images for sequences 08, 09 and 10.

Code

In the following links you find the tar.gz archives containing the code we used, the HDF5 datasets with the optical flow images (for the unmodified kitti images) and the network weights. If you find the code useful for your research, please cite the two works related to it as:

@ARTICLE{Costante2016,
author={G. Costante and M. Mancini and P. Valigi and T. A. Ciarfuglia},
journal={IEEE Robotics and Automation Letters},
title={Exploring Representation Learning With CNNs for Frame-to-Frame Ego-Motion Estimation},
year={2016},
volume={1},
number={1},
pages={18-25},
doi={10.1109/LRA.2015.2505717},
month={Jan}
}

@ARTICLE{Ciarfuglia2014,
author = "Thomas A. Ciarfuglia and Gabriele Costante and Paolo Valigi and Elisa Ricci"
title = "Evaluation of non-geometric methods for visual odometry ",
journal = "Robotics and Autonomous Systems ",
volume = "62",
number = "12",
pages = "1717 - 1730",
year = "2014",
note = "",
issn = "0921-8890",
doi = "http://dx.doi.org/10.1016/j.robot.2014.08.001",
}

CNNVO and SVR_VO Code

KITTI Optical Flow input (HDF5 format)

Networks weights

Extra Material