A Domain Independent Approach for Learning-Based Monocular Depth Estimation #####REQUIREMENTS##### Caffe, compiled with MATLAB support (refer to http://caffe.berkeleyvision.org/) #####MODELS WIKI##### Three models are released: MIX_FCN, MIX_LSTM and MIX_EIGEN, as referred in the paper. The models have been trained to estimate depth up to 39.75 meters, with ground truth labels normalized between 0 and 1 (1=39.75 meters).The deploy files handle the un-normalizing process, so all the networks return an un-normalized, metric estimation of the depth. MIX_FCN is a fully convolutional network that can handle any input resolution, returning a non-downsampled depth estimation, when odd numbers divisions do not happen in the network's pooling layers. If you would like to test on a custom resolution, you will need to change the sizes for the input (at the beginning of the file) and the unnormalizer layer(at the end of the file) in mix_fcn.prototxt accordingly. MIX_EIGEN and MIX_LSTN are not fully convolutional, as their weights vector size depends on the input resolution. For this reason, the provided models can handle 256x160 inputs. MIX_LSTM will return a 256x160 depth map, while MIX_EIGEN return a 64x40 one. #####BENCHMARKS WIKI##### The benchmarks archives contain: -HDF5 files containing the test data -a MATLAB evaluation script (eval_[name_benchmark]_test_set.m) You should edit the scripts setting up the correct paths for your Caffe installation (it should be [dir_to_caffe]/caffe/matlab),the network models and the HDF5 files. #####KITTI BENCHMARK WIKI##### Data is stored into a single HDF5 file: kitti_data/kitti_test_set.h5 Input and ground truth resolution: 256x78 pixels. MIX_FCN will return a 256x80 depth prediction due to odd numbers divisions during pooling layers. Predictions will be then cropped to fit with the given GT. MIX_LSTM and MIX_EIGEN will be fed with a padded version, 256x160 sized, of the input. MIX_LSTM will return a 256x160 depth prediction. A 256x80 crop will be extracted from it, corresponding to the unpadded part of the input. The same processing described for MIX_FCN will be then performed to fit the GT size. MIX_EIGEN will return a 64x40 depth prediction. A 64x20 crop will be extracted and processed as described above. Returned errors will be relative only to a bottom half crop of the 256x78 ground truth, corresponding to the image area where valid laser measurements are present. Refer to KITTI website (http://www.cvlibs.net/datasets/kitti) and the Experiments section of our paper for further details. #####ZURICH BENCHMARK WIKI##### Data is splitted into 4 sequences, saved into 4 HDF5 files: zurich_data/seq_[].h5 The script, when the relative path to the HDF5 files is properly set, will automatically test all the sequences and return an overall result. Input and ground truth resolution: 256x160 pixels As the resolution corresponds to the one required by MIX_LSTM and MIX_EIGEN, pre and post processing is quite straightforward #####DATASET WIKI##### Depth GT is stored in a 8-bit PNG file, with values ranged between 0 and 255. You can get metric depth by plain unnormalization (a 255 pixel value corresponds to a metric depth of 39.75 meters, a 0 pixel value corresponds to 0 meters).