A Domain Independent Approach for Learning-Based Monocular Depth Estimation 

#####REQUIREMENTS#####

Caffe, compiled with MATLAB support (refer to http://caffe.berkeleyvision.org/)

#####MODELS WIKI#####

Three models are released: MIX_FCN, MIX_LSTM and MIX_EIGEN, as referred in the paper.

The models have been trained to estimate depth up to 39.75 meters, with ground truth labels normalized 
between 0 and 1 (1=39.75 meters).The deploy files handle the un-normalizing process, so all the networks 
return an un-normalized, metric estimation of the depth.

MIX_FCN is a fully convolutional network that can handle any input resolution, returning a non-downsampled 
depth estimation, when odd numbers divisions do not happen in the network's pooling layers.
If you would like to test on a custom resolution, you will need to change the sizes for 
the input (at the beginning of the file) and the unnormalizer layer(at the end of the file) 
in mix_fcn.prototxt accordingly.

MIX_EIGEN and MIX_LSTN are not fully convolutional, as their weights vector 
size depends on the input resolution. For this reason, the provided models can 
handle 256x160 inputs. MIX_LSTM will return a 256x160 depth map, while MIX_EIGEN return a 64x40 one. 

#####BENCHMARKS WIKI#####

The benchmarks archives contain:
-HDF5 files containing the test data
-a MATLAB evaluation script (eval_[name_benchmark]_test_set.m)

You should edit the scripts setting up the correct paths for your Caffe installation 
(it should be [dir_to_caffe]/caffe/matlab),the network models and the HDF5 files.  

#####KITTI BENCHMARK WIKI#####

Data is stored into a single HDF5 file: kitti_data/kitti_test_set.h5

Input and ground truth resolution: 256x78 pixels.
MIX_FCN will return a 256x80 depth prediction due to odd numbers divisions during
pooling layers. Predictions will be then cropped to fit with the given GT.
MIX_LSTM and MIX_EIGEN will be fed with a padded version, 256x160 sized, of the input. 
MIX_LSTM will return a 256x160 depth prediction. A 256x80 crop will be extracted from it, corresponding
to the unpadded part of the input. The same processing described for MIX_FCN will be
then performed to fit the GT size.
MIX_EIGEN will return a 64x40 depth prediction. A 64x20 crop will be extracted and processed as described above.

Returned errors will be relative only to a bottom half crop of the 256x78 ground truth, corresponding to 
the image area where valid laser measurements are present.
Refer to KITTI website (http://www.cvlibs.net/datasets/kitti) and the Experiments section of our paper for further details.   

#####ZURICH BENCHMARK WIKI#####

Data is splitted into 4 sequences, saved into 4 HDF5 files: zurich_data/seq_[].h5
The script, when the relative path to the HDF5 files is properly set, will automatically test all the sequences and return an
overall result.

Input and ground truth resolution: 256x160 pixels
As the resolution corresponds to the one required by MIX_LSTM and MIX_EIGEN, pre and post processing is quite straightforward


#####DATASET WIKI#####

Depth GT is stored in a 8-bit PNG file, with values ranged between 0 and 255. You can get metric depth by plain unnormalization 
(a 255 pixel value corresponds to a metric depth of 39.75 meters, a 0 pixel value corresponds to 0 meters).