Adapting CNNs for Fisheye Cameras without Retraining
The majority of image processing approaches assume images are in or can be rectified to a perspective projection.
However, in many applications it is beneficial to use non conventional cameras, such as fisheye cameras, that have a larger field of view (FOV).
The issue arises that these large-FOV images can't be rectified to a perspective projection without significant cropping of the original image.
To address this issue we propose Rectified Convolutions (RectConv); a new approach for adapting pre-trained convolutional networks to operate with new non-perspective images,
without any retraining.
- We demonstrate RectConv adapting multiple pre-trained networks to perform segmentation and detection on fisheye imagery from two publicly available datasets.
- RectConv layers replace the convolutional layers of the network and allows the network to see both rectified patches and the entire FOV.
- Our approach requires no additional data or training, and operates directly on the native image as captured from the camera.
We believe this work is a step toward adapting the vast resources available for perspective images to operate across a broad range of camera geometries.
Publications
• R. Griffiths, and D. G. Dansereau, “Adapting CNNs for Fisheye Cameras without Retraining,” Arxiv, 2024. Preprint here.
Citing
If you find this work useful please cite
@article{griffiths2024adapting, title = {Adapting CNNs for Fisheye Cameras without Retraining}, author = {Ryan Griffiths and Donald G. Dansereau}, journal = {arXiv preprint arXiv:2404.08187}, URL = {https://arxiv.org/abs/2404.08187}, year = {2024}, month = apr }
This work was carried out within the Robotic Imaging Group at the
Australian Centre for Robotics,
University of Sydney.
Themes
Gallery
(click to enlarge)
Example of perspective and cylindrical camera projections applied to a wide field of view fisheye image. Regions in red show areas that are excluded from the rectified projection. Decreasing the focal length can reduce cropping but increases distortion.
For a given patch each pixel is converted to 3D space which is then sampled on a regular planar grid. This grid in 3D space is converted back to image locations that represent the kernel locations for that position.
Comparison of segmentation using a FCN-Resnet101 pre-trained on Cityscape. The unmodified pre-trained network shows poor performance, pre-rectification shows poor performance and suffers from dead zones that could not be included in the rectification, and the proposed RectConv shows the strongest performance while covering the entire image.
Detection and segmentation results for people on the PIROPO dataset using pre-trained segmentation (FCN-Resnet101) and object detection (FCOS-Resnet50) networks. RectConv has significantly improved the segmentation results and has a significant improvement in bounding box detections.
See paper for additional results.