Computer Vision and ROS

ROS/Images/OpenCV

Useful links

image_pipeline is the stack for dealing with ROS Image messages. The packages below are part of image_pipeline:
1. image_view
  - Package used to view ROS Image messages.
2. image_proc
  - Package used to process images (often takes input from camera driver or rosbag) and undistorts it. See stereo_image_proc for the stereo equivalent.
3. camera_calibration
  - Package used to calibrate the camera (to get intrinsic parameters).
  - After calibrating the camera, the camera node (i.e. usb_cam) will take care of publishing the calibration information for that camera on the /camera_info topic along with the raw image topic.
- Intrinsic camera parameters are those that are independent of the camera position or orientation. They are essentially focal length and the distortion coefficients.
- Extrinsic camera parameters are those that describe the "coordinate system transformations from 3D world coordinates to 3D camera coordinates." In other words, how to turn the pixels in an image to points in 3D space.
image_transport is the hidden layer underneath nodes that use ROS messages. Essentially, it takes care of publishing all the extra "topics" for the different compression formats (image_transport doesn't use ROS topics, but something very similar and transparent for the user).
- ROS Image messages can be in RAW format (not encoded or compressed), or they can be compressed (in JPEG format mostly), in which case they would be CompressedImage ROS message types. In order to take care of all this nonsense compression magic, image_transport is used as a hidden layer underneath nodes that use ROS Image messages (such as image_view or image_proc).
Notes on Images in ROS
Besides the image_resizer we have, you can also compress/decompress images in rosbags by playing them and recording them at the same time using the different transport (compressed or not). Also, check out this image_compressor node (this one uses PNG compression).

Going from Image coordinates (pixels) to World coordinates (x,y,z relative to camera)

There's two approaches I can think of:

This answer here. The main idea is that given an intrinsic camera matrix (obtained from calibration), we can multiply the inverse intrinsic matrix by the pixel coordinates to get a normalized x,y,z coordinate (think of this as the 3d Ray coming out from the center of the camera), which we will intersect with the ground plane (fixed known height). We can then solve for actual x and y coordinates.
Using a library (image_geometry) function called projectPixelTo3dRay(uv), which will give us that 3D ray. Then we intersect it with our ground plane.

Cameras and Lenses

List of companies:
Axis Lens Guide
Edmund Optics Imaging Resource Guide
Read all this about lenses and here also.
What to lookout for:
- The lens focal length and the camera image sensor size determine the field of view you get.
- Also watch out for aperture size and vignetting.
How to calculate Angle of view:
- First find the image sensor dimensions (check link below).
- Then find the lens focal length.
- Then plug them into this equation:
Don't worry too much about lens mount types because there's adapters out there. Usually C-mount or CS-mount are what we'll need. (CS cam and C lens can be used with a spacer, but not vice versa)
List of image sensor dimensions.
Possibly use DSLR cameras: use Python scripts to remote control cameras.
Global vs Rolling shutter
CCD vs CMOS sensor types
Iris features (basically controllable aperture. Auto-iris works best for changing lighting conditions but might suffer diffraction lens flare artifacts under bright light)

General Overview

ignore this section for now

First Approach

Stereo camera driver node publishes raw image topics.
stereo_image_proc subscribes to these images and publishes a pointcloud of everything.
Some node listens to pointcloud and separates ground points and above-ground points.
The ground points are thrown into line_detection, which has to figure out which pixels these points came from (reconstruct an image of the ground), and then find the lines in that ground image. Once the lines are found in the image, we pick the points that correspond to these line pixels and publish those points into the costmap
The above-ground points are thrown into costmap as barrels and obstacles.

Second Approach

Stereo camera driver node publishes raw image topics.
line_detection subscribes to raw image and publishes image with only lines. stereo_image_proc1 then subscribes to the line images and publishes line pointcloud to costmap (line layer).
in parallel, stereo_image_proc2 also subscribes to raw images and publishes pointcloud of everything. A ground_chopper node subscribes to this pointcloud and chops off ground (publishes above ground pointcloud to costmap obstacle layer)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Computer Vision and ROS

ROS/Images/OpenCV

Useful links

Going from Image coordinates (pixels) to World coordinates (x,y,z relative to camera)

Cameras and Lenses

General Overview

First Approach

Second Approach

Clone this wiki locally