Skip to content

Computer Vision and ROS

Basheer Subei edited this page Apr 18, 2015 · 12 revisions

ROS/Images/OpenCV

Useful links

  • image_pipeline is the stack for dealing with ROS Image messages. The packages below are part of image_pipeline:

    1. image_view
      • Package used to view ROS Image messages.
    2. image_proc
      • Package used to process images (often takes input from camera driver or rosbag) and undistorts it. See stereo_image_proc for the stereo equivalent.
    3. camera_calibration
      • Package used to calibrate the camera (to get intrinsic parameters).
      • After calibrating the camera, the camera node (i.e. usb_cam) will take care of publishing the calibration information for that camera on the /camera_info topic along with the raw image topic.
    • Intrinsic camera parameters are those that are independent of the camera position or orientation. They are essentially focal length and the distortion coefficients.
    • Extrinsic camera parameters are those that describe the "coordinate system transformations from 3D world coordinates to 3D camera coordinates." In other words, how to turn the pixels in an image to points in 3D space.
  • image_transport is the hidden layer underneath nodes that use ROS messages. Essentially, it takes care of publishing all the extra "topics" for the different compression formats (image_transport doesn't use ROS topics, but something very similar and transparent for the user).

    • ROS Image messages can be in RAW format (not encoded or compressed), or they can be compressed (in JPEG format mostly), in which case they would be CompressedImage ROS message types. In order to take care of all this nonsense compression magic, image_transport is used as a hidden layer underneath nodes that use ROS Image messages (such as image_view or image_proc).
  • Notes on Images in ROS

  • Besides the image_resizer we have, you can also compress/decompress images in rosbags by playing them and recording them at the same time using the different transport (compressed or not). Also, check out this image_compressor node (this one uses PNG compression).

Going from Image coordinates (pixels) to World coordinates (x,y,z relative to camera)

There's two approaches I can think of:

  1. This answer here. The main idea is that given an intrinsic camera matrix (obtained from calibration), we can multiply the inverse intrinsic matrix by the pixel coordinates to get a normalized x,y,z coordinate (think of this as the 3d Ray coming out from the center of the camera), which we will intersect with the ground plane (fixed known height). We can then solve for actual x and y coordinates.

  2. Using a library (image_geometry) function called projectPixelTo3dRay(uv), which will give us that 3D ray. Then we intersect it with our ground plane.

Cameras and Lenses

  • List of companies:

  • Edmund Optics Imaging Resource Guide

  • Read all this about lenses and here also.

  • What to lookout for:

    • The lens focal length and the camera image sensor size determine the field of view you get.
    • Also watch out for aperture size and vignetting.
  • How to calculate Angle of view:

    • First find the image sensor dimensions (check link below).

    • Then find the lens focal length.

    • Then plug them into this equation:

      angle of view

  • Don't worry too much about lens mount types because there's adapters out there. Usually C-mount or CS-mount are what we'll need. (CS cam and C lens can be used with a spacer, but not vice versa)

  • List of image sensor dimensions.

  • Possibly use DSLR cameras: use Python scripts to remote control cameras.

  • Global vs Rolling shutter

  • CCD vs CMOS sensor types

  • Iris features (basically controllable aperture. Auto-iris works best for changing lighting conditions but might suffer diffraction lens flare artifacts under bright light)

General Overview

ignore this section for now

First Approach

  • Stereo camera driver node publishes raw image topics.
  • stereo_image_proc subscribes to these images and publishes a pointcloud of everything.
  • Some node listens to pointcloud and separates ground points and above-ground points.
  • The ground points are thrown into line_detection, which has to figure out which pixels these points came from (reconstruct an image of the ground), and then find the lines in that ground image. Once the lines are found in the image, we pick the points that correspond to these line pixels and publish those points into the costmap
  • The above-ground points are thrown into costmap as barrels and obstacles. diagram of first method

Second Approach

  • Stereo camera driver node publishes raw image topics.
  • line_detection subscribes to raw image and publishes image with only lines. stereo_image_proc1 then subscribes to the line images and publishes line pointcloud to costmap (line layer).
  • in parallel, stereo_image_proc2 also subscribes to raw images and publishes pointcloud of everything. A ground_chopper node subscribes to this pointcloud and chops off ground (publishes above ground pointcloud to costmap obstacle layer) diagram of second method
Clone this wiki locally