Vision Guided Control

COMPSCI 773 S1 T

Introduction

This course introduces computational methods and techniques used in vision-based real-time control. Many topics are only overviewed, but a number of interesting theoretical and practical problems are analysed in detail. You should not expect exciting things which may be found in sci-fi books or movies like "Terminator" as you will soon find out that even a seemingly simple automatic real-world action may present a real challenge.

Design of modern control systems involves different mathematical tools, especially, optimization techniques, matrix analysis, and analytic 2D / 3D geometry. Some tools will be explained in brief in the lectures. Still, you are expected to learn these methods in details and use them to complete assignments.

Assessment

Assessment is based on 60% course work (30% group work, 30% individual work) and 40% open-book final examination. Course work includes one-to-one oral test and assignments that exploit the hardware (digital cameras and PCs) available in the CITR Robotics Lab at Tamaki (room 731.234). For each assignment, each group will have to write a report which should be organised as follows:
  • Each member of the group works on a distinct part of the assignment and writes an individual report
  • Each group provides a group report presenting the group solution and achievements for each assignment. Basically the group report should consist of an introduction of the problem and different solutions proposed.
  • Both the individual and group reports should show students' abilities to:
    • analyse a problem
    • propose feasible solutions based on materials taught during the lectures or learnt while reading research papers
    • use statistical tools to assess their experimental results

Course work

A particular feature of the course work is the emphasis on complete system design. Therefore, instead of picking a small part of the material covered in lectures as assignment tasks, the project in this paper has the aim of developing a complete system to perform a specified task. The individual assignments present intermediate steps toward achieving this goal. At the end of the paper, there will be a competition to evaluate your project.

The equipment in the CITR Active Vision Lab consists of a number of PCs running Linux/Zindows and a few digital video cameras to be used to perform hand-gesture recognition and user's face authentication. Also there are two pan tilt digital cameras forming a stereo system for sensing and describing a 3D working space where hand gestures should be recognised and traced.

Nowadays, Human - Computer Interaction is a hot research topic. It consists mainly on extracting information (from audio-visual speech, visual expression, hand signs, body expression) to interact efficiently with a robot or a machine via a computer. Potential applications range from automatic speech recognition (ASR), videoconference, virtual reality, communication for disabled people, user verification and recognition (audiovisual biometrics features), to remote control of robots, vehicles, and devices.

This year course projects will encompass topics such as stereo vision and 3D positioning, feature extraction and classification with a focus on real-time processes for efficient interaction. Basically, you will have to:

  • Design an interface to  acquire images and videos from 2 USB cameras.
  • Find a 3D trajectory of a hand motion and specify particular objects using stereo vision and hand gesture recognition.
    • Your task will be to track your hand movement along an arbitrary path but with mandatory check points in a 3D field. You will have to use specific gesture to name the check point and create a meaningful 3D description of the work field.
    • You will have to identify a set of predifened hand postures using advanced statistical analysis techniques such as PCA, LDA.
  • Autentify faces in the labs
    • Extract faces from images and identify them using advanced statistical analysis techniques such as PCA, LDA.

The work is subdivided into three assignments covering the following parts of the project:

  1. Calibration of stereo cameras for computing 3D positions of a desired item in the cameras field-of-view by intersecting optical rays and visualising 3D movements of the item (you will use in this assignment the existing Tsai calibration software but also do several programming tasks in networking and client-server camera access). You will build a GUI which acquire images and videostreams from USB cameras and will display the 3D positions and trajectories any object part of the field.
    1. Hand gesture recognition for hand localisation, hand mask extraction, and hand signs recognition at real-time.
    2. Face authentication for faces localisation, face mask extraction, and face recognition at real-time.
  2. Whole system testing. You will integrate your previous work to:

      • effectively recognise hand signs and use these latter to indicate objects
      • display a sequence of check point objects visited by a moving hand

      • effectively extract faces from video-images
      • recognize faces

The schedule of these assignments is as follows:


Theme Due date
Assignment 1 Camera calibration, line detection, and 3D GUI April 1 (preamble) and April 15 (main part), 2005
Assignment 2 2D/3D Hand posture and face recognition
TBA
Assignment 3 Whole System Testing TBA

Basic Topics of the Course

  1. Camera calibration and projective geometry (for handouts: go to "Lectures")
    • 2D and 3D vision geometry
    • Single camera calibration
    • Stereo calibration: epipolar geometry, triangulation
  2. Low-level Image processing (for handouts: go to "Lectures")
    • Colour detection and discrimination
    • Binary image segmentation
    • Image matching
  3. Video sequence processing
    • Object tracking and image filtering
    • Real-time image processing
  4. Gesture and Face recognition
    • Feature extraction: hand and face
    • Feature classification (PCA, LDA)

      

Up

Course Page

Advance