Advanced Computer Vision
I developed and taught this course, which was a follow-on to the introductory course, at the graduate level. It covered the topics of structure and motion estimation, segmentation, object detection and recognition, and tracking, using classical methods as well as deep learning methods. I used Python, OpenCV, and PyTorch for demonstrations and assignments.
In my courses, I emphasize hands-on work, and have the students work in pairs during class to complete lab assignments, usually one per week. Students also do programming assignments, and an independent final project of their own choosing.
Topics
Review of image formation, transformations, edge and line detection
Mathematical methods: linear and non-linear least squares, singular value decomposition
Direct linear transform
Estimating uncertainties in derived quantities
Essential and fundamental matrix
Structure from motion
Bundle adjustment
Stereo vision
Classification using decision trees, boosting, SVM
Convolutional neural nets: architecture, training
CNNs for object detection
Transfer learning
Example Assignments
Essential Matrix
This assignment was to detect and match features between two images and compute the essential matrix. Using the essential matrix, compute the relative pose between the cameras. The students also computed the true distance between the cameras, given that the window in the picture was of a certain known width.
First image, with epipolar lines.
Second image, with epipolar lines.
Structure from Motion
This assignment was to reconstruct camera poses from a sequence of six images, and determine the true size of the box, given the known size of the $20 bill in the picture.
One of the images, with the “ground control points” marked.
Reconstructed camera poses and 3D point positions.
Object Detection
This assignment was to train a boosting classifier to recognize room signs in our campus building, using HoG features. The images below show successful detections on two test images.
Example Final Projects
Artifact Image Search
This project by Alexander Dodge and Daniela Machnik had two parts: (1) Given an image of an ancient artifact such a lamp or bowl, find similar objects in a database of artifacts. (2) Place the artifact into a scene using correct perspective projection. For the first part, they used a CNN to generate a set of deep descriptors that were matched to a database. For the second part, the method finds vanishing points and computes the vertical planes in the scene. They can then warp a planar object to simulate its placement on a wall. See their slides for their class presentation.
Given the query image in the top left, the method finds the closest matching images in the database.
Detected lines are used to find vanishing points.
Artifacts are placed in the image.
Sudoku Solver
This project by Miguel Ruiz was to detect a Sudoku board from an arbitrary image, rectify it, and identify all cells. It then used a neural net model to recognize the digits that were present. After all digits have been classified, the unsolved board is fed into a Sudoku solver program which outputs the solved board as a string. Finally, the output string is parsed and the original board is displayed to the user with the solution overlaid on top. See his slides for his class presentation.
Input image (left), rectified image (center), solved puzzle (right).