Object Detection from Reference Images

This was exploratory research done at Diamond Age Technology. The goal of the project was to develop a method to detect objects, using a few “reference” images of the same (or similar) objects. The application is an augmented reality tool to assist a person in performing a task. The idea is that a subject matter expert annotates an example object with subpart labels and instructions, and then another user can see these labels mapped to the particular object they are looking at.

To do this I used the RoMa (Robust Matching) library to find robust features in the query and reference images. RoMa uses a transformer-based decoder to find matching pixel pairs, even when they differ significantly in viewpoint, illumination, or scale. It uses pre-trained features from the DINOv2 foundation model, that work across a wide variety of object types.

I match each reference image against the query image and pick the reference image with the highest number of consistent matches, meaning feature pairs that have each other as their best match. Once I have the “warp” (or flow) field between the images, I can map the annotations from the reference image to the query image.

Image Examples

The images below show examples of matching a reference image of an object to a query image of a different instance of the object. Each example shows the mapping of annotations from the reference image to the query image, and also the locations of the matching features.

Brake Assembly Example



Gate Valve Example

Video Examples

The videos below show the detection of an object using images from a hand-held phone camera. As shown in the guide in the adjacent figure, the inset image in the upper left of each video frame shows which reference image was matched. The inset image in the upper right of each video frame shows the matching features.

Legend for object detection videos below.


Fire Hose Valve

In this example, the reference object is a fire hose valve on one floor of a building, and the query object is an identical fire hose valve on another floor.


In this example the type of the query object is the same as the type of the reference object.

Fire Hydrant - Example 1

Fire Hydrant - Example 2

In this example the query object is different than the type of the reference object.