A simple example of
image retrieval based on building a vocabulary then retrieving closest match of the test image from the vocabulary. We are given a set of test image and training image. We can construct the vocabulary by first computing the SIFT features for each of the training image, then constructing the fixed-sized clusters of all the descriptors (coming from each of the image). Note that the descriptors represent the data points in the clustering. Each image has a variable number of descriptors (size of each descriptor is fixed, for SIFT it is 128). Once the vocabulary is computed, we can construct the
Histogram of Visual Words for each training and test image by the following approach:
All the SIFT features are stored for each of the training images. Note
that if there are F number of SIFT features detected in the image then
there should be F image descriptor, where each has 128 dimensional
vector. vl_feat returns each of the descriptors as column vector. This
descriptors will be used as data points in the next step for building the
vocabulary. Additionally each of the F feature also has 4-dimensional
information describing the location (x,y), scale, and orientation of the
frame disc. These are returned in the variable f. So f is a 4xF
dimensional matrix.
- building the vocabulary using the descriptors of the training image
tranining data contains all the 128-column vector of points of all the
images. We construct clustering of vocab_size centers on these data
points. A is a vector (of the size of all the points) and each entry in
the vector denotes which of the (1 to vocab_size) cluster the i-th point
belongs. For example the first image has 683 SIFT features. Then there
are 683 descriptors starting from the first index of training_data to 683
-th index of the training_data. So each of the first 683 entry will
contain a number in between 1 to vocab_size that will denote which of the
cluster these descriptors belong. The our goal is to construct the
HISTOGRAM OF VOCABULARY WORDS (h_vv). For each image we will have h_vw
which is a vocab_size long histogram. Value of each index denotes the frequency of
that cluster's presence in the current image.
- compute the histogram of visual words for each test image
for the test image we constuct the HISTOGRAM OF VISUAL WORDS in a
different manner. We already has the cluster centers by building our
vocabulary. Each center is vocab_size dimensional point denotes the
center of cluster. For a test image if we run SIFT we will get a set of
SIFT features along with its descriptors. For each of the descriptors we
will calculate the distance to each of the centers. We wil remember the
center which is closest to the current descriptor. In this process we
also count the frequency of each center's presence in the test image.
Completing the process will give us a histogram of frequency of cluster
centers for the test image.
Sample implementation can be found
here.
No comments:
Post a Comment