We tested the Double Probability Model with image classification. Following the general trend, we applied the BoW (Bag-of-Words) model (Fei-Fei
et al.,
2007; Chatfield
et al.,
2011; Lazebnik
et al.,
2006) for the mathematical representation of the images and we used SVM (Support Vector Machine) (Boser
et al.,
1992; Cortes and Vapnik,
1995; Chatfield
et al.,
2011) for classifier. We should note that the DPM can be used with any classification process, as long as it provides probability values for each possible category.
The key idea behind the BoW model is to represent an image (based on its visual content) with so-called visual code words while ignoring their spatial distribution. This technique consists of three steps: (i) feature detection, (ii) feature description, (iii) image description as usual phases in computer vision. For feature detection we used the Harris-Laplace corner detector (Harris and Stephens,
1988; Mikolajczyk and Schmid,
2004), and SIFT (Scale Invariant Feature Transform) (Lowe,
2004) to describe them. Note that we used the default parameterization of SIFT proposed by Lowe; therefore the descriptor vectors had 128 dimensions. To define the visual code words from the descriptor vectors, we used GMM (Gaussian Mixture Model) (Reynolds,
2009; Tomasi,
2004), which is a parametric probability density function represented as a weighted sum of (in this case 256) Gaussian component densities; as can be seen in Eq. (
11).
where
${\omega _{j}}$,
${\mu _{j}}$ and
${o_{j}}$ denote the weight, expected value and the variance of the
jth Gaussian component respectively, furthermore
$K=256$. We calculated the
λ parameter with ML (Maximum Likelihood) estimation by using the iterative EM (Expectation Maximization) algorithm (Dempster
et al.,
1977; Tomasi,
2004). We performed K-means clustering (MacQueen,
1967) over all the descriptors with 256 clusters to get the initial parameter model for the EM. The next step was to create a descriptor that specifies the distribution of the visual code words in an image, called high-level descriptor. To represent an image with high-level descriptor, the GMM based Fisher vector (see Eq. (
12)) was calculated (Perronnin and Dance,
2007; Reynolds,
2009). These vectors were the final representations (image descriptor) of the images.
where
$\log p(X\mid \lambda )$ is the probability density function introduced in Eq. (
11),
X denotes the SIFT descriptors of an image and
λ represents the parameter of GMM (
$\lambda =\{{\omega _{j}}{\mu _{j}}{o_{j}}|j=1\dots K\}$).
For the classification subtask we used a variation of SVM, the C-SVC (C-support vector classification) (Boser
et al.,
1992; Cortes and Vapnik,
1995) with RBF (Radial Basis Function) kernel. The one-against-all technique was applied to extend the SVM for multi-class classification. We used Platt’s (Platt,
2000) approach as probability estimator, which is included in LIBSVM (A Library for Support Vector Machines) (Chang and Lin,
2011; Huang
et al.,
2006). At this point we can decide whether to use the Double Probability Model for filtering out the test samples that possibly came from a previously unseen category, or keep the original predictions of the classifier (SVM). The CDF and reverse CDF (Eqs. (
2) and (
3)) can be calculated based on the class membership probabilities (in a validation set).