This paper studies image alignment, the problem of learning a shape and appearance model from labeled data and efficiently fitting the model to a non-rigid object with large variations.
Given a set of images with manually labeled landmarks, our model representation consists of a shape component represented by a point distribution model and an appearance component represented by a collection of local features, trained discriminatively as a two-class classifier using boosting. Images with ground truth landmarks are the positive training samples while those with perturbed landmarks are considered as negatives. Enabled by piece-wise affine warping, corresponding local feature positions across all training samples form a hypothesis space for boosting. Image alignment is performed by maximizing the boosted classifier score, which is our distance measure, through iteratively mapping the feature positions to the image, and computing the gradient direction of the score with respect to the shape parameter. The authors apply this approach to human body alignment from surveillance-type images. They conduct experiments on the MIT pedestrian database where the body size is approximately 110 times 46 pixels, and demonstrate their real-time alignment capability. (Published abstract provided)
810 Seventh Street NW, Washington, DC 20531, United States
Appears in 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, 2008, pp. 1-8