Human Pose as Context for Object Detection

Perceiving Systems

Abhilash Srikantha

Perceiving Systems

Jürgen Gall

Detecting small objects in images is a challenging problem particularly when they are often occluded by hands or other body parts. Recently, joint modelling of human pose and objects has been proposed to improve both pose estimation as well as object detection. These approaches, however, focus on explicit interaction with an object and lack the flexibility to combine both modalities when interaction is not obvious. We therefore propose to use human pose as an additional context information for object detection. To this end, we represent an object category by a tree model and train regression forests that localize parts of an object for each modality separately. Predictions of the two modalities are then combined to detect the bounding box of the object. We evaluate our approach on three challenging datasets which vary in the amount of object interactions and the quality of automatically extracted human poses.

Author(s):	Abhilash Srikantha and Juergen Gall
Book Title:	British Machine Vision Conference
Year:	2015
Month:	September

Project(s):
BibTeX Type:	Conference Paper (conference)

Event Name:	British Machine Vision Conference
Event Place:	Swansea, United Kingdom

Electronic Archiving:	grant_archive
Attachments:	pdf abstract

BibTeX

@conference{Srik:BMVC:2015,
  title = {Human Pose as Context for Object Detection},
  booktitle = {British Machine Vision Conference},
  abstract = {Detecting small objects in images is a challenging problem particularly when they are often occluded by hands or other body parts. 
  Recently, joint modelling of human pose and objects has been proposed to improve both pose estimation as well as object detection. 
  These approaches, however, focus on explicit interaction with an object and lack the flexibility to combine both modalities when interaction is not obvious. 
  We therefore propose to use human pose as an additional context information for object detection. 
  To this end, we represent an object category by a tree model and train regression forests that localize parts of an object for each modality separately. 
  Predictions of the two modalities are then combined to detect the bounding box of the object. 
  We evaluate our approach on three challenging datasets which vary in the amount of object interactions and the quality of automatically extracted human poses.                },
  month = sep,
  year = {2015},
  author = {Srikantha, Abhilash and Gall, Juergen},
  month_numeric = {9}
}