Part-Aligned Bilinear Representations for Person Re-identification

Perceiving Systems

Siyu Tang

Guest Scientist

Comparing the appearance of corresponding body parts is essential for person re-identification. However, body parts are frequently misaligned be- tween detected boxes, due to the detection errors and the pose/viewpoint changes. In this paper, we propose a network that learns a part-aligned representation for person re-identification. Our model consists of a two-stream network, which gen- erates appearance and body part feature maps respectively, and a bilinear-pooling layer that fuses two feature maps to an image descriptor. We show that it results in a compact descriptor, where the inner product between two image descriptors is equivalent to an aggregation of the local appearance similarities of the cor- responding body parts, and thereby significantly reduces the part misalignment problem. Our approach is advantageous over other pose-guided representations by learning part descriptors optimal for person re-identification. Training the net- work does not require any part annotation on the person re-identification dataset. Instead, we simply initialize the part sub-stream using a pre-trained sub-network of an existing pose estimation network and train the whole network to minimize the re-identification loss. We validate the effectiveness of our approach by demon- strating its superiority over the state-of-the-art methods on the standard bench- mark datasets including Market-1501, CUHK03, CUHK01 and DukeMTMC, and standard video dataset MARS.

Author(s):	Yumin Suh and Jingdong Wang and Siyu Tang and Tao Mei and Kyoung Mu Lee
Book Title:	European Conference on Computer Vision (ECCV)
Volume:	11218
Pages:	418--437
Year:	2018
Month:	September
Publisher:	Springer, Cham

Project(s):
BibTeX Type:	Conference Paper (inproceedings)

DOI:	https://doi.org/10.1007/978-3-030-01264-9_25
Event Place:	Munich, Germany

Electronic Archiving:	grant_archive
Attachments:	pdf supplementary

BibTeX

@inproceedings{personreid:eccv:2018,
  title = {Part-Aligned Bilinear Representations for Person Re-identification},
  booktitle = {European Conference on Computer Vision (ECCV)},
  abstract = {Comparing the appearance of corresponding body parts is essential for person re-identification. However, body parts are frequently misaligned be- tween detected boxes, due to the detection errors and the pose/viewpoint changes. In this paper, we propose a network that learns a part-aligned representation for person re-identification. Our model consists of a two-stream network, which gen- erates appearance and body part feature maps respectively, and a bilinear-pooling layer that fuses two feature maps to an image descriptor. We show that it results in a compact descriptor, where the inner product between two image descriptors is equivalent to an aggregation of the local appearance similarities of the cor- responding body parts, and thereby significantly reduces the part misalignment problem. Our approach is advantageous over other pose-guided representations by learning part descriptors optimal for person re-identification. Training the net- work does not require any part annotation on the person re-identification dataset. Instead, we simply initialize the part sub-stream using a pre-trained sub-network of an existing pose estimation network and train the whole network to minimize the re-identification loss. We validate the effectiveness of our approach by demon- strating its superiority over the state-of-the-art methods on the standard bench- mark datasets including Market-1501, CUHK03, CUHK01 and DukeMTMC, and standard video dataset MARS.},
  volume = {11218},
  pages = {418--437},
  publisher = {Springer, Cham},
  month = sep,
  year = {2018},
  author = {Suh, Yumin and Wang, Jingdong and Tang, Siyu and Mei, Tao and Lee, Kyoung Mu},
  doi = {https://doi.org/10.1007/978-3-030-01264-9_25},
  month_numeric = {9}
}