Part-Aligned Bilinear Representations for Person Re-identification
Comparing the appearance of corresponding body parts is essential for person re-identification. However, body parts are frequently misaligned be- tween detected boxes, due to the detection errors and the pose/viewpoint changes. In this paper, we propose a network that learns a part-aligned representation for person re-identification. Our model consists of a two-stream network, which gen- erates appearance and body part feature maps respectively, and a bilinear-pooling layer that fuses two feature maps to an image descriptor. We show that it results in a compact descriptor, where the inner product between two image descriptors is equivalent to an aggregation of the local appearance similarities of the cor- responding body parts, and thereby significantly reduces the part misalignment problem. Our approach is advantageous over other pose-guided representations by learning part descriptors optimal for person re-identification. Training the net- work does not require any part annotation on the person re-identification dataset. Instead, we simply initialize the part sub-stream using a pre-trained sub-network of an existing pose estimation network and train the whole network to minimize the re-identification loss. We validate the effectiveness of our approach by demon- strating its superiority over the state-of-the-art methods on the standard bench- mark datasets including Market-1501, CUHK03, CUHK01 and DukeMTMC, and standard video dataset MARS.
| Author(s): | Yumin Suh and Jingdong Wang and Siyu Tang and Tao Mei and Kyoung Mu Lee |
| Book Title: | European Conference on Computer Vision (ECCV) |
| Volume: | 11218 |
| Pages: | 418--437 |
| Year: | 2018 |
| Month: | September |
| Publisher: | Springer, Cham |
| Project(s): |
|
| BibTeX Type: | Conference Paper (inproceedings) |
| DOI: | https://doi.org/10.1007/978-3-030-01264-9_25 |
| Event Place: | Munich, Germany |
| Electronic Archiving: | grant_archive |
| Attachments: | |
BibTeX
@inproceedings{personreid:eccv:2018,
title = {Part-Aligned Bilinear Representations for Person Re-identification},
booktitle = {European Conference on Computer Vision (ECCV)},
abstract = {Comparing the appearance of corresponding body parts is essential for person re-identification. However, body parts are frequently misaligned be- tween detected boxes, due to the detection errors and the pose/viewpoint changes. In this paper, we propose a network that learns a part-aligned representation for person re-identification. Our model consists of a two-stream network, which gen- erates appearance and body part feature maps respectively, and a bilinear-pooling layer that fuses two feature maps to an image descriptor. We show that it results in a compact descriptor, where the inner product between two image descriptors is equivalent to an aggregation of the local appearance similarities of the cor- responding body parts, and thereby significantly reduces the part misalignment problem. Our approach is advantageous over other pose-guided representations by learning part descriptors optimal for person re-identification. Training the net- work does not require any part annotation on the person re-identification dataset. Instead, we simply initialize the part sub-stream using a pre-trained sub-network of an existing pose estimation network and train the whole network to minimize the re-identification loss. We validate the effectiveness of our approach by demon- strating its superiority over the state-of-the-art methods on the standard bench- mark datasets including Market-1501, CUHK03, CUHK01 and DukeMTMC, and standard video dataset MARS.},
volume = {11218},
pages = {418--437},
publisher = {Springer, Cham},
month = sep,
year = {2018},
author = {Suh, Yumin and Wang, Jingdong and Tang, Siyu and Mei, Tao and Lee, Kyoung Mu},
doi = {https://doi.org/10.1007/978-3-030-01264-9_25},
month_numeric = {9}
}