Towards Metrical Reconstruction of Human Faces
pdf project video code
Face reconstruction and tracking is a building block of numerous applications in AR/VR, human-machine interaction, as well as medical applications. Most of these applications rely on a metrically correct prediction of the shape, especially, when the reconstructed subject is put into a metrical context (i.e., when there is a reference object of known size). A metrical reconstruction is also needed for any application that measures distances and dimensions of the subject (e.g., to virtually fit a glasses frame). State-of-the-art methods for face reconstruction from a single image are trained on large 2D image datasets in a self-supervised fashion. However, due to the nature of a perspective projection they are not able to reconstruct the actual face dimensions, and even predicting the average human face outperforms some of these methods in a metrical sense. To learn the actual shape of a face, we argue for a supervised training scheme. Since there exists no large-scale 3D dataset for this task, we annotated and unified small- and medium-scale databases. The resulting unified dataset is still a medium-scale dataset with more than 2k identities and training purely on it would lead to overfitting. To this end, we take advantage of a face recognition network pretrained on a large-scale 2D image dataset, which provides distinct features for different faces and is robust to expression, illumination, and camera changes. Using these features, we train our face shape estimator in a supervised fashion, inheriting the robustness and generalization of the face recognition network. Our method, which we call MICA (MetrIC fAce), outperforms the state-of-the-art reconstruction methods by a large margin, both on current non-metric benchmarks as well as on our metric benchmarks (15\%\/ and 24\%\/ lower average error on NoW, respectively). Project website: \url{https://zielon.github.io/mica/}.
| Author(s): | Zielonka, Wojciech and Bolkart, Timo and Thies, Justus |
| Links: | |
| Book Title: | Computer Vision – ECCV 2022 |
| Volume: | 13 |
| Pages: | 250--269 |
| Year: | 2022 |
| Month: | October |
| Series: | Lecture Notes in Computer Science, 13673 |
| Editors: | Avidan, Shai and Brostow, Gabriel and Cissé, Moustapha and Farinella, Giovanni Maria and Hassner, Tal |
| Publisher: | Springer |
| BibTeX Type: | Conference Paper (inproceedings) |
| Address: | Cham |
| DOI: | 10.1007/978-3-031-19778-9_15 |
| Event Name: | 17th European Conference on Computer Vision (ECCV 2022) |
| Event Place: | Tel Aviv, Israel |
| State: | Published |
| URL: | https://zielon.github.io/mica/ |
| Electronic Archiving: | grant_archive |
| ISBN: | 978-3-031-19777-2 |
BibTeX
@inproceedings{MICA:ECCV2022,
title = {Towards Metrical Reconstruction of Human Faces},
booktitle = {Computer Vision – ECCV 2022},
abstract = {Face reconstruction and tracking is a building block of numerous applications in AR/VR, human-machine interaction, as well as medical applications. Most of these applications rely on a metrically correct prediction of the shape, especially, when the reconstructed subject is put into a metrical context (i.e., when there is a reference object of known size). A metrical reconstruction is also needed for any application that measures distances and dimensions of the subject (e.g., to virtually fit a glasses frame). State-of-the-art methods for face reconstruction from a single image are trained on large 2D image datasets in a self-supervised fashion. However, due to the nature of a perspective projection they are not able to reconstruct the actual face dimensions, and even predicting the average human face outperforms some of these methods in a metrical sense. To learn the actual shape of a face, we argue for a supervised training scheme. Since there exists no large-scale 3D dataset for this task, we annotated and unified small- and medium-scale databases. The resulting unified dataset is still a medium-scale dataset with more than 2k identities and training purely on it would lead to overfitting. To this end, we take advantage of a face recognition network pretrained on a large-scale 2D image dataset, which
provides distinct features for different faces and is robust to expression, illumination, and camera changes. Using these features, we train our face shape estimator in a supervised fashion, inheriting the robustness and generalization of the face recognition network. Our method, which we call MICA (MetrIC fAce), outperforms the state-of-the-art reconstruction methods by a large margin, both on current non-metric benchmarks as well as on our metric benchmarks (15\%\/ and 24\%\/ lower average error on NoW, respectively). Project website: \url{https://zielon.github.io/mica/}.},
volume = {13},
pages = {250--269},
series = {Lecture Notes in Computer Science, 13673},
editors = {Avidan, Shai and Brostow, Gabriel and Cissé, Moustapha and Farinella, Giovanni Maria and Hassner, Tal},
publisher = {Springer},
address = {Cham},
month = oct,
year = {2022},
author = {Zielonka, Wojciech and Bolkart, Timo and Thies, Justus},
doi = {10.1007/978-3-031-19778-9_15},
url = {https://zielon.github.io/mica/},
month_numeric = {10}
}