Towards Metrical Reconstruction of Human Faces

pdf project video code

Neural Capture and Synthesis

Wojciech Zielonka

Doctoral Researcher

Perceiving Systems

Timo Bolkart

Research Scientist

Neural Capture and Synthesis, Perceiving Systems

Justus Thies

Max Planck Research Group Leader

Face reconstruction and tracking is a building block of numerous applications in AR/VR, human-machine interaction, as well as medical applications. Most of these applications rely on a metrically correct prediction of the shape, especially, when the reconstructed subject is put into a metrical context (i.e., when there is a reference object of known size). A metrical reconstruction is also needed for any application that measures distances and dimensions of the subject (e.g., to virtually fit a glasses frame). State-of-the-art methods for face reconstruction from a single image are trained on large 2D image datasets in a self-supervised fashion. However, due to the nature of a perspective projection they are not able to reconstruct the actual face dimensions, and even predicting the average human face outperforms some of these methods in a metrical sense. To learn the actual shape of a face, we argue for a supervised training scheme. Since there exists no large-scale 3D dataset for this task, we annotated and unified small- and medium-scale databases. The resulting unified dataset is still a medium-scale dataset with more than 2k identities and training purely on it would lead to overfitting. To this end, we take advantage of a face recognition network pretrained on a large-scale 2D image dataset, which provides distinct features for different faces and is robust to expression, illumination, and camera changes. Using these features, we train our face shape estimator in a supervised fashion, inheriting the robustness and generalization of the face recognition network. Our method, which we call MICA (MetrIC fAce), outperforms the state-of-the-art reconstruction methods by a large margin, both on current non-metric benchmarks as well as on our metric benchmarks (15\%\/ and 24\%\/ lower average error on NoW, respectively). Project website: \url{https://zielon.github.io/mica/}.

Author(s):	Zielonka, Wojciech and Bolkart, Timo and Thies, Justus
Links:	pdf project video code
Book Title:	Computer Vision – ECCV 2022
Volume:	13
Pages:	250--269
Year:	2022
Month:	October

Series:	Lecture Notes in Computer Science, 13673
Editors:	Avidan, Shai and Brostow, Gabriel and Cissé, Moustapha and Farinella, Giovanni Maria and Hassner, Tal
Publisher:	Springer

BibTeX Type:	Conference Paper (inproceedings)

Address:	Cham
DOI:	10.1007/978-3-031-19778-9_15
Event Name:	17th European Conference on Computer Vision (ECCV 2022)
Event Place:	Tel Aviv, Israel
State:	Published
URL:	https://zielon.github.io/mica/

Electronic Archiving:	grant_archive
ISBN:	978-3-031-19777-2

BibTeX

@inproceedings{MICA:ECCV2022,
  title = {Towards Metrical Reconstruction of Human Faces},
  booktitle = {Computer Vision – ECCV 2022},
  abstract = {Face reconstruction and tracking is a building block of numerous applications in AR/VR, human-machine interaction, as well as medical applications. Most of these applications rely on a metrically correct prediction of the shape, especially, when the reconstructed subject is put into a metrical context (i.e., when there is a reference object of known size). A metrical reconstruction is also needed for any application that measures distances and dimensions of the subject (e.g., to virtually fit a glasses frame). State-of-the-art methods for face reconstruction from a single image are trained on large 2D image datasets in a self-supervised fashion. However, due to the nature of a perspective projection they are not able to reconstruct the actual face dimensions, and even predicting the average human face outperforms some of these methods in a metrical sense. To learn the actual shape of a face, we argue for a supervised training scheme. Since there exists no large-scale 3D dataset for this task, we annotated and unified small- and medium-scale databases. The resulting unified dataset is still a medium-scale dataset with more than 2k identities and training purely on it would lead to overfitting. To this end, we take advantage of a face recognition network pretrained on a large-scale 2D image dataset, which
  provides distinct features for different faces and is robust to expression, illumination, and camera changes. Using these features, we train our face shape estimator in a supervised fashion, inheriting the robustness and generalization of the face recognition network. Our method, which we call MICA (MetrIC fAce), outperforms the state-of-the-art reconstruction methods by a large margin, both on current non-metric benchmarks as well as on our metric benchmarks (15\%\/ and 24\%\/ lower average error on NoW, respectively). Project website: \url{https://zielon.github.io/mica/}.},
  volume = {13},
  pages = {250--269},
  series = {Lecture Notes in Computer Science, 13673},
  editors = {Avidan, Shai and Brostow, Gabriel and Cissé, Moustapha and Farinella, Giovanni Maria and Hassner, Tal},
  publisher = {Springer},
  address = {Cham},
  month = oct,
  year = {2022},
  author = {Zielonka, Wojciech and Bolkart, Timo and Thies, Justus},
  doi = {10.1007/978-3-031-19778-9_15},
  url = {https://zielon.github.io/mica/},
  month_numeric = {10}
}