My reasearch focuses on exploring and analyzing the underlying causes behind human movement. My belief is that language and movement are entwined. Hence, I aim towards understanding the emotions, goals, and plans of people in multimodal environments. To address this, I am combining language with human movement. I am supervised by MPI for Intelligent Systems Director Michael Black.
Doctor of Philosophy (Ph.D.) (September 2019 - now)
Much of the field has focused on estimating 2D joints, 3D joints, or the skeleton of the body. We focus on estimating the full 3D shape and pose. This is crucial for reasoning about interactions. Having the ability to do so from RGB images enables markerless motion capture and provides the foundation for h...
In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages: 5252-5262, IEEE, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2020, June 2020 (inproceedings)
Human motion is fundamental to understanding behavior. Despite progress on single-image 3D pose and shape estimation, existing video-based state-of-the-art methodsfail to produce accurate and natural motion sequences due to a lack of ground-truth 3D motion data for training. To address this problem, we propose “Video Inference for Body Pose and Shape Estimation” (VIBE), which makes use of an existing large-scale motion capture dataset (AMASS) together with unpaired, in-the-wild, 2D keypoint annotations. Our key novelty is an adversarial learning framework that leverages AMASS to discriminate between real human motions and those produced by our temporal pose and shape
regression networks. We define a temporal network architecture and show that adversarial training, at the sequence level, produces kinematically plausible motion sequences without in-the-wild ground-truth 3D labels. We perform extensive experimentation to analyze the importance of motion and demonstrate the effectiveness of VIBE on challenging 3D pose estimation datasets, achieving state-of-the-art performance. Code and pretrained models are available at https://github.com/mkocabas/VIBE
In Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)), June 2019 (inproceedings)
In traditional Distributional Semantic Models (DSMs) the multiple senses of a polysemous word are conflated into a single vector space representation. In this work, we propose a DSM that learns multiple distributional representations of a word based on different topics. First, a separate DSM is trained for each topic and then each of the topic-based DSMs is aligned to a common vector space. Our unsupervised mapping approach is motivated by the hypothesis that words preserving their relative distances in different topic semantic sub-spaces constitute robust semantic anchors that define the mappings between them. Aligned cross-topic representations achieve state-of-the-art results for the task of contextual word similarity. Furthermore, evaluation on NLP downstream tasks shows that multiple topic-based embeddings outperform single-prototype models.
In International Conference on Computational Linguistics (COLING) , August 2018 (inproceedings)
Neural activation models that have been proposed in the literature use a set of example words for which fMRI measurements are available in order to find a mapping between word semantics and localized neural activations. Successful mappings let us expand to the full lexicon of concrete nouns using the assumption that similarity of meaning implies similar neural activation patterns. In this paper, we propose a computational model that estimates semantic similarity in the neural activation space and investigates the relative performance of this model for various natural language processing tasks. Despite the simplicity of the proposed model and the very small number of example words used to bootstrap it, the neural activation semantic model performs surprisingly well compared to state-of-the-art word embeddings. Specifically, the neural activation semantic model performs better than the state-of-the-art for the task of semantic similarity estimation between very similar or very dissimilar words, while performing well on other tasks such as entailment and word categorization. These are strong indications that neural activation semantic models can not only shed some light into human cognition but also contribute to computation models for certain tasks.
Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems