Back
Program supports exceptional graduate students working on innovative research in computer science and related fields
Tübingen – Vivian Nastl and Ricardo Dominguez-Olmedo, both Ph.D. students at the Max Planck Institute for Intelligent Systems in Tübingen, were announced as recipients of this year’s Google Ph.D. Fellowship.
These fellowships recognize outstanding graduate students who are conducting exceptional and innovative research in computer science and related fields, specifically focusing on candidates who seek to influence the future of technology. The program provides vital direct financial support for their PhD pursuits and connects each Fellow with a dedicated Google Research Mentor, reinforcing our commitment to nurturing the academic community. We are excited to welcome this global cohort and look forward to partnering with them as they continue to become leaders in their respective areas.
See the complete list of Google PhD Fellowship recipients for 2025. Ricardo can be found in the “Machine Learning and ML Foundations” section, while Vivian is listed in the “Human-Computer Interaction” category.
Vivian is supervised by Moritz Hardt, who leads the Social Foundations of Computation Department at MPI-IS, as well as by Nicolai Meinshausen and Peter Bühlmann, who are both Professors of Statistics at ETH Zurich.
Vivian takes part in the Max Planck ETH Center for Learning Systems (CLS) Ph.D. doctoral program, a joint academic program between ETH Zurich and the Max Planck Society. With a background in financial mathematics, Vivian studies statistical methods for applied machine learning, with a focus on causal inference and evaluation.
Vivian’s work showcases a deep understanding of the theory of causality and its practical applications. In her first-author work published at NeurIPS 2024, “Do causal predictors generalize better to new domains?”, Vivian studied a recent hypothesis stating that causal features improve domain generalization. Vivian showed that, across many models, datasets and domains, models trained on all features (regardless of causal relationships) generalize better on new domains than models trained on only causal features. Her work blends cutting edge techniques such as causal discovery algorithms and state-of-the-art deep learning models in a principled and extensive experimental design.
In her recent work, Limits to scalable evaluation at the frontier: LLM as Judge won’t beat twice the data, Vivian studies the statistical limits and promises of annotations – increasingly a bottleneck in the evaluation of large language models. Researchers try to use models as judges (LLM-as-judge) to evaluate other models. Unfortunately, LLM judges have numerous biases that limit their success in practice. Recent methods promise to debias model evaluations from few human ground truth evaluations. In a remarkable result with Florian Dorner, Vivian showed that any such method can never be better than using twice as many human evaluations. Awarded an oral presentation at ICLR 2025, the result is as surprising as it is timely.
In another paper labelled “Causal Inference from Competing Treatments”, Vivian and her collaborator Ana Stoica study a common, yet overlooked issue when applying causal inference on digital platforms. At any point in time, multiple experimenters will be working with the same candidate pool. An example is that of a group of advertisers trying to estimate the effectiveness of their campaigns. Due to the competition between ads (i.e., treatments) on screen, the treatment choices of one experimenter compete with those of the others. Ana and Vivian characterize the optimal causal inference strategy at equilibrium when experimenters act strategically. The work addresses an important problem with experimentation on digital platforms, relevant to any team working on A/B testing for online services. This work weaves tools from economics and causality, bringing game theoretical concepts into the standard statistical inference toolkit.
Vivian grew up near Stuttgart and studied financial mathematics at the University of Konstanz. During her Master's, she shifted her focus to statistics, benefiting from the interdisciplinary course of study in mathematics and economics, and she also earned a second Bachelor's degree in mathematics.
Ricardo Dominguez-Olmedo is the person on the right
Meanwhile, Ricardo is a Ph.D. student both in the Social Foundations of Computation and the Empirical Inference Department at the Max Planck Institute for Intelligent Systems, working with Moritz Hardt and Bernhard Schölkopf. His research focuses broadly on language models.
In one of his most recent papers published at the Thirteenth International Conference on Learning Representations (ICLR 2025), Ricardo studies a fundamental problem in the evaluation of large language models that he and his team call “training on the test task”. Unlike wrongful practices like training on the test data, leakage, or data contamination, training on the test task is not a malpractice. Rather, the term describes a growing set of practices that utilize knowledge about evaluation tasks at training time. Ricardo, Florian Dorner and Moritz Hardt demonstrate that training on the test task confounds both relative model evaluations and claims about emergent capabilities. They argue that the seeming superiority of one model family over another may be explained by a different degree of training on the test task. To this end, Ricardo proposes an effective method to adjust for the effect of training on the test task on benchmark evaluations. Put simply, to fine-tune each model under comparison on the same task-relevant data prior to evaluation. They then show that instances of emergent behavior disappear gradually as models train on the test task. The work promotes a new perspective on the evaluation of large language models, with broad implications for benchmarking and the study of emergent capabilities.
More information