Efficiency of Large-Scale Machine Learning Systems

Institute Homepage

Institute Homepage Sign In

Back

Research Overview

Safety and Security of Large-Scale Machine Learning Systems

Efficiency of Large-Scale Machine Learning Systems

Safety- and Efficiency- aligned Learning

Efficiency of Large-Scale Machine Learning Systems

Screenshot from 2025 01 31 15 39 07 — Benchmark scores of a proof-of-concept LLM that is improving in test-time performance by "thinking in latent space" through repeated iteration of a central learned operator.

As a consequence of the findings of scaling laws for large-scale unsupervised pretraining of large language models, the size of large language models and other state-of-the art machine learning models has grown expontentially. This growth has led to a wealth of new applications and downstream usability beyond expectations, but also come with a number of challenges.

The first challenge with these models is the engineering of accelerators, which poses a number of scientific challenges regarding how machine learning algorithms should value computation, computation and parallelization. In a recent collaboration with the University of Maryland's DoE INCITE allocation, we have investigated these questions in "Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers" on the macro-scale [], but also on the micro-scale in "Efficiently Dispatching Flash Attention For Partially Filled Attention Masks" [].

Aside from operationalizing existing algorithms at scale, it is unclear in a number of domains how to best scale algorithms. For example, how can object detection systems or video captioning systems be trained with similarly favorable scaling properties to language models? Over the last year, we propose new algorithms for both domains in [] and [].

Also, there a number of 2nd-order effects that arise from large-scale training that we are researching to mitigate. Generative models often succeed at generating novel and relevant responses, but in some cases, in both vision and language, repeat input text or input styles verbatim. Our recent work has addressed measuring style similarity in generative models [] to audit these concerns more clearly, and has addressed mitigations for text repetition through modifications of the learning process [].

Finally, the question for the future is not necessarily how to scale existing systems further, but how to measure progress [] and how to endow scaled systems with new capabilities and mechanisms of access and manipulation [].