Skip to the content.

Xingyou (Richard) Song1,,*, Yash Akhauri2,3,,*, Jiyoun (Jen) Ha4,5,*, Bryan Lewandowski4,*,
David Smalling1, Jason Lowe-Power4, Jonathan Citrin1, David Lo4, Rami Cohen4, Julian Walker1, Lai Wei4, Subhashini Venugopalan2, Mohamed Abdelfattah3, Cheng-Hsi Lin4, Bartłomiej Wróblewski1, Suvinay Subramanian4, Daiyi Peng1,
Denny Zhou1, Ed Chi1, Quoc Le1, Jeff Dean1, Pushmeet Kohli1

1Google DeepMind      2Google Research      3Cornell University      4Google      5Stanford University

Equal Lead.      *Core Independent Contributor.

Intro

Given an observation of a complex system, what number(s) will it produce? QIM Introduction

Historically, entire fields have resorted to traditional tabular regression which represents all information as tables, or precisely, normalized fixed-dimensional vectors. But the world isn’t a table. Tabular methods can’t be applied to data possessing arbitrary sequence lengths, such as code, logs, or free-form text.

We instead represent numeric prediction as a sequence-to-sequence problem.

Method Overview

A compact encoder-decoder converts, or transduces, from the space of all observations into another: the space of all real numbers.

Method Preview

By:

At inference, decoding numbers allows us to perform intuitive, or inductive reasoning about the world.

Computational Approximation and Density Estimation

Applications

Across 10 different high-impact scientific and industrial problems spanning experimental design, code execution, healthcare, and physics, each application achieves at least one of:

  1. A new predictive capability not previously demonstrated.
  2. Outperforms SoTA without domain-specific architecture or feature engineering.
  3. Near-perfect simulation with at orders of magnitude lower cost.
  4. Unified data scaling: Massive transfer-learning across different tasks.

Predicting ML Experiments from Code

Kaggle Experiment Scores

Hyperparameter Optimization Reduction

Up to 100x fewer experiments needed

Simplifying Neural Architecture Search

Zero expertise needed, achieve 48% against SoTA

GPU Kernel Optimization

16-100x fewer trials needed

Static Analysis for Memory

24+ different languages covered

CPU Microarchitecture Simulation

Explore $10^{20}$ hardware configurations quickly

TPU/LLM Pareto Frontier Generation

Latency + throughput tradeoffs for TPU/LLM co-design

Data Center Efficiency

Prediction from raw telemetry logs

Nuclear Fusion Surrogates

Novel inputs from raw code and configs

Cancer Survival Prediction

Combine 9+ modalities into one model

Application: ML Experiment Prediction from Code
Application: Hyperparameter Optimization
Application: Neural Architecture Search
Application: GPU Kernel Optimization
Application: Static Analysis
Application: CPU Microarchitecture Simulation
Application: Pareto Frontier Prediction
Application: Data Center Efficiency
Application: Nuclear Fusion Surrogates
Application: Cancer Survival Prediction
× Full-size figure

Code Availability

Code can be found in the open-source package (github.com/google-deepmind/regress-lm). The default model trains on a single H100 GPU with inputs of up to 32K tokens, and can be further made to run on consumer hardware by using single-layer encoders and decoders.

We provide the following Colabs and pretrained checkpoints for flagship result demos:

Pretraining data sources are listed in the paper.

Acknowledgements

We thank Yutian Chen, Chen Sun, Vinh Tran, Alexander Rush, Michael Brenner, Dara Bahri, Yifeng Lu, Jonathan Lai, and Zhiyu Wei for early feedback, reviewing, and support of the manuscript.

We further thank Chen Liang, Oscar Li, Fred Zhang, Xuezhi Wang, Erik Lin, Esteban Real, Bangding (Jeffrey) Yang, Jarrod Kahn, Yiding Jiang, Samuel Sokota, Yan (Bill) Huang, Victor Reis, Phitchaya Mangpo Phothilimthana, Jörg Bornschein, Tejas Karkhanis, Amir Yazdan Bakhsh, Sami Abu-El-Haija, Erik Lin, Tung Nguyen, Eric Tang, Arissa Wongpanich, Shane Gu, Yingjie Miao, Qiuyi Zhang, Uri Alon, Shao-Hua Sun, Kuang-Huei Lee, Adrian N. Reyes, Zi Wang, Xinyun Chen, Aviral Kumar, Ke Xue, Rong-Xi Tan, Chansoo Lee, Michal Lukasik, Sagi Perel, and Daniel Golovin for relevant discussions.

We finally thank Parthasarathy Ranganathan, Amin Vahdat, Craig Donner, Martin Dixon, Shibl Mourad, Zoubin Ghahramani, and Benoit Schillings for support.

Citation

If you find this work useful, please cite:

@article{todo, title={TODO}, author={TODO}, journal={TODO}, year={TODO} }



Disclaimer: This is not an officially supported Google product.