Simulating 500 million years of evolution with a language modelMore than 3 billion years of evolution have produced an image of biology encoded into the space of natural proteins. Here, we show that language models trained at scale on evolutionary data can generate functional proteins that are far away from known proteins. We present ESM3, a frontier multimodal generative language model that reasons over the sequence, structure, and function of proteins. ESM3 can follow complex prompts combining its modalities and is highly responsive to alignment to improve its fidelity. We have prompted ESM3 to generate fluorescent proteins. Among the generations that we synthesized, we found a bright fluorescent protein at a far distance (58% sequence identity) from known fluorescent proteins, which we estimate is equivalent to simulating 500 million years of evolution.
Simulating 500 million years of evolution with a language modelThomas Hayes, Roshan Rao, Halil Akin et al.|bioRxiv (Cold Spring Harbor Laboratory)|2024 Abstract More than three billion years of evolution have produced an image of biology encoded into the space of natural proteins. Here we show that language models trained on tokens generated by evolution can act as evolutionary simulators to generate functional proteins that are far away from known proteins. We present ESM3, a frontier multimodal generative language model that reasons over the sequence, structure, and function of proteins. ESM3 can follow complex prompts combining its modalities and is highly responsive to biological alignment. We have prompted ESM3 to generate fluorescent proteins with a chain of thought. Among the generations that we synthesized, we found a bright fluorescent protein at far distance (58% identity) from known fluorescent proteins. Similarly distant natural fluorescent proteins are separated by over five hundred million years of evolution.
Inducing structure in reward learning by learning featuresAndreea Bobu, Marius Wiggert, Claire J. Tomlin et al.|The International Journal of Robotics Research|2022 Reward learning enables robots to learn adaptable behaviors from human input. Traditional methods model the reward as a linear function of hand-crafted features, but that requires specifying all the relevant features a priori, which is impossible for real-world tasks. To get around this issue, recent deep Inverse Reinforcement Learning (IRL) methods learn rewards directly from the raw state but this is challenging because the robot has to implicitly learn the features that are important and how to combine them, simultaneously. Instead, we propose a divide-and-conquer approach: focus human input specifically on learning the features separately, and only then learn how to combine them into a reward. We introduce a novel type of human input for teaching features and an algorithm that utilizes it to learn complex features from the raw state space. The robot can then learn how to combine them into a reward using demonstrations, corrections, or other reward learning frameworks. We demonstrate our method in settings where all features have to be learned from scratch, as well as where some of the features are known. By first focusing human input specifically on the feature(s), our method decreases sample complexity and improves generalization of the learned reward over a deep IRL baseline. We show this in experiments with a physical 7-DoF robot manipulator, and in a user study conducted in a simulated environment.
Hamilton-Jacobi Multi-Time ReachabilityManan Doshi, Manmeet S. Bhabra, Marius Wiggert et al.|2022 IEEE 61st Conference on Decision and Control (CDC)|2022 For the analysis of dynamical systems, it is fundamental to determine all states that can be reached at any given time. In this work, we obtain and apply new governing equations for reachability analysis over multiple start and terminal times all at once, and for systems operating in time-varying environments with dynamic obstacles and any other relevant dynamic fields. The theory and schemes are developed for both backward and forward reachable tubes with time-varying target and start sets. The resulting value functions elegantly capture not only the reachable tubes but also time-to-reach and time-to-leave maps as well as start time vs. duration plots and other useful secondary quantities for optimal control. We discuss the numerical schemes and computational efficiency. We first verify our results in an environment with a moving target and obstacle where reachability tubes can be analytically computed. We then consider the Dubin’s car problem extended with a moving target and obstacle. Finally, we showcase our multi-time reachability in a non-hydrostatic bottom gravity current system. Results highlight the novel capabilities of exact multi-time reachability in dynamic environments.
RAPID-MOLT: A Meso-scale, Open-source, Low-cost Testbed for Robot Assisted Precision Irrigation and DeliveryTo study the automation of plant-level precision irrigation, specifically learning-based irrigation controllers, we present a modular, open-source testbed that enables real-time, fine-grained data collection and irrigation actuation. RAPID-MOLT costs USD $600 and has floor space of 0.37m <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> . The functionality of the platform is evaluated by measuring the correlation between plant growth (Leaf Area Index) and water stress (Crop Water Stress Index) with irrigation volume. In line with biological studies, the observed plant growth is positively correlated with irrigation volume while water stress is negatively correlated. Construction directions, experimental data, CAD models, and related software are available at github.com/BerkeleyAutomation/RAPID-MOLT.