Written by
Kristine Lloyd, Princeton Laboratory for Artificial Intelligence
May 1, 2025

As the scaling of existing data becomes increasingly unsustainable, researchers must find algorithmic innovations to “bend” scaling laws and continue the rapid advancement of large language models, Yejin Choi said as part of the Princeton Laboratory for Artificial Intelligence’s ongoing Distinguished Lecture Series.

“It might be that the era of brute force scaling is over, and the era of smart scaling begins,” she said. “Therefore, we can now do what computer science is all about, which is algorithms.”

The AI Lab’s Distinguished Lecture Series aims to bring scholars to campus whose research demonstrates the transformative impact AI could have across disciplines. The final Distinguished Lecture of the semester will be held Friday, May 2, featuring DeepMind Vice President for Research Pushmeet Kohli.

Choi, an incoming professor of computer science at Stanford University, was named among the Time100 Most Influential People in AI, and in 2022 received the prestigious MacArthur Foundation award commonly called the “genius grant.” 

“She is one of the brightest stars of AI and language models,” said Sanjeev Arora, director of Princeton Language and Intelligence, which hosted the event. 

Earlier in her career, she was a professor of computer science at the University of Washington, where she was also an adjunct in the linguistics department, and affiliate of the Center for Statistics and Social Sciences. She earned her Ph.D. from Cornell University.

The Distinguished Lecture, held April 11 in the Friend Center, was based on Choi’s new research from her time as a senior director at Nvidia, she said. She agrees with other prominent researchers who believe that retraining, the process of updating a large language model by training it again with a new dataset, will soon no longer be useful. 

“We humans are not writing internet data fast enough for [large language models] to train more,” Choi said. “So how do we cope with this situation?”

In her talk, Choi described recent research into methods for enhancing synthetic data, symbolic search algorithms for test-time reasoning, test-time training that enhances learning even during testing, and a new tokenization algorithm for better and faster inference.

“In the end, I think the scaling of intelligence will continue, but we as a community, I hope, will do it in a more, exciting, smart way,” Choi said.