OpenAI’s cofounder and former chief scientist, Ilya Sutskever says "we've achieved peak data and there'll be no more. Pre-training as we know it will end,” referring to AI model development, when a large language model learns patterns from vast amounts of unlabeled data from the internet, books, and other sources. The industry is tapping out on new data to train on. This will force a shift away from the way models are trained. He compared the situation to fossil fuels: just as oil is a finite resource, the internet contains a finite amount of human-generated content. He suggested that just as evolution found a new scaling pattern for hominid brains, AI might similarly discover new approachesOpenAI’s cofounder and former chief scientist, Ilya Sutskever, made headlines earlier this year after he left to start his own AI lab called Safe Superintelligence Inc. He has avoided the limelight since his departure but made a rare public appearance in Vancouver on Friday at the Conference on Neural Information Processing Systems (NeurIPS).
“Pre-training as we know it will unquestionably end,” Sutskever said onstage. This refers to the first phase of AI model development, when a large language model learns patterns from vast amounts of unlabeled data — typically text from the internet, books, and other sources. During his NeurIPS talk, Sutskever said that, while he believes existing data can still take AI development farther, the industry is tapping out on new data to train on. This dynamic will, he said, eventually force a shift away from the way models are trained today. He compared the situation to fossil fuels: just as oil is a finite resource, the internet contains a finite amount of human-generated content.
“We’ve achieved peak data and there’ll be no more,” according to Sutskever. “We have to deal with the data that we have. There’s only one internet.”
Next-generation models, he predicted, are going to “be agentic in a real ways.” Agents have become a real buzzword in the AI field. While Sutskever didn’t define them during his talk, they are commonly understood to be an autonomous AI system that performs tasks, makes decisions, and interacts with software on its own.
Along with being “agentic,” he said future systems will also be able to reason. Unlike today’s AI, which mostly pattern-matches based on what a model has seen before, future AI systems will be able to work things out step-by-step in a way that is more comparable to thinking. The more a system reasons, “the more unpredictable it becomes,” according to Sutskever. He compared the unpredictability of “truly reasoning systems” to how advanced AIs that play chess “are unpredictable to the best human chess players.”
“They will understand things from limited data,” he said. “They will not get confused.”
On stage, he drew a comparison between the scaling of AI systems and evolutionary biology, citing research that shows the relationship between brain and body mass across species. He noted that while most mammals follow one scaling pattern, hominids (human ancestors) show a distinctly different slope in their brain-to-body mass ratio on logarithmic scales.
He suggested that, just as evolution found a new scaling pattern for hominid brains, AI might similarly discover new approaches to scaling beyond how pre-training works today.
After Sutskever concluded his talk, an audience member asked him how researchers can create the right incentive mechanisms for humanity to create AI in a way that gives it “the freedoms that we have as homosapiens.”
“I feel like in some sense those are the kind of questions that people should be reflecting on more,” Sutskever responded. He paused for a moment before saying that he doesn’t “feel confident answering questions like this” because it would require a “top down government structure.” The audience member suggested cryptocurrency, which made others in the room chuckle.
“I don’t feel like I am the right person to comment on cryptocurrency but there is a chance what you [are] describing will happen,” Sutskever said. “You know, in some sense, it’s not a bad end result if you have AIs and all they want is to coexist with us and also just to have rights. Maybe that will be fine... I think things are so incredibly unpredictable. I hesitate to comment but I encourage the speculation.”
Dec 14, 2024
OpenAI Cofounder Says 'Peak Data' Reached, AI Training 'As We Know It' Ending
While tech may well be able to discover new ways to scale AI learning as 'peak data' limits new sources, the looming question is what this may do to AI's cost structure, already considered excessive. JL
Kylie Robison reports in The Verge:
0 comments:
Post a Comment