A Blog by Jonathan Low

 

Oct 5, 2024

Companies Find Gen AI Models Lack Specific Enough Knowledge To Be Useful

On top of all the money they are spending to adapt Gen AI models from OpenAI, Anthropic et al, organizations are discovering that without specific domain expertise, the models are relatively useless, thus requiring an even bigger financial commitment to make them effective. 

The question is how much "augmentation" as it is called will be required - and what it will cost in addition to initial expenditures. JL

Isabelle Bousquette reports in the Washington Post:

As AI projects creep from the pilot-project stage into operations, corporate users are discovering that many AI models are about as useful out of the box as a new employee entering orientation. Generative AI’s foundation models can be trained on vast troves of data from the internet and other sources, but still lack deep, specific knowledge on topics. Companies are finding it is critical to augment today’s general models, like those offered by Anthropic or OpenAI, with more industry-specific or business-specific data if they’re going to be useful. A question is how much augmentation is enough to make models accurate and reliable enough for a specific use. 

Earlier this year, the PGA Tour’s digital chief witnessed ChatGPT make the digital equivalent of a double bogey when the chatbot flubbed a question on basic golf lore: How many times has Tiger Woods won the Tour?

Generative AI’s foundation models can be trained on vast troves of data from the internet and other sources, but still lack deep, specific knowledge even on topics as mainstream as golf, Scott Gutterman, the Tour’s senior vice president of digital and broadcast technology, realized.

“There’s missing data, there’s generalized data. Those things have just kind of led to generalized responses,” Gutterman said.

The PGA Tour is not alone. As AI projects creep from the pilot-project stage into operations, corporate users are discovering that many AI models are about as useful out of the box as a new employee entering orientation.

Companies are finding it is critical to augment today’s general models, like those offered by Anthropic or OpenAI, with more industry-specific or business-specific data if they’re going to be useful. (News Corp, owner of The Wall Street Journal, has a content-licensing partnership with OpenAI.)

But that augmentation presents a spectrum of options, where higher levels of accuracy and reliability also bring more costs and complexity, said Ritu Jyoti, general manager and group vice president of AI and data as well as global AI lead at research firm International Data Corp. And the augmentation only works if companies have an impeccable handle on their data, which can be difficult, Jyoti said.

Yet another question is how much augmentation is enough to make models accurate and reliable enough for a specific use. A range of companies including consultants, cloud providers like Amazon Web Services and model makers like OpenAI itself are positioning themselves to help. 

How to Make the AI Models Go to Work

Broad foundation models can require their own upskilling to make it in business. CIOs have access to a range of options to augment models with company and industry specific data. Here are some of the key approaches, according to Ritu Jyoti, GM/GVP AI and Data, Global AI lead analyst at IDC.

Retrieval Augmented Generation (RAG)

This is the process of optimizing the output of a large language model, so it references an authoritative knowledge base outside of its training data sources before generating a response. RAG can provide a 20-40% increase in accuracy.

Fine Tuning

This involves further training a large language model on proprietary data so that it becomes better at specific tasks. This can be a lot more effective than just prompt engineering and RAG, but it’s costly and complicated, requiring financial investment as well as specialized, in-demand talent.

Custom Building

Building a brand-new model from the ground up that’s designed for a specific use in a specific business could yield the most accurate and reliable answers, but it’s also the most costly and cumbersome option. It’s also risky since any missteps could reduce the level of accuracy.

IDC estimates that worldwide spending on AI, including AI-enabled applications, infrastructure and related IT and business services, will more than double by 2028 to $632 billion.

The PGA Tour said it is now using an approach known as retrieval augmented generation, or RAG, to avoid any future AI errors. It taps into Claude on the Amazon Web Services infrastructure, then inputs organization-specific information, for instance, a 190-page document containing the Tour’s rules. That way, it can ask a query and require the model to directly refer to the information in the document, rather than information culled from the internet.

All outputs are still reviewed by a human before they’re delivered to players or customers, Gutterman said.

But the RAG approach can only take you so far, according to Will McQueen, vice president and head of data assets and analytics at the agriculture division of pharmaceutical and biotech giant Bayer.

“It’s, for sure, limited,” McQueen said. 

It works well enough for certain low-stakes uses, like answering new engineers’ questions in the onboarding process, he said. The stakes are much higher, though, when giving farmers advice on tending their crops. For that, the company might go to “fine-tuning,” McQueen said, or training parts of the model with proprietary data to get a big leap in accuracy and relevance of responses.

But fine-tuning can be even more expensive, require more specialized talent—and still fall short of 100% accuracy, Jyoti said. To help, model makers like OpenAI now offer business customers assistance with model fine-tuning and customization.

The highest level of accuracy available today is only possible when a company trains its own models from the ground up, Jyoti said. But the cost and talent required for that approach are prohibitive to most enterprises, she said.

 

Custom-building just a small language model, for instance, could run anywhere from $500,000 to millions of dollars, said Bayer’s McQueen. Ongoing maintenance adds another layer of expense.

Even so, Rocket Companies sees potential in the approach, said Shawn Malhotra, chief technology officer. The lending giant is exploring using an AI model to automatically fill in portions of mortgage applications. 

But there’s a lot of nuance in the language of home ownership that foundation models on their own wouldn’t typically understand, Malhotra said.

“It’s not simply just the name and number,” he said. “You’re worried about: What kind of dwelling is this? Is this a detached home? Is it a non-detached home?” 

Building a new small model can help the AI understand those nuances, Malhotra said.

 

“You may have to give examples of ‘when a dwelling is described in the following way, it maps to this kind of property; when it’s described in a different way, it’s matched to a different kind of property’.” 

Companies are still learning how to make the various approaches work and figuring out which ones make sense for which situations.

“Most customers are using RAG in one or another fashion,” said Sri Elaprolu, director at Amazon’s AWS GenAI Innovation Center. “Some customers are starting down the fine-tuning route—we’re starting to see that volume increase, rapidly. And more and more customers are exploring what it means to pre-train a model.” 

Industry-specific models are also emerging to help solve some of the customization complexities.

Legal tech company Luminance has an AI model that’s been trained specifically on over 150 million legal documents over the last 10 years. Getting something wrong in a legal context could be disastrous, said Luminance CEO Eleanor Lightbody. “That’s why having AI you’ve just trained on legal contracts is so, so important.” 

But the overall supply of these narrow AI models is still limited, said IDC’s Jyoti. For now, the task of augmenting and customizing is falling directly to companies. 

Like the PGA Tour, which found that ChatGPT would occasionally say Tiger had 15 PGA Tour wins. In fact, he’s had 82.

“Tiger’s won 15 majors. That is not the same thing as winning 82 PGA Tour events,” said Gutterman. “Beginning to get the models to understand the difference between major wins and PGA Tour wins was something that we saw we would need to do.”

0 comments:

Post a Comment