A Blog by Jonathan Low

 

Feb 18, 2016

The Best Artificial Intelligence Still Flunks 8th Grade Science

But just because it isn't all that smart doesn't mean it isn't capable of taking jobs from humans who might also fit that description. JL

Cade Metz reports in Wired:

8th-grade science asks students to solve problems that require several steps, and combine multiple facts to show understanding. So, we’ve yet to build a machine that’s even sorta close to real intelligence.
In 2012, IBM Watson went to medical school. So said The New York Times, announcing that the tech giant’s artificially intelligent question-and-answer machine had begun a “stint as a medical student” at the Cleveland Clinic Lerner College of Medicine.
This was just a metaphor. Clinicians were helping IBM train Watson for use in medical research. But as metaphors go, it wasn’t a very good one. Three years later, our artificially intelligent machines can’t even pass an eighth-grade science test, much less go to medical school.
The top performers successfully answered about 60 percent of the questions. In other words, they flunked.
So says Oren Etzioni, a professor of computer science at the University of Washington and the executive director of the Allen Institute for Artificial Intelligence, the AI think-tank funded by Microsoft co-founder Paul Allen. Etzioni and the non-for-profit Allen Institute recently ran a contest, inviting nearly 800 teams of researchers to build AI systems that could take an eighth grade science test, and today, the Institute released the results: The top performers successfully answered about 60 percent of the questions. In other words, they flunked.
For Etzioni, this five-month-long contest serves as a reality check for the state of artificial intelligence. Yes, thanks to the rise of deep neural networks, networks of hardware and software that approximate the web of neurons in the human brain, companies like Google and Facebook and Microsoft have achieved human-like performance in identifying images and recognizing spoken words, among other tasks. But we’re still a long way from machines that can really think, from AI that can carry on a real conversation, even from systems that can pass a basic science test.

Whither Watson?

You might say that, way back in 2011, IBM Watson beat the best humans on Earth at Jeopardy!, the venerable TV trivia game show. And it did. Google just built a system that could top a professional at the ancient game of Go. But for a machine, these are somewhat easier tasks than taking a science test. “Jeopardy! is [about] finding a single fact, while I would imagine—and hope—that 8th-grade science asks students to solve problems that require several steps, and combine multiple facts to show understanding,” says Chris Nicholson, CEO and founder of AI startup Skymind.
The Allen Institute’s science test includes more than just trivia. It asks that machines understand basic ideas, serving up not only questions like “Which part of the eye does light hit first?” but more complex questions that revolve around concepts like evolutionary adaptation. “Some types of fish live most of their adult lives in salt water but lay their eggs in freshwater,” one question read. “The ability of these fish to survive in these different environments is an example of [what]?” These were multiple-choice questions—and the machines still couldn’t pass, despite using state-of-the-art techniques, including deep neural nets. “Natural language processing, reasoning, picking up a science textbook and understanding—this presents a host of more difficult challenges,” Etzioni says. “To get these questions right requires a lot more reasoning.”
Yes, most of the contestants were academics, independent researchers, or computer scientists outside the largest tech companies. But Etzioni isn’t sure the tech giants would preform all that much better, despite employing some of the top researchers in the field. “It’s entirely possible that the scores would have gone higher had companies like Google and others put their ‘big guns’ to work,” he says. “[But] the ‘wisdom of the crowds’ is quite powerful and there some very talented folks engaged in these contests.” Chaim Linhart, an Israeli researcher who participated in the competition, agrees. “In most competitions, I think the winning models are very specific to the test dataset, so even companies that work in the same domain don’t necessarily have a significant advantage,” he says.
What about Watson? According to Etzioni, IBM declined to participate (the company says it has turned its attentions away from contests like this and towards “real world” applications). But Watson is perhaps not the best litmus test. Watson was good at Jeopardy!. That’s what it was built for. But today, Watson is really just a brand name for a wide range of AI tools offered by IBM, and those tools aren’t necessarily state of the art.

Back to Work

Etzioni’s eighth grade science test is really a test of natural language understanding—how well a machine understands the natural way humans speak and write. IBM’s services do include natural language processing, but since Watson’s arrival, this kind of tech has received a new boost from deep neural nets. Just as you can teach a neural net to recognize a cat by feeding it myriad cat photos, you can teach it to understand natural language using mountains of digital dialogue. Google, for instance, has used neural nets to build a chatbot that debates the meaning of life.
But this chatbot wasn’t completely convincing. As it stands, the state of the art lies beyond any one technology. “So far, there is no universal method,” says Dutch researcher Benedikt Wilbertz, another participant in the Allen AI contest. “This challenge needed its own mix of machine learning and [other] AI tools.” Indeed, the top participants in the Allen AI challenge used deep learning as well as various other techniques. And the end result was still well below perfect.
Doug Lenat, who runs an AI project called Cyc, says that teaching today’s machines to take basic science tests doesn’t even make much sense. We should be striving for something more—something much further out. “If you’re talking about passing multiple choice science tests, I always felt that was not actually the test AI should be aiming to pass,” he says. “The focus on natural language understanding—-science tests, and so on—is something that should follow from a program being actually intelligent. Otherwise, you end up hitting the target but producing the veneer of understanding.” In other words, a machine that passes an eighth grade science test isn’t all that smart.
So, we’ve yet to build a machine that’s even sorta close to real intelligence. But work will continue.

24 comments:

360digitmg said...

Hi! This is my first visit to your blog! We are a team of volunteers and new initiatives in the same niche. Blog gave us useful information to work. You have done an amazing job!
Data Science Training in Hyderabad

360digitmg said...

wow, great, I was wondering how to cure acne naturally. and found your site by google, learned a lot, now i’m a bit clear. I’ve bookmark your site and also add rss. keep us updated.
Best Data Science Courses in Hyderabad

prathyusha said...

I at last discovered extraordinary post here.I will get back here. I just added your blog to my bookmark locales. thanks.Quality presents is the pivotal on welcome the guests to visit the website page, that is the thing that this page is giving.data science training Hyderabad


Maneesha said...

Truly quite fascinating post. I was searching for this sort of data and delighted in perusing this one. Continue posting. Much obliged for sharing.
data scientist hyderabad

lionelmessi said...

I would you like to say thank you so much for my heart. Really amazing and impressive post you have the share. Please keep sharing...

AWS Training in Hyderabad

Priya Rathod said...

very informative blog and useful article thank you for sharing with us, keep posting.
DevOps Training in Hyderabad
DevOps Course in Hyderabad

kajal shah said...

I'm glad that I found this page thanks for sharing such valuable information.

ceramic coating in chennai

data science said...

Happy to visit your blog, I am by all accounts forward to more solid articles and I figure we as a whole wish to thank such huge numbers of good articles, blog to impart to us.

PMP Training in Malaysia said...

360DigiTMG, the top-rated organisation among the most prestigious industries around the world, is an educational destination for those looking to pursue their dreams around the globe. The company is changing careers of many people through constant improvement, 360DigiTMG provides an outstanding learning experience and distinguishes itself from the pack. 360DigiTMG is a prominent global presence by offering world-class training. Its main office is in India and subsidiaries across Malaysia, USA, East Asia, Australia, Uk, Netherlands, and the Middle East.

360DigiTMG said...

Great post. I would like to thank you for the efforts you have made in writing this interesting and knowledgeable article.
data science institutes in hyderabad

Mahil mithu said...

Well done! I am really glad to read your fantastic posting and keep sharing...
Family Law Retainer Fee
Child Support Virginia

abogado de divorcio de nueva jersey said...

Thanks for sharing beautiful content. I got information from your blog.keep sharing
Divorce Attorneys Fairfax

PilgrimageTour said...

Thank you for sharing this kind of wonderful post keep sharing blogs and you might check our blog also. Pilgrimage Tour

360DigiTMGIOTCourses said...

I feel exceptionally pleased to have seen your site page and expect such endless additionally spellbinding occasions looking at here. Appreciative again for all of the subtleties.
MLOps Course

ambrosed081 said...

AI has made significant progress in tasks like image recognition and game playing, but still struggles with fundamental reasoning and understanding for tasks like an 8th-grade science test. Despite advanced techniques like deep neural networks, AI systems tested still fall short on basic comprehension and reasoning tasks. IBM, once showcasing Watson's capabilities in Jeopardy!, declined to participate in such tests, focusing instead on practical applications in real-world scenarios. The pursuit of genuine artificial intelligence, machines that can think, learn, and reason like humans, remains a daunting but compelling goal. The reluctance of major tech companies like IBM to participate in such challenges signals a shift in focus towards practical applications of AI rather than academic benchmarks. abogado planificación patrimonial

Post a Comment