A Blog by Jonathan Low

 

Mar 13, 2015

True That: Should Google Rank Searches Based on Accuracy?

'Truth? You can't handle the truth!'

That was a line spoken by Jack Nicholson's character to a prosecuting attorney in the movie 'A Few Good Men.'

It was a film ostensibly about a Marine officer on trial for ordering the disciplining of an underperforming subordinate which led to that character's death. But it was, more broadly, about how belief systems shape human views of what is true and what is not.

Google is the dominant provider of search on the web. By a long shot. So it has been conducting research to understand the demand for those services and how it might improve them. Just the sort of investment in understanding one's customers expected of a leading enterprise.

But we can well imagine the reaction to a search algorithm designed around the notion of accuracy: who's truth, exactly, are we talking about? Beliefs shade our view of what is true or accurate, so is a proponent of anti-vaccination policies likely to be persuaded by the views of a bunch of scientists she doesn't know? Or a gun control advocate convinced by the National Rifle Association's 'data? Yeah, right.

The benefit accuracy-based search provides is that it uses data and research to lead the billions that use Google's service to the answers they seek. The danger it faces is that its own credibility will suffer if the information it provides does not square with the emotional drivers of its users' emotional beliefs.

The web has democratized the search for meaning. In so doing, it has denigrated the power of experts, or at least those who disagree with the searcher's predilections. The strategic question facing Google is whether its credibility will survive the argument about who's data are 'true.' JL

Chris Mooney reports in the Washington Post:

In the U.S. over the past decade everyone does seem to have his own facts — at least around certain politicized topics. If Web content were rated in searches based on its actual accuracy — rather than based on its link-based popularity — a lot of misleading stuff might get buried
For some time, those of us studying the problem of misinformation in U.S. politics — and especially scientific misinformation — have wondered whether Google could come along and solve the problem in one fell swoop.
After all, if Web content were rated such that it came up in searches based on its actual accuracy — rather than based on its link-based popularity — then quite a lot of misleading stuff might effectively get buried. And maybe, just maybe, fewer parents would stumble on dangerous anti-vaccine misinformation (to list one highly pertinent example).
It always sounded like a pipe dream, but in the past week, there’s been considerable buzz that Google might indeed be considering such a thing. The reason is that a team of Google researchers recently published a mathematics-heavy paper documenting their attempts to evaluate vast numbers of Web sites based upon their accuracy. As they put it:
The quality of web sources has been traditionally evaluated using exogenous signals such as the hyperlink structure of the graph. We propose a new approach that relies on endogenous signals, namely, the correctness of factual information provided by the source. A source that has few false facts is considered to be trustworthy.
As our friends at The Intersect note, this does not mean Google is actually going to do this or put in place such a ranking system for searches. It means it’s studying it.
[Google has developed a technology to tell whether facts on the Internet are true]
Indeed, Google gave us the following statement: “This was research — we don’t have any specific plans to implement it in our products. We publish hundreds of research papers every year.”
It’s not the company’s first inquiry into the realm of automating the discovery of fact. The new paper draws on a prior Google project called the Knowledge Vault, which has  compiled more than a billion facts so far by grabbing them from the Web and then comparing them with existing sources. For 271 million of these facts, the probability of actual correctness is over 90 percent, according to Google.
The new study, though, goes farther. It draws on the Knowledge Vault approach to actually evaluate pages across the Web and determine their accuracy. Through this method, the paper reports, an amazing 119 million Web pages were rated. One noteworthy  result, the researchers note, is that Gossip sites and Web forums in particular don’t do very well — they end up being ranked quite low, despite their popularity.
Indeed, when comparing this new method, dubbed “Knowledge Based-Trust,” with the traditional Google PageRank approach – which focuses on links — the researchers found that “the two signals are almost orthogonal.”
Google’s new research didn’t explicitly mention how this approach might rank science contrarian Web sites. But media have been reporting this week that climate change skeptics seem unnerved by the direction that Google appears to be heading.
If this ever moves closer to a reality, then they should be. If you read the Google papers themselves, for instance, you’ll note that the researchers explicitly use, as a running example, a fact that has become “political.” Namely, the fact that Barack Obama was born in the United States.
From their Knowledge Vault paper, for instance:
For example, suppose an extractor returns a fact claiming that Barack Obama was born in Kenya, and suppose (for illustration purposes) that the true place of birth of Obama was not already known. … Our prior model can use related facts about Obama (such as his profession being US President) to infer that this new fact is unlikely to be true. The error could be due to mistaking Barack Obama for his father (entity resolution or co-reference resolution error), or it could be due to an erroneous statement on a spammy Web site (source error).
And now from the new paper:
In our example, there are 12 sources (i.e., extractorwebpage pairs) for USA and 12 sources for Kenya; this seems to suggest that USA and Kenya are equally likely to be true. However, intuitively this seems unreasonable: extractors E1 − E3 all tend to agree with each other, and so seem to be reliable; we can therefore “explain away” the Kenya values extracted by E4 − E5 as being more likely to be extraction errors.
And thus, before our eyes, algorithms begin to erode politicized disinformation.
Substitute “Barack Obama was born in the United States” with “Global warming is mostly caused by human activities” or “Childhood vaccines do not cause autism,” and you can quickly see how potentially disruptive these algorithms could be. Which is precisely why, if Google really starts to look like it’s heading in this direction, the complaints will get louder and louder.
I say bring them. The late Sen. Daniel Patrick Moynihan famously observed that “Everyone is entitled to his own opinions, but not to his own facts.” The problem in the U.S. over the past decade in particular, however, is that everyone does seem to have his own facts — at least around certain politicized topics.
But if anyone can bring us back to a shared reality, well, it’s Google

0 comments:

Post a Comment