Or sometimes they simply follow the data, as the article below explains: if white males in a data sample have higher salaries, ads for higher paying jobs will be shown primarily to other white males. The challenge is how to eliminate such inefficiencies without undermining the advantages of the technological approach. JL
Elizabeth Dwoskin reports in the Wall Street Journal:
In the type of image-tagging programs used by Google and others, software learns to distinguish people in photos by finding common patterns in millions of images of people. If the training data predominantly depicts white people, the software won’t learn to recognize people who look different.
Software tags our photos, recommends products to buy online and serves up ads based on our interests. Increasingly, it also plays a role in more consequential decisions, such as who gets a job or loan, or who pays surge pricing in a ride-sharing app.
But computer programs that crunch immense amounts of data to render decisions or predictions can go embarrassingly, sometimes troublingly wrong. n go embarrassingly, sometimes troublingly wrong. In May, Flickr, a division of Yahoo Inc., rolled out software that recognized objects in photos uploaded by users and tagged them accordingly: car, boat, cat, dog. But what was supposed to be a snazzy feature became a public-relations
nightmare when the online photo-sharing service tagged a photo of a black man with the word “ape” and a picture of a concentration camp as a “jungle gym.” Google Inc. ran into the same problem in June, when the company’s auto-tagging feature mislabeled a photo of a black man with the word “gorilla.”
Such errors can go beyond insensitivity and insult to arbitrarily limit people’s opportunities. Carnegie Mellon University researchers examining Google’s ad-targeting system recently found that male Web users were six times more likely than female users to be shown ads for high-paying jobs. Veterans have complained that they were automatically disqualified for civilian jobs because human-resources software used by the employers didn’t recognize the skills they learned in the military.
While automation is often thought to eliminate flaws in human judgment, bias—or the tendency to favor one outcome over another, in potentially unfair ways—can creep into complex computer code. Programmers may embed biases without realizing it, and they can be difficult to spot and root out. The results can alienate customers and expose companies to legal risk. Computer scientists are just starting to study the problem and devise ways to guard against it.
“Computers aren’t magically less biased than people, and people don’t know their blind spots,” said Vivienne Ming, a data scientist and entrepreneur who advises venture capitalists on artificial intelligence technology.
Many data scientists believe that the benefits of such technology outweigh the risks. As chief scientist at Gild Inc., a technology startup that makes software to help identify promising job candidates, Ms. Ming devised programs that she says helped recruiters consider a broader set of candidates than they would otherwise. Recruiters often base their decisions on sharply limited criteria—for instance, disqualifying talented candidates who didn’t attend a top school. Software that weighs more variables allowed recruiters to cast a wider net, she said.
Yet unintended negative outcomes are “definitely a risk,” said Adeyemi Ajao, vice president of technology strategy and a data scientist at Workday, a software company using complex statistical formulas for human-resources management. “I don’t think it’s possible to eliminate it 100%.”
One common error is endemic to a popular software technique called machine learning, said Andrew Selbst, co-author of “Big Data’s Disparate Impact,” a paper to be published next year by the California Law Review. Programs that are designed to “learn” begin with a limited set of training data and then refine what they’ve learned based on data they encounter in the real world, such as on the Internet. Machine-learning software adopts and often amplifies biases in either data set.
In the type of image-tagging programs used by Google and others, software learns to distinguish people in photos by finding common patterns in millions of images of people. If the training data predominantly depicts white people, the software won’t learn to recognize people who look different.
Google acknowledged the error in its image-tagging software and said it was working to fix the problem, but declined to comment further.
Paul Viola, a former Massachusetts Institute of Technology engineer who helped pioneer such techniques, said he encountered similar problems 15 years ago, and that they’re hard to tackle. Back then, he built a software program that would comb through images online and try to detect objects in them. The program could easily recognize white faces, but it had trouble detecting faces of Asians and blacks. Mr. Viola eventually traced the error back to the source: In his original data set of about 5,000 images, whites predominated.
The problem got worse as the program processed images it found on the Internet, he said, because the Internet, too, had more images of whites than blacks. The software’s familiarity with a larger set of pictures sharpened its knowledge of faces, but it also solidified the program’s limited understanding of human differences.
To fix the problem, Mr. Viola added more images of diverse faces into his training data, he said.
Mr. Viola’s ability to trace the problem back to the source was unusual, said Mr. Selbst. More often than not, the culprit is hard to pinpoint. Two common reasons: The software is proprietary and not available for examination, and the formula, or algorithm, used by the computer is extremely complex. An image-detection algorithm, for instance, may use hundreds of thousands of variables.
Take recent research from Carnegie Mellon that found male Web users were far more likely than female users to be shown Google ads for high-paying jobs. The researchers couldn’t say whether this outcome was the fault of advertisers—who may have chosen to target ads for higher-paying jobs to male users—or of Google algorithms, which tend to display similar ads to similar people. If Google’s software notices men gravitating toward ads for high-paying jobs, the company’s algorithm will automatically show that type of ad to men, the researchers said.
Google declined to comment.
Regardless of their source, the only way to detect subtle flaws in such complex software is to test it with a large number of users, said Markus Spiering, a former Yahoo Inc. product manager who oversaw the company’s Flickr division. To improve an image-detection program quickly enough to be competitive, he said, it is necessary to let the public use it, and that means running the risk of making public mistakes.
“Even though the technology is advanced, sometimes it makes mistakes,” Yahoo spokeswoman Anne Yeh said. “When errors occur, users have the ability to delete the inaccurate tags, and doing so enables our algorithm to learn from the feedback and the technology becomes smarter and more accurate over time.” She added that the company had removed the incorrect tags and allows users to exempt their photos from autotagging.
Data scientists say software bias can be minimized by what amounts to building affirmative action into a complex statistical model, such as Mr. Viola introducing more diverse faces.
“You can plan for diversity,” said T.M. Ravi, co-founder and director of the Hive, an incubator for data-analytics startups.
Mr. Selbst, along with the Carnegie Mellon technologists, and others are among the pioneers of an emerging discipline known as algorithmic accountability. These academics, who hail from computer science, law and sociology, try to pinpoint what causes software to produce these types of flaws, and find ways to mitigate them.
Researchers at Princeton University’s Web Transparency and Accountability Project, for example, have created software robots that surf the Web in patterns designed to make them appear to be human users who are rich or poor, male or female, or suffering from mental-health issues. The researchers are trying to determine whether search results, ads, job postings and the like differ depending on these classifications.
One of the biggest challenges, they say, is that it isn’t always clear that the powerful correlations revealed by data-mining may be biased. Xerox Corp., for example, quit looking at job applicants’ commuting time even though software showed that customer-service employees with the shortest commutes were likely to keep their jobs at Xerox longer. Xerox managers ultimately decided that the information could put applicants from minority neighborhoods at a disadvantage in the hiring process.
“Algorithms reproduce old patterns of discrimination,” Mr. Selbst said, “And create new challenges.”
0 comments:
Post a Comment