A Blog by Jonathan Low

 

Nov 3, 2014

Unintentional Outcomes: Big Data's Disparate Impact

Big Data is all about the user, not the provider of the information. Of course, users want to understand the provider so he and she can be convinced to buy more of whatever the user is selling, but the insights gleaned are not intended to improve lives so much as to influence them.

Illegal? Immoral? Well, who's to judge? That probably seems like a rarified and largely philosophical question. Or at least it did. Because data's impact is becoming an issue, as the following article explains.

The fact is that sources and uses of data, especially application based on interpretation are going to change lives. But not necessarily in the ways that those whose lives are being changed wanted or even thought about. As awareness grows of these potential effects, however, challenges are going to rise.

The fundamental question is who owns what in this ephemeral realm and what does that mean for everyone involved: those who generated the data, those who bought it in good faith assuming they were free to use it for commercial purposes and, most emphatically, those who must bear the brunt of its impact on their lives.

There are no good, universally applicable answers right now. But that is most assuredly coming. It will probably be imposed by the courts rather than reasoned by the discussants. It may not be satisfying to anyone, let alone everyone, but we will know where we stand and this may even make the use of such data even more widespread. JL

Cathy O'Neill comments in The MathBabe blog:

We don’t measure the effects of our models on our users. We only see whether we have gained an edge in terms of profit.
Take a look at this paper by Solon Barocas and Andrew D. Selbst entitled Big Data’s Disparate Impact.
It deals with the question of whether current anti-discrimination law is equipped to handle the kind of unintentional discrimination and digital redlining we see emerging in some “big data” models (and that we suspect are hidden in a bunch more). See for example this post for more on this concept.
The short answer is no, our laws are not equipped.
Here’s the abstract:
This article addresses the potential for disparate impact in the data mining processes that are taking over modern-day business. Scholars and policymakers had, until recently, focused almost exclusively on data mining’s capacity to hide intentional discrimination, hoping to convince regulators to develop the tools to unmask such discrimination. Recently there has been a noted shift in the policy discussions, where some have begun to recognize that unintentional discrimination is a hidden danger that might be even more worrisome. So far, the recognition of the possibility of unintentional discrimination lacks technical and theoretical foundation, making policy recommendations difficult, where they are not simply misdirected. This article provides the necessary foundation about how data mining can give rise to discrimination and how data mining interacts with anti-discrimination law.
The article carefully steps through the technical process of data mining and points to different places within the process where a disproportionately adverse impact on protected classes may result from innocent choices on the part of the data miner. From there, the article analyzes these disproportionate impacts under Title VII. The Article concludes both that Title VII is largely ill equipped to address the discrimination that results from data mining. Worse, due to problems in the internal logic of data mining as well as political and constitutional constraints, there appears to be no easy way to reform Title VII to fix these inadequacies. The article focuses on Title VII because it is the most well developed anti-discrimination doctrine, but the conclusions apply more broadly because they are based on the general approach to anti-discrimination within American law.
I really appreciate this paper, because it’s an area I know almost nothing about: discrimination law and what are the standards for evidence of discrimination.
Sadly, what this paper explains to me is how very far we are away from anything resembling what we need to actually address the problems. For example, even in this paper, where the writers are well aware that training on historical data can unintentionally codify discriminatory treatment, they still seem to assume that the people who build and deploy models will “notice” this treatment. From my experience working in advertising, that’s not actually what happens. We don’t measure the effects of our models on our users. We only see whether we have gained an edge in terms of profit, which is very different.
Essentially, as modelers, we don’t humanize the people on the other side of the transaction, which prevents us from worrying about discrimination or even being aware of it as an issue. It’s so far from “intentional” that it’s almost a ridiculous accusation to make. Even so, it may well be a real problem and I don’t know how we as a society can deal with it unless we update our laws.

0 comments:

Post a Comment