The question for society, as usual at the cross section of data and tech, is whether the public cares enough to resist the blandishments and power of those who are benefitting from the information generated. JL
Peter Aldhous reports in Buzzfeed:
It’s a numbers game: The more people who put their DNA into databases that can be search(ed), the greater the chance (researchers) will find matches. As I discovered, if you find relatives closer than second cousins, you can identify your target within hours. But from third cousins or further out, it can be a long haul. My 60% clearance rate is an overestimate of how easy it is to solve cases through genetic genealogy. Unease I had about genetic racial profiling came not when I was examining matching DNA profiles, but when I pored through people’s Facebook timelines, trying to discover something about their family relationships.
A year ago, when cops captured Joseph James DeAngelo, the suspected Golden State Killer, the world woke up to the power of genetic genealogy. DeAngelo was identified because DNA he left at the scene of a 1980 double murder partially matched the profiles that a few of his distant relatives had uploaded to a public website to research their family history. Based on those matches, a team of detectives drew up family trees that eventually led them to DeAngelo, suspected of at least 13 murders and more than 50 rapes.
In the year since, more than 50 other criminal cases have been cracked using similar methods, launching a new forensic science industry. One estimate has suggested that more than half of the US population could be found in this way — although genealogists have warned that, in practice, complications like adoptions or misunderstandings over who is the biological father of a child can throw an investigator off track.
How hard is it to crack cases in this way? And what issues does it raise, as police recruit genealogists to help them solve crimes by sifting through the perpetrators’ extended family trees?
To explore these questions, my editor Virginia Hughes and I conjured up an experiment: She would recruit BuzzFeed employees to play the role of “suspects” and get their DNA tested with a company used by genealogy enthusiasts. She’d then download their DNA profiles, containing data on hundreds of thousands of genetic markers, and send the files to me labeled with randomly chosen fake names. Then I’d play genealogy “detective” and try to figure out who they really were.In the end, I identified 6 out of our 10 volunteers. Four of those cases I solved by tracking them down through their relatives’ family trees, much as the cops did with DeAngelo. In a twist I didn’t anticipate, I found two more not through their relatives, but simply because their ancestry indicated that their family came from a specific country — raising uncomfortable questions about genetic racial profiling.
Investigation basicsBefore I started this project, I was a novice at genealogy. So I sat down for a couple of hours of training with an expert, Leah Larkin, a genealogist in Livermore, California, who helps adoptees find their biological relatives and who blogs as the DNA Geek.
Larkin taught me how to look for partial DNA matches on GEDmatch, the free, public website used to find DeAngelo’s distant relatives. She also introduced me to another online tool that takes the amount of DNA shared by two people — a number provided by GEDmatch — and tells
tells you the possible family relationships that might explain the match. (A child typically shares about 50% of their DNA with their mother, for example, while two first cousins have about 12.5% shared DNA.) And Larkin showed me how she uses genealogical websites, public records, and social media posts to piece together a person’s family tree.
Then I got down to work. Even with a pool of suspects limited to the 1,200 or so people who work for BuzzFeed, I was prepared for a long, hard slog, especially for anyone whose matches were more distant than second cousins. For the Golden State Killer, the closest matches were third or fourth cousins. It took a team of five people, including the genealogist Barbara Rae-Venter, more than four months to identify DeAngelo as their prime suspect. In total, I spent about 60 hours trying to identify our 10 suspects. In the most frustrating case, I gave up after 30 hours. The easiest took me less than an hour.
Suspects still at large: Marina Paniagua, Rico Ray, and Katelyn Kim
The first case tested my stamina. For the profile labeled Marina Paniagua, I found two DNA matches — one that shared 1.25% of their DNA with Marina, and another about 1% — that seemed worth pursuing. For each I was able to find a family tree online that served as a starting point for my own research. Using the tools available with a subscription at Ancestry, which cost me $49 per month, I extended these trees to try and find my target.
Each match looked like they could be Marina’s third cousin. If so, I would have to go back four generations to their great-great-grandparents to find common ancestors with Marina. Then I would follow the possible branches back down four generations until I found someone who works for BuzzFeed.
The trouble is that there are many, many potential paths up and down four generations in an extended family tree. Here’s some back-of-the-envelope math: Marina would have eight sets of great-great-grandparents, so if everyone descended from them had three children down four generations, Marina would be just one of 648 (that’s 8 x 34) third cousins.
It would have been easier if Marina’s two matches were related to each other — then I could have looked for where their family trees intersected to narrow down the range of possible branches to explore. But they weren’t, which means that one was probably from her mother’s side of the family, the other from her father’s.
After more than 30 hours of work, I’d traced one of Marina’s matches back to great-great-grandparents in southern Sweden, and another to ancestors in central Ireland. But despite exploring many descending branches back down to the present day, I’d failed to find anyone whose name was in the BuzzFeed employee directory — and there were dozens of possible branches still to explore.
It was time to give up.
When my target, Sarah Mimms, a politics editor with BuzzFeed News in Washington, DC, revealed her identity, I learned of another reason why her case would be difficult to solve. “My mother is adopted,” Mimms said. So even if I’d found one of her maternal biological grandparents, I probably would have lost the trail at that point, because public records would not link them to Mimms’ mother.
Mimms’ case was informative, however, because matches to third or fourth cousins, roughly the same as in her case, were what Venter and the detectives hunting the Golden State Killer were working with.
Viewed against the gargantuan effort needed to crack the Golden State Killer case, my fruitless week of genealogical research didn’t seem so bad. And it provided a yardstick to judge when a case would be too time-consuming to solve. So when I subsequently examined two DNA profiles with similarly distant matches, I quickly conceded defeat.
Rico Ray was a man with Western European ancestry whose closest match — again a possible third cousin — was a police officer in upstate New York. Rico was Brandon Hardin, a curation editor for BuzzFeed News in New York City.
The matches to the profile labeled Katelyn Kim were even less promising, but her ancestry was an interesting mix of Eastern, Southern, and Western European.
“I’m a European mutt,” BuzzFeed News managing editor Maggie Schultz said, when she later revealed her identity.
Giving up on finding Marina, Rico, and Katelyn might seem defeatist, but it’s the same sort of triage that the companies offering genealogy services to the cops are now doing. “That’s exactly how they’re operating,” said genealogist Larkin, when I told her about my strategy.
Indeed, Bode Technology, which launched its forensic genealogy service in February, is focusing on cases with second-cousin matches or closer, according to its director of sales and marketing, Andrew Singer.
Parabon NanoLabs, which has been using genealogy to help cops make arrests since May 2018, doesn’t use a simple cutoff. But after providing an initial assessment of how hard a case will be to solve, Parabon charges $3,500 for up to 15 hours of genealogical research. So similarly, its genealogists aren’t laboring for months to solve difficult cases.
Revealed by Facebook: Cheyenne Griggs
My first success owed a lot to Facebook. Cheyenne Griggs’ closest match was possibly a half second cousin or a second cousin once removed. But this profile had been uploaded by someone with a common name — Judith Johnson — and an email that I couldn’t track down on a simple Google search. That made her hard to identify without other clues.
To get a foothold onto Cheyenne’s family tree, I turned to Judith’s closest match, a woman in Illinois who seemed like she might be Judith’s half sister. From posts on her Facebook profile, I worked out that the Illinois woman was adopted and had recently discovered members of her biological family. One elderly woman was tagged in a profile picture of a family get-together, so I went to her Facebook profile and found she was friends with a Judith Johnson in Albuquerque, New Mexico. And when I looked at Judith’s Facebook friend list, a name that I recognized jumped out: Drusilla Moorhouse, deputy copy chief for BuzzFeed News, based in our Los Angeles bureau.
It was too much of a coincidence for Moorhouse not to be my target, but I still had to confirm the connection. That’s when I discovered from public records that Moorhouse had grown up as Jennifer Johnson. She was from New Mexico, and her mother’s name was Judith. LexisNexis, a database I often consult to track down sources for my reporting, also linked the email Judith Johnson had used on GEDmatch to Moorhouse’s mother.
But Cheyenne’s closest match on GEDmatch was way too distant to be her mother. Moorhouse’s mom, I realized, must have uploaded the DNA of another family member.
Moorhouse, a true crime fan who volunteered for our project because she was fascinated by the Golden State Killer investigation, was surprised that I’d been able to learn these details about her family — particularly because she had kept her own Facebook friend list hidden. But because her relatives had theirs in open view, that was no obstacle to tracking her down.
Still, she was convinced that the benefits of this new approach to police work outweigh any privacy concerns.
“I think it's important to remember that in most of the cases being solved through genetic genealogy, the victims are women who have been sexually assaulted,” she said. “I hope that long-forgotten rape kits will be taken off the shelves and reexamined.”
Solved within hours: Donald Adkins and Keith Herman
After cracking Cheyenne’s case, my editor sent me a new batch of DNA profiles. As soon as I uploaded them to GEDmatch, two stood out for having reasonably close relatives on GEDmatch.
For Donald Adkins, the best match was a woman who once worked in the fashion footwear industry and is now an Episcopal priest with a senior position in the Diocese of New York. She and Donald shared more than 11% of their DNA, so were likely as close as first cousins. To connect the priest to her half nephew took me less than two hours: Donald was BuzzFeed News Editor-in-Chief Ben Smith.
The only real complication in Smith’s family tree was that his grandfather had married three times. But a family that includes several writers and a New York Court of Appeals judge — Smith’s father — lives in public view. All I needed was a newspaper obituary for the Episcopal priest’s mother to complete the family tree.
I wondered whether Keith Herman, whose family seemed not to include public figures, would be harder to find. His closest match — a 6.7% DNA overlap, likely closer than a second cousin — was an elderly man in Pebble Beach, California. Fortunately for me, not only was this man’s wife a keen genealogist, but so too were people on Keith’s branch of the family tree.
Using the trees they had already researched and posted online, I quickly discovered that the California man’s great-grandfather lived near the border between Georgia and Alabama and had fought in the Confederate army in the Civil War.
In 1939, a granddaughter of this Confederate soldier married an Irish immigrant called Mathew Honan. And they were the grandparents of BuzzFeed’s San Francisco bureau chief, Mat Honan. Within an hour of identifying the elderly man in California — Honan’s first cousin, once removed — I had cracked the case.
Both Smith and Honan were struck by how much I’d been able to find out about their families, so quickly. And as journalists who follow debates about online privacy, they were concerned about what this meant for their children.
“I realized that my family’s genealogical hobbies were exposing my kids’ identities, without their input or consent,” Smith said.
Snared by their ancestry: Ethan Dagostino and Casandra Reed
Ethan Dagostino at first looked like a tough nut to crack. Although he had a reasonably good match on GEDmatch — possibly as close as a second cousin — I struggled to build out a family tree beyond this one man and his wife, who had lived in both Colorado and Virginia.
When I used a tool on GEDmatch that explores where in the world someone’s DNA comes from, I could see that Ethan was quite different from the other people I’d tried to find.
While other volunteers in this project had clearly descended from Europeans, Ethan’s DNA came from Northeast Africa and the area around the Red Sea. And when I started googling Ethan’s closest matches, I found that their names were typically Sudanese.
At that point, I gave up trying to identify Ethan by genealogy, realizing that it would be next to impossible to trace his ancestors back to Sudan. Sudanese naming conventions also posed a problem, because there are no surnames carried down across generations. Instead, a man takes his father’s first name as his second name.
So I took a different approach, searching BuzzFeed’s staff directory for a man with a Sudanese name. There was just one: Elamin Abdelmahmoud, a curation editor in Toronto who writes the daily BuzzFeed News newsletter.
From an essay he wrote about naming his daughter, which described his pride in his Sudanese ancestry, I learned that Abdelmahmoud had emigrated to Canada when he was 12. Later, we discussed his ancestry, and Abdelmahmoud told me he’d been surprised to learn that he was 60% African and 40% Arab — he had assumed he had more Arab ancestry.
“In Sudan, there’s quite a bit of racism against people who are seen as more African than Arab,” he said. “And then my DNA comes back, and I’m more African than I am Arab. And like, look, it's a small difference, but that's not to say that it doesn’t mess with your identity a little bit.”
I fell back on the same approach — essentially racial profiling based on overall genetic ancestry and the names of the closest matches — for the profile labeled Casandra Reed. This was clearly a woman with East Asian ancestry. Her matches were too distant for me to make progress by building family trees, but I noticed typically Vietnamese names.
So again, I went through the BuzzFeed staff directory and looked for possible suspects. After excluding one colleague who I believe is half Vietnamese — which wouldn’t be consistent with the DNA profile — I was left with four possibilities.
If this had really been a criminal investigation, I’d have then further scrutinized these four suspects — looking at their social media postings, for instance, to see if any were elsewhere at the time of the crime.
And if that couldn’t narrow it down enough to get an arrest warrant, there would still be one good option to solve the case. Even after identifying a single prime suspect, cops need to confirm that the suspect’s DNA exactly matches the crime scene evidence. And in some cases, they’ve done this by picking up an item carrying the suspect’s DNA, such as the napkin discarded by a Minnesota man who in February was charged for a 1993 murder. To identify Casandra, it wouldn’t be too much of a stretch to tail each of my suspects and apply this approach.
That wasn’t an option in our experiment, so instead I made an educated guess. I knew that people in Europe are less keen on researching their family trees through DNA testing than those in the US. Given that Casandra had only distant matches, I chose one of my suspects whose family I knew had migrated from Vietnam to Germany.
Whether by luck or judgment, I guessed right: This was Lam Thuy Vo, a data reporter with BuzzFeed News in New York City.
Although Abdelmahmoud wasn’t too disturbed by the approach I’d taken, Vo was worried about the implications for civil liberties for people from minority groups if police did anything similar.
0 comments:
Post a Comment