A Blog by Jonathan Low

 

Apr 10, 2019

How A Reporter Identified Colleagues Using Publicly Available Genetic Genealogy

As more people use genetic geneology sites like 23andMe, and those who are interested cross reference that data with voluntary postings on social media, the ease of finding people - whether they want to be found or not - grows exponentially.

The question for society, as usual at the cross section of data and tech, is whether the public cares enough to resist the blandishments and power of those who are benefitting from the information generated. JL


Peter Aldhous reports in Buzzfeed:

It’s a numbers game: The more people who put their DNA into databases that can be search(ed), the greater the chance (researchers) will find matches. As I discovered, if you find relatives closer than second cousins, you can identify your target within hours. But from third cousins or further out, it can be a long haul. My 60% clearance rate is an overestimate of how easy it is to solve cases through genetic genealogy. Unease I had about genetic racial profiling came not when I was examining matching DNA profiles, but when I pored through people’s Facebook timelines, trying to discover something about their family relationships.
A year ago, when cops captured Joseph James DeAngelo, the suspected Golden State Killer, the world woke up to the power of genetic genealogy. DeAngelo was identified because DNA he left at the scene of a 1980 double murder partially matched the profiles that a few of his distant relatives had uploaded to a public website to research their family history. Based on those matches, a team of detectives drew up family trees that eventually led them to DeAngelo, suspected of at least 13 murders and more than 50 rapes.
In the year since, more than 50 other criminal cases have been cracked using similar methods, launching a new forensic science industry. One estimate has suggested that more than half of the US population could be found in this way — although genealogists have warned that, in practice, complications like adoptions or misunderstandings over who is the biological father of a child can throw an investigator off track.
How hard is it to crack cases in this way? And what issues does it raise, as police recruit genealogists to help them solve crimes by sifting through the perpetrators’ extended family trees?
To explore these questions, my editor Virginia Hughes and I conjured up an experiment: She would recruit BuzzFeed employees to play the role of “suspects” and get their DNA tested with a company used by genealogy enthusiasts. She’d then download their DNA profiles, containing data on hundreds of thousands of genetic markers, and send the files to me labeled with randomly chosen fake names. Then I’d play genealogy “detective” and try to figure out who they really were.In the end, I identified 6 out of our 10 volunteers. Four of those cases I solved by tracking them down through their relatives’ family trees, much as the cops did with DeAngelo. In a twist I didn’t anticipate, I found two more not through their relatives, but simply because their ancestry indicated that their family came from a specific country — raising uncomfortable questions about genetic racial profiling.
Investigation basics
Before I started this project, I was a novice at genealogy. So I sat down for a couple of hours of training with an expert, Leah Larkin, a genealogist in Livermore, California, who helps adoptees find their biological relatives and who blogs as the DNA Geek.
Larkin taught me how to look for partial DNA matches on GEDmatch, the free, public website used to find DeAngelo’s distant relatives. She also introduced me to another online tool that takes the amount of DNA shared by two people — a number provided by GEDmatch — and tells
tells you the possible family relationships that might explain the match. (A child typically shares about 50% of their DNA with their mother, for example, while two first cousins have about 12.5% shared DNA.) And Larkin showed me how she uses genealogical websites, public records, and social media posts to piece together a person’s family tree.
Then I got down to work. Even with a pool of suspects limited to the 1,200 or so people who work for BuzzFeed, I was prepared for a long, hard slog, especially for anyone whose matches were more distant than second cousins. For the Golden State Killer, the closest matches were third or fourth cousins. It took a team of five people, including the genealogist Barbara Rae-Venter, more than four months to identify DeAngelo as their prime suspect. In total, I spent about 60 hours trying to identify our 10 suspects. In the most frustrating case, I gave up after 30 hours. The easiest took me less than an hour.

Suspects still at large: Marina Paniagua, Rico Ray, and Katelyn Kim

The first case tested my stamina. For the profile labeled Marina Paniagua, I found two DNA matches — one that shared 1.25% of their DNA with Marina, and another about 1% — that seemed worth pursuing. For each I was able to find a family tree online that served as a starting point for my own research. Using the tools available with a subscription at Ancestry, which cost me $49 per month, I extended these trees to try and find my target.
Each match looked like they could be Marina’s third cousin. If so, I would have to go back four generations to their great-great-grandparents to find common ancestors with Marina. Then I would follow the possible branches back down four generations until I found someone who works for BuzzFeed.
The trouble is that there are many, many potential paths up and down four generations in an extended family tree. Here’s some back-of-the-envelope math: Marina would have eight sets of great-great-grandparents, so if everyone descended from them had three children down four generations, Marina would be just one of 648 (that’s 8 x 34) third cousins.
It would have been easier if Marina’s two matches were related to each other — then I could have looked for where their family trees intersected to narrow down the range of possible branches to explore. But they weren’t, which means that one was probably from her mother’s side of the family, the other from her father’s.

After more than 30 hours of work, I’d traced one of Marina’s matches back to great-great-grandparents in southern Sweden, and another to ancestors in central Ireland. But despite exploring many descending branches back down to the present day, I’d failed to find anyone whose name was in the BuzzFeed employee directory — and there were dozens of possible branches still to explore.
It was time to give up.
When my target, Sarah Mimms, a politics editor with BuzzFeed News in Washington, DC, revealed her identity, I learned of another reason why her case would be difficult to solve. “My mother is adopted,” Mimms said. So even if I’d found one of her maternal biological grandparents, I probably would have lost the trail at that point, because public records would not link them to Mimms’ mother.
Mimms’ case was informative, however, because matches to third or fourth cousins, roughly the same as in her case, were what Venter and the detectives hunting the Golden State Killer were working with.
Viewed against the gargantuan effort needed to crack the Golden State Killer case, my fruitless week of genealogical research didn’t seem so bad. And it provided a yardstick to judge when a case would be too time-consuming to solve. So when I subsequently examined two DNA profiles with similarly distant matches, I quickly conceded defeat.
Rico Ray was a man with Western European ancestry whose closest match — again a possible third cousin — was a police officer in upstate New York. Rico was Brandon Hardin, a curation editor for BuzzFeed News in New York City.

The matches to the profile labeled Katelyn Kim were even less promising, but her ancestry was an interesting mix of Eastern, Southern, and Western European.
“I’m a European mutt,” BuzzFeed News managing editor Maggie Schultz said, when she later revealed her identity.
Giving up on finding Marina, Rico, and Katelyn might seem defeatist, but it’s the same sort of triage that the companies offering genealogy services to the cops are now doing. “That’s exactly how they’re operating,” said genealogist Larkin, when I told her about my strategy.
Indeed, Bode Technology, which launched its forensic genealogy service in February, is focusing on cases with second-cousin matches or closer, according to its director of sales and marketing, Andrew Singer.
Parabon NanoLabs, which has been using genealogy to help cops make arrests since May 2018, doesn’t use a simple cutoff. But after providing an initial assessment of how hard a case will be to solve, Parabon charges $3,500 for up to 15 hours of genealogical research. So similarly, its genealogists aren’t laboring for months to solve difficult cases.

Revealed by Facebook: Cheyenne Griggs

My first success owed a lot to Facebook. Cheyenne Griggs’ closest match was possibly a half second cousin or a second cousin once removed. But this profile had been uploaded by someone with a common name — Judith Johnson — and an email that I couldn’t track down on a simple Google search. That made her hard to identify without other clues.

To get a foothold onto Cheyenne’s family tree, I turned to Judith’s closest match, a woman in Illinois who seemed like she might be Judith’s half sister. From posts on her Facebook profile, I worked out that the Illinois woman was adopted and had recently discovered members of her biological family. One elderly woman was tagged in a profile picture of a family get-together, so I went to her Facebook profile and found she was friends with a Judith Johnson in Albuquerque, New Mexico. And when I looked at Judith’s Facebook friend list, a name that I recognized jumped out: Drusilla Moorhouse, deputy copy chief for BuzzFeed News, based in our Los Angeles bureau.
It was too much of a coincidence for Moorhouse not to be my target, but I still had to confirm the connection. That’s when I discovered from public records that Moorhouse had grown up as Jennifer Johnson. She was from New Mexico, and her mother’s name was Judith. LexisNexis, a database I often consult to track down sources for my reporting, also linked the email Judith Johnson had used on GEDmatch to Moorhouse’s mother.
But Cheyenne’s closest match on GEDmatch was way too distant to be her mother. Moorhouse’s mom, I realized, must have uploaded the DNA of another family member.
Moorhouse, a true crime fan who volunteered for our project because she was fascinated by the Golden State Killer investigation, was surprised that I’d been able to learn these details about her family — particularly because she had kept her own Facebook friend list hidden. But because her relatives had theirs in open view, that was no obstacle to tracking her down.
Still, she was convinced that the benefits of this new approach to police work outweigh any privacy concerns.
“I think it's important to remember that in most of the cases being solved through genetic genealogy, the victims are women who have been sexually assaulted,” she said. “I hope that long-forgotten rape kits will be taken off the shelves and reexamined.”

Solved within hours: Donald Adkins and Keith Herman

After cracking Cheyenne’s case, my editor sent me a new batch of DNA profiles. As soon as I uploaded them to GEDmatch, two stood out for having reasonably close relatives on GEDmatch.
For Donald Adkins, the best match was a woman who once worked in the fashion footwear industry and is now an Episcopal priest with a senior position in the Diocese of New York. She and Donald shared more than 11% of their DNA, so were likely as close as first cousins. To connect the priest to her half nephew took me less than two hours: Donald was BuzzFeed News Editor-in-Chief Ben Smith.
The only real complication in Smith’s family tree was that his grandfather had married three times. But a family that includes several writers and a New York Court of Appeals judge — Smith’s father — lives in public view. All I needed was a newspaper obituary for the Episcopal priest’s mother to complete the family tree.

I wondered whether Keith Herman, whose family seemed not to include public figures, would be harder to find. His closest match — a 6.7% DNA overlap, likely closer than a second cousin — was an elderly man in Pebble Beach, California. Fortunately for me, not only was this man’s wife a keen genealogist, but so too were people on Keith’s branch of the family tree.
Using the trees they had already researched and posted online, I quickly discovered that the California man’s great-grandfather lived near the border between Georgia and Alabama and had fought in the Confederate army in the Civil War.
In 1939, a granddaughter of this Confederate soldier married an Irish immigrant called Mathew Honan. And they were the grandparents of BuzzFeed’s San Francisco bureau chief, Mat Honan. Within an hour of identifying the elderly man in California — Honan’s first cousin, once removed — I had cracked the case.
Both Smith and Honan were struck by how much I’d been able to find out about their families, so quickly. And as journalists who follow debates about online privacy, they were concerned about what this meant for their children.
“I realized that my family’s genealogical hobbies were exposing my kids’ identities, without their input or consent,” Smith said.

Snared by their ancestry: Ethan Dagostino and Casandra Reed

Ethan Dagostino at first looked like a tough nut to crack. Although he had a reasonably good match on GEDmatch — possibly as close as a second cousin — I struggled to build out a family tree beyond this one man and his wife, who had lived in both Colorado and Virginia.
When I used a tool on GEDmatch that explores where in the world someone’s DNA comes from, I could see that Ethan was quite different from the other people I’d tried to find.
While other volunteers in this project had clearly descended from Europeans, Ethan’s DNA came from Northeast Africa and the area around the Red Sea. And when I started googling Ethan’s closest matches, I found that their names were typically Sudanese.
At that point, I gave up trying to identify Ethan by genealogy, realizing that it would be next to impossible to trace his ancestors back to Sudan. Sudanese naming conventions also posed a problem, because there are no surnames carried down across generations. Instead, a man takes his father’s first name as his second name.
So I took a different approach, searching BuzzFeed’s staff directory for a man with a Sudanese name. There was just one: Elamin Abdelmahmoud, a curation editor in Toronto who writes the daily BuzzFeed News newsletter.

From an essay he wrote about naming his daughter, which described his pride in his Sudanese ancestry, I learned that Abdelmahmoud had emigrated to Canada when he was 12. Later, we discussed his ancestry, and Abdelmahmoud told me he’d been surprised to learn that he was 60% African and 40% Arab — he had assumed he had more Arab ancestry.
“In Sudan, there’s quite a bit of racism against people who are seen as more African than Arab,” he said. “And then my DNA comes back, and I’m more African than I am Arab. And like, look, it's a small difference, but that's not to say that it doesn’t mess with your identity a little bit.”
I fell back on the same approach — essentially racial profiling based on overall genetic ancestry and the names of the closest matches — for the profile labeled Casandra Reed. This was clearly a woman with East Asian ancestry. Her matches were too distant for me to make progress by building family trees, but I noticed typically Vietnamese names.
So again, I went through the BuzzFeed staff directory and looked for possible suspects. After excluding one colleague who I believe is half Vietnamese — which wouldn’t be consistent with the DNA profile — I was left with four possibilities.

If this had really been a criminal investigation, I’d have then further scrutinized these four suspects — looking at their social media postings, for instance, to see if any were elsewhere at the time of the crime.
And if that couldn’t narrow it down enough to get an arrest warrant, there would still be one good option to solve the case. Even after identifying a single prime suspect, cops need to confirm that the suspect’s DNA exactly matches the crime scene evidence. And in some cases, they’ve done this by picking up an item carrying the suspect’s DNA, such as the napkin discarded by a Minnesota man who in February was charged for a 1993 murder. To identify Casandra, it wouldn’t be too much of a stretch to tail each of my suspects and apply this approach.
That wasn’t an option in our experiment, so instead I made an educated guess. I knew that people in Europe are less keen on researching their family trees through DNA testing than those in the US. Given that Casandra had only distant matches, I chose one of my suspects whose family I knew had migrated from Vietnam to Germany.
Whether by luck or judgment, I guessed right: This was Lam Thuy Vo, a data reporter with BuzzFeed News in New York City.
Although Abdelmahmoud wasn’t too disturbed by the approach I’d taken, Vo was worried about the implications for civil liberties for people from minority groups if police did anything similar.
“It creates space for inherent bias to make snap decisions that may not always be in favor of minorities,” she said, adding that she intended to delete her account at FamilyTreeDNA, the company we used to test our volunteers’ DNA.
“I don’t trust for-profit companies to protect my privacy,” Vo said.
In a real criminal investigation, distinctive genetic ancestry would be less informative than it was in our experiment. I knew that my pool of potential suspects was limited to the 1,200 or so people who work for BuzzFeed. But it’s rare that a criminal case would be so tightly constrained. “I wish we had a limited pool of 1,200 for these cases!” CeCe Moore, Parabon’s lead genealogist, told me by email.
Moore also pointed out that broad genetic ancestry sometimes exonerates members of minority groups. “It can help to eliminate everyone else as a suspect who does not belong to the identified population group of origin,” she said. “That may be its greatest utility.”
What’s clear is that detectives hunting down a violent offender will use any information they can. As they closed in on DeAngelo, for instance, Venter’s team knew they were looking for a man with Italian ancestry. They also reportedly used a tool called Promethease, which can predict certain traits from a DNA profile, to learn that he likely had blue eyes.
Parabon has taken this approach even further: As well as offering a genealogy service, it makes facial reconstructions from crime scene DNA, predicting skin tone, bone structure, hair and eye color, and more.

Another twist in the tale of genetic genealogy is that the typical racial biases in the US criminal justice system can be turned on their head. Because white people have made proportionately greater use of DNA testing services, they are more likely to have close relatives in public databases than do people with African ancestry.
In a study published last year, a team led by Yaniv Erlich, chief scientist with the DNA testing company MyHeritage, based in Israel but with customers around the world, found that people with a mainly North European ancestry were 30% more likely to have a third cousin or closer in the company’s database than someone whose ancestry is mostly African.

Protected by his ancestry: Lucas Featherstone

While Abdelmahmoud and Vo were exposed by their racial ancestry, another volunteer was obscured by his. When I looked at the matches in GEDmatch for the profile labeled Lucas Featherstone, I was initially encouraged by a single match that looked like a potential second cousin.
But then I counted more than 75 matches that appeared to be third cousins, or similarly close relatives. This was starkly different from all of the other volunteers, none of whom had more than a handful of matches this close.

Larkin had warned me about cases like this. Her family has roots in the Cajun population of Louisiana. And for groups like the Cajuns and Ashkenazi Jews, where there has been a long history of marrying within a small community, everyone is more genetically similar to one another than in other populations. This meant that Lucas’s abundance of supposed third cousins were probably not that closely related at all.
From the names of Lucas’s matches, and in one case the title of “Rabbi,” I could see that this was an Ashkenazi man — that part of the Jewish diaspora that initially migrated to Eastern Europe.
I had to admit defeat. I made an incorrect guess, failing to identify Ken Bensinger, an investigative reporter with BuzzFeed News based in Los Angeles.
“One of my big takeaways is that while DNA can help tell a story about a person, it clearly can also provide information that can cloud issues as much as it resolves them,” Bensinger said. “The idea that there is a kind of safety in numbers, a sort of herd immunity from being tracked down, is fascinating.”

Betrayed by a second database: Lindsey Drury

That left just one more case. I could see from her overall ancestry that Lindsey Drury was a black woman. GEDmatch connected her with a woman who shared about 2.25% of her DNA.

This woman is a realtor in Seattle who is also a semiprofessional singer. But as I started to research her parents, I realized they were white, while she is not — suggesting she was adopted. To try to find her biological parents, I read through years of her postings on Facebook, and researched her closest matches in GEDmatch. Her biological parents, I discovered, were a white man and a black woman.
Knowing that my suspect was black, I focused on that side of the Seattle woman’s family. Her grandmother was married several times, but none of her descendants led me to any of my colleagues. The Seattle woman’s grandfather, however, remained a mystery. I knew his name from a 1947 Texas marriage license, but it was a common one, and I couldn’t trace his other descendants.
Just before I hit a wall with Lindsey’s case, however, BuzzFeed News revealed that the FBI was using a second database to find partial matches to crime scene DNA samples. That was the database run by FamilyTreeDNA, the same company we’d used to test our volunteers’ DNA. So I decided to see if it offered any better clues than those I’d had obtained from using GEDmatch.
For the other volunteers where I’d gotten stuck, this second database didn’t help much. But for Lindsey, it cracked the case wide open. On FamilyTreeDNA, her closest match shared so much DNA that she could only be one person: Lindsey’s mother. I quickly discovered that Lindsey was Farryn Lewis, formerly BuzzFeed’s San Francisco office manager. Just a week before, she had transferred to BuzzFeed’s Los Angeles bureau.
Lewis’s case may have solved another mystery. We’re still trying to work out the details, but it’s possible that the man in Texas named on that 1947 marriage license, or a close relative of his, is Lewis’s biological grandfather — someone whose identity she hadn’t known before.

The Future

The transformation of Lindsey’s case by the addition of a second database points to the future of investigative genealogy. It’s a numbers game: The more people who put their DNA into databases that police can search, the greater the chance that cops will find close matches, and the easier crimes will become to solve.
As I discovered, if you find relatives closer than second cousins, with luck you can identify your target within hours. But from third cousins or further out, it can be a long haul.
My 60% clearance rate is probably an overestimate of how easy it is to solve cases through genetic genealogy, because I knew my suspects came from our limited pool of around 1,200 BuzzFeed employees.
Parabon’s experience may offer a better guide. As of the start of April, it had assessed 209 cases, deciding that 137 of them could potentially be solved through genealogy. So far, it has cracked 46 cases, with new ones being solved on a roughly weekly basis.
The success rate will only improve as DNA databases grow in size. Right now, GEDmatch contains about 1.2 million DNA profiles; FamilyTreeDNA has another million or so. What cops would dearly love is to have access to the huge and fast-growing databases operated by the industry’s really big players: 23andMe’s database contains some 10 million DNA profiles, and Ancestry’s about 15 million.
Those companies have so far maintained a firm line on their customers’ privacy, however, vowing that they won’t open them for searching by the police.
There were times working on this project that I felt like a cyberstalker. But aside from the unease I had about genetic racial profiling, those moments mostly came not when I was examining matching DNA profiles, but when I pored through people’s Facebook timelines, trying to discover something about their family relationships. I happen to be adopted myself, and looking for posts in which another adoptee revealed their biological relationships — sometimes in a subtle way, probably only intended as a message to people very close to them — felt intrusive.
That’s the thing about digital privacy. With any one disclosure, we think we’re pulling back the curtain on our private lives just a little. But put all of the pieces together — especially if one of those pieces is a relative’s genetic code — and suddenly there’s a searchlight shining up and down your family tree.
As long as this power is used to find killers and rapists, it seems that most people will support it. It’s certainly hard to counter Moorhouse’s argument in favor of cops solving heinous crimes.
“Solving these cases is so important to so many people,” Curtis Rogers, one of the founders of GEDmatch, told me by email. “The reports we get of the tremendous relief each capture brings to victims, families, and others, are truly heart-warming.”
Right now, the terms and conditions of both GEDmatch and FamilyTreeDNA limit acceptable uses of their databases by law enforcement to identifying human remains and finding the perpetrators of serious violent crimes — homicides and sexual assaults (FamilyTreeDNA also includes abductions).
“We can’t apply genetic genealogy to every case brought to us,” Parabon CEO Steven Armentrout told BuzzFeed News.
Already, two cases involving abandoned babies investigated with these methods are raising ethical questions among genealogy enthusiasts. “We’ve had very polarized discussions in our genetic genealogy community,” Debbie Kennett, an honorary research associate at University College London, told BuzzFeed News.
Last month, a woman was charged in South Dakota with first-degree murder in the death of her infant son, who was abandoned in a ditch in 1981. And on April 4, a woman in Greenville, South Carolina, was charged with homicide by child abuse for the death of a baby girl who was found in 1990 in a cardboard box.
In some jurisdictions, such crimes may not be considered as murders. The UK, for example, has a separate offense of infanticide, which can be charged when a woman’s “balance of mind is disturbed” after giving birth. It is treated more leniently: Women convicted are typically not imprisoned, and may instead be hospitalized for mental health treatment.
As DNA databases grow and costs come down, it may become tempting for law enforcement to extend the approach to less serious offenses. I hope my experience helps you understand what’s involved, so that we can all decide where to draw the line.●




0 comments:

Post a Comment