Late last week, a California-based AI artist who goes by the name Lapine discovered private medical record photos taken by her doctor in 2013 referenced in the LAION-5B image set, which is a scrape of publicly available images on the web. AI researchers download a subset of that data to train AI image synthesis models such as Stable Diffusion and Google Imagen.
Lapine discovered her medical photos on a site called Have I Been Trained, which lets artists see if their work is in the LAION-5B data set. Instead of doing a text search on the site, Lapine uploaded a recent photo of herself using the site's reverse image search feature. She was surprised to discover a set of two before-and-after medical photos of her face, which had only been authorized for private use by her doctor, as reflected in an authorization form Lapine tweeted and also provided to Ars.
đźš©My face is in the #LAION dataset. In 2013 a doctor photographed my face as part of clinical documentation. He died in 2018 and somehow that image ended up somewhere online and then ended up in the dataset- the image that I signed a consent form for my doctor- not for a dataset. pic.twitter.com/TrvjdZtyjD
— Lapine (@LapineDeLaTerre) September 16, 2022
Lapine has a genetic condition called Dyskeratosis Congenita. "It affects everything from my skin to my bones and teeth," Lapine told Ars Technica in an interview. "In 2013, I underwent a small set of procedures to restore facial contours after having been through so many rounds of mouth and jaw surgeries. These pictures are from my last set of procedures with this surgeon."
The surgeon who possessed the medical photos died of cancer in 2018, according to Lapine, and she suspects that they somehow left his practice's custody after that. "It’s the digital equivalent of receiving stolen property," says Lapine. "Someone stole the image from my deceased doctor’s files and it ended up somewhere online, and then it was scraped into this dataset."
Lapine prefers to conceal her identity for medical privacy reasons. With records and photos provided by Lapine, Ars confirmed that there are medical images of her referenced in the LAION data set. During our search for Lapine's photos, we also discovered thousands of similar patient medical record photos in the data set, each of which may have a similar questionable ethical or legal status, many of which have likely been integrated into popular image synthesis models that companies like Midjourney and Stability AI offer as a commercial service.