New York CNN —More than a thousand images of child sexual abuse material were found in a massive public dataset used to train popular AI image-generating models, Stanford Internet Observatory researchers said in a study published earlier this week.
The presence of these images in the training data may make it easier for AI models to create new and realistic AI-generated images of child abuse content, or “deepfake” images of children being exploited.
The massive dataset that the Stanford researchers examined, known as LAION 5B, contains billions of images that have been scraped from the internet, including from social media and adult entertainment websites.
Of the more than five billion images in the dataset, the Stanford researchers said they identified at least 1,008 instances of child sexual abuse material.
“Stability AI models were trained on a filtered subset of that dataset.
Persons:
” LAION
Organizations:
New, New York CNN, Stanford Internet, Stanford, Internet Watch, National Center for, Canadian Centre for Child, CNN, Stability
Locations:
New York, London