The New York Times discovered a big AI training dataset contained links to its copyrighted content.
The media company also found its content in other AI training datasets, such as WebText.
The New York Times discovered that Common Crawl, one of the largest AI training datasets, contained millions of URLs linking to its paywalled articles and other copyrighted content.
The New York Times has found its paywalled articles and other copyrighted content in other popular AI training datasets.
It's unclear if The New York Times has managed to get its content removed from WebText and other AI training datasets.
Persons:
—, OpenAI's, Google's Infiniset, Charlie Stadtlander, Masterclass, Kelly, GAI
Organizations:
New York Times, Service, The New York Times, Foundation, US, Amazon, Yorker, The Times
Locations:
Originality.ai