DIRS.INFO The most relevant business & tech news

☆ The New York Times got its content removed from one of the biggest AI training datasets. Here's how it did it.
▼ + stars: | 2023-11-08 | by ( Alistair Barr | Kali Hays | ) www.businessinsider.com time to read: +4 min

The New York Times discovered a big AI training dataset contained links to its copyrighted content. The media company also found its content in other AI training datasets, such as WebText. The New York Times discovered that Common Crawl, one of the largest AI training datasets, contained millions of URLs linking to its paywalled articles and other copyrighted content. The New York Times has found its paywalled articles and other copyrighted content in other popular AI training datasets. It's unclear if The New York Times has managed to get its content removed from WebText and other AI training datasets.

Persons: —, OpenAI's, Google's Infiniset, Charlie Stadtlander, Masterclass, Kelly, GAI Organizations: New York Times, Service, The New York Times, Foundation, US, Amazon, Yorker, The Times Locations: Originality.ai

Search resuls for: "Google's Infiniset"

1 mentions found