Leading tech companies like Google, Meta, OpenAI, Anthropic, and Microsoft are all scrambling to find new sources of data.
Part of the problem is that publishers are increasingly accusing these companies of hoovering up copyrighted data.
One solution is synthetic data, which is artificially generated rather than collected from the real world, and can easily be generated by machine learning algorithms.
OpenAI has considered synthetic data as an option to train its models, but CEO Sam Altman has raised concerns about producing quality data.
"As long as you can get over the synthetic data event horizon, where the model is smart enough to make good synthetic data, everything will be fine," Altman said at a tech conference in May 2023.
Persons:
—, Simon, Schuster, OpenAI, they'll, Mother Jones, Monika Bauerlein, Axel Springer, Sam Altman, Altman
Organizations:
Service, Google, Microsoft, Meta, Business, US Copyright, Investigative, Center, Author's, New York Times, Guild, Associated Press, The, Street, New, New York Post, Prisa Media, Le Monde, Financial Times
Locations:
New York, The