DIRS.INFO The most relevant business & tech news

☆ The New York Times and other top news sites block OpenAI's new SearchGPT web crawling bot
▼ + stars: | 2024-08-02 | by ( Darius Rafieyan | ) www.businessinsider.com time to read: +5 min

The New York Times and at least 13 other news sites have blocked OAI-SearchBot. Part of the goal with new AI-powered search engines is to keep users around by showing them summaries. If publishers aren't seeing huge traffic from search engines anymore, why bother allowing their web crawling bots? The major holdout among publishers is The New York Times. "By providing Times content without The Times's permission or authorization, Defendants' tools undermine and damage The Times's relationship with its readers and deprive The Times of subscription, licensing, advertising, and affiliate revenue."

Persons: —, OpenAI, Jon Gillham, SearchGPT, Gillham, The New York Times Gillham, Axel Springer, Charlie Stadtlander, Darius Rafieyan Organizations: Service, New York Times, Business, Yorker, Vogue, The New York Times, Microsoft, OpenAI

☆ Meta's AI chatbot says it was trained on millions of YouTube videos. Meta didn't deny it but said the bot could be inaccurate.
▼ + stars: | 2024-06-04 | by ( Kali Hays | ) www.businessinsider.com time to read: +5 min

Read previewThe Meta AI chatbot is more willing to share what data it was trained on than Meta is. It expanded Meta AI in April as a chat and image generator function across all its apps, including Instagram and WhatsApp. Meta AI told Business Insider that it was trained on large datasets of transcriptions from YouTube videos. Meta AI initially said its training data included a third-party dataset of 3.7 million transcribed YouTube videos. In responding to further queries about its YouTube training data, Meta AI said its training data included another, larger dataset of transcriptions from 6 million YouTube videos also compiled by a third party.

Persons: —, hasn't, Meta, OpenAI, Meta AI's, We'll, Meta's chatbot, Google's GoogleBot, Kali Hays Organizations: Service, Meta, Facebook, Business, TED, YouTube, NBC News, CNN, Financial Times, US Locations: khays@businessinsider.com

☆ Top websites block Google from training AI models on their data. Nowhere near as much as OpenAI, though.
▼ + stars: | 2024-03-14 | by ( Hugh Langley | ) www.businessinsider.com time to read: +4 min

Google launched a new tool that lets publishers opt out of training Google's AI models. It turns out that all this content has been stored in datasets that are the foundation for training powerful AI models, including those from OpenAI, Google, Meta, and others. Part of Google's response has been to launch a new tool that lets websites block the company from using their content for training AI models. BI asked Originality.ai CEO Jonathan Gillham why Google-Extended is being used less than other AI training data-blockers. It's unclear if the company will launch this fully in the future, or how much different it will be from the traditional Google search engine.

Persons: —, There's, Robots.txt, Jonathan Gillham, Gillham, Axel Springer Organizations: Google, Service, New York Times, CNN, BBC, Business Locations: Chicago

☆ OpenAI offers a way for creators to opt out of AI training data. It's so onerous that one artist called it 'enraging.'
▼ + stars: | 2023-09-29 | by ( Kali Hays | ) www.businessinsider.com time to read: +5 min

Artists and image owners can now ask OpenAI to remove their images from DALL-E training data. OpenAI recently unveiled a new form that image owners and creators can use to request that owned or copyrighted images be removed from DALL-E training data. AI models need high quality, and human generated training data to perform well. "Enraging"Toby Bartlett, an artist with a namesake consulting firm, wrote on Threads that OpenAI's DALL-E opt-out process is "enraging." Or, as OpenAI put it, its model will have "learned from their training data" and be able to "retain the concepts that they learned."

Persons: —, OpenAI, Toby Bartlett, OpenAI's, Greg Madhere, He's, it's, we've, We've, Kali Hays Organizations: Service, Georgia O'Keeffe Museum, US Copyright, Twitter Locations: khays@insider.com, @hayskali

☆ OpenAI's GPTBot and other AI web crawlers are being blocked by even more companies now
▼ + stars: | 2023-09-28 | by ( Kali Hays | ) www.businessinsider.com time to read: +7 min

Unique, high quality data, mainly scraped from the web, is vital to the performance of AI models. AdvertisementAdvertisementMore and more companies are trying to avoid having their data freely scraped and saved by web crawlers working for the benefit of AI models. Last month, OpenAI last revealed its own crawler, GPTBot, saying it would respect robots.txt, a decades-old method through which a website can tell a web crawler to ignore it. Many more companies are now also blocking CCBot, a web crawler used by Common Crawl. AdvertisementAdvertisementSee below for a full list of the biggest websites now blocking GPTBot and CCBot as of Sept. 22:Blocking GPTBotamazon.comquora.comnytimes.comtheguardian.comshutterstock.comwikihow.comcnn.comsciencedirect.comusatoday.comhealthline.comstackexchange.comalamy.comscribd.comwebmd.combusinessinsider.comdictionary.comreuters.comwashingtonpost.commedicalnewstoday.comnpr.orgcbsnews.comgoodhousekeeping.comamazon.co.uktumblr.comlatimes.cominsider.comglassdoor.comvocabulary.cominvestopedia.comslideshare.netamazon.decosmopolitan.comnbcnews.comindiamart.comstackoverflow.comhindustantimes.combloomberg.comcnbc.compeople.comtvtropes.orgamazon.invimeo.comverywellhealth.comikea.comespn.comindianexpress.comthesaurus.compbs.org123rf.comwattpad.comvariety.comtoday.compopsugar.comthespruce.comuol.com.bramazon.frgeeksforgeeks.orgelle.comeconomictimes.compcmag.comtheverge.comallrecipes.comthoughtco.comrollingstone.comwired.comnextdoor.comhollywoodreporter.comabc.net.auew.comamazon.canews18.comwomenshealthmag.comrateyourmusic.comamazon.co.jptechradar.comairbnb.comndtv.comlifewire.comtomsguide.comvulture.comeverydayhealth.compolygon.comtheconversation.comesquire.comprnewswire.combillboard.commenshealth.commetro.co.ukcountryliving.commashable.comgamesradar.comthehindu.comtimesofindia.comdeadline.comharpersbazaar.commedscape.comnymag.comrefinery29.comradiotimes.comcbssports.comtandfonline.comtheatlantic.comtrulia.comamazon.espinterest.esnationalgeographic.combhg.comeater.comsouthernliving.comhealthgrades.comvice.compicclick.combustle.comnewyorker.comeonline.comdigitalspy.comopentable.compinterest.dethepioneerwoman.comcaranddriver.combyrdie.comlivemint.commedicinenet.comteacherspayteachers.comcookpad.comthespruceeats.combizjournals.compagesjaunes.frliputan6.comdelish.commasterclass.comarchiveofourown.orgvox.comrealsimple.comaarp.orgfrancetvinfo.frpinterest.frkumparan.comtheathletic.comtravelandleisure.comvogue.comlivescience.comapartments.commarketwatch.comglamour.comamazon.itcinemablend.comthrillist.comamazon.com.brpinterest.co.ukangi.comalamy.esusmagazine.comdistractify.combbcgoodfood.comjagran.commercadolibre.com.mxandroidauthority.comcity-data.comfoodandwine.comhellomagazine.comamazon.com.augq.comingles.comamarujala.comieee.orgprevention.comstern.dekbb.comedmunds.commarthastewart.compcgamer.comjustanswer.comhealth.com20minutes.frfortune.comhomes.comscientificamerican.compopularmechanics.comverywellfit.comvanityfair.comchicagotribune.comverywellmind.comhousebeautiful.comcntraveler.comallure.comspanishdict.comneverbounce.comanswers.commoneycontrol.comarchitecturaldigest.comslate.comlonelyplanet.cominverse.comcorriere.itactu.frself.comtripsavvy.cominstyle.comeatingwell.comsuperuser.comwelt.despiegel.dewomansday.comseventeen.comhbr.orgoprahdaily.comautotrader.combonappetit.comsueddeutsche.deseriouseats.comliveabout.comseattletimes.comcoursera.orglivehindustan.comfrance24.comtownandcountrymag.comdotesports.comworldplaces.mefaz.netteenvogue.commotor1.comnj.comglamourmagazine.co.ukokdiario.combrides.comstylecaster.comalamyimages.frjagranjosh.comtheglobeandmail.comaxios.comfrancebleu.frtabelog.comthebalancemoney.comnydailynews.comsheknows.comnaomedical.comverywellfamily.comBlocking CCBot

Persons: —, OpenAI, GPTbot, Conde Nast, Masterclass, Kelly, robots.txt, verywellhealth.com, indianexpress.com Organizations: Service, Amazon, Guardian, NPR, CBS News, CBS Sports, NBC News, CNBC, Yorker, Hearst, New York Times Locations: USA, Europe, Originality.ai, androidauthority.com

☆ The raw materials for creating AI
▼ + stars: | 2023-09-15 | by ( Alistair Barr | Kali Hays | ) www.businessinsider.com time to read: +5 min

The AI models behind this technology are built using high-quality datasets from millions of different sources. These are the raw materials for model "training," in industry parlance. Nvidia GPUs are the main hardware required for AI model training. AdvertisementAdvertisementOver 8,000 authors, including Margaret Atwood and James Patterson, signed an open letter demanding compensation from AI companies for using their works to train AI without permission. Got a tip or insights about the leading AI companies OpenAI, Google, Microsoft and Meta?

Persons: ChatGPT, Nat Friedman, Ben Thompson, Friedman, There's, OpenAI, Reddit, Sarah Silverman, Margaret Atwood, James Patterson, JK Rowling's Harry Potter, Alistair Barr Organizations: Service, Nvidia, Tech, Amazon, LexisNexis, Meta, Google, Microsoft, Twitter Locations: Wall, Silicon, abarr@insider.com

☆ AI is killing the grand bargain at the heart of the web. 'We're in a different world.'
▼ + stars: | 2023-08-30 | by ( Kali Hays | Alistair Barr | ) www.businessinsider.com time to read: +14 min

AdvertisementAdvertisementAI is undermining the web's grand bargain, and a decades-old handshake agreement is the only thing standing in the way. Now, though, generative AI and large language models are changing the mission of web crawlers radically and rapidly. Without a supply of potential consumers, there's little incentive for content creators to let web crawlers continue to suck up free data online. It's also open to manipulation, especially given the voracious appetite for quality AI data. Because robots.txt is voluntary, web crawlers can also simply ignore the blocking instructions and siphon the information from a site anyway.

Persons: Microsoft's Bing, Joost de Valk, It's, de Valk, Nick Vincent, Valk, OpenAI, robots.txt, Jason Schultz, Catherine Stihler, Archie, NYU's Schultz, Steven Sinofsky, who's, Andreessen Horowitz, De Valk, Stihler Organizations: Big Tech, Google, Wordpress, NYU's Technology, Policy Clinic, AWS, Creative Commons, Creative, Microsoft, Nvidia, Star Wars, DC Comics, Warner Brothers, Marvel, Disney, Atlantic, Meta Locations: CCBot, EleutherAI

☆ ChatGPT is becoming a certified cash cow with OpenAI on course to generate $1 billion in annual sales, report says
▼ + stars: | 2023-08-30 | by ( Hasan Chowdhury | ) www.businessinsider.com time to read: +3 min

ChatGPT is set to become a $1 billion sales cash cow for OpenAI. The Information cited a source saying OpenAI will soon hit $1 billion in annual sales. It's a sign that AI tools like ChatGPT can be lucrative as businesses drive demand. AdvertisementAdvertisementOpenAI's prized possession ChatGPT is helping propel the company towards $1 billion in annual revenue as the boom in AI demand from businesses drives a sales bonanza, according to a new report. Developers using the AI model at the heart of ChatGPT say it's getting dumber.

Persons: OpenAI, Carlyle, Similarweb, ChatGPT, hoover Organizations: Morning, Microsoft, Enterprise, ChatGPT, Amazon, The New York Times

☆ Hurricanes could drive gas prices, and inflation, higher
▼ + stars: | 2023-08-29 | by ( Nicole Goodkind | ) edition.cnn.com time to read: +7 min

That’s potentially bad news for gas prices. What’s happening: Gas prices are already at $3.82 a gallon. Geopolitical tensions have been supporting high oil and gas prices for some time. In 2005, for example, gas prices surged by 46% between Memorial Day and Labor Day because of the landfall of Hurricane Katrina, according to Bespoke. “Energy prices have been a major contributor to persistently high inflation in the US, so the crude oil price will remain a watch-out factor for future inflation.”High oil and gas prices are one of the largest contributing factors to inflation.

Persons: “ Idalia, ”, Louis Navellier, Andrew Woods, OpenAI, Catherine Thorbecke, Estee Lauder, CNN’s Gregory Wallace Organizations: CNN Business, Bell, New York CNN, Labor, Nasdaq Advisory Services Energy Team, Navellier, Investment, Citigroup, Day, Federal Reserve, “, Exxon Mobil, BP, Chevron, Fortune, CNN, The New York Times, Reuters, Disney, Bloomberg, The Washington Post, ABC News, ESPN, American Airlines, Airlines, Department of Transportation, Fort Worth Locations: New York, Florida, China, Russia, Saudi Arabia, Ukraine, The, Texas, Dallas, American

☆ Disney, The New York Times and CNN are among a dozen major media companies blocking access to ChatGPT as they wage a cold war on A.I.
▼ + stars: | 2023-08-28 | by ( Oliver Darcy | ) edition.cnn.com time to read: +5 min

The Guardian’s Ariel Bogle reported last week that CNN, The New York Times, and Reuters had blocked GPTBot. Publishers such as Condé Nast, Hearst, and Vox Media, which all house several prominent publications, have also taken the defensive measure. The deep archives and intellectual property rights of these news organizations are immensely valuable — arguably crucial — to training A.I. “I see a heightened sense of urgency when it comes to addressing the use, and misuse, of our content,” Coffey said. News organizations might feel they’re on solid legal ground, as Coffey told me, but there has yet to be any serious action taken against the OpenAI.

Persons: Ariel Bogle, Condé Nast, GPTBot, Danielle Coffey, Coffey, newsrooms “, ” Coffey, Barry Diller, OpenAI, “, ”, they’re Organizations: CNN —, CNN, The New York Times, Reuters, Disney, Bloomberg, The Washington Post, ABC News, ESPN, Hearst, Vox Media, News Media Alliance, Associated Press Locations: The, …

☆ Major websites like Amazon and the New York Times are increasingly blocking OpenAI's web crawler GPTBot
▼ + stars: | 2023-08-24 | by ( Kali Hays | ) www.businessinsider.com time to read: +3 min

The top 100 sites blocking GPTBot include bloomberg.com, scribd.com, and reuters.com, as well as insider.com and businessinsider.com. Among the top 1,000 sites blocking the bot are ikea.com, airbnb.com, nextdoor.com, nymag.com, theatlantic.com, axios.com, usmagazine.com, lonelyplanet.com, and coursera.org. AdvertisementAdvertisement"GPTBot launched 14 days ago and the percentage of Top 1,000 sites blocking it has been steadily increasing," the analysis said. How these websites block GPTBot is relatively simple, even crude, depending on your perspective. When revealing the crawler, OpenAI said it would abide by robots.txt and GPTBot would not crawl websites that deploy it.

Persons: OpenAI, GPTBot, robots.txt, Stephen King, ChatGPT Organizations: Reuters, Amazon, The New York Times Locations: ChatGPT, robots.txt

☆ OpenAI just admitted it has a bot that crawls the web to collect AI training data. If you don't block GPTbot, that's self-sabotage.
▼ + stars: | 2023-08-08 | by ( Alistair Barr | ) www.businessinsider.com time to read: +8 min

Some of these bots have been helpful because they send users to sources of original content online. The most active one is probably Googlebot, which automatically collects web information so Google can later rank and serve it up in Search results. It's called GPTbot and it's being used to scrape and collect online content for AI model training. So what is Clarke's advice for other online content creators when it comes to GPTbot? What is the incentive that OpenAI offers to have these content creators allow GPTbot to crawl and scrape their sites?

Persons: OpenAI, Prasad Dhumal, Neil Clarke, Clarkesworld, Clarke, I've, hasn't Organizations: Morning, Twitter, OpenAI, Associated Press

☆ Adding one line of code can now prevent OpenAI from accessing a website's data to train ChatGPT
▼ + stars: | 2023-08-08 | by ( Kai Xiang Teo | ) www.businessinsider.com time to read: +3 min

OpenAI launched a new web crawler called GPTBot to browse the internet and collect information. However, adding one line of code to a website will block the crawler from accessing the site's data. Adding just one line of code to a website will now block OpenAI from using the site's data to train its AI models. A web crawler is a bot that browses the internet to collect information. Search engines like Google use web crawlers to collect information for their search results, while AI companies use these crawlers to collect data to train their models.

Persons: OpenAI, Michael Veale, ChatGPT —, James Patterson, Margaret Atwood — Organizations: Morning, University College London, MIT Technology, OpenAI

Search resuls for: "GPTbot"

13 mentions found