Google's new large language model, which the company announced last week, uses almost five times as much training data as its predecessor from 2022, allowing its to perform more advanced coding, math and creative writing tasks, CNBC has learned.
PaLM 2, the company's new general-use large language model (LLM) that was unveiled at Google I/O, is trained on 3.6 trillion tokens, according to internal documentation viewed by CNBC.
Google's previous version of PaLM, which stands for Pathways Language Model, was released in 2022 and trained on 780 billion tokens.
PaLM 2, according to internal documents, is trained on 340 billion parameters, an indication of the complexity of the model.
The initial PaLM was trained on 540 billion parameters.