Lotus Reader - The Hackernews Client

Lotus Reader

If you click (or tap) on the name of a parent in a discussion, you will be directed to the comment that the parent wrote.

daveguy

15 days ago

In reply toSeven replies to the viral Apple reasoning paper and why they fall shortby spwestwood

LLMs are trained with hundreds of terabytes of data to a few petabyte at most. You are off by 3 to 6 orders of magnitude in your estimate of training data. They aren't literally trained on "all the data of the internet". That would be a divergent nightmare. Catastrophic forgetting is still a problem with neural networks and ML algorithms in general. Humans are probably trained on less than half an exabyte of data given the ~1Gbps of sensory data we receive in a lifetime. That's still ~20 petabytes of data by age 5. A 400B parameter LLM with 100 examples per parameter would equal about 640 TB (F16 parameters) of training data. That's the order of magnitude of current models.