Pre-Training GPT-4.5 [video] | Svelte Hacker News

The only reason that I'm sharing this is because there is a gem at the end. From the transcript

44:26 its responses but it's incredible it is incredible related to that and sort of last question in some sense this whole 44:33 effort which was hugely expensive in terms of people and time and dollars and everything else was an experiment to 44:41 further validate that the scaling laws keep going and why and turns out they do and they 44:48 probably keep going for a long time um I accept scaling laws like I accept quantum mechanics or something but they 44:54 still don't like I still don't know why like why should that be a property of the universe so why are scaling laws a 45:01 property of the universe

you want I can I can take a stab well the the fact that more compression will lead to more 45:07 intelligence that has this very strong philosophical grounding so the question is why does training bigger models for 45:15 longer give you more compression and there are a lot of theories here 45:20 there's the one I like is that the the relevant concepts are sort of uh sparse 45:27 in the in the the data of the world and in particularly it's is a power law so 45:34 that the like the hundth uh most important concept appears in one out of 45:39 a hundred of the documents or or whatever so there's long tales does that mean that

if we make a perfect data set 45:44 and figure out very data efficient algorithms i mean can go home it it means that there's potentially 45:50 exponential compute wins on the table to be very s sophisticated about your choice of data but but basically when 45:59 you just scoop up data passively you're going to require 10xing your compute and your 46:07 data to to get the next constant number of things in that tail and there's just that tail keeps 46:14 going it's long you keep you can keep uh mining it although as you alluded to you 46:22 can probably do a lot better

i think that's a good place to leave it 46:28 thank you guys very much that was fun yeah thank you

belter 3 days ago

They argument of more compression makes the model smarter as argued in the last few minutes, will make you look good and stay in the good graces of Sam Altman, and something he can push to his infinite money backers.
But if intelligence were just about compression, then zip files would be "smarter" than humans. The reliance on scaling laws is a symptom that current AI is fundamentally not intelligent, it’s just very good at statistical mimicry. True AGI requires qualitative breakthroughs, not just quantitative scaling.
These researchers have fine-tuned their noses to ignore the stench of overhyped benchmarks