This is the important part. It's not guaranteed to be accurate. They claim it "delivers essentially the same correctness as the model it imitates -- sometimes even finer detail". But can you really trust that? Especially if each frame of the simulation derives from the previous one, then errors could compound.
It seems like a fantastic tool for quickly exploring hypotheses. But seems like once you find the result you want to publish, you'll still need the supercomputer to verify?
Has this been confirmed already? Seems like the 'laws' we know are just an approximation of reality. 2) if none external intervention has been detected it doesn't mean there was none.
Fine details. We are talking about NN model vs algorithm. Both are approximation, and in practice model can fill the gaps in data that algorithm cannon, or does not by default. Good example would be image scaling with in-painting for scratches and damaged parts.
Suppose you emulate a forward model y = F(x), by choosing a design X = {x1, ..., xN}, and making a training set T = {(x1, y1), ..., (xN, yN)}.
With T, you train an emulator G. You want to know how good y0hat = G(x0) is compared to y0 = F(x).
If there is a stochastic element to the forward model F, there will be noise in all of the y's, including in the training set, but also including y0! (Hopefully your noise has expectation 0.)
(This would be the case for a forward model that uses any kind of Monte Carlo under the hood.)
In this case, because the trained G(x0) is averaging over (say) all the nearby x's, you can see variance reduction in y0hat compared to y0. This, for example, would apply in a very direct way to G's that are kernel methods.
I have observed this in real emulation problems. If you're pushing for high accuracy, it's not even rare to see.
More speculatively, one can imagine settings in which (deterministic) model error, when averaged out over nearby training samples in computing y0hat, can be smaller than the single-point model error affecting y0. (For example, there are some errors in a deterministic lookup table buried in the forward model, and averaging nearby runs of F causes the errors to decrease.)
I have seen this claim credibly made, but verifying it is hard -- the minute you find the model error that explains this[*], the model will be fixed and the problem will go away.
[*] E.g., with a plot of y0hat overlaid on y0, and the people who maintain the forward model say "do you have y0 and y0hat labeled correctly?"
Physicists have been doing this sort of thing for a long time. Arguably they invented computers to do this sort of thing.
ENIAC (1945) wasn't assembled to for cryptography, nor was the Difference Engine (1820s) designed for that purpose.
Between these the Polish Bomba's (1938) were adapted from other designs to break Enigma codes but lacked features of general purpose computers like ENIAC.
Tommy Flowers' Colossus (1943–1945) was a rolling series of adaptions and upgrades purposed for cryptography but programmed via switches and plugs rather than a stored program and lacked ability to modify programs on the fly.
But for the interested, the Von Neumann became one of the lead developers on the ENIAC. The Von Neumann architecture is based on a writeup he did of the EDVAC. Von Neumann and Stanislaw Ulam worked out monte carlo simulations for the Manhattan project.
The first programmable electronic computer was developed at the same time as randomized physics simulations and with the same people playing leading roles.
Protein structure prediction is now considered to be "solved," but the way it was solved was not through physics applied to what is clearly a physics problem. Instead it was solved with lots of data, with protein language modeling, and with deep nets applied to contact maps (which are an old tool in the space), and some refinement at the end.
The end result is correct not because physics simulations are capable of doing the same thing and we could check Alphafold against it, but because we have thousands of solved crystal structures from decades of grueling lab work and electron density map reconstruction from thousands of people.
We still need that crystal structure to be sure of anything, but we can get really good first guesses with AlphaFold and the models that followed, and it has opened new avenues of research because a very very expensive certainty now has very very cheap mostly-right guesses.
When it comes to very complicated things, physics tends to fall down and we need to try non-physics modeling, and/or come up with non-physics abstraction.
"When things are complicated, if I just dream that it is not complicated and solve another problem than the one I have, I find a great solution!"
Joking apart, models that can help target potentially very interesting sub phase space much smaller than the original one, are incredibly useful, but fundamental understanding of the underlying principles, allowing to make very educated guesses on what can and cannot be ignored, usually wins against throwing everything at the wall...
And as you are pointing out, when the complex reality comes knocking in it usually is much much messier...
Everything is nuclear physics in the end, but trying to solve problems in, say, economics or psychology by solving a vast number of subatomic equations is only theoretically possible. Even in most of physics we have to round up and make abstractions.
In cases like this I'm always thinking of Reimann integrals and how I remember feeling my face scrunching up in distaste when they were first explained to us in class. It took a while for me to feel comfortable with the whole idea. I'm a very uh discrete kind of person.
As an aside, I consider the kind of work described in the article where a classic, symbolic system is essentially "compiled" into a neural net as one of the good forms of neuro-symbolic AI. Because it works and like I say there are important use cases where it beats just using the symbolic system.
Neuro-symbolic AI can often feel a bit like English cuisine where stuff is like bangers (i.e. sausages) and mash or a full English (breakfast), or a Sunday roast, where a bunch of disparate ingredients are prepared essentially independently and separately and then plonked on a plate all together. Most other cuisines don't work that way: you cook all the ingredients together and you get something bigger than the sum of the parts, a gestalt, if you like. Think e.g. of Greek gemista (tomatoes, bell peppers and occasionally zucchini and aubergines stuffed with rice) or French cassoulet (a bean stew with three different kinds of meat and a crusty top).
Lots of the neuro-symbolic stuff I've seen do it the English breakfast way: there's a neural net feeding its output to a symbolic system, rarely the other way around. But what the authors have done here, which others have also done, is to train a neural net on the output of a symbolic system, thereby basically "cooking" them together and getting the best of both worlds. Not yet a gestalt, as such, but close. Kind of like souvlaki with pitta (what the French call a "sandwich Grecque").
I'm unfortunately not (self?) important enough to have a newsletter. Thanks though, that's very sweet.
Emulators have existed in astrophysics long before ML became part of the zeitgeist. The earliest cosmology emulator paper that I'm aware of is from 2009 here: https://arxiv.org/abs/0902.0429. IIRC the method came from fluid dynamics. It just so happens that the interpolators used under the hood (clasically it was GPs but is NNs latelys) are also popular in ML, and so the method gets lumped into the ML pile.
The key difference between emulation and typical ML is that emulation is always in an interpolation setting, whereas typically predictive ML is extrapolating in one way or another (e.g. predicting future events).
Congrats to the authors!
While the work the authors do is important, in no sense does the tool they produced actually run a simulation.
A simulation implies a physical model and usually partial differential equations that are often solved on supercomputers, but here the neural network is rather interpolating some fixed simulation output in a purely data-driven way.
The simulations have not gotten faster due to neural networks, cosmologists have just gotten better at using them. Which is great!
Edit: see the sub-comment in the thread by crazygringo for the lead author’s take
More seriously though, https://www.404media.co/a-vast-cosmic-web-connects-the-universe-really-now-we-can-emulate-it/ is another nice article about this work. It has a bit less detail (i.e. less proper nouns in it), but is more readable for it IMO.
(It was a bit difficult to find by just scrolling)
[1] https://arxiv.org/pdf/2509.18480 [2] https://github.com/apple/ml-simplefold
It is such a great language, and capable of ML and AI workloads, as evidenced by this research.
I don't think Paramount would look kindly on giving it Majel Barret's voice, but it sure feels like talking to the computer on the holodeck.