Blackwell: Nvidia's GPU
pella
17 hours ago
102
28
https://chipsandcheese.com/p/blackwell-nvidias-massive-gpu
ksec13 hours ago
I heard there is still trouble to buy consumer grade Nvidia GPU. At this point I am wondering if it is Gaming market demand, AI, or simply a supply issue.

On another note I am waiting for Nvidia's entry to CPU. At some point down the line I expect the CPU will be less important, ( relatively speaking ) and Nvidia could afford to throw a CPU in the system as bonus. Especially when we are expecting ARM X930 to rival Apple's M4 in terms of IPC. CPU design has become somewhat of a commodity.

Incipientksec13 hours ago
My understanding is it's the AI demand and willingness to pay crazy money for wafer that makes consumer GPUs a significantly less attractive product to produce.

I don't have really solid evidence, just semi-anecdotal/semi-reliable internet posts:

Eg. https://www.tomshardware.com/tech-industry/more-than-251-million-gpus-shipped-in-2024-according-to-new-research

Nvidia as a whole has been fairly anti-consumer recently with pricing, so I wouldn't be banking on them for a great cpu option. Weirdly Intel is in the position where they have to prove themselves, so hopefully they'll give us some great products in the next 2-5 years - if they survive (think the old lead-up-to-ryzen era for amd)

KronisLVIncipient9 hours ago
> Nvidia as a whole has been fairly anti-consumer recently with pricing, so I wouldn't be banking on them for a great cpu option.

If they’re swimming in the AI cash and the consumer GPU segment isn’t that important (https://www.visualcapitalist.com/nvidia-revenue-by-product-line/) then why on earth couldn’t they do less price gouging?

It feels a bit like the Intel Core Ultra desktop CPU launch where the prices were the critical factor that doomed an otherwise pretty okay product. At least Intel's excuse is that they’re closer to going under than before, even if their GPUs were pretty fairly priced anyways.

It’s almost like everyone complains about their prices and the fact that they’re releasing 8 GB cards… and then still go and give them money anyways.

p_lIncipient7 hours ago
The same chip as the "proper"[1] 5090 is also used for workstation and some server cards, which go for easy higher price. So it's just an allocation of child to different products, taking into account that with the power demands and design issues in 5090s power supply there isn't all that much demand for 5090 either.

[1] there are now 5090 branded cards that use same chip as 5080

jonas21ksec12 hours ago
wtallisjonas2112 hours ago
Their Grace datacenter CPU is basically a chip where they put down all the LPDDR5 memory controllers (albeit curiously slow), NVLINK and PCIe IOs they needed around the perimeter, and then filled in the interior with boring off the shelf ARM cores. It's basically an IO and memory expander that happens to run Linux.

GB10 when it ships might be more interesting, since it'll go into systems that need to support use cases other than merely feeding a big GPU ML workloads. But it sounds like the CPU chiplet at least was more or less outsourced to Mediatek.

xl-brainksec11 hours ago
The micro center in my neighborhood has hundreds of 5090s in stock. I'm not sure its as hard as it used to be.
enqkksec8 hours ago
I keep wondering if the yields have gone all bad with the newer processes
magicalhippoksec7 hours ago
The whole missing ROPs saga[1][2] didn't help. I bought a 5070 Ti and had to return it due to missing ROPs. Had to get another brand as replacement, as they had so little stock.

[1]: https://gamersnexus.net/gpus/investigating-nvidias-defective-gpus-rtx-5080-missing-rops-benchmarks

[2]: https://nvidia.custhelp.com/app/answers/detail/a_id/5628/~/how-to-check-the-number-of-rops-in-your-geforce-rtx-5090%2F5090d%2F5080%2F5070-ti-gpus

CalChris11 hours ago
The Nvidia technical brief says 208 billion transistors.

https://resources.nvidia.com/en-us-blackwell-architecture

Blackwell uses the TSMC 4NP process. It has two layers. A very back of the envelope estimate:

  750mm^2 / (208/2) * 10^9 = 7211 nm^2
  85 nm x 85 nm
NB: process feature size does not equal transistor size. Process feature size doesn't even equal process feature size.
dist-epochCalChris11 hours ago
You also need space for wires, ..., etc, right? It's not just transistors.
CalChrisdist-epoch11 hours ago
The wires didn't fit on the back of the envelope.
a_wild_dandanCalChris10 hours ago
I love this retort and I'm stealing it.
ameliusdist-epoch10 hours ago
The wires run over the transistors.
gchadwickdist-epoch9 hours ago
The wires sit on top of the transistors. Many layers of them in a modern process.

However you can't always pack the transistors as dense as you would like because you can't fit the wiring for them in above at the same density.

Plus there are various 'design rules' that constrain how things get placed. These are needed to ensure manufacturing is successful and achieved good yield. An important set of rules are the 'antenna rules' that requires the insertion of antenna diodes (using silicon reducing transistor density) to prevent circuitry being destroyed during manufacturing: https://www.zerotoasiccourse.com/terminology/antenna-report/

gchadwickCalChris9 hours ago
> It has two layers

Where did you get that from? Pretty sure it's a single planar set of transistors. Those transistors are manufactured using multiple layers of mask.

FinFET transistors are described as 3D or non-planar but crucially this isn't allowing transistor on transistor stacking you've just got the gate structure of the FinFET poking out above the plane of the rest of the transistors.

Silicon on silicon die stacking is a possibility but limits your power and GPUs run very hot so it's not an option for them.

murderfsgchadwick6 hours ago
GPUs are not particularly hot for compute silicon, they just have ridiculously huge dies. Comparing the 5090 to a Core Ultra 285K, the GPU has a 750 mm^2 die compared to the CPU's 243 mm^2, but has a peak power of 575W compared to 250W. The CPU uses 25% more power per area, and that's before considering the fact that consumer CPUs are packaged for user installation, so there's an extra heatspreader on top of the die, whereas GPUs are sold as integrated units, so the heatsink sits directly on top of the die.
kvemkonmurderfs6 hours ago
> consumer CPUs are packaged for user installation

I'd say advanced users or skilled staff.

20+ years ago e.g. Athlon XP had a small CPU die in the middle and 4 round spacers in the corners for a proper heatspreader installation. Despite the CPU die wouldn't clock down and go in flames in case of cooler removal during operation.

Nowadays with a safer CPU monitoring its temperature, one has to risk to remove the heatspreader and replace it with "special" direct die cooling resulting in either a bit more performance or 15-20 grad lower temperatures or a smaller or a silent cooler. One is free to choose.

Sure, even advanced user must take more care working around the naked die. But the technology to make this safer than before could have also matured.

bgnnCalChris43 minutes ago
This assumes 100% utilization. Ralistically the utilization (active device area wrt total die area) 70-75% at best.
dist-epoch11 hours ago
Why doesn't NVIDIA also build something like Google TPU, a systolic array processor? Less programmable, but more throughput/power efficiency?

It seems there is a huge market for inference.

AlotOfReadingdist-epoch11 hours ago
Nvidia tensor cores are small systolic arrays. They'd have to throw out a lot of their ecosystem investments and backwards compatibility to make effective use of them as the main GPU compute, and there's really no need given how competitive their chips are right now.
aurareturndist-epoch11 hours ago

  Less programmable, but more throughput/power efficiency?
I also wonder the same. It'd make sense to sell two categories of chips:

Traditional GPUs like Blackwell that can do anything and have backwards compatibility.

Less programmable and more ASIC-like inference chips like Google's TPUs. Inference market is going to be multiple times bigger than training soon.

ggreg847 hours ago
Chips and Cheese GPU analysis are pretty detailed, but they need to be taken with a huge grain of salt because the results only really apply to OpenCL and nobody buying NVIDIA or AMD GPUs for Compute runs OpenCL on them; its either CUDA or HIP, which differ widely in parts of their compilation stack.

After reading the entire analysis, I'm left wondering, what observations in this analysis - if any - actually apply to CUDA?

almostgotcaughtggreg845 hours ago
> its either CUDA or HIP, which differ widely in parts of their compilation stack.

This is an ironic comment - OpenCL uses the same compiler as CUDA on NVIDIA and HIP on AMD.

JonChesterfieldalmostgotcaught4 hours ago
Sort of. Same compiler backend, mostly, but the set of intrinsics and semantic rules are different.
almostgotcaughtJonChesterfield2 hours ago
i have no idea what your point is - same compiler, different frontend, yes that's literally what i said.
nromiunggreg844 hours ago
For benchmarking code like this CUDA, HIP and OpenCL are almost the same. You will only see the difference in big codebases, where you launch multiple kernels and move data between them.

Otherwise OpenCL is very good as well, with the added benefit of running on all GPUs.

Aissen5 hours ago
Does the comparison even makes sense, considering there's (more than) an order of magnitude difference in price between the AMD's Desktop GPU and NVIDIA's Workstation accelerator?