Lower power consumption on a desktop monitor is an interesting technical challenge but I do wonder “Cui bono?” - obviously I’d want my gaming machine to consume less power but I’m not sure I’ve ever considered mouse-idle monitor-on power consumption when considering eg AMD versus Nvidia for my gaming machine.
Don’t get me wrong this is very interesting and AMD does great engineering and I loath to throw shade on an engineering focused company but… Is this going to convert to even a single net gain purchase for AMD?
I’m a relatively (to myself) a large AMD shareholder (colloquially: fanboy) and damn I’d love to see more focus on hardware matmul acceleration rather than idle monitor power draw.
I am not saying that this was the reason I bought it, but I recently purchased a Radeon 9070 and I was surprised how little power this card uses in idle. I was seeing figures between 4W~10W on Windows (sadly slightly more on Linux).
In general this generation of Radeon GPUs seems highly efficient. Radeon 9070 is a beast of a GPU.
Some people appreciate leaving the pc open for light tasks even at night, and wasting too much power doing nothing is... well, wasteful. Imagine a home server that has the GPU for AI or multimedia stuff.
The same architecture will also be used in mobile, so depending on where this comes from (architecturally) it could mean more power savings there, too.
Besides, lower power also means lower cooling/noise on idle, and shorter cooldown times after a burst of work.
And since AMD is slowly going to the (ever next-time) unified architecture, any gains there will also mean less idle power draw in other environments, like servers.
Nothing groundbreaking, sure, but I won't say no to all of that.
To wager a guess, would that optimization also help push the envelope when one application needs all the power it can get while another monitor is just sitting idle ?
Another angle I'm wondering about is longevity of the card. Not sure if AMD would positively care in the first place, but as a user if the card didn't have to grind much on the idle parts and thus last a year or two longer, it would be pretty valuable.
Recent nvidia generations also about doubled their idle power consumption. Those increases are probably actual baseline increases (i.e. reduce compute power budget), while prior RDNA generations would idle at around 80-100 W doing video playback or driving more than one monitor, which is more indicative of problematic power management.
In terms of heat output the difference between an idling gaming PC from 10 years ago (~30-40 W) and one today (100+ W) is very noticeable in a room. Besides, even gaming PCs are likely idle or nearly idle a significant amount of time, and that's just power wasted. There are also commercial users of desktop GPUs, and there they are idle an even bigger percentage of the time.
Another pane of AMD GPU R&D is the _userland_ _hardware_ [ring] buffers for near direct hardware userland programming.
They started to experiment on that in mesa and linux ("user queues", as "user hardware queues").
I don't know how they will work around the scarse VM IDs, but here, we are talking near 0 driver. Obviously, they will have to simplify/cleanup a lot 3D pipeline programming and be very sure of its robustness, basically to have it ready for "default" rendering/usage right away.
Userland will get from the kernel stuff along those lines: command/event hardware ring buffers, data dma buffers, read/write pointers & doorbells memory page for those ring buffers, and an event file descriptor for an event ring buffer. Basically, what the kernel currently has.
I wonder if it will provide some significant simplification than the current way which is giving indirect command buffers to the kernel and deal with 'sync objects'/barriers.
the architecture is shared between desktop and mobile. this sounds 100% like something that they did to give some dual display laptop or handheld 3 hours extra battery life by fixing something dumb.
I refer to the RDNA4 instruction set manual ([1]), page 90, Table 41. WMMA Instructions.
They support FP8/BF8 with F32 accumulate and also IU4 with I32 accumulate. The max matrix size is 16x16. For comparison, NVIDIA Blackwell GB200 supports matrices up to 256x32 for FP8 and 256x96 for NVFP4.
This matters for overall throughput, as feeding a bigger matrix unit is actually cheaper in terms of memory bandwidth, as the number of FLOPs grows O(n^2) when increasing the size of a systolic array, while the number of inputs/outputs as O(n).
Went down the MI300A rabbit hole that was just casually mentioned in this post (https://chipsandcheese.com/p/inside-the-amd-radeon-instinct-...). What a fun chip! (and blog!)
Lower power consumption on a desktop monitor is an interesting technical challenge but I do wonder “Cui bono?” - obviously I’d want my gaming machine to consume less power but I’m not sure I’ve ever considered mouse-idle monitor-on power consumption when considering eg AMD versus Nvidia for my gaming machine.
Don’t get me wrong this is very interesting and AMD does great engineering and I loath to throw shade on an engineering focused company but… Is this going to convert to even a single net gain purchase for AMD?
I’m a relatively (to myself) a large AMD shareholder (colloquially: fanboy) and damn I’d love to see more focus on hardware matmul acceleration rather than idle monitor power draw.
Power efficient chips will result in more overall performance for the same amount of total power drawn. Its all about performance/watt
I am not saying that this was the reason I bought it, but I recently purchased a Radeon 9070 and I was surprised how little power this card uses in idle. I was seeing figures between 4W~10W on Windows (sadly slightly more on Linux).
In general this generation of Radeon GPUs seems highly efficient. Radeon 9070 is a beast of a GPU.
Some people appreciate leaving the pc open for light tasks even at night, and wasting too much power doing nothing is... well, wasteful. Imagine a home server that has the GPU for AI or multimedia stuff.
The same architecture will also be used in mobile, so depending on where this comes from (architecturally) it could mean more power savings there, too.
Besides, lower power also means lower cooling/noise on idle, and shorter cooldown times after a burst of work.
And since AMD is slowly going to the (ever next-time) unified architecture, any gains there will also mean less idle power draw in other environments, like servers.
Nothing groundbreaking, sure, but I won't say no to all of that.
Rumors have been floating around about some kind of PS6 portable or next gen steam deck with RDNA4 where power consumption matters.
There's also simply laptop longevity that would be nice.
To wager a guess, would that optimization also help push the envelope when one application needs all the power it can get while another monitor is just sitting idle ?
Another angle I'm wondering about is longevity of the card. Not sure if AMD would positively care in the first place, but as a user if the card didn't have to grind much on the idle parts and thus last a year or two longer, it would be pretty valuable.
Recent nvidia generations also about doubled their idle power consumption. Those increases are probably actual baseline increases (i.e. reduce compute power budget), while prior RDNA generations would idle at around 80-100 W doing video playback or driving more than one monitor, which is more indicative of problematic power management.
In terms of heat output the difference between an idling gaming PC from 10 years ago (~30-40 W) and one today (100+ W) is very noticeable in a room. Besides, even gaming PCs are likely idle or nearly idle a significant amount of time, and that's just power wasted. There are also commercial users of desktop GPUs, and there they are idle an even bigger percentage of the time.
Another pane of AMD GPU R&D is the _userland_ _hardware_ [ring] buffers for near direct hardware userland programming.
They started to experiment on that in mesa and linux ("user queues", as "user hardware queues").
I don't know how they will work around the scarse VM IDs, but here, we are talking near 0 driver. Obviously, they will have to simplify/cleanup a lot 3D pipeline programming and be very sure of its robustness, basically to have it ready for "default" rendering/usage right away.
Userland will get from the kernel stuff along those lines: command/event hardware ring buffers, data dma buffers, read/write pointers & doorbells memory page for those ring buffers, and an event file descriptor for an event ring buffer. Basically, what the kernel currently has.
I wonder if it will provide some significant simplification than the current way which is giving indirect command buffers to the kernel and deal with 'sync objects'/barriers.
the architecture is shared between desktop and mobile. this sounds 100% like something that they did to give some dual display laptop or handheld 3 hours extra battery life by fixing something dumb.
More curious, does RDNA4 have native FP8 support?
I refer to the RDNA4 instruction set manual ([1]), page 90, Table 41. WMMA Instructions.
They support FP8/BF8 with F32 accumulate and also IU4 with I32 accumulate. The max matrix size is 16x16. For comparison, NVIDIA Blackwell GB200 supports matrices up to 256x32 for FP8 and 256x96 for NVFP4.
This matters for overall throughput, as feeding a bigger matrix unit is actually cheaper in terms of memory bandwidth, as the number of FLOPs grows O(n^2) when increasing the size of a systolic array, while the number of inputs/outputs as O(n).
1. https://www.amd.com/content/dam/amd/en/documents/radeon-tech...
2. https://semianalysis.com/2025/06/23/nvidia-tensor-core-evolu...
[dead]