vyor
Oh that's cute
- Joined
- Jul 29, 2015
- Messages
- 14,179
- Likes received
- 50,212
Didn't see one of these, so I'm making one. Post your nerdy tech shit here!
I'll start with a cross post from SB. Because I can.
I'll start with a cross post from SB. Because I can.
Ok, so, legal disclaimer: do not read the linked patent or article if you work at Intel or NVIDIA on GPU tech or are planning to in the next 4 years or so(or if you're starting a new company for that). I will spoiler my analysis to keep any possible legal trouble off your ass. Consider yourself warned.
AMD GPU Architecture Patent Shows Post Navi Features That Could Be Coming In The Future | SegmentNext
SUPER SINGLE INSTRUCTION MULTIPLE DATA (SUPER-SIMD) FOR GRAPHICS PROCESSING UNIT (GPU) COMPUTING
Now, I'm no GPU engineer but I know some stuff but, more importantly, I know people that know a lot more than I do and I'll say this right out: No one knows what in the everloving hell what types of drugs AMD is on to have created this abomination against good sense.
Let's start with the basics. AMD's current GPUs use SIMD(Single Instruction; Multiple Data) instructions, that is it sends out a single big instruction that contains multiple operations. It is, in theory, nice for heavily parallel tasks like gaming and most compute workloads. It does have some drawbacks though; in the current batch of AMD GPU's it has a thread stalling issue. That is, when one of the CU's(or compute units) stalls so do all of the rest as they wait for that single CU to finish the instruction so that they can get another batch of operations. SIMD is great for things like compute workloads where it'll be a bunch of tasks of the same length, but rather poor for graphics in general.
There is, however, another option that AMD once used in the architecture called Terascale. This architecture used VLIW(Very Long Instruction Word) instructions. that is, it sends out a single very complex instruction to each of the cores, this is amazing for gaming but horrid for compute thanks to it's one big drawback: the compiler. It lives and dies by how good the compiler is and most compilers are utter and complete ass. AMD's internal testing showed that for early Terascale GPUs only 3.5 out of 5 ALUs were used per "core." That's almost 2/3s of the ALUs not being used for most compute and gaming loads. Terascale, frankly, didn't scale. It scaled worse than even GCN does right now.
Enter this patent and something AMD is calling "Super-SIMD." It uses what looks to be a Terascale style core, 3 ALUs, one "side" ALU to do "nonessential tasks" whatever the hell those are, and then 2 other larger ALUs which can be "Vectorized ALUs"(what's usually used in a GPU), "Core ALUs"(presumably doing core functions like Fused Multiply Adds), or "Transcendental ALUs" for more complex math problems. These ALUs communicate with each other using VLIW2.
These Super-SIMDs are grouped together to form a CU(or Compute Unit). There are 2 forms of CU, Compact; made up from 2 Super-SIMDs and Full; made up from 4 Super-SIMDs. A full CU has 12 ALUs, a Compact one has 6.
This all tells us something rather interesting: they're packaging VLIW instructions into a SIMD instruction. This is completely and utterly bonkers and no one I've spoken with on it has any clue of how well it will work. It will be faster than GCN, it should be faster than Volta... but how much faster is anyone's guess. But let's get into one of the features that screams "the engineers are on something."
Quoting from the article here. What this means is that if the ALUs need to use the data they just processed instead of going back to the register space the ALUs can just use them again right away which saves a few cycles of work when it's needed. As far as anyone I've spoken to has said and knows... only CPUs do this. Ryzen CPUs, though intel may or may not also do this as well.
Taking all of this into account, it tells us that AMD is trying to get around the traditional SIMD problems and the traditional VLIW problems. This all, however, hinges on their Drivers and their hardware Compiler. If either of those aren't fast enough, aren't up to par, the entire system starts to fall apart.
It's insane and brilliant at the same time. Mostly insane though and makes me wonder if there's an elder thing in Radeon headquarters.