Jan 21, 2020

Sash Rant: Some people don't understand Turing's advantages.

Okay, I am a bit triggered, so that means you get a Rant Post! How lucky you are, I mean that one person that reads my blog or randomly wandered here from the internet wasteland. :D

So, this misinformation comes from my favourite source! Disqus, of course. To be fair, VideoCardz's comment section has cultivated some excellent technology discussion about the hardware and industry, you just have to (like any public comment forum) wade through the BS and people who don't know what they're talking about.

Hell, I never even claimed to really "know what I'm talking about"; I still have a positively huge amount to learn on these subjects, but I am passionate about learning about them, and I feel I have a solid foundation of knowledge that is required for me to create a rant post like this one. So without further ado, I will take some comments and type why I flat-out disagree (and not just about opinions, I will argue that facts support my argument).

Pascal (10-series) has higher PPT (Perf Per Transistor) and PPA (Performance Per Area) than Turing (16/20-series): In a lot of '3D Gaming Workloads'.

So, let's start by making this very clear. Because it's a fact, Turing does indeed have more Transistors and larger die sizes relative to Pascal for a given amount of performance, in many 3D Gaming Workloads. That said, this is not a regression, because there are some things that we need to make clear concerning the architectures being compared here, and also; the extend of that 'advantage' the OP is claiming is erroded RE: TU106-based RTX 2060, due to said product featuring a pretty heavily cut-down processor.

So, firstly, RTX 2060 is based on TU106 with just over 10b transistors, this is a fact, and on paper, it doesn't look that great in many gaming workloads versus a processor like GP104 with just over 7b for similar performance.

This flawed assessment firstly doesn't take into consideration that RTX 2060's TU106 GPU is heavily cut back. That is, there are a lot of transistors in functional units in that silicon that are not doing anything at all; I.E, they are laser-disabled likely due to lithography defects, leakage, or just product segmentation reasons.

A 'fairer' (not fully, will come to that in a moment) comparison is to compare the full active TU106 die (RTX 2070 non SUPER), since you are using the full GP104 silicon in the GTX 1080. Running that same comparison, the TU106 does indeed have higher performance, but wait! Yes, it indeed has many more transistors, more so than the increase in performance. Which brings me to the next point in this rant.

Turing incorporates architectural advancements that increase transistor budget, but also flexibility, performance in certain workloads: Ray-Tracing.

This is something we need to understand before judging Turing in 'Gaems' versus Pascal. Firstly, Turing isn't actually built specifically for 3D graphics; it's an extremely capable, versatile architecture with strong compute elements, but those elements can also be leveraged to significantly improve performance in certain gaming situations.

The big elephant in the room, obviously, that everyone knows about; is that Turing has dedicated, fixed-function logic for Ray Tracing. These "RT Cores" are essentially ASIC blocks that vastly speed up the process of looking through a Bounding Volume Hierarchy table (BVH), essentially a list of geometry in a given scene, to determine where a Ray intersects with that geometry and the resultant pixel will need to be altered depending on how the light ray interacted with the geometry. This creates extremely realistic simulations of lighting effects. Ray-Tracing!

Those blocks are doing absolutely nothing in a 'normal 3D Game'. So that transistor bloat, isn't even being used. Consider the following:

What about GTX 1080 versus RTX 2060 with RT enabled?

I think you might find that the 2060 is quite a bit more powerful than the GTX 1080. Now, even though this is a huge performance increase when Turing's HW RT feature is being used, it might not actually fully explain the increased transistor count relative to % performance increase. There are further reasons to explain this, so I will maybe create a smaller sub-heading. Or whatever. It's a rant, don't judge me!

Turing's architecture trades PPA/PPT for higher flexibility, concurrent INT/FP, RT/AI acceleration, and higher performance per watt.

Turing's core design includes dediated hardware for Integer code, Floating point code, AI/ML processing and Ray Tracing. The fact that for each 'advertised' CUDA core on a Turing-based GPU, there are actually two seperate CUDA-cores, one for INT32 and one for FP32; furthermore, the SM has additional logic dedicated to Tensor cores (FP16 matrix processors) and Ray Tracing (BVH Traversal ASIC). Those are not being used in normal games, so it's not entirely accurate to judge PPA/PPT between architectures.

On the INT/FP subject, this improves shading efficiency, and very much so, in certain situations, by allowing Integer and FP shader code (obviously not dependent) to be excuted in parallel. With the same number of CUDA cores on Pasca, such code would need to wait for the CUDA core to finish doing INT or FP; it cannot do both.

This advantage varies from pretty significant, to essentially nil, depending on the game engine and the types of instructions used. In certain scenes, Turing can achieve aorund ~30% higher shading efficiency per CUDA core versus Pascal.

TU116 and TU117 do not have RT or Tensor cores, so why do they still have worse PPA/PPT to Pascal?

Where you even reading?These GPUS still use dedicated INT/FP pipes. Furthermore, the TU11x chips, instead of Tensor cores, have a dedicated array of FP16x2 Accumilators. That is dedicated logical pathways for doing 2xFP16 ops in 1 operation, 2:1 of FP32. That is used for Variable Rate Shading and certain other workloads.

Compare GTX 1660 Ti to GTX 1080 when VRS is fully in use, in a game issuing INT and FP concurrently, I think you'll see it a bit differently. Perf/watt did increase with Turing, on a processs that is essentially the same.

Consider that Turing might not have really been intended to be on TSMC's 12nmFFN.

This sub-heading is really interesting. It is highly likely that Turing's die sizes are much larger than NVidia intended. I heard something about Turing originally being slated for Samsung's 10nm process, with higher density. I cannot confirm this, but it's worth thinking about.

Turing strikes me as an architecture that was intended to be on a denser node; where the die sizes would have remained similar or even seen a slight reduction versus previous-generation chips. Of course, that's entirely speculation but it does made sense.

Desktop/datacentre video/accelerator cards aren't as sensitive to PPA/PPT as ultra-portable mobile chips.

One of my final points is, that Turing is not as sensitive to having slightly worse PPA/PPT (in situations where its advanced features are not in use) as Pascal. The very fact that Pascal is still being used in those tiny chips (GP108) also backs this claim up. As for the TU11x series, Nvidia decided that it wasn't worth the AI/RT hardware because these are to be used in products that are historically very high volume, so it shows you that reasoning.

TU102, TU104 and TU106 are powerful, versatile compute processors with Ray-Tracing, AI/ML and the best featureset and API support of any GPU architecture currently available. Perf/watt increased slightly (10-20%) and higher-end (yes 2060 is higher-end, fite me) video card are not sensitive to those trade-offs.

Conclusion.

ADHD kicked in, and I kinda lost my train of thought towards the end. That's because I bought an R9 3950X (lol!) and RTX 2070 (lol again) and they arrive in, like, two hours, and I'm EXTREMELY excited.

By the way, before you start with the "Sash is defending Nvidia Turing to justify his purchase of a 2070", you can stop right there, because I have defended Turing staunchly since its launch for pushing graphics technology forward.

Unfornuately, 'gamers' who don;t understand GPU architectures, engineering and trade-offs that need to be made, 'gamers' who just see FPS charts and nothing. Else. matters. (Intel lol)

Those people are ignorant and shouldn't be making criticism of an architecture developed by incredibily talented engineers and millions of dollars in R&D.

TL;DR: Compare RTX 2060 to GTX 1080 in a game using HW-acel RT, and Tier 2 VRS (which Pascal can't even emulate). (using RT+Tensor, latter for VRS/FP16x2). Also in a game which issues INT and FP instructions a lot concurrently. RTX 2060 is likely a lot more than 50% faster.

Thanks for reading. <3

Eridonia Archives