(GPU) NVIDIA GeForce GTX 1660

(Profile updated as of 10th July. 2019)

Here is my GPU profile for the GTX 1660. NVIDIA's probably best value Turing-based graphics card in my opinion and even then it's not like Oh My God I want To Buy It Now.

TU116 is interesting to me because NVIDIA has stripped out the Tensor and RT hardware to make a smaller, leaner die to address the high volume market that is very sensitive to die size and cost of manufactuering, and probably doesn't care about RT just yet (even if this chip had RT cores: It wouldn't be fast enough to actually do anything with them. RTX 2060 barely is).

(click for full images).

(Picture 1) The silicon die of TU116-300. It is surrounded by 6x 1024 MB GDDR5 SDRAM chips, making up the 192-bit interface. Note the two missing solder points indicating this PCB supports a 256-bit GPU interface, but is unused on TU116-based cards.

(Picture 2) Actual silicon die-shot using Infrared imaging of the TU116 GPU, the chip pictured is the 400 silicon, from GTX 1660 Ti, but I have annotated a single disabled TPC with two SMs that the 300 (GTX 1660) has laser cut. Image credit is to Fritzchens Fritz for the die shot, and the information on GPC structure on die is from this highly useful tool: https://misdake.github.io/ChipAnnotationViewer

(Picture 3) The architectural block-diagram for TU116-300. Note the disabled TPC with its two Streaming Multi-Processors.

Graphics Card Information

Graphics Card: NVIDIA GeForce GTX 1660

Graphics Card Manufacturer: NVIDIA

Graphics Card Release Date: March 14, 2019

Graphics Card MSRP: $219 USD

Graphics Processor Codename: TU116-300

Graphics Processor Manufacturer: NVIDIA

Graphics Processor Implementation: Cut die

Graphics Interface: PCI-E 16x Gen3

Architecture: Turing (TU11x)

Lithography Process: TSMC 12nmFFN FinFET

Approximate die size: 284mm²

Sasha's GPU die Size Rating: small-medium

Approximate Transistor Count: 6,600 Million

Approximate Transistor Density: 23.2 Million / Square Milimetre

GPU Features

Double-speed FP16 Shading: Yes (dedicated FP16x2 pipelines)

Asynchronous Compute Capability: Full

DirectX Hardware Support: DX12.1 (FL 12_1)

Dedicated DXR Acelleration on chip: No

Variable-rate Shading: Yes (Adaptive Shading)

Adv. Geometry shading: Yes (Mesh Shading)

Adv. Geometry shading (Programmable/DX12 Mesh Shaders): Yes

AI/ML Acceleration: No

Advanced Memory Management: No

Integer and Float Shader Co-execution: Yes

Tile-based Renderer: Yes

GPU Computing Resources

GPU Substructures: 3 Graphics Processing Clusters, 11 Texture Processing Clusters

Graphics Cores: 22 Streaming Multi-processors (24 Full Chip)

Graphics Cores per Substructure: 2 per TPC, 2 x GPC with 8, 1 x GPC with 6

Total Stream Processors (ALU/Shaders): 1408 (float/Int) (1536 Full Chip) *

Stream Processors per Graphics Core: 64 Float32, 64 INT32

Graphics Core SIMD Structure: 4 x 16 Float32, 4 x 16 INT32

Total Special Execution Units: 352‬ Special Function Units (384 Full Chip), 352 Load/Store Units (384 Full Chip) 1408 FP16x2 CUDA Cores, 44 FP64 CUDA Cores (48 Full Chip)

Special Execution Units per Graphics Core: 16 Special Function Units, 16 Load/Store Units, 64 FP16x2 CUDA Cores, 2 FP64 CUDA Cores

Total Texturing Units: 88 (96 Full Chip)

Texturing Units per Graphics Core: 4

Pixel Pipelines (ROPs): 48 (6 x ROP Partitions with 8 Pixels per clock)

Level 2 shared on-chip cache: 1536 KB

Geometry/Tessellation Processors: 11 (12 Full Chip)

Raster Engines: 3

GPU Memory Subsystem

Graphics Memory Type: GDDR5

Graphics Memory Standard Capacity: 6144 MB

Graphics Memory Composition: 6 x 1024 MB GDDR5 SDRAM Chips

Graphics Memory Access Granularity: 32-bit (4 bytes)

Graphics Memory Standard Clock Speed / Data Rate: 2000 MHz / 8000 MHz

Graphics Memory Full Interface Width: 192-bit (24 bytes per clock)

Graphics Memory Peak Memory Bandwidth: 192 GB/s

GPU Frequency and Peak performance

Graphics Engine Clock: 1785 MHz *

GPU Computing Power FP16: 10,053,120‬ Million operations per second with FMA

GPU Computing Power FP32: 5,026,560‬ Million operations per second with FMA

GPU Computing Power FP64: 157,080 Million operations per second with FMA

GPU Texturing Rate INT8: 157,080‬ Million texels per second

GPU Texturing Rate FP16: 157,080‬ Million texels per second

GPU Pixel Rate: 85,680‬ Million pixels per second

GPU Primitive Rate: 5,355 Million triangles per second *

GPU Thermal and Power

Standard Cooling Solution: Custom designs with various heatsink types from small single-fan to large multi-fan designs

Typical Board Power: 120 W

Maximum Board Power: Varies per design

Maximum Allowed Junction Temperature (TJ Max): 95*C

Graphics Card description

GeForce GTX 1660 is a low-mid range graphics card released by Nvidia in early 2019 to provide a low-cost entry to the Turing architecture without the bloated dies caused by dedicated Ray Tracing and Tensor hardware. This card is a die-cut TU116 processor, the same used in the more expensive GTX 1660 Ti but with a TPC disabled losing 128 CUDA cores, a Tessellator and 8 Texturing units. It also trades the latest GDDR6 memory technology for cheaper, more common GDDR5. As a result the GTX 1660 can hit the very sweet spot of around £200, making it compete with AMD's incumbent Radeon RX 580 in price, however price cuts to that card have reduced its cost even further and now the GTX 1660 is competing with RX 590 (due to unofficial price cuts) and offers slightly more performance at similar price. GTX 1660's advantage is significantly reduced thermal output and power consumption, but at the cost of having less video memory (6GB vs 8GB on the RX 590).

Interesting to note is that all TU116 boards feature pinouts for 8 DRAM chips meaning these PCB were built to house a 256-bit GPU (TU116 is natively 192-bit, you can see on the die shot). I think it is a cost saving measure to reuse trace designs from the 256-bit TU106 chip used by the RTX 2070.

Graphics Card approximate 3D Performance

Sasha's gaming performance rating (2020): Great for 1080p High settings 60 FPS

GeForce GTX 1660 provides great performance paired with a 1920x1080 monitor and running games at, or close to, maximum detail settings and 60 frames per second. It provides performance slightly ahead (~5%) AMD's Radeon RX 590 but with significantly lower power consumption. Performance is around 10-15% ahead of the last-gen GTX 1060 6GB and Radeon RX 580.

Notes

Graphics Engine Clock

NVIDIA-spec rated boost is listed. Actual gaming clock will be higher due to GPU Boost. It varies on power limit and cooling capacity, per design but will likely be around 1900 MHz. As a result is almost impossible to say what each card will run at in gaming situations.

GPU Primitive Rate

Raw triangle output based on my understanding of the Raster Engines. PolyMorph engines attached to each TPC may have an effect on total triangles rastered.

Total Stream Processors (ALU/Shaders)

Only 32-bit precision CUDA cores are listed, and only advertised CUDA cores. You can see the SIMD structure for the full pipeline count in 32-bits. For example the GTX 1660 actually has 1408 FP32 CUDA cores and 1408 INT32 CUDA cores, that is a total of 2,816 CUDA cores, but as I just said only half can do Floats and half can do Integers. Gaming performance uplift in shading from this design is from limited (<10%) to fairly significant (30-40%) Depending on the types of instructions in shader code.

Misc.

This bit is for my personal opinion on this Graphics card / Graphics processor

Sasha's Awesomeness Rating: Pretty Good