Content deleted Content added
https://www.igorslab.de/en/nvidia-geforce-rtx-5090-and-rtx-5080-launch-in-the-fourth-quarter-of-2024/ |
m Section heading change: Multiprocessor Architecture → Multiprocessor architecture using a script |
||
(24 intermediate revisions by 17 users not shown) | |||
Line 9:
| developer = [[Nvidia]]
| released = {{Start date and age|2007|06|23}}
| latest_release_version = 12.
| latest_release_date = {{Start date and age|2024|
| operating_system = [[Windows]], [[Linux]]
| platform = [[#GPUs supported|Supported GPUs]]
Line 17:
| website = {{URL|https://developer.nvidia.com/cuda-zone}}
}}
In [[computing]], '''CUDA''' (originally '''Compute Unified Device Architecture
CUDA is designed to work with programming languages such as [[C (programming language)|C]], [[C++]], [[Fortran]] and [[Python (programming language)|Python]]. This accessibility makes it easier for specialists in parallel programming to use GPU resources, in contrast to prior APIs like [[Direct3D]] and [[OpenGL]], which required advanced skills in graphics programming.<ref>{{Cite news |url=https://www.videomaker.com/article/c15/19313-cuda-vs-opencl-vs-opengl |title=CUDA vs. OpenCL vs. OpenGL |last=Zunitch |first=Peter |date=2018-01-24 |work=Videomaker |access-date=2018-09-16 |language=en-US}}</ref> CUDA-powered GPUs also support programming frameworks such as [[OpenMP]], [[OpenACC]] and [[OpenCL]].<ref>{{Cite web |url=https://developer.nvidia.com/opencl |title=OpenCL |date=2013-04-24 |website=NVIDIA Developer |language=en |access-date=2019-11-04}}</ref><ref name="CUDA intro - TomsHardware" />
Line 78:
[[File:CUDA processing flow (En).PNG|thumb|300px|right|'''Example of CUDA processing flow''' {{ordered list |1=Copy data from main memory to GPU memory |2=CPU initiates the GPU [[compute kernel]] |3=GPU's CUDA cores execute the kernel in parallel |4=Copy the resulting data from GPU memory to main memory}}]]
The CUDA platform is accessible to software developers through CUDA-accelerated libraries, [[Directive (programming)|compiler directives]] such as [[OpenACC]], and extensions to industry-standard programming languages including [[C (programming language)|C]], [[C++]], [[Fortran]] and [[Python (programming language)|Python]]. C/C++ programmers can use 'CUDA C/C++', compiled to [[Parallel Thread Execution|PTX]] with [[NVIDIA CUDA Compiler|nvcc]], Nvidia's [[LLVM]]-based C/C++ compiler, or by clang itself.<ref>{{cite web|url=https://developer.nvidia.com/cuda-llvm-compiler|title=CUDA LLVM Compiler|date=7 May 2012}}</ref> Fortran programmers can use 'CUDA Fortran', compiled with the PGI CUDA Fortran compiler from [[The Portland Group]].{{Update inline|reason=PGI Compilers & Tools have evolved into the NVIDIA HPC SDK. The current Fortran compiler is called nvfortran.|date=December 2022}} Python programmers can use the
In addition to libraries, compiler directives, CUDA C/C++ and CUDA Fortran, the CUDA platform supports other computational interfaces, including the [[Khronos Group]]'s [[OpenCL]],<ref>{{YouTube|r1sN1ELJfNo|First OpenCL demo on a GPU}}</ref> Microsoft's [[DirectCompute]], [[OpenGL]] Compute Shader and [[C++ AMP]].<ref>{{YouTube|K1I4kts5mqc|DirectCompute Ocean Demo Running on Nvidia CUDA-enabled GPU}}</ref> Third party wrappers are also available for [[Python (programming language)|Python]], [[Perl]], Fortran, [[Java (programming language)|Java]], [[Ruby (programming language)|Ruby]], [[Lua (programming language)|Lua]], [[Common Lisp (programming language)|Common Lisp]], [[Haskell (programming language)|Haskell]], [[R (programming language)|R]], [[MATLAB]], [[IDL (programming language)|IDL]], [[Julia (programming language)|Julia]], and native support in [[Mathematica]].
Line 288:
| 11.8<ref>{{cite web|url=https://docs.nvidia.com/cuda/archive/11.8.0/cuda-toolkit-release-notes/index.html|title=CUDA 11.8 Release Notes|website=NVIDIA Developer}}</ref> || || || || {{yes|3.5}} || {{yes|}} || {{yes|}} || {{yes|}} || {{yes|}} || {{yes|}} || {{yes|8.9}} || {{yes|9.0}} ||
|-
| 12.0 – 12.
|}
Line 485:
|-
|10.0
|rowspan="2" |[[Blackwell (microarchitecture)|Blackwell]]
|GB100
|
|
|B200, B100
|
|-
|10.x
|GB202, GB203, GB205, GB206, GB207
|GeForce RTX
|
|B40
Line 549 ⟶ 548:
| colspan="4" rowspan="1" {{yes}}
|-
| Uniform Datapath
| colspan="6" rowspan="1" {{no}}
| colspan="3" rowspan="1" {{yes}}
Line 622 ⟶ 621:
| {{yes|1.2}}
| {{yes|2.0}}
|-
| any 128-bit trivially copyable type
| general operations
| {{no}}
| atomicExch, atomicCAS
| colspan="2" {{yes|9.0}}
|-
| rowspan="2" | 16-bit floating point<br />FP16
Line 647 ⟶ 652:
| atomic addition
| colspan="2" {{yes|2.0}}
|-
| rowspan="1" | 32-bit floating point float2 and float4
| general operations
| {{no}}
| atomic addition
| colspan="2" {{yes|9.0}}
|-
| rowspan="1" | 64-bit floating point
Line 655 ⟶ 666:
|}
Note: Any missing lines or empty entries do reflect some lack of information on that exact item.<ref>{{cite web | url=https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#features-and-technical-specifications | title=CUDA C++ Programming Guide }}</ref>
===Tensor cores===
Line 709 ⟶ 719:
| colspan="1" {{yes|1024}}
|-
| 4-bit floating point FP4 (E2M1?)
| colspan="2" {{yes|10.0}}
| colspan="11" {{no}}
| colspan="1" {{yes|4096}}
|-
| 6-bit floating point FP6 (E3M2 and E2M3?)
| colspan="2" {{yes|10.0}}
| colspan="11" {{no}}
Line 780 ⟶ 790:
Note: Any missing lines or empty entries do reflect some lack of information on that exact item.<ref>{{cite web|url=https://www.nvidia.com/content/dam/en-zz/Solutions/gtcf21/jetson-orin/nvidia-jetson-agx-orin-technical-brief.pdf|title=Technical brief. NVIDIA Jetson AGX Orin Series|website=nvidia.com|access-date=5 September 2023}}</ref><ref>{{cite web|url=https://images.nvidia.com/aem-dam/en-zz/Solutions/geforce/ampere/pdf/NVIDIA-ampere-GA102-GPU-Architecture-Whitepaper-V1.pdf|title=NVIDIA Ampere GA102 GPU Architecture|website=nvidia.com|access-date=5 September 2023}}</ref>
<ref>{{cite
<ref>{{cite web|url=https://images.nvidia.com/content/Solutions/data-center/a40/nvidia-a40-datasheet.pdf|title=Datasheet NVIDIA A40|website=nvidia.com|access-date=27 April 2024}}</ref>
<ref>{{cite web | url=https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.1.pdf | title=NVIDIA AMPERE GA102 GPU ARCHITECTURE | date=27 April 2024 }}</ref>
Line 889 ⟶ 899:
|}
===Technical
<div style="overflow-x:auto">
{| class="wikitable" style="font-size:85%;"
Line 1,229 ⟶ 1,239:
</div>
===Multiprocessor
<div style="overflow-x:auto">
{| class="wikitable" style="font-size:85%;"
Line 1,539 ⟶ 1,549:
* Accelerated interconversion of video file formats
* Accelerated [[encryption]], [[decryption]] and [[Data compression|compression]]
*[[Bioinformatics]], e.g. [[Massive parallel sequencing|NGS]] DNA sequencing
* Distributed calculations, such as predicting the native conformation of [[proteins]]
* Medical analysis simulations, for example [[virtual reality]] based on [[X-ray computed tomography|CT]] and [[Magnetic resonance imaging|MRI]] scan images
* Physical simulations,<ref>{{Cite web|title=Part V: Physics Simulation|url=https://developer.nvidia.com/gpugems/gpugems3/part-v-physics-simulation|access-date=2020-09-11|website=NVIDIA Developer|language=en}}</ref> in particular in [[fluid dynamics]]
* [[Neural network]] training in [[machine learning]] problems
* [[Large Language Model]] inference
* [[Face recognition]]
* [[Volunteer computing]] projects, such as [[SETI@home]] and other projects using [[Berkeley Open Infrastructure for Network Computing|BOINC]] software
Line 1,553 ⟶ 1,564:
CUDA competes with other GPU computing stacks: [[OneAPI (compute acceleration)|Intel OneAPI]] and [[ROCm|AMD ROCm]].
=== Intel OneAPI ===
{{Main|OneAPI (compute acceleration)}}
'''oneAPI''' is an initiative based in open standards, created to support software development for multiple hardware architectures.<ref>{{Cite web |title=oneAPI Programming Model |url=https://www.oneapi.io/ |access-date=2024-07-27 |website=oneAPI.io |language=en-US}}</ref> The oneAPI libraries must implement open specifications that are discussed publicly by the Special Interest Groups, offering the possibility for any developer or organization to implemente their own versions of oneAPI libraries.<ref>{{Cite web |title=Specifications {{!}} oneAPI |url=https://www.oneapi.io/spec/ |access-date=2024-07-27 |website=oneAPI.io |language=en-US}}</ref><ref>{{Cite web |title=oneAPI Specification — oneAPI Specification 1.3-rev-1 documentation |url=https://oneapi-spec.uxlfoundation.org/specifications/oneapi/v1.3-rev-1/ |access-date=2024-07-27 |website=oneapi-spec.uxlfoundation.org}}</ref>
Originally made by Intel, other hardware adopters
==== Unified Acceleration Foundation (UXL) ====
Unified Acceleration Foundation (UXL) is a new technology consortium
=== AMD ROCm ===
|