CUDA: Difference between revisions

Content deleted Content added
https://www.igorslab.de/en/nvidia-geforce-rtx-5090-and-rtx-5080-launch-in-the-fourth-quarter-of-2024/
m Section heading change: Multiprocessor Architecture → Multiprocessor architecture using a script
 
(24 intermediate revisions by 17 users not shown)
Line 9:
| developer = [[Nvidia]]
| released = {{Start date and age|2007|06|23}}
| latest_release_version = 12.4.16
| latest_release_date = {{Start date and age|2024|04|1208}}
| operating_system = [[Windows]], [[Linux]]
| platform = [[#GPUs supported|Supported GPUs]]
Line 17:
| website = {{URL|https://developer.nvidia.com/cuda-zone}}
}}
In [[computing]], '''CUDA''' (originally '''Compute Unified Device Architecture (CUDA)''') is a proprietary<ref name=":0">{{Cite web |last=Shah |first=Agam |title=Nvidia not totally against third parties making CUDA chips |url=https://www.theregister.com/2021/11/10/nvidia_cuda_silicon/ |access-date=2024-04-25 |website=www.theregister.com |language=en}}</ref> [[parallel computing]] platform and [[application programming interface]] (API) that allows software to use certain types of [[graphics processing units]] (GPUs) for accelerated general-purpose processing, an approach called general-purpose computing on GPUs ([[GPGPU]]). CUDA API and its runtime: The CUDA API is an extension of the C programming language that adds the ability to specify thread-level parallelism in C and also to specify GPU device specific operations (like moving data between the CPU and the GPU). <ref>{{cite web |last1=Nvidia |title=What is CUDA? |url=https://nvidia.custhelp.com/app/answers/detail/a_id/2132/~/what-is-cuda%3F |website=Nvidia |access-date=21 March 2024}}</ref> CUDA is a software layer that gives direct access to the GPU's virtual [[instruction set]] and parallel computational elements for the execution of [[compute kernel]]s.<ref name="CUDA intro - TomsHardware">{{cite web |url=https://www.tomshardware.com/reviews/nvidia-cuda-gpu,1954.html |title=Nvidia's CUDA: The End of the CPU? |last=Abi-Chahla |first=Fedy |date=June 18, 2008 |publisher=Tom's Hardware |access-date=May 17, 2015}}</ref> In addition to drivers and runtime kernels, the CUDA platform includes compilers, [https://developer.nvidia.com/gpu-accelerated-libraries libraries] and [https://developer.nvidia.com/tools-overview developer tools] to help programmers accelerate their applications.
 
CUDA is designed to work with programming languages such as [[C (programming language)|C]], [[C++]], [[Fortran]] and [[Python (programming language)|Python]]. This accessibility makes it easier for specialists in parallel programming to use GPU resources, in contrast to prior APIs like [[Direct3D]] and [[OpenGL]], which required advanced skills in graphics programming.<ref>{{Cite news |url=https://www.videomaker.com/article/c15/19313-cuda-vs-opencl-vs-opengl |title=CUDA vs. OpenCL vs. OpenGL |last=Zunitch |first=Peter |date=2018-01-24 |work=Videomaker |access-date=2018-09-16 |language=en-US}}</ref> CUDA-powered GPUs also support programming frameworks such as [[OpenMP]], [[OpenACC]] and [[OpenCL]].<ref>{{Cite web |url=https://developer.nvidia.com/opencl |title=OpenCL |date=2013-04-24 |website=NVIDIA Developer |language=en |access-date=2019-11-04}}</ref><ref name="CUDA intro - TomsHardware" />
Line 78:
[[File:CUDA processing flow (En).PNG|thumb|300px|right|'''Example of CUDA processing flow''' {{ordered list |1=Copy data from main memory to GPU memory |2=CPU initiates the GPU [[compute kernel]] |3=GPU's CUDA cores execute the kernel in parallel |4=Copy the resulting data from GPU memory to main memory}}]]
 
The CUDA platform is accessible to software developers through CUDA-accelerated libraries, [[Directive (programming)|compiler directives]] such as [[OpenACC]], and extensions to industry-standard programming languages including [[C (programming language)|C]], [[C++]], [[Fortran]] and [[Python (programming language)|Python]]. C/C++ programmers can use 'CUDA C/C++', compiled to [[Parallel Thread Execution|PTX]] with [[NVIDIA CUDA Compiler|nvcc]], Nvidia's [[LLVM]]-based C/C++ compiler, or by clang itself.<ref>{{cite web|url=https://developer.nvidia.com/cuda-llvm-compiler|title=CUDA LLVM Compiler|date=7 May 2012}}</ref> Fortran programmers can use 'CUDA Fortran', compiled with the PGI CUDA Fortran compiler from [[The Portland Group]].{{Update inline|reason=PGI Compilers & Tools have evolved into the NVIDIA HPC SDK. The current Fortran compiler is called nvfortran.|date=December 2022}} Python programmers can use the [https://developer.nvidia.com/cunumeric cuNumeric] library to accelerate applications on Nvidia GPUs.
 
In addition to libraries, compiler directives, CUDA C/C++ and CUDA Fortran, the CUDA platform supports other computational interfaces, including the [[Khronos Group]]'s [[OpenCL]],<ref>{{YouTube|r1sN1ELJfNo|First OpenCL demo on a GPU}}</ref> Microsoft's [[DirectCompute]], [[OpenGL]] Compute Shader and [[C++ AMP]].<ref>{{YouTube|K1I4kts5mqc|DirectCompute Ocean Demo Running on Nvidia CUDA-enabled GPU}}</ref> Third party wrappers are also available for [[Python (programming language)|Python]], [[Perl]], Fortran, [[Java (programming language)|Java]], [[Ruby (programming language)|Ruby]], [[Lua (programming language)|Lua]], [[Common Lisp (programming language)|Common Lisp]], [[Haskell (programming language)|Haskell]], [[R (programming language)|R]], [[MATLAB]], [[IDL (programming language)|IDL]], [[Julia (programming language)|Julia]], and native support in [[Mathematica]].
Line 288:
| 11.8<ref>{{cite web|url=https://docs.nvidia.com/cuda/archive/11.8.0/cuda-toolkit-release-notes/index.html|title=CUDA 11.8 Release Notes|website=NVIDIA Developer}}</ref> || || || || {{yes|3.5}} || {{yes|}} || {{yes|}} || {{yes|}} || {{yes|}} || {{yes|}} || {{yes|8.9}} || {{yes|9.0}} ||
|-
| 12.0 – 12.45 || || || || || {{yes|5.0}} || {{yes|}} || {{yes|}} || {{yes|}} || {{yes|}} || {{yes|}} || {{yes|9.0}} ||
|}
 
Line 485:
|-
|10.0
|rowspan="2" |[[Blackwell (microarchitecture)|Blackwell]]
|GB100
|
|
|B200, B100
|
|-
|10.x
|[[Blackwell (microarchitecture)|Blackwell]]
|GB202, GB203, GB205, GB206, GB207
|GeForce RTX 50805090, RTX5090RTX 5080
|
|B40
Line 549 ⟶ 548:
| colspan="4" rowspan="1" {{yes}}
|-
| Uniform Datapath <ref>[https://developer.download.nvidia.com/video/gputechconf/gtc/2019/presentation/s9839-discovering-the-turing-t4-gpu-architecture-with-microbenchmarks.pdf Dissecting the Turing GPU Architecture through Microbenchmarking]</ref>
| colspan="6" rowspan="1" {{no}}
| colspan="3" rowspan="1" {{yes}}
Line 622 ⟶ 621:
| {{yes|1.2}}
| {{yes|2.0}}
|-
| any 128-bit trivially copyable type
| general operations
| {{no}}
| atomicExch, atomicCAS
| colspan="2" {{yes|9.0}}
|-
| rowspan="2" | 16-bit floating point<br />FP16
Line 647 ⟶ 652:
| atomic addition
| colspan="2" {{yes|2.0}}
|-
| rowspan="1" | 32-bit floating point float2 and float4
| general operations
| {{no}}
| atomic addition
| colspan="2" {{yes|9.0}}
|-
| rowspan="1" | 64-bit floating point
Line 655 ⟶ 666:
|}
Note: Any missing lines or empty entries do reflect some lack of information on that exact item.<ref>{{cite web | url=https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#features-and-technical-specifications | title=CUDA C++ Programming Guide }}</ref>
 
 
===Tensor cores===
Line 709 ⟶ 719:
| colspan="1" {{yes|1024}}
|-
| 4-bit floating point FP4 (E2M1?)
| colspan="2" {{yes|10.0}}
| colspan="11" {{no}}
| colspan="1" {{yes|4096}}
|-
| 6-bit floating point FP6 (E3M2 and E2M3?)
| colspan="2" {{yes|10.0}}
| colspan="11" {{no}}
Line 780 ⟶ 790:
 
Note: Any missing lines or empty entries do reflect some lack of information on that exact item.<ref>{{cite web|url=https://www.nvidia.com/content/dam/en-zz/Solutions/gtcf21/jetson-orin/nvidia-jetson-agx-orin-technical-brief.pdf|title=Technical brief. NVIDIA Jetson AGX Orin Series|website=nvidia.com|access-date=5 September 2023}}</ref><ref>{{cite web|url=https://images.nvidia.com/aem-dam/en-zz/Solutions/geforce/ampere/pdf/NVIDIA-ampere-GA102-GPU-Architecture-Whitepaper-V1.pdf|title=NVIDIA Ampere GA102 GPU Architecture|website=nvidia.com|access-date=5 September 2023}}</ref>
<ref>{{cite web|url=https://arxiv.org/pdf/2402.13499v1arXiv|title=Benchmarking and Dissecting the Nvidia Hopper GPU Architecture|websiteeprint=arxiv2402.org13499v1 |access-datelast1=27Luo April|first1=Weile |last2=Fan |first2=Ruibo |last3=Li |first3=Zeyu |last4=Du |first4=Dayou |last5=Wang |first5=Qiang |last6=Chu |first6=Xiaowen |date=2024 |class=cs.AR }}</ref>
<ref>{{cite web|url=https://images.nvidia.com/content/Solutions/data-center/a40/nvidia-a40-datasheet.pdf|title=Datasheet NVIDIA A40|website=nvidia.com|access-date=27 April 2024}}</ref>
<ref>{{cite web | url=https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.1.pdf | title=NVIDIA AMPERE GA102 GPU ARCHITECTURE | date=27 April 2024 }}</ref>
Line 889 ⟶ 899:
|}
 
===Technical Specificationspecification===
<div style="overflow-x:auto">
{| class="wikitable" style="font-size:85%;"
Line 1,229 ⟶ 1,239:
</div>
 
===Multiprocessor Architecturearchitecture===
<div style="overflow-x:auto">
{| class="wikitable" style="font-size:85%;"
Line 1,539 ⟶ 1,549:
* Accelerated interconversion of video file formats
* Accelerated [[encryption]], [[decryption]] and [[Data compression|compression]]
*[[Bioinformatics]], e.g. [[Massive parallel sequencing|NGS]] DNA sequencing [[sourceforge:projects/seqbarracuda/|BarraCUDA]]<ref>{{Cite web|url=https://www.biocentric.nl/biocentric/nvidia-cuda-bioinformatics-barracuda/|title=nVidia CUDA Bioinformatics: BarraCUDA|date=2019-07-19|website=BioCentric|language=en|access-date=2019-10-15}}</ref>
* Distributed calculations, such as predicting the native conformation of [[proteins]]
* Medical analysis simulations, for example [[virtual reality]] based on [[X-ray computed tomography|CT]] and [[Magnetic resonance imaging|MRI]] scan images
* Physical simulations,<ref>{{Cite web|title=Part V: Physics Simulation|url=https://developer.nvidia.com/gpugems/gpugems3/part-v-physics-simulation|access-date=2020-09-11|website=NVIDIA Developer|language=en}}</ref> in particular in [[fluid dynamics]]
* [[Neural network]] training in [[machine learning]] problems
* [[Large Language Model]] inference
* [[Face recognition]]
* [[Volunteer computing]] projects, such as [[SETI@home]] and other projects using [[Berkeley Open Infrastructure for Network Computing|BOINC]] software
Line 1,553 ⟶ 1,564:
CUDA competes with other GPU computing stacks: [[OneAPI (compute acceleration)|Intel OneAPI]] and [[ROCm|AMD ROCm]].
 
Where asWhereas Nvidia's CUDA is closed-source, Intel's OneAPI and AMD's ROCm are open source.
 
=== Intel OneAPI ===
{{Main|OneAPI (compute acceleration)}}
 
'''oneAPI''' is an initiative based in open standards, created to support software development for multiple hardware architectures.<ref>{{Cite web |title=oneAPI Programming Model |url=https://www.oneapi.io/ |access-date=2024-07-27 |website=oneAPI.io |language=en-US}}</ref> The oneAPI libraries must implement open specifications that are discussed publicly by the Special Interest Groups, offering the possibility for any developer or organization to implemente their own versions of oneAPI libraries.<ref>{{Cite web |title=Specifications {{!}} oneAPI |url=https://www.oneapi.io/spec/ |access-date=2024-07-27 |website=oneAPI.io |language=en-US}}</ref><ref>{{Cite web |title=oneAPI Specification — oneAPI Specification 1.3-rev-1 documentation |url=https://oneapi-spec.uxlfoundation.org/specifications/oneapi/v1.3-rev-1/ |access-date=2024-07-27 |website=oneapi-spec.uxlfoundation.org}}</ref>
'''oneAPI''' is open source, and all the corresponding libraries are published on its [https://github.com/oneapi-src GitHub Page].
 
Originally made by Intel, other hardware adopters are exampleinclude Fujitsu and Huawei.
 
==== Unified Acceleration Foundation (UXL) ====
 
Unified Acceleration Foundation (UXL) is a new technology consortium that are working on the continuation of the OneAPI initiative, with tothe goal to create a new open standard accelerator software ecosystem, related open standards and specification projects through Working Groups and Special Interest Groups (SIGs). The goal willis competeto withoffer open alternatives to Nvidia's CUDA. The main companies behind it are Intel, Google, ARM, Qualcomm, Samsung, Imagination, and VMware.<ref>{{Cite web |title=Exclusive: Behind the plot to break Nvidia's grip on AI by targeting software |website=[[Reuters]] |url=https://www.reuters.com/technology/behind-plot-break-nvidias-grip-ai-by-targeting-software-2024-03-25/ |access-date=2024-04-05}}</ref>
 
=== AMD ROCm ===