CUDA入门01:安装配置测试

本文介绍CUDA的安装与配置。

首先,你要有一块英伟达显卡。

安装

打开CUDA官网选择平台下载新版本的安装包。Windows下是exe格式的,直接双击安装就行了。Linux下下载deb格式的离线包,防止因为网络问题在线安装失败。

打开CUDNN官网,登录账户,进入CUDNN下载工具需要下载指定版本。CUDNN是神经网络库,一般是结果动态链接库复制到CUDA的动态链接库目录就可以了。

Linux

Linux下还可以通过官方软件仓库安装。

ARM

ARM平台一般使用的是Ubuntu,那么操作和Linux下一样。如果不是,连NVIDIA官方的开发板都是Ubuntu for ARM,我也无能为力。

配置

所谓配置就是将可执行程序所在的目录添加到系统环境变量中(Windows平台不需要)。

测试

老版本在软件安装文件夹中有samples文件夹,在1_Utilities/deviceQuery文件夹中有测试代码,不需要理解,主要能编译运行就可以了。

新版本的CUDA示例代码被分开,放在cuda-samples Github上。

暗夜精灵5笔记本,CUDA驱动为10.2,英伟达驱动为440,cudnn为0.7,测试结果为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1660 Ti"
CUDA Driver Version / Runtime Version 10.2 / 10.2
CUDA Capability Major/Minor version number: 7.5
Total amount of global memory: 5942 MBytes (6230245376 bytes)
(24) Multiprocessors, ( 64) CUDA Cores/MP: 1536 CUDA Cores
GPU Max Clock rate: 1590 MHz (1.59 GHz)
Memory Clock rate: 6001 Mhz
Memory Bus Width: 192-bit
L2 Cache Size: 1572864 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 1024
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 3 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS