deepin 15.11 安装 tensorflow or pytorch GPU 版本

    笔者环境

    主机: 台式机
    CPU: amd 3600
    GPU: GTX 2060

    不同电脑(尤指双显卡的笔记本)的显卡驱动安装方式可能不太一样,但安装显卡后后续步骤应该通用。
    本文仅供参考~~

    安装成功后的最终版本如下,仅供参考:

    nvidia 驱动:  430.50
    tensorflow: 2.0
    Cuda: 10.1
    cuDNN: 7.6.4
    

    参考连接:

    deepin15.8+NVIDIA_390.87+cuda9.0+cudnn7.4+tensorflow-gpu_1.9配置血泪史
    deepin 15.10.2 安装 Python3.6.9
    deepin 15.10.2 安装 Jupyter-notebook

    安装显卡驱动

    此处有参考这里 Deepin 下安装 Nvidia 驱动

    下载驱动

    https://www.geforce.cn/drivers 找到合适的驱动并下载,
    下载完好放到在主目录(NVIDIA-Linux-x86_64-430.50.run)

    禁用nouveau驱动

    # 先安装一个pluma编辑器,或者你可以手动进目录去编辑
    sudo apt-get install pluma
    sudo pluma /etc/modprobe.d/blacklist.conf
     
    ## 或者通过文件夹右键管理员打开,然后手动打开对应的文件(可能需要新建blacklist.conf)
    ## 然后在文件中写入内容如下---:
    blacklist nouveau
    blacklist lbm-nouveau
    options nouveau modeset=0
    alias nouveau off
    alias lbm-nouveau off
    

    接下来需要把刚才更改的这个生效

    sudo update-initramfs -u
    

    重启系统,再次进入系统

    安装显卡驱动

    关闭用户操作界面

    sudo service lightdm stop
    

    命令行模式下输入账号密码登录后,需要进入字符命令模式

    sudo init 3
    

    给与目标nvidia驱动可执行权限--注意路径一定要正确

    chmod 777 ./NVI.............run
    

    安装显卡驱动, 这里需要注意的是,安装过程中会出现很多弹框提示,如果懂的话,按照步骤操作即可,如果不懂的话,一路选择 YES 即可

    sudo ./NVI.............run
    

    不出意外的话,这里是能够安装成功的。如果失败的话也没关系,继续开启下面的用户界面,再寻找其他教程安装显卡驱动吧。显卡驱动下面的步骤依然适用 :)

    开启用户界面

    sudo service lightdm start
    

    判断显卡驱动是否安装成功

    第一种:安装成功之后,系统分辨率应该是变成你显示器支持的最大分辨率的。
    第二种:命令行输入 nvidia-smi, 出现以下类似界面

    jansora@jansora-PC:~$ nvidia-smi 
    Fri Oct 18 15:37:06 2019       
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 430.50       Driver Version: 430.50       CUDA Version: 10.1     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |===============================+======================+======================|
    |   0  GeForce RTX 2060    Off  | 00000000:08:00.0  On |                  N/A |
    | 34%   33C    P8    21W / 165W |     91MiB /  5931MiB |      0%      Default |
    +-------------------------------+----------------------+----------------------+
                                                                                   
    +-----------------------------------------------------------------------------+
    | Processes:                                                       GPU Memory |
    |  GPU       PID   Type   Process name                             Usage      |
    |=============================================================================|
    |    0      4279      G   /usr/lib/xorg/Xorg                            60MiB |
    |    0      4733      G   kwin_x11                                      17MiB |
    +-----------------------------------------------------------------------------+
    
    

    安装 cuda 10.1

    请确保你的显卡驱动支持 cuda10.1 (CUDA 10.1 requires 418.x or higher.)

    下载 cuda 10.1

    wget https://developer.nvidia.com/cuda-10.1-download-archive-base?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=runfilelocal

    赋予执行权限

    chmod 755 cuda_10.1.243_418.87.00_linux.run
    

    开始安装 cuda10.1

    deepin 15.11 安装cuda10.1不能使用sudo 执行root权限来安装, 否则会抛出 跟 /var/log/nvidia/.uninstallManifests 相关的 error,这里不过多赘述该原因了,
    可以通过安装到用户目录下后,再移动到/usr/local方式来绕过这个error,详情请看以下步骤

    创建安装到的文件夹

    cd ~
    mkdir cuda-10.1
    

    执行安装文件, 安装到 ~/cuda-10.1 目录下.

    执行后会有阅读指南,按 [[q]] 跳过指南. 输入 [[accept]] 开始安装

     ./cuda_10.1.243_418.87.00_linux.run  --toolkitpath=$HOME/cuda-10.1 --defaultroot=$HOME/cuda-10.1
    

    选择 CUDA Toolkit 10.1 即可,其他都去掉 [[X]] 号

    ┌──────────────────────────────────────────────────────────────────────────────┐
    │ CUDA Installer                                                               │
    │ - [ ] Driver                                                                 │
    [ ] 418.87.00                                                           │
    │ + [X] CUDA Toolkit 10.1[ ] CUDA Samples 10.1[ ] CUDA Demo Suite 10.1[ ] CUDA Documentation 10.1│   Options                                                                    │
    │   Install                                                                    │
    │ Up/Down: Move | Left/Right: Expand | 'Enter': Select | 'A': Advanced options │
    └──────────────────────────────────────────────────────────────────────────────┘
    

    不出意外的话,这里是能够安装成功的

    移动到 /usr/local 下

     sudo mv cuda-10.1 /usr/local/
    

    配置软连接

     sudo ln -sv /usr/local/cuda-10.1/ /usr/local/cuda
    

    配置Cuda环境变量

    配置到 ~/.bashrc/etc/profile 都可以, 建议配置到 /etc/profile
    sudo vim /etc/profile , 加入以下内容

    export LD_LIBRARY_PATH=/usr/local/cuda/lib64/:$LD_LIBRARY_PATH
    export PATH=/usr/local/cuda/bin:$PATH
    

    使环境变量配置生效

    source /etc/profile
    

    检测安装是否成功

    nvcc -V
    

    出现以下类似信息,即安装成功。

    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2019 NVIDIA Corporation
    Built on Sun_Jul_28_19:07:16_PDT_2019
    Cuda compilation tools, release 10.1, V10.1.243

    安装cuDNN 7.6

    下载 cuDNN 7.6

    需要登陆账号才能下载,选择使用QQ登陆就好了

    下载地址 https://developer.nvidia.com/rdp/cudnn-download
    选择 cuDNN Library for Linux 下载即可,如图所示

    解压

    tar xvf cudnn-*.tgz 
    

    拷贝文件

    cd cuda
    sudo cp include/* /usr/local/cuda/include/ 
    sudo cp lib64/libcudnn.so.7.6.4 lib64/libcudnn_static.a /usr/local/cuda/lib64/ 
    cd /usr/lib/x86_64-linux-gnu 
    sudo ln -s libcudnn.so.7.6.4 libcudnn.so.7
    sudo ln -s libcudnn.so.7 libcudnn.so
    

    配置环境变量

    配置到 ~/.bashrc/etc/profile 都可以
    sudo vim /etc/profile , 加入以下内容

    export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"
    export CUDA_HOME=/usr/local/cuda
    export PATH="$CUDA_HOME/bin:$PATH"
    

    安装 NCCL 2.4.8

    下载 NCCL 2.4.8

    https://developer.nvidia.com/nccl/nccl-download

    安装

    tar xvf nccl_2.4.8-1+cuda10.1_x86_64.txz
    cd nccl_2.4.8-1+cuda10.1_x86_64
    sudo mkdir -p /usr/local/cuda/nccl/lib /usr/local/cuda/nccl/include 
    sudo cp *.txt /usr/local/cuda/nccl 
    sudo cp include/*.h /usr/include/ 
    sudo cp lib/libnccl.so.2.4.8 lib/libnccl_static.a /usr/lib/x86_64-linux-gnu/ 
    sudo ln -s /usr/include/nccl.h /usr/local/cuda/nccl/include/nccl.h 
    cd /usr/lib/x86_64-linux-gnu 
    sudo ln -s libnccl.so.2.4.8 libnccl.so.2 
    sudo ln -s libnccl.so.2 libnccl.so 
    for i in libnccl*; do sudo ln -s /usr/lib/x86_64-linux-gnu/$i /usr/local/cuda/nccl/lib/$i; done
    

    如果不需要手动编译 tensorflow, JDK, Babel无需安装


    安装JDK8

    sudo apt install openjdk-8-jdk

    安装babel 0.26.1

    babel 版本不能高于 0.26.1,否则会提示

    Please downgrade your bazel installation to version 0.26.1 or lower to build TensorFlow! To downgrade: download the installer for the old version (from https://github.com/bazelbuild/bazel/releases) then run the installer.

    下载 babel 0.26.1

    https://github.com/bazelbuild/bazel/releases/download/0.26.1/bazel-0.26.1-installer-linux-x86_64.sh

    安装 babel 0.26.1

    bazel 安装的时候不能放在中文文件夹下

    sudo chmod 755 ./bazel-0.26.1-installer-linux-x86_64.sh 
     ./bazel-0.26.1-installer-linux-x86_64.sh --user
    

    配置环境变量

    1. 编辑脚本 sudo vim ~/.bashrc
    2. 追加以下内容:
    export PATH="$PATH:$HOME/bin" #放在文件末尾
    
    1. 使配置生效 source ~/.bashrc

    检测babel 安装成功

    bazel version
    

    出现以下内容就算成功

    WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
    Build label: 0.26.1
    Build target: bazel-out/k8-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
    Build time: Thu Jun 6 11:05:05 2019 (1559819105)
    Build timestamp: 1559819105
    Build timestamp as int: 1559819105

    编译安装 tensorflow 2.0

    不建议手动编译pip包, 因为国内的网络问题, download github 文件时基本会失败

    下载 tensorflow 2.0

    https://github.com/tensorflow/tensorflow/archive/r2.0.zip

    解压

    你可能还需要安装解压 zip 文件的软件, 执行该命令安装 sudo apt install unzip

    unzip  tensorflow-r2.0.zip
    cd tensorflow-r2.0
    

    configure

    WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
    You have bazel 0.26.1 installed.
    Please specify the location of python. [Default is /usr/bin/python]: /usr/local/bin/python3
    
    Found possible Python library paths:
      /usr/local/lib/python3.8/site-packages
    Please input the desired Python library path to use.  Default is [/usr/local/lib/python3.8/site-packages]
    
    Do you wish to build TensorFlow with XLA JIT support? [Y/n]: 
    XLA JIT support will be enabled for TensorFlow.
    
    Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: 
    No OpenCL SYCL support will be enabled for TensorFlow.
    
    Do you wish to build TensorFlow with ROCm support? [y/N]: 
    No ROCm support will be enabled for TensorFlow.
    
    Do you wish to build TensorFlow with CUDA support? [y/N]: y
    CUDA support will be enabled for TensorFlow.
    
    Do you wish to build TensorFlow with TensorRT support? [y/N]: 
    No TensorRT support will be enabled for TensorFlow.
    
    Found CUDA 10.1 in:
        /usr/local/cuda/lib64
        /usr/local/cuda/include
    Found cuDNN 7 in:
        /usr/local/cuda/lib64
        /usr/local/cuda/include
    
    
    Please specify a list of comma-separated CUDA compute capabilities you want to build with.
    You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
    Please note that each additional compute capability significantly increases your build time and binary size, and that TensorFlow only supports compute capabilities >= 3.5 [Default is: 3.5,7.0]: 7.5
    
    
    Do you want to use clang as CUDA compiler? [y/N]: 
    nvcc will be used as CUDA compiler.
    
    Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: 
    
    
    Do you wish to build TensorFlow with MPI support? [y/N]: 
    No MPI support will be enabled for TensorFlow.
    
    Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]: 
    
    
    Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: 
    Not configuring the WORKSPACE for Android builds.
    
    Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
            --config=mkl            # Build with MKL support.
            --config=monolithic     # Config for mostly static monolithic build.
            --config=gdr            # Build with GDR support.
            --config=verbs          # Build with libverbs support.
            --config=ngraph         # Build with Intel nGraph support.
            --config=numa           # Build with NUMA support.
            --config=dynamic_kernels        # (Experimental) Build kernels into separate shared objects.
            --config=v2             # Build TensorFlow 2.x instead of 1.x.
    Preconfigured Bazel build configs to DISABLE default on features:
            --config=noaws          # Disable AWS S3 filesystem support.
            --config=nogcp          # Disable GCP support.
            --config=nohdfs         # Disable HDFS support.
            --config=noignite       # Disable Apache Ignite support.
            --config=nokafka        # Disable Apache Kafka support.
            --config=nonccl         # Disable NVIDIA NCCL support.
    Configuration finished
    

    手动编译 pip 包

    bazel build --config=opt --config=cuda --config=v2 //tensorflow/tools/pip_package:build_pip_package

    pip 安装 tensorflow-gpu

    截止本文发表日期时, tensorflow2.0 尚不支持GPU版本
    pip3 instal tensorflow-gpu

    pip 安装 pytorch

    pip3 install torch torchvision

    GPU版本tensorflow pytorch 安装完毕

    评论栏