基于DHCP服务器使用 PXE 引导和 kickstart (UEFI) 自动安装 ESXi 7

基于DHCP服务器使用 PXE 引导和 kickstart (UEFI) 自动安装 ESXi 7
基于UEFI的PXE安装 ESXi 7
PXE网络引导自动化安装 ESXi 7 详解
PXE+Kickstart 批量安装 ESXi 7
使用PXE 和TFTP 引导ESXi 安装ESXi 7 程序

1 基于DHCP服务器使用 PXE 引导和 kickstart (UEFI) 自动安装 ESXi 7 录制视频

2 浪潮NF5280M5服务器基于DHCP服务器使用 PXE 引导和 kickstart (UEFI) 自动安装 ESXi 7 录制视频

#CentOS 7 系统 安装dhcp服务器 、 tftp-server服务器 、 web服务器
yum install -y dhcp tftp-server httpd

1、配置DHCP服务器

守护进程 :/etc/sbin/dhcpd
脚本:/etc/init.d/dhcpd
端口:67(bootps) 服务端,68(bootpc) 客户端
配置文件:/etc/dhcp/dhcpd.conf
租约信息:/var/lib/dhcpd/dhcpd.leases
配置文件:cp /usr/share/doc/dhcp-4.2.5/dhcpd.conf.example /etc/dhcp/dhcpd.conf
修改DHCP监听特定端口 vi /etc/sysconfig/dhcpd

#编辑DHCP服务器配置文件
vi /etc/dhcp/dhcpd.conf

default-lease-time 600;
max-lease-time 7200;
log-facility local7;

subnet 10.53.220.0 netmask 255.255.255.0 {
  range 10.53.220.41 10.53.220.49;
  option domain-name-servers 10.1.0.1, 114.114.114.114;   #DNS服务器IP地址
  option routers 10.53.220.1;                             #网关地址
  next-server 10.53.220.224;                              # 指定TFTP服务器地址
  filename "/mboot.efi";                                  # 指定网络引导映像文件
}
#绑定IP地址
host zhangfangzhou {                                      #zhangfangzhou 随意指定,但必须唯一
  hardware ethernet 00:50:56:99:06:b7;
  fixed-address 10.53.220.48;
}

#DHCP的服务设置开机启动、启动DHCP服务
systemctl enable dhcpd && systemctl start dhcpd

########防火墙设置
firewall-cmd --zone=public --add-port=67/udp --permanent
firewall-cmd --zone=public --add-port=68/udp --permanent
#重新载入配置
firewall-cmd --reload
firewall-cmd --list-all #查看防火墙规则,只显示/etc/firewalld/zones/public.xml中防火墙策略

2、配置 tftp 服务器

启用 tftp 服务
vi /etc/xinetd.d/tftp

# default: off
# description: The tftp server serves files using the trivial file transfer \
#       protocol.  The tftp protocol is often used to boot diskless \
#       workstations, download configuration files to network-aware printers, \
#       and to start the installation process for some operating systems.
service tftp
{
        socket_type             = dgram
        protocol                = udp
        wait                    = yes
        user                    = root
        server                  = /usr/sbin/in.tftpd
        server_args             = -s /var/lib/tftpboot # TFTP服务器顶级目录
        disable                 = no    # 从yes修改为no
        per_source              = 11
        cps                     = 100 2
        flags                   = IPv4
}

#tftp的服务设置开机启动、启动tftp服务
systemctl enable tftp && systemctl start tftp

########防火墙设置
firewall-cmd --zone=public --add-port=69/tcp --permanent
firewall-cmd --zone=public --add-port=69/udp --permanent
#重新载入配置
firewall-cmd --reload
firewall-cmd --list-all #查看防火墙规则,只显示/etc/firewalld/zones/public.xml中防火墙策略

3、挂载ESXi 7 镜像

#创建文件夹,主要是下载ESXi 7和挂载ESXi 7使用
mkdir -p /var/lib/tftpboot/{iso,ESXi70u2}
wget -P /var/lib/tftpboot/iso  http://10.53.123.144/ISO/ESXi/VMware-VMvisor-Installer-7.0U2a-17867351.x86_64.iso
cd /var/lib/tftpboot
mount /var/lib/tftpboot/iso/VMware-VMvisor-Installer-7.0U2a-17867351.x86_64.iso /var/lib/tftpboot/ESXi70u2     #重启后需要重新mount

#1、UEFI启动 ESXi 7
将上面DHCP服务器指定的启动镜像文件复制到指定目录下。上面的设置示例名为“mboot.efi”,因此将其复制为该名称。
cd /var/lib/tftpboot/
cp -p /var/lib/tftpboot/ESXi70u2/efi/boot/bootx64.efi /var/lib/tftpboot/mboot.efi

直接在 tftpboot 下复制名为 boot.cfg 的文件,该文件描述了引导设置。
cp -p /var/lib/tftpboot/ESXi70u2/efi/boot/boot.cfg /var/lib/tftpboot/boot.cfg
 #2、传统BIOS启动 ESXi 7
如果要使用旧版 BIOS,则需要3.86版的 syslinux 软件包。从 https://www.kernel.org/pub/linux/utils/boot/syslinux/ 下载包
cd /tmp
wget https://mirrors.edge.kernel.org/pub/linux/utils/boot/syslinux/3.xx/syslinux-3.86.tar.gz
tar xvzf syslinux-3.86.tar.gz
cp -p syslinux-3.86/core/mboot.efi /var/lib/tftpboot/
#3、修改boot.cfg
删除 boot.cfg 中文件路径的“/”
sed -i 's|/||g' boot.cfg

4、检查boot.cfg

根据您的环境更改“标题”和“前缀”。为“title”使用一个描述性名称是个好主意,它是自动安装过程中显示的标题字符。“前缀prefix”指定存储 ESXi 安装程序的目录。在这种情况下,它将是“ESXi70u2”。

修改title和refix名称
prefix=ESXi70u2 (ESXi70u2就是上面ESXi ISO文件mount的文件夹)

# vi /var/lib/tftpboot/boot.cfg
bootstate=0
title=Loading ESXi installer www.zhangfangzhou.cn
timeout=5
prefix=ESXi70u2
kernel=b.b00
kernelopt=runweasel cdromBoot
modules=jumpstrt.gz --- useropts.gz --- features.gz --- k.b00 --- uc_intel.b00 --- uc_amd.b00 --- uc_hygon.b00 --- procfs.b00 --- vmx.v00 --- vim.v00 --- tpm.v00 --- sb.v00 --- s.v00 --- atlantic.v00 --- bnxtnet.v00 --- bnxtroce.v00 --- brcmfcoe.v00 --- brcmnvme.v00 --- elxiscsi.v00 --- elxnet.v00 --- i40enu.v00 --- iavmd.v00 --- icen.v00 --- igbn.v00 --- irdman.v00 --- iser.v00 --- ixgben.v00 --- lpfc.v00 --- lpnic.v00 --- lsi_mr3.v00 --- lsi_msgp.v00 --- lsi_msgp.v01 --- lsi_msgp.v02 --- mtip32xx.v00 --- ne1000.v00 --- nenic.v00 --- nfnic.v00 --- nhpsa.v00 --- nmlx4_co.v00 --- nmlx4_en.v00 --- nmlx4_rd.v00 --- nmlx5_co.v00 --- nmlx5_rd.v00 --- ntg3.v00 --- nvme_pci.v00 --- nvmerdma.v00 --- nvmxnet3.v00 --- nvmxnet3.v01 --- pvscsi.v00 --- qcnic.v00 --- qedentv.v00 --- qedrntv.v00 --- qfle3.v00 --- qfle3f.v00 --- qfle3i.v00 --- qflge.v00 --- rste.v00 --- sfvmk.v00 --- smartpqi.v00 --- vmkata.v00 --- vmkfcoe.v00 --- vmkusb.v00 --- vmw_ahci.v00 --- clusters.v00 --- crx.v00 --- elx_esx_.v00 --- btldr.v00 --- esx_dvfi.v00 --- esx_ui.v00 --- esxupdt.v00 --- tpmesxup.v00 --- weaselin.v00 --- loadesx.v00 --- lsuv2_hp.v00 --- lsuv2_in.v00 --- lsuv2_ls.v00 --- lsuv2_nv.v00 --- lsuv2_oe.v00 --- lsuv2_oe.v01 --- lsuv2_oe.v02 --- lsuv2_sm.v00 --- native_m.v00 --- qlnative.v00 --- vdfs.v00 --- vmware_e.v00 --- vsan.v00 --- vsanheal.v00 --- vsanmgmt.v00 --- tools.t00 --- xorg.v00 --- gc.v00 --- imgdb.tgz --- basemisc.tgz --- resvibs.tgz --- imgpayld.tgz
build=7.0.2-0.0.17867351
updated=0

到这一步可以实现通过PXE网络安装ESXi

5、配置web 服务器

#默认网站目录/var/www/html

systemctl enable httpd && systemctl start httpd

########防火墙设置
firewall-cmd --zone=public --add-port=80/tcp --permanent
firewall-cmd --zone=public --add-port=443/tcp --permanent
#重新载入配置
firewall-cmd --reload
firewall-cmd --list-all #查看防火墙规则,只显示/etc/firewalld/zones/public.xml中防火墙策略

6、配置 ESXi 7 的 kickstart

示例 kickstart 文件存储在 ESXi 的 /etc/vmware/weasel/ks.cfg 中。将此复制到 http 服务器上的 /var/www/html

#示例 1 kickstart 文件,给ESXi 7 配置 静态IP地址
vi /var/www/html/ks.cfg
#####################
# Accept the VMware End User License Agreement
vmaccepteula
 
# Set the root password for the DCUI and Tech Support Mode
rootpw P@ssw0rd
 
# The install media is in the CD-ROM drive
install --firstdisk --overwritevmfs
 
# Set the network on the first network adapter
network --bootproto=static --device=vmnic0 --ip=10.53.220.199 --netmask=255.255.255.0 --vlanid=0 --gateway=10.53.220.1 --hostname=10.53.220.199 --nameserver=10.1.0.1

#vmserialnum
vmserialnum --esx=5U4TK-DML1M-M8550-XK1QP-1A052
# reboot after install
reboot
 
# run the following command only on the firstboot
%firstboot --interpreter=busybox

sleep 10
# enable & start remote ESXi Shell (SSH)
vim-cmd hostsvc/enable_ssh
vim-cmd hostsvc/start_ssh
sleep 1
# enable & start ESXi Shell (TSM)
vim-cmd hostsvc/enable_esx_shell
vim-cmd hostsvc/start_esx_shell
sleep 1
# enable High Performance
esxcli system settings advanced set --option=/Power/CpuPolicy --string-value="High Performance"
sleep 1
#Disable ipv6
esxcli network ip set --ipv6-enabled=0
sleep 1
#不加入体验
esxcli system settings advanced set -o /UserVars/HostClientCEIPOptIn -i 2
sleep 1
#enable ntp
/bin/esxcli system ntp set --server=ntp.aliyun.com
sleep 1
/bin/chkconfig ntpd on
sleep 1
#Disable ShellWarning
esxcli system settings advanced set -o /UserVars/SuppressShellWarning -i 1
sleep 1
cat << EOF > /etc/ssh/keys-root/authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAgEA6PP8/RDdYIqUq6DE3zj9Qs8AF3uzfYH5lYrB+mMxxfI8kq2NIzQlsW1KDaH/fWYTkkK120lPUu97lWic9Li3Z3iFR6Nh4q6PVTfBNt4xOx754Ipqtpefk+9sLZAYEPK8pnRP0QZv7CtDFv842tfUYIrVNnecRQTFfNtnGDcXnO1u2RE1kq6Tr5N3595PbPLDKczjOFnS+jy0MeKKHPJcZfYz7TUTSzTwTHYbPRRaQ8/0eihUwzpAmXRo9NYNle26qp6+SlRsjGSBcUr0rh0wSe6r/C2btnpOUd//aFvcl8plhyb++nivlhB71v+6I0UcPoSXOIVs/1QuHEMbv7Ircjb2emqHtZDpk8KSYhgV0ZdbAq9XOcux76eok//xtjbleKPAcDMY/KotIEh7QX4NLQxSJOm5gCLkn5kbrHfVx6nWlGzVVds1sDlcnSAWul5lFiI5ZkArXFKcnm+aCnStPpx5SSCpZpMUdZvt8ZA7vLx3xjMDFwv5HTuTFwB9mlZrdfqp5USC2mWC3eAAPE7GxDSfJv9epteIYywIP9NVT3Z4ng9z6jrcFfA4GFlfLrk8J71cnxZ/AWZXXUwp3ooE/Cp3jc473VpK7FZwjQ7Xz9PD8WQgMOO4xnGQhPWlxhRuoTYyQVa0xOBO9gh9Cuc6zq5FQgYQEcSB+/FBJ/YNDIc= www.zhangfangzhou.cn
EOF
sleep 1
date > /finished.stamp

# Restart a last time
reboot


#示例 2 kickstart 文件,ESXi 7 通过 DHCP 服务器获得IP地址
​vi /var/www/html/ks.cfg

# Accept the VMware End User License Agreement
vmaccepteula
 
# Set the root password for the DCUI and Tech Support Mode
rootpw P@ssw0rd
 
# The install media is in the CD-ROM drive
install --firstdisk --overwritevmfs
 
# Set the network on the first network adapter
network --bootproto=dhcp

#vmserialnum
vmserialnum --esx=5U4TK-DML1M-M8550-XK1QP-1A052
# reboot after install
reboot
 
# run the following command only on the firstboot
%firstboot --interpreter=busybox

sleep 10
# enable & start remote ESXi Shell (SSH)
vim-cmd hostsvc/enable_ssh
vim-cmd hostsvc/start_ssh
sleep 1
# enable & start ESXi Shell (TSM)
vim-cmd hostsvc/enable_esx_shell
vim-cmd hostsvc/start_esx_shell
sleep 1
# enable High Performance
esxcli system settings advanced set --option=/Power/CpuPolicy --string-value="High Performance"
sleep 1
#Disable ipv6
esxcli network ip set --ipv6-enabled=0
sleep 1
#不加入体验
esxcli system settings advanced set -o /UserVars/HostClientCEIPOptIn -i 2
sleep 1
#enable ntp
/bin/esxcli system ntp set --server=ntp.aliyun.com
sleep 1
/bin/chkconfig ntpd on
sleep 1
#Disable ShellWarning
esxcli system settings advanced set -o /UserVars/SuppressShellWarning -i 1
sleep 1
cat << EOF > /etc/ssh/keys-root/authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAgEA6PP8/RDdYIqUq6DE3zj9Qs8AF3uzfYH5lYrB+mMxxfI8kq2NIzQlsW1KDaH/fWYTkkK120lPUu97lWic9Li3Z3iFR6Nh4q6PVTfBNt4xOx754Ipqtpefk+9sLZAYEPK8pnRP0QZv7CtDFv842tfUYIrVNnecRQTFfNtnGDcXnO1u2RE1kq6Tr5N3595PbPLDKczjOFnS+jy0MeKKHPJcZfYz7TUTSzTwTHYbPRRaQ8/0eihUwzpAmXRo9NYNle26qp6+SlRsjGSBcUr0rh0wSe6r/C2btnpOUd//aFvcl8plhyb++nivlhB71v+6I0UcPoSXOIVs/1QuHEMbv7Ircjb2emqHtZDpk8KSYhgV0ZdbAq9XOcux76eok//xtjbleKPAcDMY/KotIEh7QX4NLQxSJOm5gCLkn5kbrHfVx6nWlGzVVds1sDlcnSAWul5lFiI5ZkArXFKcnm+aCnStPpx5SSCpZpMUdZvt8ZA7vLx3xjMDFwv5HTuTFwB9mlZrdfqp5USC2mWC3eAAPE7GxDSfJv9epteIYywIP9NVT3Z4ng9z6jrcFfA4GFlfLrk8J71cnxZ/AWZXXUwp3ooE/Cp3jc473VpK7FZwjQ7Xz9PD8WQgMOO4xnGQhPWlxhRuoTYyQVa0xOBO9gh9Cuc6zq5FQgYQEcSB+/FBJ/YNDIc= www.zhangfangzhou.cn
EOF
sleep 1
date > /finished.stamp

# Restart a last time
reboot

7、最终boot.cfg配置文件

# vi /var/lib/tftpboot/boot.cfg
bootstate=0
title=Loading ESXi installer www.zhangfangzhou.cn
timeout=5
prefix=ESXi70u2
kernel=b.b00
kernelopt=ks=http://10.53.220.224/ks.cfg
modules=jumpstrt.gz --- useropts.gz --- features.gz --- k.b00 --- uc_intel.b00 --- uc_amd.b00 --- uc_hygon.b00 --- procfs.b00 --- vmx.v00 --- vim.v00 --- tpm.v00 --- sb.v00 --- s.v00 --- atlantic.v00 --- bnxtnet.v00 --- bnxtroce.v00 --- brcmfcoe.v00 --- brcmnvme.v00 --- elxiscsi.v00 --- elxnet.v00 --- i40enu.v00 --- iavmd.v00 --- icen.v00 --- igbn.v00 --- irdman.v00 --- iser.v00 --- ixgben.v00 --- lpfc.v00 --- lpnic.v00 --- lsi_mr3.v00 --- lsi_msgp.v00 --- lsi_msgp.v01 --- lsi_msgp.v02 --- mtip32xx.v00 --- ne1000.v00 --- nenic.v00 --- nfnic.v00 --- nhpsa.v00 --- nmlx4_co.v00 --- nmlx4_en.v00 --- nmlx4_rd.v00 --- nmlx5_co.v00 --- nmlx5_rd.v00 --- ntg3.v00 --- nvme_pci.v00 --- nvmerdma.v00 --- nvmxnet3.v00 --- nvmxnet3.v01 --- pvscsi.v00 --- qcnic.v00 --- qedentv.v00 --- qedrntv.v00 --- qfle3.v00 --- qfle3f.v00 --- qfle3i.v00 --- qflge.v00 --- rste.v00 --- sfvmk.v00 --- smartpqi.v00 --- vmkata.v00 --- vmkfcoe.v00 --- vmkusb.v00 --- vmw_ahci.v00 --- clusters.v00 --- crx.v00 --- elx_esx_.v00 --- btldr.v00 --- esx_dvfi.v00 --- esx_ui.v00 --- esxupdt.v00 --- tpmesxup.v00 --- weaselin.v00 --- loadesx.v00 --- lsuv2_hp.v00 --- lsuv2_in.v00 --- lsuv2_ls.v00 --- lsuv2_nv.v00 --- lsuv2_oe.v00 --- lsuv2_oe.v01 --- lsuv2_oe.v02 --- lsuv2_sm.v00 --- native_m.v00 --- qlnative.v00 --- vdfs.v00 --- vmware_e.v00 --- vsan.v00 --- vsanheal.v00 --- vsanmgmt.v00 --- tools.t00 --- xorg.v00 --- gc.v00 --- imgdb.tgz --- basemisc.tgz --- resvibs.tgz --- imgpayld.tgz
build=7.0.2-0.0.17867351
updated=0

到这里可以基于DHCP实现通过PXE自动安装 ESXi 7、自动设置系统密码、自动接受许可、自动添加产品密钥、自动启用SSH服务、自动启用shell服务、电源模式自动设置为高性能、禁用IPV6、不参加(CEIP)客户体验改善计划、自动配置NTP、自动添加私钥。

https://www.zhangfangzhou.cn/pxe-kickstart-install-uefi-esxi-7.html

浪潮 NF5280M5 服务器 CPU 升级 Xeon Gold 6248 实战与经验总结

充分盘活现有硬件资源,本次选取浪潮(Inspur)NF5280M5 服务器进行定向硬件升级,在不更换整机平台的前提下,将原配置的 Xeon Gold 6132 处理器升级为 Xeon Gold 6248

一、升级前确认(非常重要)

1、平台兼容性确认

浪潮 NF280M5 支持:Intel Xeon® Scalable 第一代(Skylake-SP)和 Intel Xeon® Scalable 第二代(Cascade Lake-SP)
- 插槽:LGA3647
- 芯片组:Intel C621

型号 核心 / 线程 基准频率 最大睿频 L3 缓存 TDP
Xeon Gold 6132(当前机型) 14 / 28 2.60 GHz 3.70 GHz 19.25 MB 140 W
Xeon Gold 6134 8 / 16 3.20 GHz 3.70 GHz 24.75 MB 130 W
Xeon Gold 6138 20 / 40 2.00 GHz 3.70 GHz 27.5 MB 125 W
Xeon Gold 6230 20 / 40 2.10 GHz 4.00 GHz 27.5 MB 125 W
Xeon Gold 6230R 26 / 52 2.10 GHz 4.00 GHz 35.75 MB 150 W
Xeon Gold 6248 20 / 40 2.50 GHz 3.90 GHz 27.5 MB 150 W
Xeon Gold 6252 24 / 48 2.10 GHz 3.90 GHz 35.75 MB 150 W
Xeon Platinum 8260 24 / 48 2.40 GHz 3.90 GHz 35.75 MB 205 W
Xeon Platinum 8270 26 / 52 2.70 GHz 4.00 GHz 35.75 MB 205 W
Xeon Platinum 8276L 28 / 56 2.20 GHz 4.00 GHz 38.5 MB 165 W
Xeon Platinum 8280 28 / 56 2.70 GHz 4.00 GHz 38.5 MB 205 W
Xeon Platinum 8280L(大内存版) 28 / 56 2.70 GHz 4.00 GHz 38.5 MB 205 W

2、型号推荐

高性能计算 / 大数据/虚拟化
推荐:Xeon Gold 6230R / Xeon Platinum 8260 / Platinum 8280

通用服务器负载(Web/应用/RDB)
推荐:Xeon Gold 6138 / Gold 6248 / Gold 6252

3、散热器是否支持 150W

6132 TDP = 140W
6248 TDP = 150W

二、准备工具

- T30梅花内六角螺丝刀(长柄)
- 无水酒精
- 无尘布
- 硅脂(信越7921)
- 防静电措施

三、拆卸步骤(关键)

这种是 四螺丝压框结构,必须按顺序。

第一步:断电

- 关机
- 拔电源
- 按电源键 10 秒放电

第二步:拆散热器

NF5280M5 一般是四颗弹簧螺丝:

按对角线顺序慢慢松:

4 3 2 1

每颗螺丝分几次慢慢松,不要一次拧到底,防止主板受力不均

第三步:拆 CPU 扣具(LGA3647 重点)

- 使用卡片,按照缝隙,一点一点撬起来

四、安装新 CPU(6248)

1、把旧 CPU 从扣具取下

注意方向标识(三角标)

2、放回插槽

轻轻放入定位槽

五、涂硅脂 + 装散热器

- 中间一点或十字涂抹

3、安装CPU扣具和散热

(普通手感:拧紧到停止,不要暴力)

按顺序拧紧:1 2 3 4

六、开机验证

进 BIOS 看:

- CPU 型号
- 核心数
- 频率

七、NF5280M5 升级注意事项

1. 必须双路一致

如果是双路机器:

6132 + 6248 混插是不行的

必须两颗都换成 6248

2. 内存频率变化

6132 最高 2666
6248 支持 2933

如果内存支持 2933,会自动提升

3.Xeon Gold 6132 和 Xeon Gold 6248 参数对比

CPU 型号参数对比

项目 Xeon Gold 6132 Xeon Gold 6248
架构 Skylake Cascade Lake
制程工艺 14 nm 14 nm
核心数 14 核 20 核
线程数 28 线程 40 线程
基础频率 2.60 GHz 2.50 GHz
最大睿频 3.70 GHz 3.90 GHz
L3 缓存 19.25 MB 27.5 MB
TDP 140 W 150 W
内存支持 DDR4-2666 ECC(6 通道) DDR4-2933 ECC(6 通道)
插槽 LGA3647 LGA3647

浪潮 NF5280M4 服务器 CPU 升级 Xeon E5-2690 v4 实战与经验总结

一、背景说明

为搭建 KVM 虚拟化测试平台,充分盘活现有硬件资源,本次选取浪潮(Inspur)NF5280M4 服务器进行定向硬件升级,在不更换整机平台的前提下,将原配置的 Intel Xeon E5-2630 v4 处理器升级为 Intel Xeon E5-2690 v4。

二、CPU 型号参数对比

项目 Xeon E5-2630 v4 Xeon E5-2690 v4
架构 Broadwell-EP Broadwell-EP
制程工艺 14 nm 14 nm
核心数 10 核 14 核
线程数 20 线程 28 线程
基础频率 2.2 GHz 2.6 GHz
最大睿频 3.1 GHz 3.5 GHz
三级缓存 25 MB 35 MB
TDP 85 W 135 W
内存支持 DDR4-2400 DDR4-2400

三、准备工具与材料

防静电手环;
十字螺丝刀;
酒精棉片/无尘布;
导热硅脂。

四、服务器下电与拆机

正常关闭操作系统;
断开服务器电源线及所有外设连接;
按压静电释放按钮(如有),佩戴防静电手环;
打开服务器上盖。

五、CPU 拆卸顺序说明

拆卸顺序原则:
先拆 CPU2(副 CPU,通常靠近扩展槽一侧);
再拆 CPU1(主 CPU,通常靠近内存通道 0)。

原因:
避免在操作过程中对主 CPU 区域造成误触;
便于空间操作和散热器取放。

六、拆卸原有 CPU(E5-2630 v4));垂直取出旧 CPU,放入防静电盒中保存。

七、安装新 CPU(E5-2690 v4)对齐 CPU 金色三角标识与插槽标识;垂直放入 CPU,确认完全贴合;按顺序压下插槽压杆并锁定;清理散热器底部旧硅脂,重新均匀涂器拧紧螺丝。

双路服务器需 CPU1、CPU2 同步更换为同型号处理器,避免兼容及性能问题。

八、上电检查与验证

盖好服务器上盖,连接电源,登录操作系统;

思杰云桌面 Citrix XenDesktop 基于 Citrix Gateway 的定时登录访问控制配置

一、思杰云桌面整体架构说明

思杰云桌面(Citrix Virtual Apps and Desktops)采用分层、集中管控的架构设计,整体包括预控与安全接入层、控制层、资源层及基础设施层,为平台的高可用性、安全性及可扩展性提供系统性保障。

在预控与安全接入层,部署Citrix ADC(Citrix Gateway)作为统一访问入口,承担外部用户接入控制、流量调度与安全策略执行等关键职能。ADC 通过负载均衡、访问策略控制、身份认证集成(如 LDAP、双因子认证)、会话安全防护等能力,对用户访问行为进行集中管理,是云桌面对外服务的第一道安全防线。

在控制层,Delivery Controller(DDC) 负责虚拟桌面与应用的发布和会话调度;StoreFront(SF) 提供统一的用户访问入口;Director 用于系统运行监控、会话管理及故障分析。这些核心组件依赖后端数据库集群存储站点配置及运行数据,其稳定性直接关系到云桌面整体服务能力。

在资源与基础设施层,云桌面运行于统一的虚拟化平台(如 VMware、Hyper-V),并由集中存储系统提供桌面镜像、用户数据及配置文件的持久化支撑,确保性能稳定与数据安全。


二、限制登录时间的必要性

综合云桌面系统的架构特点及实际安全运行情况,通过 Citrix ADC(Citrix Gateway) 实施登录时间限制,非办公时段关闭登录入口,可显著降低被漏洞扫描、暴力破解和自动化攻击命中的概率,从源头减少安全风险。

1、登录Citrix ADC(Citrix Gateway)找到LDAP策略

一个脚本搞定Ubuntu\Rocky登录后自动展示 CPU/内存/多盘使用情况

登录Linux服务器后,自动展示平时关注的系统信息,一目了然。
1、效果展示

2、脚本内容
vi /etc/profile.d/sysinfo.sh

#!/usr/bin/env bash

# ================== Colors ==================
GREEN="\033[1;32m"
YELLOW="\033[1;33m"
CYAN="\033[1;36m"
RESET="\033[0m"

# ================== Separator ==================
LINE_LEN=60
LINE=$(printf '%*s' "$LINE_LEN" '' | tr ' ' '-')

# ================== OS & Kernel ==================
if [ -f /etc/os-release ]; then
    OS_NAME=$(awk -F= '/^PRETTY_NAME/ {gsub(/"/,""); print $2}' /etc/os-release)
else
    OS_NAME=$(uname -s)
fi

KERNEL_VER=$(uname -r)

# ================== Basic Info ==================
HOSTNAME=$(hostname)

UPTIME=$(uptime -p 2>/dev/null | sed 's/^up //')
[ -z "$UPTIME" ] && UPTIME=$(uptime | awk -F',' '{print $1}')

LOADAVG=$(awk '{print $1", "$2", "$3}' /proc/loadavg)

# ================== Memory ==================
read MEM_TOTAL MEM_USED <<<$(free -m | awk '/^Mem:/ {print $2, $3}')
MEM_PCT=$(( MEM_USED * 100 / MEM_TOTAL ))

# ================== IP Address ==================
IP_ADDR=$(hostname -I 2>/dev/null | awk '{print $1}')
IP_ADDR=${IP_ADDR:-"N/A"}

# ================== CPU Usage (top, fast) ==================
CPU_IDLE=$(top -bn1 2>/dev/null \
  | grep -E "Cpu\(s\)|%Cpu\(s\)" \
  | sed 's/,/\n/g' \
  | awk '/id/ {print $1}')

CPU_USAGE=$(awk "BEGIN {printf \"%.0f\", 100 - $CPU_IDLE}")

# ================== Output ==================
echo -e "\n${GREEN}Welcome, system information overview (系统信息概览)${RESET}"
echo -e "${YELLOW}${LINE}${RESET}"

printf "| %-16s | %-38s |\n" "Resource (资源)" "Status (使用情况)"
printf "|------------------|----------------------------------------|\n"
printf "| %-16s | %-38s |\n" "Hostname" "$HOSTNAME"
printf "| %-16s | %-38s |\n" "OS Version" "$OS_NAME"
printf "| %-16s | %-38s |\n" "Kernel Version" "$KERNEL_VER"
printf "| %-16s | %-38s |\n" "IP Address" "$IP_ADDR"
printf "| %-16s | %-38s |\n" "CPU Usage" "${CPU_USAGE}%"
printf "| %-16s | %-38s |\n" "Memory" "${MEM_USED}MB / ${MEM_TOTAL}MB (${MEM_PCT}%)"
printf "| %-16s | %-38s |\n" "Load Average" "$LOADAVG"
printf "| %-16s | %-38s |\n" "Uptime" "$UPTIME"

echo -e "${YELLOW}${LINE}${RESET}"
echo -e "${CYAN}Disk Usage (磁盘使用情况)${RESET}"
echo -e "${YELLOW}${LINE}${RESET}"

printf "| %-20s | %-10s | %-10s | %-6s |\n" \
       "Mount Point" "Used" "Total" "Use%"
printf "|----------------------|------------|------------|--------|\n"

df -h -x tmpfs -x devtmpfs | awk 'NR>1 {
    printf "| %-20s | %-10s | %-10s | %-6s |\n", $6, $3, $2, $5
}'

echo -e "${YELLOW}${LINE}${RESET}"
echo -e "${GREEN}Operate carefully. (谨慎操作)${RESET}\n"

编译安装 Nginx 1.20.2,并增加 Sticky 和 nginx_upstream_check_module 模块

nginx_upstream_check_module 模块的作用主要是为 Nginx 的 upstream(上游服务器) 提供主动健康检查 和一些负载均衡增强功能。

1、具体来说,它的功能和作用如下:
主动健康检查(Active Health Check)
默认 Nginx 的 upstream 只能被动发现后端宕机(通过连接失败或响应超时)。
安装了 nginx_upstream_check_module 后,可以:
定期向后端服务器发送 HTTP 请求或 TCP 探针。
判断后端服务器是否可用。
动将不可用的服务器标记为 down,从负载均衡池中剔除。

2、支持多种类型:
type=http → 用于 HTTP 服务。
type=https → 用于 HTTPS 服务。
type=tcp → 用于 TCP 服务(非 HTTP)。

interval:检查间隔,例如 `interval=5000` 表示每 5 秒检查一次。
rise:连续成功多少次后,判定节点恢复正常。
fall:连续失败多少次后,判定节点不可用。
timeout:检查请求超时时间(毫秒)。

3、可自定义 HTTP 请求报文。
判断响应状态码是否正常,例如 2xx 或 3xx。

4、设置可用更新源

wget -O /etc/yum.repos.d/epel.repo https://mirrors.aliyun.com/repo/epel-7.repo
wget -4 --no-check-certificate -O /etc/yum.repos.d/CentOS-Base.repo https://www.zhangfangzhou.cn/third/Centos-7.repo

5、下载源码

### Nginx 源码
wget https://nginx.org/download/nginx-1.20.2.tar.gz
tar -zxvf nginx-1.20.2.tar.gz

### zlib(可选,用于数据压缩)
wget http://www.zlib.net/zlib-1.2.13.tar.gz
tar -zxf zlib-1.2.13.tar.gz

### PCRE(支持正则表达式匹配)
wget --no-check-certificate https://ftp.pcre.org/pub/pcre/pcre-8.45.tar.gz
tar -zxf pcre-8.45.tar.gz

### OpenSSL(支持 HTTPS)
wget https://www.openssl.org/source/openssl-1.1.1t.tar.gz
tar -zxf openssl-1.1.1t.tar.gz
mv openssl-1.1.1t openssl

### Sticky 模块
wget https://bitbucket.org/nginx-goodies/nginx-sticky-module-ng/get/08a395c66e42.zip -O sticky.zip
unzip sticky.zip
mv nginx-goodies-nginx-sticky* nginx-sticky

### nginx_upstream_check_module
wget https://github.com/yaoweibin/nginx_upstream_check_module/archive/master.zip -O nginx_upstream_check_module-master.zip
unzip nginx_upstream_check_module-master.zip

6、编译安装

yum -y install libtool  
#zlib 是提供数据压缩之用的库 (非必要编译安装)
cd ~
cd zlib-1.2.13
./configure --prefix=/usr/local/zlib
make && make install
echo "/usr/local/zlib/lib" > /etc/ld.so.conf.d/zlib.conf
ldconfig

#pcre PCRE库是一组函数,它们使用与Perl 5相同的语法和语义实现正则表达式模式匹配.(非必要编译安装)
cd ~
sudo yum install gcc-c++ -y
tar -zxf pcre-8.45.tar.gz
cd pcre-8.45
./configure --prefix=/usr/local/pcre --enable-utf8
make && make install
~/pcre-8.45/libtool --finish  /usr/local/pcre/lib/
echo "/usr/local/pcre/lib/" > /etc/ld.so.conf.d/pcre.conf
ldconfig

## 安装 Patch 工具(为 nginx_upstream_check_module 打补丁)
yum install -y patch

ngstable=1.20.2
#Install Nginx
cd ~
yum -y install gzip man
tar -zxf nginx-${ngstable}.tar.gz
#
#Custom nginx name
sed -i 's@^#define NGINX_VER          "nginx/" NGINX_VERSION@#define NGINX_VER          "Microsoft-IIS/10.0/" NGINX_VERSION@g'  ~/nginx-${ngstable}/src/core/nginx.h
sed -i 's@^#define NGINX_VAR          "NGINX"@#define NGINX_VAR          "Microsoft-IIS"@g'  ~/nginx-${ngstable}/src/core/nginx.h
sed -i '30,40s@nginx@Microsoft-IIS@g'  ~/nginx-${ngstable}/src/http/ngx_http_special_response.c
sed -i '45,50s@nginx@Microsoft-IIS@g' ~/nginx-${ngstable}/src/http/ngx_http_header_filter_module.c
#
#Nginx shows the file name length of a static directory file
sed -i 's/^#define NGX_HTTP_AUTOINDEX_PREALLOCATE  50/#define NGX_HTTP_AUTOINDEX_PREALLOCATE  150/'  ~/nginx-${ngstable}/src/http/modules/ngx_http_autoindex_module.c
sed -i 's/^#define NGX_HTTP_AUTOINDEX_NAME_LEN     50/#define NGX_HTTP_AUTOINDEX_NAME_LEN     150/'  ~/nginx-${ngstable}/src/http/modules/ngx_http_autoindex_module.c
#
yum install -y patch
cd ~/nginx-${ngstable}
patch -p1 < ../nginx_upstream_check_module-master/check_1.20.1+.patch
# patch -p1 < ../nginx_upstream_check_module-master/check_1.20.1+.patch
patching file src/http/modules/ngx_http_upstream_hash_module.c
patching file src/http/modules/ngx_http_upstream_ip_hash_module.c
patching file src/http/modules/ngx_http_upstream_least_conn_module.c
patching file src/http/ngx_http_upstream_round_robin.c
patching file src/http/ngx_http_upstream_round_robin.h

#Copy NGINX manual page to /usr/share/man/man8:
cp -f ~/nginx-${ngstable}/man/nginx.8 /usr/share/man/man8
gzip /usr/share/man/man8/nginx.8

cd ~/nginx-${ngstable}
./configure --prefix=/usr/local/nginx --user=www --group=www \
--build=CentOS \
--modules-path=/usr/local/nginx/modules \
--with-openssl=/root/openssl \
--with-pcre=/root/pcre-8.45 \
--with-zlib=/root/zlib-1.2.13 \
--add-module=/root/nginx-sticky \
--add-module=/root/nginx_upstream_check_module-master \
--with-http_stub_status_module \
--with-http_secure_link_module \
--with-threads \
--with-file-aio \
--with-http_v2_module \
--with-http_ssl_module \
--with-http_gzip_static_module \
--with-http_gunzip_module \
--with-http_realip_module \
--with-http_flv_module \
--with-http_mp4_module \
--with-http_sub_module \
--with-http_dav_module \
--with-stream \
--with-stream=dynamic \
--with-stream_ssl_module \
--with-stream_realip_module \
--with-stream_ssl_preread_module
make -j$(nproc)
make install

7、平滑升级nginx

mv /usr/local/nginx/sbin/nginx /usr/local/nginx/sbin/nginx.old

#然后拷贝一份新编译的二进制文件:
cp ~/nginx-${ngstable}/objs/nginx /usr/local/nginx/sbin/

#检测配置
nginx -t
kill -USR2 `cat /var/run/nginx.pid`
kill -HUP `cat /var/run/nginx.pid`

8、修改配置

upstream oah {
sticky; # 会话保持
server 10.53.121.51:8080;
server 10.53.121.52:8080;
server 10.53.121.53:8080;
server 10.53.121.66:8080;
server 10.53.121.67:8080;
# TCP 健康检查,只检测端口是否可用
check interval=5000 rise=2 fall=3 timeout=3000 type=tcp;
}

9、添加status页面
location /status {
check_status;
}

如何在Ubuntu 22.04上安装GeForce RTX 4090和GeForce RTX 2080TI驱动程序和CUDA

如何在Ubuntu 22.04上安装GeForce RTX 4090驱动程序和CUDA
How to Install GeForce RTX 4090 Drivers and CUDA on Ubuntu 22.04

如何在Ubuntu 22.04上安装GeForce RTX 2080TI驱动程序和CUD
How to Install GeForce RTX 2080 TI Drivers and CUDA on Ubuntu 22.04

1. 打开终端并更新
使用sudo更新apt软件包列表并使用sudo升级apt软件包

sudo apt update && sudo apt upgrade -y  
sudo apt install build-essential dkms linux-headers-$(uname -r) -y

2. 确定可用驱动
使用sudo命令列出ubuntu驱动程序列表

sudo ubuntu-drivers list
......
nvidia-driver-565, (kernel modules provided by nvidia-dkms-565)
nvidia-driver-535, (kernel modules provided by linux-modules-nvidia-535-generic)
nvidia-driver-575, (kernel modules provided by nvidia-dkms-575)
nvidia-driver-555-open, (kernel modules provided by nvidia-dkms-555-open)
nvidia-driver-575-open, (kernel modules provided by nvidia-dkms-575-open)
nvidia-driver-570-server, (kernel modules provided by linux-modules-nvidia-570-server-generic)
nvidia-driver-555, (kernel modules provided by nvidia-dkms-555)
nvidia-driver-535-server-open, (kernel modules provided by linux-modules-nvidia-535-server-open-generic)
nvidia-driver-560, (kernel modules provided by nvidia-dkms-560)
nvidia-driver-570-server-open, (kernel modules provided by linux-modules-nvidia-570-server-open-generic)
nvidia-driver-570-open, (kernel modules provided by linux-modules-nvidia-570-open-generic)
nvidia-driver-535-server, (kernel modules provided by linux-modules-nvidia-535-server-generic)
nvidia-driver-560-open, (kernel modules provided by nvidia-dkms-560-open)
nvidia-driver-565-open, (kernel modules provided by nvidia-dkms-565-open)
nvidia-driver-550, (kernel modules provided by linux-modules-nvidia-550-generic)
nvidia-driver-550-open, (kernel modules provided by linux-modules-nvidia-550-open-generic)
nvidia-driver-570, (kernel modules provided by linux-modules-nvidia-570-generic)
nvidia-driver-535-open, (kernel modules provided by linux-modules-nvidia-535-open-generic)
open-vm-tools-desktop

3.禁用 Nouveau 驱动:创建配置文件并更新,重启系统

sudo cat <<EOF | sudo tee /etc/modprobe.d/blacklist-nvidia.conf
blacklist nouveau
blacklist nvidia
blacklist nvidiafb
blacklist rivafb
EOF
sudo update-initramfs -u
sudo reboot

4.安装驱动

sudo apt install nvidia-driver-570 -y    #sudo apt install nvidia-driver-580 -y

5.验证GeForce RTX 4090驱动是否正常加载

root@lenovo-ThinkStation-PX:~# nvidia-smi 
Fri Jul 18 17:13:35 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.169                Driver Version: 570.169        CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        Off |   00000000:2A:00.0 Off |                  Off |
| 32%   42C    P8             34W /  450W |      87MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 4090        Off |   00000000:3D:00.0 Off |                  Off |
| 30%   34C    P8             16W /  450W |      15MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA GeForce RTX 4090        Off |   00000000:BD:00.0 Off |                  Off |
| 32%   32C    P8              6W /  450W |      15MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA GeForce RTX 4090        Off |   00000000:E1:00.0 Off |                  Off |
| 32%   33C    P8             15W /  450W |      15MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            2453      G   /usr/lib/xorg/Xorg                       46MiB |
|    0   N/A  N/A            2564      G   /usr/bin/gnome-shell                     13MiB |
|    1   N/A  N/A            2453      G   /usr/lib/xorg/Xorg                        4MiB |
|    2   N/A  N/A            2453      G   /usr/lib/xorg/Xorg                        4MiB |
|    3   N/A  N/A            2453      G   /usr/lib/xorg/Xorg                        4MiB |
+-----------------------------------------------------------------------------------------+

验证GeForce RTX 2080 TI驱动是否正常加载

root@su:~# nvidia-smi 
Sat Nov  1 12:04:44 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05              Driver Version: 580.95.05      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 2080 Ti     Off |   00000000:00:0B.0 Off |                  N/A |
| 16%   35C    P0             86W /  250W |       1MiB /  11264MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

6.安装 CUDA

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-8

用 nvcc --version 确认cuda的版本,如果显示Command nvcc not found,则编辑~/.bashrc

vim ~/.bashrc
export PATH=/usr/local/cuda-12.8/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-12.8/lib64:${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

#更新变量
source ~/.bashrc

root@su:~# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Feb_21_20:23:50_PST_2025
Cuda compilation tools, release 12.8, V12.8.93
Build cuda_12.8.r12.8/compiler.35583870_0

7.锁定驱动版本防止升级冲突

sudo apt-mark hold nvidia-driver-570
sudo apt-mark hold cuda-toolkit-12-8

GeForce RTX 4090

GeForce RTX 2080TI

如何在UUbuntu 24.04 安装 NVIDIA L40 显卡驱动与 CUDA(12.8 / 13.1)

To install the NVIDIA L40 driver and cuda on Ubuntu 24.04
适用于 NVIDIA L40 / L40S 数据中心显卡,在 Ubuntu 24.04 LTS 系统下部署深度学习、AI 推理、CUDA 计算环境。

一、环境说明
操作系统:Ubuntu 24.04 LTS(x86_64 架构)
GPU 型号:NVIDIA L40(数据中心级显卡)
NVIDIA 驱动
nvidia-driver-570-server(推荐,生产环境稳定)
nvidia-driver-590-server(可选,用于支持 CUDA 13.1)
CUDA Toolkit
CUDA 12.8(长期稳定版本,推荐)
CUDA 13.1(新特性版本,测试/验证使用)
BIOS 启动模式:UEFI (不要开启EFI安全引导,Secure Boot 会阻止未签名或非官方签名的内核模块加载,而 NVIDIA 驱动、VFIO 及 DKMS 模块往往不满足 Secure Boot 的签名要求)
Above 4G Decoding:已开启(GPU 透传 / 大显存 BAR 必需)

二、打开终端并更新
使用sudo更新apt软件包列表并使用sudo升级apt软件包

sudo apt update && sudo apt upgrade -y

三、查看 Ubuntu 可用的 NVIDIA 驱动版本
使用sudo命令列出ubuntu驱动程序列表

sudo ubuntu-drivers list
......
nvidia-driver-565, (kernel modules provided by nvidia-dkms-565)
nvidia-driver-535, (kernel modules provided by linux-modules-nvidia-535-generic)
nvidia-driver-575, (kernel modules provided by nvidia-dkms-575)
nvidia-driver-555-open, (kernel modules provided by nvidia-dkms-555-open)
nvidia-driver-575-open, (kernel modules provided by nvidia-dkms-575-open)
nvidia-driver-570-server, (kernel modules provided by linux-modules-nvidia-570-server-generic)
nvidia-driver-590-server, (kernel modules provided by linux-modules-nvidia-590-server-generic)
nvidia-driver-555, (kernel modules provided by nvidia-dkms-555)
nvidia-driver-535-server-open, (kernel modules provided by linux-modules-nvidia-535-server-open-generic)
nvidia-driver-560, (kernel modules provided by nvidia-dkms-560)
nvidia-driver-570-server-open, (kernel modules provided by linux-modules-nvidia-570-server-open-generic)
nvidia-driver-570-open, (kernel modules provided by linux-modules-nvidia-570-open-generic)
nvidia-driver-535-server, (kernel modules provided by linux-modules-nvidia-535-server-generic)
nvidia-driver-560-open, (kernel modules provided by nvidia-dkms-560-open)
nvidia-driver-565-open, (kernel modules provided by nvidia-dkms-565-open)
nvidia-driver-550, (kernel modules provided by linux-modules-nvidia-550-generic)
nvidia-driver-550-open, (kernel modules provided by linux-modules-nvidia-550-open-generic)
nvidia-driver-570, (kernel modules provided by linux-modules-nvidia-570-generic)
nvidia-driver-535-open, (kernel modules provided by linux-modules-nvidia-535-open-generic)
open-vm-tools-desktop

四、安装驱动
L40 / A100 / H100 等数据中心卡,Server 驱动比普通驱动更适合 L40 / 多卡服务器 / 无显示环境

sudo apt install nvidia-driver-570-server    #数据中心卡(如L40)专用驱动,稳定性更好,支持 MIG、多用户多实例等特性

如果需要新版可以安装
sudo apt install nvidia-driver-590-server

五、重启系统

sudo reboot

六、验证驱动是否正常加载

root@su:~# nvidia-smi
Thu Jun 12 06:14:12 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.133.20             Driver Version: 570.133.20     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L40                     Off |   00000000:13:00.0 Off |                    0 |
| N/A   29C    P0             79W /  300W |       0MiB /  46068MiB |      3%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

7、安装 CUDA
安装CUDA12.8

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-8

用 nvcc --version 确认cuda的版本,如果显示Command nvcc not found,则编辑~/.bashrc,配置 CUDA 环境变量(解决 nvcc not found)

vim ~/.bashrc
export PATH=/usr/local/cuda-12.8/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-12.8/lib64:${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

#更新变量
source ~/.bashrc

root@su:~# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Feb_21_20:23:50_PST_2025
Cuda compilation tools, release 12.8, V12.8.93
Build cuda_12.8.r12.8/compiler.35583870_0

安装CUDA 13.1

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-ubuntu2404.pin
sudo mv cuda-ubuntu2404.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/13.1.1/local_installers/cuda-repo-ubuntu2404-13-1-local_13.1.1-590.48.01-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2404-13-1-local_13.1.1-590.48.01-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2404-13-1-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-13-1


用 nvcc --version 确认cuda的版本,如果显示Command nvcc not found,则编辑~/.bashrc,配置 CUDA 环境变量(解决 nvcc not found)

vim ~/.bashrc
export PATH=/usr/local/cuda-13.1/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-13.1/lib64:${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

#更新变量
source ~/.bashrc

root@su:~# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Feb_21_20:23:50_PST_2025
Cuda compilation tools, release 12.8, V12.8.93
Build cuda_12.8.r12.8/compiler.35583870_0

八、锁定驱动版本防止升级冲突

sudo apt-mark hold nvidia-driver-570-server
sudo apt-mark hold cuda-toolkit-12-8



Q1:CUDA 版本一定要和 nvidia-smi 显示一致吗?
不需要完全一致,只要 驱动版本 ≥ CUDA 需求版本 即可。

Q2:可以同时安装多个 CUDA 版本吗?
可以,需手动切换 /usr/local/cuda 软链接或 PATH。

基于VMware 组织唯一标识符(OUI)和vCenter Server ID的虚拟机MAC地址规划

VMware vCenter Server MAC 地址分配与冲突避免

1. vCenter Server MAC 地址分配机制

在企业级 VMware 虚拟化环境中,多个 vCenter Server 可能共存。默认情况下,vCenter Server 使用 VMware 组织唯一标识符 (OUI) 和 vCenter Server ID 生成虚拟机 MAC 地址。

MAC 地址格式

VMware OUI:00:50:56
完整格式:00:50:56:XX:YY:ZZ

  • XX 计算方式80 + vCenter Server ID(十六进制表示)

  • YY、ZZ:随机分配的十六进制值

地址分配范围

  • 适用于 0-63 之间的 vCenter Server 实例 ID

  • 每个 vCenter Server 最多可分配 64,000 个唯一 MAC 地址

  • 地址范围:

    • 00:50:56:80:YY:ZZ(vCenter ID = 0)

    • 00:50:56:81:YY:ZZ(vCenter ID = 1)

    • ...

    • 00:50:56:BF:YY:ZZ(vCenter ID = 63)

分配方式 vCenter Server

ID

MAC地址

范围

OUI 0 00:50:56:80:YY:ZZ
OUI 1 00:50:56:81:YY:ZZ
OUI 2 00:50:56:82:YY:ZZ
OUI 3 00:50:56:83:YY:ZZ
OUI 4 00:50:56:84:YY:ZZ
OUI 5 00:50:56:85:YY:ZZ
OUI 6 00:50:56:86:YY:ZZ
OUI 7 00:50:56:87:YY:ZZ
OUI 8 00:50:56:88:YY:ZZ
OUI 9 00:50:56:89:YY:ZZ
OUI 10 00:50:56:8A:YY:ZZ
OUI 11 00:50:56:8B:YY:ZZ
OUI 12 00:50:56:8C:YY:ZZ
OUI 13 00:50:56:8D:YY:ZZ
OUI 14 00:50:56:8E:YY:ZZ
OUI 15 00:50:56:8F:YY:ZZ
OUI 16 00:50:56:90:YY:ZZ
OUI 17 00:50:56:91:YY:ZZ
OUI 18 00:50:56:92:YY:ZZ
OUI 19 00:50:56:93:YY:ZZ
OUI 20 00:50:56:94:YY:ZZ
OUI 21 00:50:56:95:YY:ZZ
OUI 22 00:50:56:96:YY:ZZ
OUI 23 00:50:56:97:YY:ZZ
OUI 24 00:50:56:98:YY:ZZ
OUI 25 00:50:56:99:YY:ZZ
OUI 26 00:50:56:9A:YY:ZZ
OUI 27 00:50:56:9B:YY:ZZ
OUI 28 00:50:56:9C:YY:ZZ
OUI 29 00:50:56:9D:YY:ZZ
OUI 30 00:50:56:9E:YY:ZZ
OUI 31 00:50:56:9F:YY:ZZ
OUI 32 00:50:56:A0:YY:ZZ
OUI 33 00:50:56:A1:YY:ZZ
OUI 34 00:50:56:A2:YY:ZZ
OUI 35 00:50:56:A3:YY:ZZ
OUI 36 00:50:56:A4:YY:ZZ
OUI 37 00:50:56:A5:YY:ZZ
OUI 38 00:50:56:A6:YY:ZZ
OUI 39 00:50:56:A7:YY:ZZ
OUI 40 00:50:56:A8:YY:ZZ
OUI 41 00:50:56:A9:YY:ZZ
OUI 42 00:50:56:AA:YY:ZZ
OUI 43 00:50:56:AB:YY:ZZ
OUI 44 00:50:56:AC:YY:ZZ
OUI 45 00:50:56:AD:YY:ZZ
OUI 46 00:50:56:AE:YY:ZZ
OUI 47 00:50:56:AF:YY:ZZ
OUI 48 00:50:56:B0:YY:ZZ
OUI 49 00:50:56:B1:YY:ZZ
OUI 50 00:50:56:B2:YY:ZZ
OUI 51 00:50:56:B3:YY:ZZ
OUI 52 00:50:56:B4:YY:ZZ
OUI 53 00:50:56:B5:YY:ZZ
OUI 54 00:50:56:B6:YY:ZZ
OUI 55 00:50:56:B7:YY:ZZ
OUI 56 00:50:56:B8:YY:ZZ
OUI 57 00:50:56:B9:YY:ZZ
OUI 58 00:50:56:BA:YY:ZZ
OUI 59 00:50:56:BB:YY:ZZ
OUI 60 00:50:56:BC:YY:ZZ
OUI 61 00:50:56:BD:YY:ZZ
OUI 62 00:50:56:BE:YY:ZZ
OUI 63 00:50:56:BF:YY:ZZ

潜在冲突风险

如果两个 vCenter Server 具有相同的实例 ID,可能会生成相同的 MAC 地址,导致:

  • 网络冲突(数据包丢失、通信异常)

  • IP 绑定错误(DHCP 服务器误分配 IP)


2. 修改 vCenter Server 实例 ID 以避免冲突

修改步骤(vSphere Client)

  1. 导航vCenter Server > 配置

  2. 选择 常规 > 编辑

  3. 调整 运行时设置

    • vCenter Server 唯一 ID 中输入 0-63 范围内的唯一值

    • 设置 vCenter Server 受管地址(IPv4/IPv6/FQDN)

    • 确保 vCenter Server 名称 正确

  4. 保存更改并重启 vCenter Server

重要提示

  • 修改实例 ID 不会影响已分配的 MAC 地址

  • 若需重新分配 MAC 地址,可将虚拟机网卡设置为手动 MAC,然后改回自动分配

www.zhangfangzhou.cn