Pytorch入门1

本文原文见 Microsoft Learnpytorch官网,有删改。

1
import torch
1
!python --version
Python 3.10.4

数据

导入包、下载并导入数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
%matplotlib inline
import torch
from torch.utils.data import Dataset
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda
import matplotlib.pyplot as plt

training_data = datasets.FashionMNIST(
root="data",
train=True,
download=True,
transform=ToTensor()
)

test_data = datasets.FashionMNIST(
root="data",
train=False,
download=True,
transform=ToTensor()
)
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data\FashionMNIST\raw\train-images-idx3-ubyte.gz


100.0%


Extracting data\FashionMNIST\raw\train-images-idx3-ubyte.gz to data\FashionMNIST\raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data\FashionMNIST\raw\train-labels-idx1-ubyte.gz


100.0%


Extracting data\FashionMNIST\raw\train-labels-idx1-ubyte.gz to data\FashionMNIST\raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data\FashionMNIST\raw\t10k-images-idx3-ubyte.gz


100.0%


Extracting data\FashionMNIST\raw\t10k-images-idx3-ubyte.gz to data\FashionMNIST\raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data\FashionMNIST\raw\t10k-labels-idx1-ubyte.gz


100.0%

Extracting data\FashionMNIST\raw\t10k-labels-idx1-ubyte.gz to data\FashionMNIST\raw

定义数据标签labels_map、展示部分数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
labels_map = {
0: "T-Shirt",
1: "Trouser",
2: "Pullover",
3: "Dress",
4: "Coat",
5: "Sandal",
6: "Shirt",
7: "Sneaker",
8: "Bag",
9: "Ankle Boot",
}
figure = plt.figure(figsize=(8, 8))
cols, rows = 3, 3
for i in range(1, cols * rows + 1):
sample_idx = torch.randint(len(training_data), size=(1,)).item()
img, label = training_data[sample_idx]
figure.add_subplot(rows, cols, i)
plt.title(labels_map[label])
plt.axis("off")
plt.imshow(img.squeeze(), cmap="gray")
plt.show()

png

1
len(training_data)
60000

使用DataLoader导入数据

1
2
3
4
from torch.utils.data import DataLoader

train_dataloader = DataLoader(training_data, batch_size=64, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=64, shuffle=True)
1
2
3
4
5
6
7
8
9
# Display image and label.
train_features, train_labels = next(iter(train_dataloader))
print(f"Feature batch shape: {train_features.size()}")
print(f"Labels batch shape: {train_labels.size()}")
img = train_features[0].squeeze()
label = train_labels[0]
plt.imshow(img, cmap="gray")
plt.show()
print(f"Label: {label}")
Feature batch shape: torch.Size([64, 1, 28, 28])
Labels batch shape: torch.Size([64])

png

Label: 9

图像预处理

  • 使用torchvision.transforms.ToTensor将numpy的ndarray或PIL.Image读的图片转换成形状为(C,H, W)的Tensor格式,且/255归一化到[0,1.0]之间.
    当使用ToTensor()将numpy转为Tensor格式时,numpy中的元素必须时uint类型时才会将[0,255]归一化到[0,1.0]之间,否则不作映射。
    另外transforms.Normalize需要跟在ToTensor后面

    使用skimage.transform.resize后,元素也会被归一化到[0,1]

  • 使用Lambda()自定义处理流程

1
2
3
4
5
6
7
8
9
10
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda

ds = datasets.FashionMNIST(
root="data",
train=True,
download=True,
transform=ToTensor(),
target_transform=Lambda(lambda y: torch.zeros(10, dtype=torch.float).scatter_(0, torch.tensor(y), value=1))
)

模型

1
2
3
4
5
6
%matplotlib inline
import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

检查硬件

1
2
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print('Using {} device'.format(device))
Using cuda device

创建类

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
from turtle import forward


class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork,self).__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, 512),
nn.ReLU(),
nn.Linear(512,512),
nn.ReLU(),
nn.Linear(512,10),
nn.ReLU(),
)

def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits

将模型转移至GPU上并观察搭建的模型

TensorFlow不同,Pytorch使用

1
print(model)

而不是

1
model.summary()
1
2
model = NeuralNetwork().to(device)
print(model)
NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
    (5): ReLU()
  )
)

使用模型
To use the model, we pass it the input data. This executes the model’s forward, along with some background operations. However, do not call model.forward() directly! Calling the model on the input returns a 10-dimensional tensor with raw predicted values for each class.

We get the prediction densities by passing it through an instance of the nn.Softmax.

1
2
3
4
5
X = torch.rand(1, 28, 28, device=device)
logits = model(X)
pred_probab = nn.Softmax(dim=1)(logits)
y_pred = pred_probab.argmax(1)
print(f"Predicted class: {y_pred}")
Predicted class: tensor([9], device='cuda:0')

探究模型

权值和偏置

nn.Linear module 会为每一层随机初始化权值和偏置并存储在张量内部

1
2
3
print(f"First Linear weights: {model.linear_relu_stack[0].weight} \n")

print(f"First Linear weights: {model.linear_relu_stack[0].bias} \n")
First Linear weights: Parameter containing:
tensor([[ 0.0320,  0.0144,  0.0278,  ...,  0.0041,  0.0231, -0.0273],
        [ 0.0017,  0.0234, -0.0171,  ...,  0.0075, -0.0174, -0.0120],
        [-0.0065,  0.0199, -0.0121,  ...,  0.0015,  0.0163,  0.0158],
        ...,
        [-0.0306,  0.0066, -0.0202,  ..., -0.0013,  0.0297, -0.0129],
        [ 0.0177,  0.0163, -0.0178,  ...,  0.0225,  0.0012,  0.0096],
        [ 0.0216, -0.0191,  0.0158,  ..., -0.0197, -0.0265, -0.0107]],
       device='cuda:0', requires_grad=True) 

First Linear weights: Parameter containing:
tensor([-2.6006e-02,  1.1282e-02, -1.9284e-02, -2.9585e-02, -2.5711e-02,
        -3.3497e-02, -1.1521e-02, -2.1348e-03,  1.7615e-02,  1.3028e-02,
         1.1955e-02, -9.6981e-03, -3.2341e-02,  2.6981e-02, -3.0818e-02,
         1.1867e-02,  4.1684e-04, -2.2046e-02, -1.8189e-02, -1.8600e-02,
         2.3887e-02, -3.1697e-02,  1.7449e-02, -9.9130e-03, -1.8800e-02,
        -3.0924e-02, -1.7194e-02, -3.5234e-02,  1.9638e-02,  2.1027e-02,
         2.5941e-02,  6.3917e-03, -2.6321e-02,  2.0197e-02,  3.3656e-02,
         1.7973e-02, -1.6505e-02,  9.0856e-03, -1.9772e-02,  3.0663e-02,
         2.5595e-04,  3.3888e-02,  2.5366e-02, -2.4493e-02, -1.3552e-02,
         2.9439e-02, -3.3090e-02, -2.8643e-03, -6.0685e-03,  2.9785e-02,
         1.5882e-02,  1.1287e-02, -2.8698e-02,  1.5780e-02,  1.5793e-02,
         1.1584e-02,  1.7856e-02,  1.8341e-02, -2.7292e-02, -2.1379e-02,
        -2.7757e-02,  9.5430e-03, -1.2900e-02,  2.7905e-03,  2.9746e-02,
         2.3862e-02, -1.7868e-02,  2.3705e-02,  1.7510e-02,  2.9195e-02,
         1.3214e-02,  2.0529e-02, -8.3375e-03,  1.8821e-02,  1.4528e-02,
        -1.9898e-03,  3.4802e-02, -1.7129e-02, -2.4352e-02,  1.3585e-02,
        -1.5257e-02, -2.4021e-02,  8.5652e-04, -1.5770e-02, -1.5102e-02,
         7.3365e-03, -3.3464e-02,  5.3946e-03, -3.1315e-02,  1.5753e-02,
         1.6709e-02,  2.5959e-02,  1.6445e-02,  1.3350e-02,  2.2374e-03,
         2.2247e-03,  1.1674e-02,  2.6806e-03,  3.2409e-02, -2.2234e-02,
        -9.0154e-03,  2.3878e-02,  2.1499e-02, -2.4113e-02,  2.8188e-02,
        -2.4483e-03, -1.1341e-02,  1.3118e-02, -2.6706e-02,  2.9309e-02,
        -3.0233e-02, -2.4814e-02,  8.2302e-03,  3.5089e-02, -1.3213e-02,
         3.4287e-02,  9.5810e-03,  5.9718e-04,  2.8746e-02, -1.3568e-02,
        -1.5295e-02, -2.1663e-02,  5.5438e-03,  8.9619e-03, -2.9195e-02,
         2.8108e-02,  8.4121e-03, -3.0265e-02,  3.0213e-02, -8.1312e-03,
        -5.0775e-04,  2.2344e-02, -2.2555e-02,  8.8907e-03,  3.0807e-02,
         2.5433e-02,  1.2548e-02,  3.4413e-02,  1.7105e-03,  1.2994e-02,
         2.6252e-02,  1.2658e-02,  3.2733e-02,  3.4644e-02,  1.1835e-02,
         3.0441e-02, -1.3975e-02, -2.1235e-02,  1.0669e-02, -3.3472e-02,
        -1.0083e-02, -5.1791e-03, -2.5663e-02, -5.0829e-03, -1.2860e-02,
        -1.2265e-02,  2.4654e-02, -7.0238e-03,  2.0018e-02,  2.5920e-02,
        -9.2453e-03,  1.4488e-03,  3.3901e-02,  2.1066e-02, -1.9417e-02,
         2.0483e-02,  1.6776e-02, -1.7126e-02,  3.3188e-02,  3.0860e-02,
         5.8501e-04, -9.0919e-03, -1.8687e-02, -1.0033e-02,  2.2578e-02,
         4.1224e-03, -1.8397e-02, -2.9715e-03,  7.2415e-03, -2.2108e-03,
        -1.5284e-02,  1.6989e-02, -1.0209e-02,  1.4885e-02,  3.5372e-02,
         2.6844e-02, -5.4393e-03,  2.6259e-02, -2.4955e-02,  3.4677e-02,
         2.6692e-02, -1.9448e-02,  5.3398e-03, -2.0995e-02, -2.1384e-02,
        -2.7079e-03, -2.4570e-02,  2.6919e-03, -2.4380e-02, -2.7893e-02,
         3.5283e-02, -3.5563e-02,  2.9622e-02,  2.7266e-02, -2.7408e-02,
        -3.4980e-02,  2.1726e-02,  2.2507e-02,  3.5522e-03,  1.7125e-02,
         3.5366e-02, -1.8476e-02, -3.2758e-03,  2.1834e-02,  3.1109e-02,
         2.3482e-02, -3.1050e-02,  7.0262e-04, -8.2894e-04,  3.6325e-03,
         3.0943e-02,  1.8139e-02,  7.9527e-03, -9.7255e-03, -1.3140e-02,
        -2.4600e-02,  3.5511e-02, -2.3258e-02, -2.9434e-02, -3.4498e-02,
         2.7266e-02, -2.9281e-02,  2.2602e-03, -2.9149e-02, -1.7309e-02,
        -2.9321e-02, -3.5394e-02,  2.4823e-02,  2.3959e-02, -2.0739e-02,
        -2.0847e-02, -3.0929e-02, -8.6851e-03, -2.7283e-02, -2.8895e-02,
         3.1782e-02,  3.0542e-02, -3.0402e-02,  2.5634e-02, -5.3038e-03,
        -2.5348e-02,  1.6744e-02, -3.2049e-02,  8.6603e-03, -3.3634e-02,
         2.9429e-02,  3.1919e-02, -1.0402e-02,  1.7745e-03,  3.5563e-02,
        -1.7028e-03,  1.1210e-02, -3.0686e-02,  2.3886e-02, -3.1506e-02,
         5.8991e-03,  3.0542e-02,  1.1321e-02, -1.0227e-02,  1.3815e-02,
         5.1821e-03,  2.4084e-02,  2.6380e-02,  8.6132e-03, -1.3398e-02,
        -1.8592e-02,  3.0999e-02,  1.9882e-04, -3.4181e-02, -2.8050e-02,
         2.8497e-02,  2.6119e-02, -2.1203e-02,  8.4547e-03,  1.3562e-02,
        -3.0189e-02, -8.7675e-03,  1.5895e-02, -3.3195e-02, -2.5344e-02,
         1.8514e-02, -9.4555e-03, -2.1581e-02,  3.1369e-02,  3.3150e-02,
        -1.0371e-02, -2.0909e-03, -1.0577e-02, -4.5758e-03, -3.2039e-03,
         3.1382e-02,  2.2180e-02,  3.4596e-02, -1.2733e-02,  7.0383e-03,
        -3.1302e-02,  8.8619e-03, -3.5241e-04,  1.1938e-02, -9.2269e-03,
        -1.8961e-02,  3.2386e-02,  2.2335e-02,  1.6690e-03,  7.8621e-03,
         1.7088e-02,  1.6596e-02,  2.4473e-02,  2.2516e-02, -1.2616e-02,
        -3.4513e-02, -1.5047e-02, -3.3390e-02,  6.8794e-03,  1.4066e-02,
        -2.9244e-02, -2.0458e-02,  3.1527e-02, -1.9576e-02, -4.9366e-03,
         1.8976e-02,  2.7564e-02, -3.0471e-02,  1.2256e-02, -1.5687e-02,
         1.4344e-02,  2.8089e-02,  3.4514e-02,  2.7833e-02, -1.4761e-02,
        -1.3764e-02,  1.2766e-03, -2.1614e-02, -3.1152e-02,  2.7850e-02,
         3.4017e-02, -3.4201e-02,  3.3797e-02,  8.2489e-03,  3.3344e-02,
        -1.2652e-02, -7.9504e-03,  1.6887e-02,  2.4246e-04,  3.9919e-03,
        -1.9675e-02,  4.4289e-03, -2.4002e-04, -2.5501e-02, -1.3526e-02,
        -1.5989e-03, -2.6788e-02, -6.2623e-03, -2.1444e-02,  2.7719e-02,
        -2.8779e-03,  3.3268e-03,  2.2601e-02,  3.0004e-02, -1.9565e-02,
         3.3785e-02, -7.1944e-03,  1.1668e-02,  4.5885e-03,  1.3253e-02,
        -1.8362e-02, -2.8227e-02, -2.1698e-02, -2.0479e-02,  3.1186e-02,
         2.3864e-02,  1.5582e-02, -8.3300e-03,  2.9520e-02,  2.0377e-02,
        -2.1738e-02, -2.9295e-02,  8.9096e-03, -1.8586e-02, -4.7335e-03,
        -2.5155e-02, -1.9236e-02, -1.3558e-02, -2.7788e-02,  3.0922e-02,
         2.1677e-02,  2.2618e-02, -8.6077e-03, -9.7825e-03,  2.7695e-02,
         5.5598e-03,  4.8039e-03,  1.3247e-02,  2.5548e-02, -1.3728e-02,
        -2.4397e-02, -1.1332e-02,  3.2191e-02, -2.6787e-02, -2.4574e-02,
        -9.9482e-03,  2.1317e-02,  2.1186e-02, -3.2303e-02, -1.5411e-02,
        -3.3040e-02,  2.4188e-02, -1.5969e-02, -3.3893e-02, -8.3995e-04,
        -1.7355e-02, -8.6745e-03,  1.2260e-02, -3.1569e-03, -1.3767e-02,
        -1.4470e-02,  4.4783e-03, -1.4268e-02,  2.7619e-02, -3.0515e-02,
         1.2447e-02, -1.8226e-02, -1.4747e-02, -1.4719e-02, -1.9833e-02,
         3.4683e-02,  1.7172e-02, -1.6289e-02,  1.3955e-02, -3.3802e-02,
         9.2992e-03, -8.3913e-03,  5.8379e-03,  2.8259e-02,  2.7650e-02,
        -6.4386e-03,  3.0599e-02,  2.2695e-02, -1.2125e-02, -4.5581e-03,
        -7.0076e-03,  1.5394e-02,  1.1820e-04, -5.5900e-03, -1.9548e-02,
         1.8477e-05,  2.3143e-02,  3.4269e-03, -8.9663e-03,  2.7497e-02,
        -1.1936e-02,  1.7263e-02, -1.6613e-03, -2.9675e-02,  2.7710e-02,
         1.9436e-03, -2.0375e-02, -8.8369e-03, -8.8694e-03,  1.4800e-02,
         5.2205e-03, -3.1635e-02,  1.8743e-02,  1.7389e-03, -1.8351e-02,
         2.5004e-02,  1.7871e-02, -1.8295e-02,  3.4359e-02,  2.5180e-02,
         3.1008e-03, -3.3762e-02,  3.1167e-02,  2.9628e-02, -7.3742e-03,
        -1.5260e-03,  2.6626e-02, -1.9807e-02,  5.1834e-04,  1.3656e-02,
         1.1180e-02,  1.6382e-02, -7.4677e-03, -1.4210e-02,  5.4440e-03,
        -1.6454e-02, -3.5543e-02,  1.3476e-04,  2.8747e-03, -1.6024e-02,
         3.1024e-03,  2.8920e-02, -2.6268e-02,  3.3699e-02, -1.8295e-02,
         2.2468e-03,  3.4589e-02, -3.4602e-02, -5.9845e-03,  7.4443e-04,
        -2.5526e-02, -3.0481e-02], device='cuda:0', requires_grad=True) 

每层网络

下面观察3张28×28的图像经过模型后会发生什么

1
2
3
# 模拟输入图像
input_image = torch.rand(3,28,28)
print(input_image.size())
torch.Size([3, 28, 28])

nn.Flatten

We initialize the nn.Flatten layer to convert each 2D 28x28 image into a contiguous array of 784 pixel values (the minibatch dimension (at dim=0) is maintained). Each of the pixels are pass to the input layer of the neural network.

1
2
3
flatten = nn.Flatten()
flat_image = flatten(input_image)
print(flat_image.size())
torch.Size([3, 784])

nn.Linear

线性层做线性变换
$$NaN $$

1
2
3
layer1 = nn.Linear(in_features=28*28, out_features=20)
hidden1 = layer1(flat_image)
print(hidden1.size())
torch.Size([3, 20])

nn.ReLU

Linear output: ${ x = {weight * input + bias}} $.
ReLU: $f(x)=
\begin{cases}
0, & \text{if } x < 0\
x, & \text{if } x\geq 0\
\end{cases}
$

1
2
3
print(f"Before ReLU: {hidden1}\n\n")
hidden1 = nn.ReLU()(hidden1)
print(f"After ReLU: {hidden1}")
Before ReLU: tensor([[ 0.1734, -0.2238,  0.0592, -0.2199, -0.2259, -0.1257,  0.0083, -0.1266,
         -0.1586,  0.9050, -0.1587, -0.3437,  0.2649, -0.6128, -0.3549,  0.1719,
          0.5616, -0.5121,  0.1277, -0.4033],
        [ 0.0958, -0.0541,  0.1040, -0.0807, -0.0630, -0.3321,  0.2598, -0.0122,
         -0.0934,  0.3702, -0.0027,  0.0125, -0.1347, -0.7110, -0.4810, -0.1182,
          0.4265, -0.4620,  0.2368, -0.3566],
        [-0.1361, -0.0456,  0.0091, -0.1966, -0.1325, -0.2158,  0.1892,  0.0219,
         -0.4065,  0.5589, -0.0490, -0.4669,  0.0042, -0.3410, -0.4415, -0.0838,
          0.6105, -0.5981,  0.3086,  0.0171]], grad_fn=<AddmmBackward0>)


After ReLU: tensor([[0.1734, 0.0000, 0.0592, 0.0000, 0.0000, 0.0000, 0.0083, 0.0000, 0.0000,
         0.9050, 0.0000, 0.0000, 0.2649, 0.0000, 0.0000, 0.1719, 0.5616, 0.0000,
         0.1277, 0.0000],
        [0.0958, 0.0000, 0.1040, 0.0000, 0.0000, 0.0000, 0.2598, 0.0000, 0.0000,
         0.3702, 0.0000, 0.0125, 0.0000, 0.0000, 0.0000, 0.0000, 0.4265, 0.0000,
         0.2368, 0.0000],
        [0.0000, 0.0000, 0.0091, 0.0000, 0.0000, 0.0000, 0.1892, 0.0219, 0.0000,
         0.5589, 0.0000, 0.0000, 0.0042, 0.0000, 0.0000, 0.0000, 0.6105, 0.0000,
         0.3086, 0.0171]], grad_fn=<ReluBackward0>)

nn.Sequential

nn.Sequential is an ordered container of modules. The data is passed through all the modules in the same order as defined. You can use sequential containers to put together a quick network like seq_modules.

1
2
3
4
5
6
7
8
seq_modules = nn.Sequential(
flatten,
layer1,
nn.ReLU(),
nn.Linear(20, 10)
)
input_image = torch.rand(3,28,28)
logits = seq_modules(input_image)

nn.Softmax

nn.Softmax module 将logits转化为 probability of the output from the neural network

1
2
softmax = nn.Softmax(dim=1)
pred_probab = softmax(logits)

查看模型parameters

1
2
3
4
print("Model structure: ", model, "\n\n")

for name, param in model.named_parameters():
print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")
Model structure:  NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
    (5): ReLU()
  )
) 


Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values : tensor([[ 0.0320,  0.0144,  0.0278,  ...,  0.0041,  0.0231, -0.0273],
        [ 0.0017,  0.0234, -0.0171,  ...,  0.0075, -0.0174, -0.0120]],
       device='cuda:0', grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values : tensor([-0.0260,  0.0113], device='cuda:0', grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) | Values : tensor([[-0.0056, -0.0027,  0.0008,  ..., -0.0214, -0.0375,  0.0119],
        [-0.0288, -0.0196,  0.0303,  ...,  0.0117, -0.0301, -0.0124]],
       device='cuda:0', grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.2.bias | Size: torch.Size([512]) | Values : tensor([0.0163, 0.0376], device='cuda:0', grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.4.weight | Size: torch.Size([10, 512]) | Values : tensor([[ 4.3638e-02,  1.2320e-02, -4.0249e-02,  ...,  3.4624e-02,
         -3.5368e-05,  1.0406e-02],
        [-2.1222e-02, -1.9338e-03, -2.5966e-02,  ..., -8.9450e-04,
          2.5315e-02,  9.4413e-03]], device='cuda:0', grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.4.bias | Size: torch.Size([10]) | Values : tensor([ 0.0015, -0.0346], device='cuda:0', grad_fn=<SliceBackward0>) 

自动微分

1
2
3
4
5
6
7
8
9
%matplotlib inline
import torch

x = torch.ones(5) # input tensor
y = torch.zeros(3) # expected output
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w)+b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)

Note: You can set the value of requires_grad when creating a tensor, or later by using x.requires_grad_(True) method.

1
2
print('Gradient function for z =',z.grad_fn)
print('Gradient function for loss =', loss.grad_fn)
Gradient function for z = <AddBackward0 object at 0x0000019149F7A620>
Gradient function for loss = <BinaryCrossEntropyWithLogitsBackward0 object at 0x0000019149F7AFE0>

计算梯度

为计算 $\frac{\partial loss}{\partial w}$ 和 $\frac{\partial loss}{\partial b}$ 我们先调用loss.backward()后使用w.gradb.grad:

1
2
3
loss.backward()
print(w.grad)
print(b.grad)
tensor([[0.1775, 0.2345, 0.2753],
        [0.1775, 0.2345, 0.2753],
        [0.1775, 0.2345, 0.2753],
        [0.1775, 0.2345, 0.2753],
        [0.1775, 0.2345, 0.2753]])
tensor([0.1775, 0.2345, 0.2753])

torch.no_grad()暂停梯度计算

1
2
3
4
5
6
z = torch.matmul(x, w)+b
print(z.requires_grad)

with torch.no_grad():
z = torch.matmul(x, w)+b
print(z.requires_grad)
True
False

detach()方法也有相同的效果,用于:

  • To mark some parameters in your neural network at frozen parameters. This is
    a very common scenario for fine tuning a pre-trained network.
  • 只前向传播时 加速计算 because computations on tensors that do
    not track gradients would be more efficient.
1
2
3
z = torch.matmul(x, w)+b
z_det = z.detach()
print(z_det.requires_grad)
False

设定超参数

1
2
3
learning_rate = 1e-3
batch_size = 64
epochs = 5

loss function

Common loss functions include:

  • nn.MSELoss (Mean Square Error) used for regression tasks
  • nn.NLLLoss (Negative Log Likelihood) used for classification
  • nn.CrossEntropyLoss combines nn.LogSoftmax and nn.NLLLoss

We pass our model’s output logits to nn.CrossEntropyLoss, which will normalize the logits and compute the prediction error.

1
2
# Initialize the loss function
loss_fn = nn.CrossEntropyLoss()

Optimization pass

1
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

全部流程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
%matplotlib inline
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda

training_data = datasets.FashionMNIST(
root="data",
train=True,
download=True,
transform=ToTensor()
)

test_data = datasets.FashionMNIST(
root="data",
train=False,
download=True,
transform=ToTensor()
)

train_dataloader = DataLoader(training_data, batch_size=64)
test_dataloader = DataLoader(test_data, batch_size=64)

class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, 512),
nn.ReLU(),
nn.Linear(512, 512),
nn.ReLU(),
nn.Linear(512, 10),
nn.ReLU()
)

def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits

model = NeuralNetwork()
1
2
3
learning_rate = 1e-3
batch_size = 64
epochs = 5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
def train_loop(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
for batch, (X, y) in enumerate(dataloader):
# Compute prediction and loss
pred = model(X)
loss = loss_fn(pred, y)

# Backpropagation
optimizer.zero_grad()
loss.backward()
optimizer.step()

if batch % 100 == 0:
loss, current = loss.item(), batch * len(X)
print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")


def test_loop(dataloader, model, loss_fn):
size = len(dataloader.dataset)
test_loss, correct = 0, 0

with torch.no_grad():
for X, y in dataloader:
pred = model(X)
test_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()

test_loss /= size
correct /= size
print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
1
2
3
4
5
6
7
8
9
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

epochs = 10
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train_loop(train_dataloader, model, loss_fn, optimizer)
test_loop(test_dataloader, model, loss_fn)
print("Done!")
Epoch 1
-------------------------------
loss: 2.305028  [    0/60000]
loss: 2.295792  [ 6400/60000]
loss: 2.285109  [12800/60000]
loss: 2.275857  [19200/60000]
loss: 2.271758  [25600/60000]
loss: 2.267347  [32000/60000]
loss: 2.255360  [38400/60000]
loss: 2.243975  [44800/60000]
loss: 2.241143  [51200/60000]
loss: 2.210880  [57600/60000]
Test Error: 
 Accuracy: 48.5%, Avg loss: 0.034781 

Epoch 2
-------------------------------
loss: 2.240807  [    0/60000]
loss: 2.216887  [ 6400/60000]
loss: 2.176663  [12800/60000]
loss: 2.162243  [19200/60000]
loss: 2.164471  [25600/60000]
loss: 2.142355  [32000/60000]
loss: 2.152825  [38400/60000]
loss: 2.125795  [44800/60000]
loss: 2.128952  [51200/60000]
loss: 2.059253  [57600/60000]
Test Error: 
 Accuracy: 47.8%, Avg loss: 0.032641 

Epoch 3
-------------------------------
loss: 2.142435  [    0/60000]
loss: 2.092333  [ 6400/60000]
loss: 2.004931  [12800/60000]
loss: 1.985219  [19200/60000]
loss: 2.005092  [25600/60000]
loss: 1.968048  [32000/60000]
loss: 1.987880  [38400/60000]
loss: 1.951957  [44800/60000]
loss: 1.979424  [51200/60000]
loss: 1.846356  [57600/60000]
Test Error: 
 Accuracy: 47.4%, Avg loss: 0.029790 

Epoch 4
-------------------------------
loss: 2.017921  [    0/60000]
loss: 1.933817  [ 6400/60000]
loss: 1.803801  [12800/60000]
loss: 1.774853  [19200/60000]
loss: 1.840858  [25600/60000]
loss: 1.785817  [32000/60000]
loss: 1.810488  [38400/60000]
loss: 1.791834  [44800/60000]
loss: 1.847734  [51200/60000]
loss: 1.647881  [57600/60000]
Test Error: 
 Accuracy: 50.1%, Avg loss: 0.027340 

Epoch 5
-------------------------------
loss: 1.908676  [    0/60000]
loss: 1.803817  [ 6400/60000]
loss: 1.650390  [12800/60000]
loss: 1.613460  [19200/60000]
loss: 1.725269  [25600/60000]
loss: 1.653432  [32000/60000]
loss: 1.667764  [38400/60000]
loss: 1.672491  [44800/60000]
loss: 1.728529  [51200/60000]
loss: 1.499958  [57600/60000]
Test Error: 
 Accuracy: 52.2%, Avg loss: 0.025272 

Epoch 6
-------------------------------
loss: 1.795644  [    0/60000]
loss: 1.684765  [ 6400/60000]
loss: 1.509682  [12800/60000]
loss: 1.476312  [19200/60000]
loss: 1.554773  [25600/60000]
loss: 1.534349  [32000/60000]
loss: 1.533391  [38400/60000]
loss: 1.567757  [44800/60000]
loss: 1.567240  [51200/60000]
loss: 1.352881  [57600/60000]
Test Error: 
 Accuracy: 53.6%, Avg loss: 0.022756 

Epoch 7
-------------------------------
loss: 1.623559  [    0/60000]
loss: 1.567390  [ 6400/60000]
loss: 1.337601  [12800/60000]
loss: 1.331376  [19200/60000]
loss: 1.414052  [25600/60000]
loss: 1.330582  [32000/60000]
loss: 1.367225  [38400/60000]
loss: 1.350859  [44800/60000]
loss: 1.408106  [51200/60000]
loss: 1.184717  [57600/60000]
Test Error: 
 Accuracy: 56.9%, Avg loss: 0.020195 

Epoch 8
-------------------------------
loss: 1.439290  [    0/60000]
loss: 1.431444  [ 6400/60000]
loss: 1.135230  [12800/60000]
loss: 1.170522  [19200/60000]
loss: 1.268394  [25600/60000]
loss: 1.203035  [32000/60000]
loss: 1.240419  [38400/60000]
loss: 1.259241  [44800/60000]
loss: 1.313567  [51200/60000]
loss: 1.086417  [57600/60000]
Test Error: 
 Accuracy: 59.5%, Avg loss: 0.018757 

Epoch 9
-------------------------------
loss: 1.336656  [    0/60000]
loss: 1.359270  [ 6400/60000]
loss: 1.037303  [12800/60000]
loss: 1.085977  [19200/60000]
loss: 1.198249  [25600/60000]
loss: 1.127438  [32000/60000]
loss: 1.178556  [38400/60000]
loss: 1.200027  [44800/60000]
loss: 1.243823  [51200/60000]
loss: 1.034128  [57600/60000]
Test Error: 
 Accuracy: 61.0%, Avg loss: 0.017827 

Epoch 10
-------------------------------
loss: 1.262514  [    0/60000]
loss: 1.308815  [ 6400/60000]
loss: 0.970262  [12800/60000]
loss: 1.031770  [19200/60000]
loss: 1.144329  [25600/60000]
loss: 1.069766  [32000/60000]
loss: 1.131326  [38400/60000]
loss: 1.153146  [44800/60000]
loss: 1.187189  [51200/60000]
loss: 0.994985  [57600/60000]
Test Error: 
 Accuracy: 62.0%, Avg loss: 0.017097 

Done!

保存模型

1
2
3
torch.save(model.state_dict(), "data/model.pth")

print("Saved PyTorch Model State to model.pth")
Saved PyTorch Model State to model.pth

读取模型

1
2
3
4
5
6
7
8
%matplotlib inline
import torch
# import onnxruntime
from torch import nn
import torch.onnx as onnx
import torchvision.models as models
from torchvision import datasets
from torchvision.transforms import ToTensor

加载模型前,需要重新定义模型类并实例化:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, 512),
nn.ReLU(),
nn.Linear(512, 512),
nn.ReLU(),
nn.Linear(512, 10),
nn.ReLU()
)

def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
1
2
3
4
5
6
# 实例化
model = NeuralNetwork()
# 加载模型权重
model.load_state_dict(torch.load('data/model.pth'))
# 设置`dropout`和`BN`
model.eval()
NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
    (5): ReLU()
  )
)

预测

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
classes = [
"T-shirt/top",
"Trouser",
"Pullover",
"Dress",
"Coat",
"Sandal",
"Shirt",
"Sneaker",
"Bag",
"Ankle boot",
]

x, y = test_data[0][0], test_data[0][1]
with torch.no_grad():
pred = model(x)
predicted, actual = classes[pred[0].argmax(0)], classes[y]
print(f'Predicted: "{predicted}", Actual: "{actual}"')
Predicted: "Ankle boot", Actual: "Ankle boot"

将模型导出为ONNX格式

1
2
3
input_image = torch.zeros((1,28,28))
onnx_model = 'data/model.onnx'
onnx.export(model, input_image, onnx_model)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
test_data = datasets.FashionMNIST(
root="data",
train=False,
download=True,
transform=ToTensor()
)

classes = [
"T-shirt/top",
"Trouser",
"Pullover",
"Dress",
"Coat",
"Sandal",
"Shirt",
"Sneaker",
"Bag",
"Ankle boot",
]
x, y = test_data[0][0], test_data[0][1]

We need to create inference session with onnxruntime.InferenceSession. To inference the onnx model, use run and pass in the list of outputs you want returned (leave empty if you want all of them) and a map of the input values. The result is a list of the outputs.

1
2
3
4
5
6
7
session = onnxruntime.InferenceSession(onnx_model, None)
input_name = session.get_inputs()[0].name
output_name = session.get_outputs()[0].name

result = session.run([output_name], {input_name: x.numpy()})
predicted, actual = classes[result[0][0].argmax(0)], classes[y]
print(f'Predicted: "{predicted}", Actual: "{actual}"')