提供LeNet、Alexnet、VGG、NiN、GoogLeNet、ResNet的Pytorch实现
LeNet 1 2 3 4 5 6 7 8 9 10 11 12 import torchfrom torch import nnnet = nn.Sequential( nn.Conv2d(1 , 6 , kernel_size=5 , padding=2 ), nn.Sigmoid(), nn.AvgPool2d(kernel_size=2 , stride=2 ), nn.Conv2d(6 , 16 , kernel_size=5 ), nn.Sigmoid(), nn.AvgPool2d(kernel_size=2 , stride=2 ), nn.Flatten(), nn.Linear(16 * 5 * 5 , 120 ), nn.Sigmoid(), nn.Linear(120 , 84 ), nn.Sigmoid(), nn.Linear(84 , 10 ))
1 2 3 4 X = torch.rand(size=(1 , 1 , 28 , 28 ), dtype=torch.float32) for layer in net: X = layer(X) print (layer.__class__.__name__,'output shape: \t' ,X.shape)
Conv2d output shape: torch.Size([1, 6, 28, 28])
Sigmoid output shape: torch.Size([1, 6, 28, 28])
AvgPool2d output shape: torch.Size([1, 6, 14, 14])
Conv2d output shape: torch.Size([1, 16, 10, 10])
Sigmoid output shape: torch.Size([1, 16, 10, 10])
AvgPool2d output shape: torch.Size([1, 16, 5, 5])
Flatten output shape: torch.Size([1, 400])
Linear output shape: torch.Size([1, 120])
Sigmoid output shape: torch.Size([1, 120])
Linear output shape: torch.Size([1, 84])
Sigmoid output shape: torch.Size([1, 84])
Linear output shape: torch.Size([1, 10])
请注意,在整个卷积块中,与上一层相比,每一层特征的高度和宽度都减小了。第一个卷积层使用2个像素的填充,来补偿$5 \times 5$卷积核导致的特征减少。相反,第二个卷积层没有填充,因此高度和宽度都减少了4个像素。随着层叠的上升,通道的数量从输入时的1个,增加到第一个卷积层之后的6个,再到第二个卷积层之后的16个。同时,每个汇聚层的高度和宽度都减半。最后,每个全连接层减少维数,最终输出一个维数与结果分类数相匹配的输出。
Batch Norm 使用 Batch Norm 的 LeNet 实现如下,主要为在卷积层与激活函数之间添加了nn.BatchNorm2d()
和nn.BatchNorm1d()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 import torchfrom torch import nnfrom torch.nn import functional as Fnet = nn.Sequential( nn.Conv2d(1 , 6 , kernel_size=5 ), nn.BatchNorm2d(6 ), nn.Sigmoid(), nn.AvgPool2d(kernel_size=2 , stride=2 ), nn.Conv2d(6 , 16 , kernel_size=5 ), nn.BatchNorm2d(16 ), nn.Sigmoid(), nn.AvgPool2d(kernel_size=2 , stride=2 ), nn.Flatten(), nn.Linear(256 , 120 ), nn.BatchNorm1d(120 ), nn.Sigmoid(), nn.Linear(120 , 84 ), nn.BatchNorm1d(84 ), nn.Sigmoid(), nn.Linear(84 , 10 ) )
1 2 3 4 5 X = torch.rand(size=(2 , 1 , 28 , 28 )) for layer in net: X = layer(X) print (layer.__class__.__name__,'output shape:\t' , X.shape)
Conv2d output shape: torch.Size([2, 6, 24, 24])
BatchNorm2d output shape: torch.Size([2, 6, 24, 24])
Sigmoid output shape: torch.Size([2, 6, 24, 24])
AvgPool2d output shape: torch.Size([2, 6, 12, 12])
Conv2d output shape: torch.Size([2, 16, 8, 8])
BatchNorm2d output shape: torch.Size([2, 16, 8, 8])
Sigmoid output shape: torch.Size([2, 16, 8, 8])
AvgPool2d output shape: torch.Size([2, 16, 4, 4])
Flatten output shape: torch.Size([2, 256])
Linear output shape: torch.Size([2, 120])
BatchNorm1d output shape: torch.Size([2, 120])
Sigmoid output shape: torch.Size([2, 120])
Linear output shape: torch.Size([2, 84])
BatchNorm1d output shape: torch.Size([2, 84])
Sigmoid output shape: torch.Size([2, 84])
Linear output shape: torch.Size([2, 10])
AlexNet 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 import torchfrom torch import nnnet = nn.Sequential( nn.Conv2d(1 , 96 , kernel_size=11 , stride=4 , padding=1 ), nn.ReLU(), nn.MaxPool2d(kernel_size=3 , stride=2 ), nn.Conv2d(96 , 256 , kernel_size=5 , padding=2 ), nn.ReLU(), nn.MaxPool2d(kernel_size=3 , stride=2 ), nn.Conv2d(256 , 384 , kernel_size=3 , padding=1 ), nn.ReLU(), nn.Conv2d(384 , 384 , kernel_size=3 , padding=1 ), nn.ReLU(), nn.Conv2d(384 , 256 , kernel_size=3 , padding=1 ), nn.ReLU(), nn.MaxPool2d(kernel_size=3 , stride=2 ), nn.Flatten(), nn.Linear(6400 , 4096 ), nn.ReLU(), nn.Dropout(p=0.5 ), nn.Linear(4096 , 4096 ), nn.ReLU(), nn.Dropout(p=0.5 ), nn.Linear(4096 , 10 ))
1 2 3 4 X = torch.randn(1 , 1 , 224 , 224 ) for layer in net: X=layer(X) print (layer.__class__.__name__,'output shape:\t' ,X.shape)
Conv2d output shape: torch.Size([1, 96, 54, 54])
ReLU output shape: torch.Size([1, 96, 54, 54])
MaxPool2d output shape: torch.Size([1, 96, 26, 26])
Conv2d output shape: torch.Size([1, 256, 26, 26])
ReLU output shape: torch.Size([1, 256, 26, 26])
MaxPool2d output shape: torch.Size([1, 256, 12, 12])
Conv2d output shape: torch.Size([1, 384, 12, 12])
ReLU output shape: torch.Size([1, 384, 12, 12])
Conv2d output shape: torch.Size([1, 384, 12, 12])
ReLU output shape: torch.Size([1, 384, 12, 12])
Conv2d output shape: torch.Size([1, 256, 12, 12])
ReLU output shape: torch.Size([1, 256, 12, 12])
MaxPool2d output shape: torch.Size([1, 256, 5, 5])
Flatten output shape: torch.Size([1, 6400])
Linear output shape: torch.Size([1, 4096])
ReLU output shape: torch.Size([1, 4096])
Dropout output shape: torch.Size([1, 4096])
Linear output shape: torch.Size([1, 4096])
ReLU output shape: torch.Size([1, 4096])
Dropout output shape: torch.Size([1, 4096])
Linear output shape: torch.Size([1, 10])
VGG 原始VGG网络有5个卷积块,其中前两个块各有一个卷积层,后三个块各包含两个卷积层。 第一个模块有64个输出通道,每个后续模块将输出通道数量翻倍,直到该数字达到512。由于该网络使用8个卷积层和3个全连接层,因此它通常被称为VGG-11。
下面的代码实现了VGG-11
。可以通过在conv_arch上执行for循环来简单实现。
1 2 3 4 5 6 7 8 9 10 11 12 13 import torchfrom torch import nndef vgg_block (num_convs, in_channels, out_channels ): layers = [] for _ in range (num_convs): layers.append(nn.Conv2d(in_channels, out_channels, kernel_size=3 , padding=1 )) layers.append(nn.ReLU()) in_chann. els = out_channels layers.append(nn.MaxPool2d(kernel_size=2 ,stride=2 )) return nn.Sequential(*layers)
1 conv_arch = ((1 , 64 ), (1 , 128 ), (2 , 256 ), (2 , 512 ), (2 , 512 ))
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 def vgg (conv_arch ): conv_blks = [] in_channels = 1 for (num_convs, out_channels) in conv_arch: conv_blks.append(vgg_block(num_convs, in_channels, out_channels)) in_channels = out_channels return nn.Sequential( *conv_blks, nn.Flatten(), nn.Linear(out_channels * 7 * 7 , 4096 ), nn.ReLU(), nn.Dropout(0.5 ), nn.Linear(4096 , 4096 ), nn.ReLU(), nn.Dropout(0.5 ), nn.Linear(4096 , 10 )) net = vgg(conv_arch)
1 2 3 4 X = torch.randn(size=(1 , 1 , 224 , 224 )) for blk in net: X = blk(X) print (blk.__class__.__name__,'output shape:\t' ,X.shape)
Sequential output shape: torch.Size([1, 64, 112, 112])
Sequential output shape: torch.Size([1, 128, 56, 56])
Sequential output shape: torch.Size([1, 256, 28, 28])
Sequential output shape: torch.Size([1, 512, 14, 14])
Sequential output shape: torch.Size([1, 512, 7, 7])
Flatten output shape: torch.Size([1, 25088])
Linear output shape: torch.Size([1, 4096])
ReLU output shape: torch.Size([1, 4096])
Dropout output shape: torch.Size([1, 4096])
Linear output shape: torch.Size([1, 4096])
ReLU output shape: torch.Size([1, 4096])
Dropout output shape: torch.Size([1, 4096])
Linear output shape: torch.Size([1, 10])
NiN 回想一下,卷积层的输入和输出由四维张量组成,张量的每个轴分别对应样本、通道、高度和宽度。 另外,全连接层的输入和输出通常是分别对应于样本和特征的二维张量。 NiN的想法是在每个像素位置(针对每个高度和宽度)应用一个全连接层。 如果我们将权重连接到每个空间位置,我们可以将其视为1×1卷积层(如 6.4节中所述),或作为在每个像素位置上独立作用的全连接层。 从另一个角度看,即将空间维度中的每个像素视为单个样本,将通道维度视为不同特征(feature)。
1 2 3 4 5 6 7 8 9 10 import torchfrom torch import nndef nin_block (in_channels, out_channels, kernel_size, strides, padding ): return nn.Sequential( nn.Conv2d(in_channels, out_channels, kernel_size, strides, padding), nn.ReLU(), nn.Conv2d(out_channels, out_channels, kernel_size=1 ), nn.ReLU(), nn.Conv2d(out_channels, out_channels, kernel_size=1 ), nn.ReLU())
1 2 3 4 5 6 7 8 9 10 11 12 13 net = nn.Sequential( nin_block(1 , 96 , kernel_size=11 , strides=4 , padding=0 ), nn.MaxPool2d(3 , stride=2 ), nin_block(96 , 256 , kernel_size=5 , strides=1 , padding=2 ), nn.MaxPool2d(3 , stride=2 ), nin_block(256 , 384 , kernel_size=3 , strides=1 , padding=1 ), nn.MaxPool2d(3 , stride=2 ), nn.Dropout(0.5 ), nin_block(384 , 10 , kernel_size=3 , strides=1 , padding=1 ), nn.AdaptiveAvgPool2d((1 , 1 )), nn.Flatten())
1 2 3 4 X = torch.rand(size=(1 , 1 , 224 , 224 )) for layer in net: X = layer(X) print (layer.__class__.__name__,'output shape:\t' , X.shape)
Sequential output shape: torch.Size([1, 96, 54, 54])
MaxPool2d output shape: torch.Size([1, 96, 26, 26])
Sequential output shape: torch.Size([1, 256, 26, 26])
MaxPool2d output shape: torch.Size([1, 256, 12, 12])
Sequential output shape: torch.Size([1, 384, 12, 12])
MaxPool2d output shape: torch.Size([1, 384, 5, 5])
Dropout output shape: torch.Size([1, 384, 5, 5])
Sequential output shape: torch.Size([1, 10, 5, 5])
AdaptiveAvgPool2d output shape: torch.Size([1, 10, 1, 1])
Flatten output shape: torch.Size([1, 10])
GoogLeNet Inception块 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 import torchfrom torch import nnfrom torch.nn import functional as Fclass Inception (nn.Module): def __init__ (self, in_channels, c1, c2, c3, c4, **kwargs ): super (Inception, self).__init__(**kwargs) self.p1_1 = nn.Conv2d(in_channels, c1, kernel_size=1 ) self.p2_1 = nn.Conv2d(in_channels, c2[0 ], kernel_size=1 ) self.p2_2 = nn.Conv2d(c2[0 ], c2[1 ], kernel_size=3 , padding=1 ) self.p3_1 = nn.Conv2d(in_channels, c3[0 ], kernel_size=1 ) self.p3_2 = nn.Conv2d(c3[0 ], c3[1 ], kernel_size=5 , padding=2 ) self.p4_1 = nn.MaxPool2d(kernel_size=3 , stride=1 , padding=1 ) self.p4_2 = nn.Conv2d(in_channels, c4, kernel_size=1 ) def forward (self, x ): p1 = F.relu(self.p1_1(x)) p2 = F.relu(self.p2_2(F.relu(self.p2_1(x)))) p3 = F.relu(self.p3_2(F.relu(self.p3_1(x)))) p4 = F.relu(self.p4_2(self.p4_1(x))) return torch.cat((p1, p2, p3, p4), dim=1 )
Inception 网络 第一个模块使用64个通道、 7×7 卷积层。
1 2 3 4 5 b1 = nn.Sequential( nn.Conv2d(1 , 64 , kernel_size=7 , stride=2 , padding=3 ), nn.ReLU(), nn.MaxPool2d(kernel_size=3 , stride=2 , padding=1 ) )
第二个模块使用两个卷积层:第一个卷积层是64个通道、 1×1 卷积层;第二个卷积层使用将通道数量增加三倍的 3×3 卷积层。 这对应于Inception块中的第二条路径。
1 2 3 4 5 6 7 b2 = nn.Sequential( nn.Conv2d(64 , 64 , kernel_size=1 ), nn.ReLU(), nn.Conv2d(64 , 192 , kernel_size=3 , padding=1 ), nn.ReLU(), nn.MaxPool2d(kernel_size=3 , stride=2 , padding=1 ) )
第三个模块串联两个完整的Inception块。 第一个Inception块的输出通道数为 64+128+32+32=256 ,四个路径之间的输出通道数量比为 64:128:32:32=2:4:1:1 。 第二个和第三个路径首先将输入通道的数量分别减少到 96/192=1/2 和 16/192=1/12 ,然后连接第二个卷积层。第二个Inception块的输出通道数增加到 128+192+96+64=480 ,四个路径之间的输出通道数量比为 128:192:96:64=4:6:3:2 。 第二条和第三条路径首先将输入通道的数量分别减少到 128/256=1/2 和 32/256=1/8 。
1 2 3 4 5 b3 = nn.Sequential( Inception(192 , 64 , (96 , 128 ), (16 , 32 ), 32 ), Inception(256 , 128 , (128 , 192 ), (32 , 96 ), 64 ), nn.MaxPool2d(kernel_size=3 , stride=2 , padding=1 ) )
第四模块更加复杂, 它串联了5个Inception块,其输出通道数分别是 192+208+48+64=512 、 160+224+64+64=512 、 128+256+64+64=512 、 112+288+64+64=528 和 256+320+128+128=832 。 这些路径的通道数分配和第三模块中的类似,首先是含 3×3 卷积层的第二条路径输出最多通道,其次是仅含 1×1 卷积层的第一条路径,之后是含 5×5 卷积层的第三条路径和含 3×3 最大汇聚层的第四条路径。 其中第二、第三条路径都会先按比例减小通道数。 这些比例在各个Inception块中都略有不同。
1 2 3 4 5 6 7 8 b4 = nn.Sequential( Inception(480 , 192 , (96 , 208 ), (16 , 48 ), 64 ), Inception(512 , 160 , (112 , 224 ), (24 , 64 ), 64 ), Inception(512 , 128 , (128 , 256 ), (24 , 64 ), 64 ), Inception(512 , 112 , (144 , 288 ), (32 , 64 ), 64 ), Inception(528 , 256 , (160 , 320 ), (32 , 128 ), 128 ), nn.MaxPool2d(kernel_size=3 , stride=2 , padding=1 ) )
第五模块包含输出通道数为 256+320+128+128=832 和 384+384+128+128=1024 的两个Inception块。 其中每条路径通道数的分配思路和第三、第四模块中的一致,只是在具体数值上有所不同。 需要注意的是,第五模块的后面紧跟输出层,该模块同NiN一样使用全局平均汇聚层,将每个通道的高和宽变成1。 最后我们将输出变成二维数组,再接上一个输出个数为标签类别数的全连接层。
1 2 3 4 5 6 7 8 b5 = nn.Sequential( Inception(832 , 256 , (160 , 320 ), (32 , 128 ), 128 ), Inception(832 , 384 , (192 , 384 ), (48 , 128 ), 128 ), nn.AdaptiveAvgPool2d((1 ,1 )), nn.Flatten() ) net = nn.Sequential(b1, b2, b3, b4, b5, nn.Linear(1024 , 10 ))
1 2 3 4 X = torch.rand(size=(1 , 1 , 96 , 96 )) for layer in net: X = layer(X) print (layer.__class__.__name__,'output shape:\t' , X.shape)
Sequential output shape: torch.Size([1, 64, 24, 24])
Sequential output shape: torch.Size([1, 192, 12, 12])
Sequential output shape: torch.Size([1, 480, 6, 6])
Sequential output shape: torch.Size([1, 832, 3, 3])
Sequential output shape: torch.Size([1, 1024])
Linear output shape: torch.Size([1, 10])
ResNet 残差块 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 import torchfrom torch import nnfrom torch.nn import functional as Fclass Residual (nn.Module): def __init__ (self, input_channels, num_channels, use_1x1conv=False , strides=1 ): super ().__init__() self.conv1 = nn.Conv2d(input_channels, num_channels, kernel_size=3 , padding=1 , stride=strides) self.conv2 = nn.Conv2d(num_channels, num_channels, kernel_size=3 , padding=1 ) if use_1x1conv: self.conv3 = nn.Conv2d(input_channels, num_channels, kernel_size=1 , stride=strides) else : self.conv3 = None self.bn1 = nn.BatchNorm2d(num_channels) self.bn2 = nn.BatchNorm2d(num_channels) def forward (self, X ): Y = F.relu(self.bn1(self.conv1(X))) Y = self.bn2(self.conv2(Y)) if self.conv3: X = self.conv3(X) Y += X return F.relu(Y)
查看输入和输出形状一致的情况:
1 2 3 4 blk = Residual(3 ,3 ) X = torch.rand(4 , 3 , 6 , 6 ) Y = blk(X) Y.shape
torch.Size([4, 3, 6, 6])
也可以在增加输出通道数的同时,减半输出的高和宽
1 2 blk = Residual(3 ,6 , use_1x1conv=True , strides=2 ) blk(X).shape
torch.Size([4, 6, 3, 3])
ResNet模型 1 2 3 4 5 6 b1 = nn.Sequential( nn.Conv2d(1 , 64 , kernel_size=7 , stride=2 , padding=3 ), nn.BatchNorm2d(64 ), nn.ReLU(), nn.MaxPool2d(kernel_size=3 , stride=2 , padding=1 ) )
ResNet则使用4个由残差块组成的模块,每个模块使用若干个同样输出通道数的残差块。 第一个模块的通道数同输入通道数一致。 由于之前已经使用了步幅为2的最大汇聚层,所以无须减小高和宽。 之后的每个模块在第一个残差块里将上一个模块的通道数翻倍,并将高和宽减半。
1 2 3 4 5 6 7 8 9 10 def resnet_block (input_channels, num_channels, num_residuals, first_block=False ): blk = [] for i in range (num_residuals): if i == 0 and not first_block: blk.append(Residual(input_channels, num_channels, use_1x1conv=True , strides=2 )) else : blk.append(Residual(num_channels, num_channels)) return blk
1 2 3 4 b2 = nn.Sequential(*resnet_block(64 , 64 , 2 , first_block=True )) b3 = nn.Sequential(*resnet_block(64 , 128 , 2 )) b4 = nn.Sequential(*resnet_block(128 , 256 , 2 )) b5 = nn.Sequential(*resnet_block(256 , 512 , 2 ))
1 2 3 4 5 6 net = nn.Sequential( b1, b2, b3, b4, b5, nn.AdaptiveAvgPool2d((1 ,1 )), nn.Flatten(), nn.Linear(512 , 10 ) )
每个模块有4个卷积层(不包括恒等映射的 1×1 卷积层)。 加上第一个 7×7 卷积层和最后一个全连接层,共有18层。 因此,这种模型通常被称为ResNet-18。 通过配置不同的通道数和模块里的残差块数可以得到不同的ResNet模型,例如更深的含152层的ResNet-152。 虽然ResNet的主体架构跟GoogLeNet类似,但ResNet架构更简单,修改也更方便。这些因素都导致了ResNet迅速被广泛使用。
1 2 3 4 X = torch.rand(size=(1 , 1 , 224 , 224 )) for layer in net: X = layer(X) print (layer.__class__.__name__,'output shape:\t' , X.shape)
Sequential output shape: torch.Size([1, 64, 56, 56])
Sequential output shape: torch.Size([1, 64, 56, 56])
Sequential output shape: torch.Size([1, 128, 28, 28])
Sequential output shape: torch.Size([1, 256, 14, 14])
Sequential output shape: torch.Size([1, 512, 7, 7])
AdaptiveAvgPool2d output shape: torch.Size([1, 512, 1, 1])
Flatten output shape: torch.Size([1, 512])
Linear output shape: torch.Size([1, 10])
其他乱七八糟的东西 为了进行评估,我们需要[对 ] :numref:sec_softmax_scratch
中描述的(evaluate_accuracy
函数进行轻微的修改 )。 由于完整的数据集位于内存中,因此在模型使用GPU计算数据集之前,我们需要将其复制到显存中。