VGG16是Visual?Geometry?Group的縮寫,它的名字來源于提出該網(wǎng)絡的實驗室,本文我們將使用PyTorch來實現(xiàn)VGG16網(wǎng)絡,用于貓狗預測的二分類任務,我們將對VGG16的網(wǎng)絡結構進行適當?shù)男薷?以適應我們的任務,需要的朋友可以參考下
1. VGG16
1.1 VGG16 介紹
深度學習已經(jīng)在計算機視覺領域取得了巨大的成功,特別是在圖像分類任務中。VGG16是深度學習中經(jīng)典的卷積神經(jīng)網(wǎng)絡(Convolutional Neural Network,CNN)之一,由牛津大學的Karen Simonyan和Andrew Zisserman在2014年提出。VGG16網(wǎng)絡以其深度和簡潔性而聞名,是圖像分類中的重要里程碑。文章源自四五設計網(wǎng)-http://www.wasochina.com/45504.html
VGG16是Visual Geometry Group的縮寫,它的名字來源于提出該網(wǎng)絡的實驗室。VGG16的設計目標是通過增加網(wǎng)絡深度來提高圖像分類的性能,并展示了深度對于圖像分類任務的重要性。VGG16的主要特點是將多個小尺寸的卷積核堆疊在一起,從而形成更深的網(wǎng)絡。文章源自四五設計網(wǎng)-http://www.wasochina.com/45504.html
1.1.1 VGG16 網(wǎng)絡的整體結構
VGG16網(wǎng)絡由多個卷積層和全連接層組成。它的整體結構相對簡單,所有的卷積層都采用小尺寸的卷積核(通常為3x3),步幅為1,填充為1。每個卷積層后面都會跟著一個ReLU激活函數(shù)來引入非線性。文章源自四五設計網(wǎng)-http://www.wasochina.com/45504.html
VGG16網(wǎng)絡主要由三個部分組成:文章源自四五設計網(wǎng)-http://www.wasochina.com/45504.html
- 輸入層:接受圖像輸入,通常為224x224大小的彩色圖像(RGB)。
- 卷積層:VGG16包含13個卷積層,其中包括五個卷積塊。
- 全連接層:在卷積層后面是3個全連接層,用于最終的分類。
VGG16網(wǎng)絡結構如下圖:文章源自四五設計網(wǎng)-http://www.wasochina.com/45504.html
文章源自四五設計網(wǎng)-http://www.wasochina.com/45504.html
1、一張原始圖片被resize到(224,224,3)。
2、conv1兩次[3,3]卷積網(wǎng)絡,輸出的特征層為64,輸出為(224,224,64),再2X2最大池化,輸出net為(112,112,64)。
3、conv2兩次[3,3]卷積網(wǎng)絡,輸出的特征層為128,輸出net為(112,112,128),再2X2最大池化,輸出net為(56,56,128)。
4、conv3三次[3,3]卷積網(wǎng)絡,輸出的特征層為256,輸出net為(56,56,256),再2X2最大池化,輸出net為(28,28,256)。
5、conv4三次[3,3]卷積網(wǎng)絡,輸出的特征層為512,輸出net為(28,28,512),再2X2最大池化,輸出net為(14,14,512)。
6、conv5三次[3,3]卷積網(wǎng)絡,輸出的特征層為512,輸出net為(14,14,512),再2X2最大池化,輸出net為(7,7,512)。
7、利用卷積的方式模擬全連接層,效果等同,輸出net為(1,1,4096)。共進行兩次。
8、利用卷積的方式模擬全連接層,效果等同,輸出net為(1,1,1000)。
最后輸出的就是每個類的預測。文章源自四五設計網(wǎng)-http://www.wasochina.com/45504.html
1.2 Pytorch使用VGG16進行貓狗二分類實戰(zhàn)
在這一部分,我們將使用PyTorch來實現(xiàn)VGG16網(wǎng)絡,用于貓狗預測的二分類任務。我們將對VGG16的網(wǎng)絡結構進行適當?shù)男薷模赃m應我們的任務。文章源自四五設計網(wǎng)-http://www.wasochina.com/45504.html
1.2.1 數(shù)據(jù)集準備
首先,我們需要準備用于貓狗二分類的數(shù)據(jù)集。數(shù)據(jù)集可以從Kaggle上下載,其中包含了大量的貓和狗的圖片。在下載數(shù)據(jù)集后,我們需要將數(shù)據(jù)集劃分為訓練集和測試集。訓練集文件夾命名為train,其中建立兩個文件夾分別為cat和dog,每個文件夾里存放相應類別的圖片。測試集命名為test,同理。文章源自四五設計網(wǎng)-http://www.wasochina.com/45504.html
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | import torch import torchvision import torchvision.transforms as transforms # 定義數(shù)據(jù)轉換 transform = transforms.Compose([ ???? transforms.Resize(( 224 , 224 )), ???? transforms.ToTensor(), ???? transforms.Normalize(( 0.5 , 0.5 , 0.5 ), ( 0.5 , 0.5 , 0.5 )) ]) # 加載數(shù)據(jù)集 train_dataset = ImageFolder( "train" , transform = transform) test_dataset = ImageFolder( "test" , transform = transform) train_loader = DataLoader(train_dataset, batch_size = batch_size, shuffle = True ) test_loader = DataLoader(test_dataset, batch_size = batch_size) |
1.2.2 構建VGG網(wǎng)絡
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | import torch.nn as nn class VGG16(nn.Module): ???? def __init__( self ): ???????? super (VGG16, self ).__init__() ???????? self .features = nn.Sequential( ???????????? # Block 1 ???????????? nn.Conv2d( 3 , 64 , kernel_size = 3 , padding = 1 ), ???????????? nn.ReLU(inplace = True ), ???????????? nn.Conv2d( 64 , 64 , kernel_size = 3 , padding = 1 ), ???????????? nn.ReLU(inplace = True ), ???????????? nn.MaxPool2d(kernel_size = 2 , stride = 2 ), ???????????? # Block 2 ???????????? nn.Conv2d( 64 , 128 , kernel_size = 3 , padding = 1 ), ???????????? nn.ReLU(inplace = True ), ???????????? nn.Conv2d( 128 , 128 , kernel_size = 3 , padding = 1 ), ???????????? nn.ReLU(inplace = True ), ???????????? nn.MaxPool2d(kernel_size = 2 , stride = 2 ), ???????????? # Block 3 ???????????? nn.Conv2d( 128 , 256 , kernel_size = 3 , padding = 1 ), ???????????? nn.ReLU(inplace = True ), ???????????? nn.Conv2d( 256 , 256 , kernel_size = 3 , padding = 1 ), ???????????? nn.ReLU(inplace = True ), ???????????? nn.Conv2d( 256 , 256 , kernel_size = 3 , padding = 1 ), ???????????? nn.ReLU(inplace = True ), ???????????? nn.MaxPool2d(kernel_size = 2 , stride = 2 ), ???????????? # Block 4 ???????????? nn.Conv2d( 256 , 512 , kernel_size = 3 , padding = 1 ), ???????????? nn.ReLU(inplace = True ), ???????????? nn.Conv2d( 512 , 512 , kernel_size = 3 , padding = 1 ), ???????????? nn.ReLU(inplace = True ), ???????????? nn.Conv2d( 512 , 512 , kernel_size = 3 , padding = 1 ), ???????????? nn.ReLU(inplace = True ), ???????????? nn.MaxPool2d(kernel_size = 2 , stride = 2 ), ???????????? # Block 5 ???????????? nn.Conv2d( 512 , 512 , kernel_size = 3 , padding = 1 ), ???????????? nn.ReLU(inplace = True ), ???????????? nn.Conv2d( 512 , 512 , kernel_size = 3 , padding = 1 ), ???????????? nn.ReLU(inplace = True ), ???????????? nn.Conv2d( 512 , 512 , kernel_size = 3 , padding = 1 ), ???????????? nn.ReLU(inplace = True ), ???????????? nn.MaxPool2d(kernel_size = 2 , stride = 2 ), ???????? ) ???????? self .classifier = nn.Sequential( ???????????? nn.Linear( 512 * 7 * 7 , 4096 ), ???????????? nn.ReLU(inplace = True ), ???????????? nn.Dropout(), ???????????? nn.Linear( 4096 , 4096 ), ???????????? nn.ReLU(inplace = True ), ???????????? nn.Dropout(), ???????????? nn.Linear( 4096 , 2 )? # 輸出層,二分類任務 ???????? ) ???? def forward( self , x): ???????? x = self .features(x) ???????? x = torch.flatten(x, 1 )? # 展開特征圖 ???????? x = self .classifier(x) ???????? return x # 初始化VGG16模型 vgg16 = VGG16() |
在上述代碼中,我們定義了一個VGG16類,其中self.features
部分包含了5個卷積塊,self.classifier
部分包含了3個全連接層。文章源自四五設計網(wǎng)-http://www.wasochina.com/45504.html
1.2.3?訓練和評估模型
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | import torch.optim as optim # 定義超參數(shù) batch_size = 32 learning_rate = 0.001 num_epochs = 10 model = VGG16() device = torch.device( "cuda" if torch.cuda.is_available() else "cpu" ) model.to(device) # 定義損失函數(shù)和優(yōu)化器 criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(model.parameters(), lr = learning_rate, momentum = 0.9 ) # 訓練模型 total_step = len (train_loader) for epoch in range (num_epochs): ???? for i, (images, labels) in enumerate (train_loader): ???????? images = images.to(device) ???????? labels = labels.to(device) ???????? # 前向傳播 ???????? outputs = model(images) ???????? loss = criterion(outputs, labels) ???????? # 反向傳播和優(yōu)化 ???????? optimizer.zero_grad() ???????? loss.backward() ???????? optimizer.step() ???????? if (i + 1 ) % 100 = = 0 : ???????????? print (f "Epoch [{epoch+1}/{num_epochs}], Step [{i+1}/{total_step}], Loss: {loss.item()}" ) torch.save(model, 'model/vgg16.pth' ) # 測試模型 model. eval () with torch.no_grad(): ???? correct = 0 ???? total = 0 ???? for images, labels in test_loader: ???????? images = images.to(device) ???????? labels = labels.to(device) ???????? outputs = model(images) ???????? print (outputs) ???????? _, predicted = torch. max (outputs.data, 1 ) ???????? total + = labels.size( 0 ) ???????? correct + = (predicted = = labels). sum ().item() ???? print (f "Accuracy on test images: {(correct / total) * 100}%" ) |
在訓練模型時,我們使用交叉熵損失函數(shù)(CrossEntropyLoss)作為分類任務的損失函數(shù),并采用隨機梯度下降(SGD)作為優(yōu)化器。同時,我們將模型移動到GPU(如果可用)來加速訓練過程。
到此這篇關于Pytorch使用VGG16模型進行預測貓狗二分類實戰(zhàn)的文章就介紹到這了
data:image/s3,"s3://crabby-images/3ddba/3ddbabc6c94270cd05778f42a079c73662b2a7a6" alt="weinxin"
data:image/s3,"s3://crabby-images/4858c/4858cd992783d2e2a4a65c6188da33ad4d30b462" alt="weinxin"
評論