卷积神经网络(CNN)入门指南：从基本原理到实战项目示例

hwyzw · 发表于 2024-12-25 22:40:23

选自

作者：

参演：韩芳、王淑婷

卷积神经网络可以算是深度神经网络中非常流行的网络。本文从基础开始，介绍了卷积网络的基本原理以及其他相关技术，并利用卷积网络做了一个简单的项目作为示例参考。想要购买CNN的朋友不要错过哦~

首先，我们看一下下面的照片：

图片来源：

这不是真实的照片。您可以创建一个新窗口来打开它，放大并查看马赛克。

这张照片居然是AI生成的，是不是看起来很真实？

亚历克斯和朋友们公布这项技术才过去了七年。这是一项每年举办一次的大型图像识别竞赛，可识别从阿拉斯加雪橇犬到卫生纸应用等 1000 多个类别。后来他们又创造了一个，并以遥遥领先第二名的成绩赢得了比赛。

这项技术就是卷积神经网络。它是深度神经网络的一个分支，特别擅长处理图像。

图片来源：

上图是多年来赢得挑战的软件产生的错误率。可以发现，2016年错误率下降到了5%，已经超越了人类水平。

深度学习的引入更多的是打破规则而不是改变规则。

卷积神经网络架构

那么问题来了，卷积神经网络是如何工作的呢？

卷积神经网络之所以优于其他深度神经网络，是由于其特殊的运算。 CNN 不是一次只计算图像中的单个像素，而是组合来自多个像素（例如上图中的 3*3 像素）的信息，因此能够理解时间模式。

此外，CNN可以“看到”一组像素组合成直线或曲线。由于深度神经网络通常是多层卷积的堆叠，通过上一层得到一条直线或曲线后，下一层不再将像素组合起来，而是将线条组合成形状，逐层进行，直到一张完整的图片。

深度卷积神经网络图来自

要深入理解CNN，你需要学习很多基础知识，比如什么是，什么是层。但现在有很多优秀的开源项目，你可以直接基于它们来学习和利用。

这就引入了另一种技术——迁移学习。

迁移学习

迁移学习使用经过训练的深度学习模型来学习特定任务。

例如，如果您在火车调度公司工作，您希望在不增加劳动力的情况下预测火车是否会晚点。

你绝对可以使用网上的卷积神经网络模型，比如2015年的冠军。用火车图片重新训练网络，相信我，你不会对结果感到失望。

迁移学习有两个主要优点：

图像分类到图像生成

通过迁移学习，出现了许多有趣的想法。既然我们可以处理图像并识别图像中的信息，为什么我们不自己生成图像呢？

因为斯汀！

生成对抗网络由此应运而生。

朱俊彦等人提出。

给定一定的输入，该技术可以生成相应的图像。

如上图所示，可以根据一幅画生成对应的真实照片，可以根据草图生成背包的照片，甚至可以进行超分辨率重建。

超分辨率生成对抗网络

太棒了，对吧？

当然，您可以学习构建这些网络。但如何开始呢？

卷积神经网络教程

首先你要知道，上手很容易，但掌握却没那么容易。

让我们先从基础开始。

图片来源：上

航拍仙人掌鉴定

在这个学习项目中，您的任务是识别航拍图像中是否存在柱状仙人掌。

是不是看起来非常简单呢？

提供了 17,500 张图像，其中 4,000 张未标记为测试集。如果你的模型能够正确标记 4000 张图像，它将获得 1 或 100% 的满分。

找了很久，终于找到了下面这个非常适合初学者的项目。

仙人掌

此图像与上面的图像类似。它的尺寸为 32*32，带或不带柱状仙人掌。因为是航拍，所以包含了各种角度。

那么你需要什么？

构建一个卷积神经网络

是的，-深度学习中最流行的语言。至于深度学习框架，你有很多选择，你可以一一尝试：

，最流行的深度学习框架，由 工程师构建，拥有最多的贡献者和粉丝。由于社区比较大，当你遇到问题时可以很容易找到解决方案。他们的高级 API keras 在初学者中非常受欢迎。

，我最喜欢的深度学习框架。因此，纯实现继承了各种优点和缺点。开发人员可以轻松上手。它也有提供抽象的库，就像 Keras 一样。

MXNet，一个开发的深度学习框架。

，的前身。

CNTK，微软开发的深度学习框架。

本教程中使用的是我最喜欢并使用过的。

在开始之前，您需要安装它。浏览官方网站并下载您需要的版本。你需要确保的是你必须使用3.6+版本，否则你需要使用的一些库将不被支持。

现在，打开命令行或终端并安装以下库：

 <pre style="box-sizing: border-box;font-size: 16px;color: rgb(62, 62, 62);line-height: inherit;text-align: start;background-color: rgb(255, 255, 255);"><code class="python language-python hljs" style="box-sizing: border-box;margin-right: 2px;margin-left: 2px;padding: 0.5em;font-size: 14px;color: rgb(169, 183, 198);line-height: 18px;border-radius: 0px;background: rgb(40, 43, 46);font-family: Consolas, Inconsolata, Courier, monospace;display: block;overflow-x: auto;letter-spacing: 0px;overflow-wrap: normal !important;word-break: normal !important;overflow-y: auto !important;">pip install numpy pip install pandas pip install jupyter </code></pre>
Numpy 用于存储输入图像、处理 CSV 文件和编码。

然后，去官网下载需要的版本，如果想加快训练速度，就安装CUDA版本，且版本至少为1.0以上。

完成上述操作后，安装并：

 <pre style="box-sizing: border-box;font-size: 16px;color: rgb(62, 62, 62);line-height: inherit;text-align: start;background-color: rgb(255, 255, 255);"><code class="python language-python hljs" style="box-sizing: border-box;margin-right: 2px;margin-left: 2px;padding: 0.5em;font-size: 14px;color: rgb(169, 183, 198);line-height: 18px;border-radius: 0px;background: rgb(40, 43, 46);font-family: Consolas, Inconsolata, Courier, monospace;display: block;overflow-x: auto;letter-spacing: 0px;overflow-wrap: normal !important;word-break: normal !important;overflow-y: auto !important;">pip install torchvision pip install fastai </code></pre>
运行命令，打开，它将打开一个浏览器窗口。

现在所需的环境已经配置完毕，让我们开始吧。

准备数据

导入所需代码：

 <pre style="box-sizing: border-box;font-size: 16px;color: rgb(62, 62, 62);line-height: inherit;text-align: start;background-color: rgb(255, 255, 255);"><code class="python language-python hljs" style="box-sizing: border-box;margin-right: 2px;margin-left: 2px;padding: 0.5em;font-size: 14px;color: rgb(169, 183, 198);line-height: 18px;border-radius: 0px;background: rgb(40, 43, 46);font-family: Consolas, Inconsolata, Courier, monospace;display: block;overflow-x: auto;letter-spacing: 0px;overflow-wrap: normal !important;word-break: normal !important;overflow-y: auto !important;">import numpy as np import pandas as pd from pathlib import Path from fastai import * from fastai.vision import * import torch %matplotlib inline </code></pre>
基本上任何任务都需要 Numpy。 Torch 是您的深度学习库。用于显示图表。

您可以从下方大赛官网下载数据。

解压缩 zip 文件并将其放入文件夹中。

假设你的名字叫仙人掌。您的文件夹结构将如下所示：

Train 文件夹包含所有训练图像。

Test 文件夹用于存放提交的测试图像。

Train CSV 文档包含训练数据信息，将图像名称映射到列。如果该列存在，则值为 1，否则为 0。

提交所需的格式为 CSV。文件名与 Test 文件夹中的图像相对应。

 <pre style="box-sizing: border-box;font-size: 16px;color: rgb(62, 62, 62);line-height: inherit;text-align: start;background-color: rgb(255, 255, 255);"><code class="python language-python hljs" style="box-sizing: border-box;margin-right: 2px;margin-left: 2px;padding: 0.5em;font-size: 14px;color: rgb(169, 183, 198);line-height: 18px;border-radius: 0px;background: rgb(40, 43, 46);font-family: Consolas, Inconsolata, Courier, monospace;display: block;overflow-x: auto;letter-spacing: 0px;overflow-wrap: normal !important;word-break: normal !important;overflow-y: auto !important;">train_df = pd.read_csv("train.csv") </code></pre>
将训练 CSV 文档加载到数据框中。

 <pre style="box-sizing: border-box;font-size: 16px;color: rgb(62, 62, 62);line-height: inherit;text-align: start;background-color: rgb(255, 255, 255);"><code class="python language-python hljs" style="box-sizing: border-box;margin-right: 2px;margin-left: 2px;padding: 0.5em;font-size: 14px;color: rgb(169, 183, 198);line-height: 18px;border-radius: 0px;background: rgb(40, 43, 46);font-family: Consolas, Inconsolata, Courier, monospace;display: block;overflow-x: auto;letter-spacing: 0px;overflow-wrap: normal !important;word-break: normal !important;overflow-y: auto !important;">data_folder = Path(".") train_images = ImageList.from_df(train_df, path=data_folder, folder='train') </code></pre>
使用将数据帧映射到训练文件夹中的图像的方法创建负载生成器。

数据增强

这是一种从现有数据创建更多数据的技术。猫的图片水平翻转后仍然是猫的图片。但通过这样做，您可以将数据增加一倍、四倍甚至 16 倍。

如果你的数据量不大，可以尝试这个方法。

 <pre style="box-sizing: border-box;font-size: 16px;color: rgb(62, 62, 62);line-height: inherit;text-align: start;background-color: rgb(255, 255, 255);"><code class="python language-python hljs" style="box-sizing: border-box;margin-right: 2px;margin-left: 2px;padding: 0.5em;font-size: 14px;color: rgb(169, 183, 198);line-height: 18px;border-radius: 0px;background: rgb(40, 43, 46);font-family: Consolas, Inconsolata, Courier, monospace;display: block;overflow-x: auto;letter-spacing: 0px;overflow-wrap: normal !important;word-break: normal !important;overflow-y: auto !important;">transformations = get_transforms(do_flip=True, flip_vert=True, max_rotate=10.0, max_zoom=1.1, max_lighting=0.2, max_warp=0.2, p_affine=0.75, p_lighting=0.75) </code></pre>
提供了执行这些操作的函数。您可以通过水平翻转、垂直翻转、旋转、放大、增加亮度/亮度或添加仿射变换来增强数据。

您可以使用我上面提供的参数来尝试图片的外观。或者你可以详细阅读官方文档。

然后，对图像序列进行上述预处理。

 <pre style="box-sizing: border-box;font-size: 16px;color: rgb(62, 62, 62);line-height: inherit;text-align: start;background-color: rgb(255, 255, 255);"><code class="python language-python hljs" style="box-sizing: border-box;margin-right: 2px;margin-left: 2px;padding: 0.5em;font-size: 14px;color: rgb(169, 183, 198);line-height: 18px;border-radius: 0px;background: rgb(40, 43, 46);font-family: Consolas, Inconsolata, Courier, monospace;display: block;overflow-x: auto;letter-spacing: 0px;overflow-wrap: normal !important;word-break: normal !important;overflow-y: auto !important;">train_img = train_img.transform(transformations, size=128) </code></pre>
参数大小将用于放大或缩小输入以匹配您将使用的神经网络。我使用的网络是2017年最佳论文奖的成果。需要输入的图片尺寸为128*128。

准备训练

看完数据，我们就来到了深度学习最关键的一步——训练。这个过程也是深度学习中学习的起源。网络从您的数据中学习，并根据学习的结果调整其参数，直到获得更好的数据结果。

 <pre style="box-sizing: border-box;font-size: 16px;color: rgb(62, 62, 62);line-height: inherit;text-align: start;background-color: rgb(255, 255, 255);"><code class="python language-python hljs" style="box-sizing: border-box;margin-right: 2px;margin-left: 2px;padding: 0.5em;font-size: 14px;color: rgb(169, 183, 198);line-height: 18px;border-radius: 0px;background: rgb(40, 43, 46);font-family: Consolas, Inconsolata, Courier, monospace;display: block;overflow-x: auto;letter-spacing: 0px;overflow-wrap: normal !important;word-break: normal !important;overflow-y: auto !important;">test_df = pd.read_csv("sample_submission.csv") test_img = ImageList.from_df(test_df, path=data_folder, folder='test') train_img = train_img .split_by_rand_pct(0.01) .label_from_df() .add_test(test_img) .databunch(path='.', bs=64, device=torch.device('cuda:0')) .normalize(imagenet_stats) </code></pre>

在训练步骤中，需要将训练数据分成一小部分作为验证集。您不能使用此数据进行训练，因为它仅用于验证。当你的卷积神经网络在验证集上表现良好时，它很可能也可以在测试集上提交更好的结果。

提供了方便上述操作的函数。

函数可以进行批处理。由于GPU内存限制，我的batch size是64。如果你没有GPU，请忽略这个参数。

之后，由于您使用的是预先训练的网络，因此请使用该函数对图像进行归一化。该函数根据预训练模型的训练方式对输入图像进行标准化。

将测试数据添加到训练数据列表中可以使以后的预测更加容易，从而无需进行额外的预处理。请记住，这些图像不能用于训练或验证。这只是为了确保训练图像和测试图像以完全相同的方式进行预处理。

 <pre style="box-sizing: border-box;font-size: 16px;color: rgb(62, 62, 62);line-height: inherit;text-align: start;background-color: rgb(255, 255, 255);"><code class="python language-python hljs" style="box-sizing: border-box;margin-right: 2px;margin-left: 2px;padding: 0.5em;font-size: 14px;color: rgb(169, 183, 198);line-height: 18px;border-radius: 0px;background: rgb(40, 43, 46);font-family: Consolas, Inconsolata, Courier, monospace;display: block;overflow-x: auto;letter-spacing: 0px;overflow-wrap: normal !important;word-break: normal !important;overflow-y: auto !important;">learn = cnn_learner(train_img, models.densenet161, metrics=[error_rate, accuracy]) </code></pre>
目前数据准备工作已经完成。现在，使用来创建一个训练器。如上所述，我将其用作预训练网络。当然，您也可以选择其他提供的网络。

单循环技术

现在您可以开始训练了。然而，包括卷积神经网络在内的深度学习训练中的一个大问题是如何选择正确的学习率。学习率决定了梯度下降过程中更新参数时误差减少的程度。

如上图所示，较大的学习率使得训练过程更快，但更容易错过误差边界，甚至跳出可控范围而无法收敛。然而，当使用稍小的学习率时，训练过程会变慢但不会发散。

因此，选择合适的学习率非常重要。我们想要找到足够大的正确学习率而不会使训练发散。

但这说起来容易做起来难。

因此，一个名叫史密斯的人提出了单期策略。

简单来说，就是先暴力搜索几种不同的学习率，然后选择最接近最小误差但仍有改进空间的。代码如下：

 <pre style="box-sizing: border-box;font-size: 16px;color: rgb(62, 62, 62);line-height: inherit;text-align: start;background-color: rgb(255, 255, 255);"><code class="python language-python hljs" style="box-sizing: border-box;margin-right: 2px;margin-left: 2px;padding: 0.5em;font-size: 14px;color: rgb(169, 183, 198);line-height: 18px;border-radius: 0px;background: rgb(40, 43, 46);font-family: Consolas, Inconsolata, Courier, monospace;display: block;overflow-x: auto;letter-spacing: 0px;overflow-wrap: normal !important;word-break: normal !important;overflow-y: auto !important;">learn.lr_find() learn.recorder.plot() </code></pre>
您将得到以下输出：

最小误差值在10^-1，所以我们可以使用比这个值稍小的学习率，比如3*10^-2。

 <pre style="box-sizing: border-box;font-size: 16px;color: rgb(62, 62, 62);line-height: inherit;text-align: start;background-color: rgb(255, 255, 255);"><code class="python language-python hljs" style="box-sizing: border-box;margin-right: 2px;margin-left: 2px;padding: 0.5em;font-size: 14px;color: rgb(169, 183, 198);line-height: 18px;border-radius: 0px;background: rgb(40, 43, 46);font-family: Consolas, Inconsolata, Courier, monospace;display: block;overflow-x: auto;letter-spacing: 0px;overflow-wrap: normal !important;word-break: normal !important;overflow-y: auto !important;">lr = 3e-02 learn.fit_one_cycle(5, slice(lr)) </code></pre>
训练几个epoch（这里我选择5个，不太大也不太小）并查看结果。

等等，这是怎么回事？！

验证集准确率达到100%！训练过程其实非常高效，只用了六分钟。这是多么大的祝福啊！在实践中，您可能需要多次迭代才能找到正确的算法。

我等不及要提交了！哈哈。让我们预测并提交测试集结果。

 <pre style="box-sizing: border-box;font-size: 16px;color: rgb(62, 62, 62);line-height: inherit;text-align: start;background-color: rgb(255, 255, 255);"><code class="python language-python hljs" style="box-sizing: border-box;margin-right: 2px;margin-left: 2px;padding: 0.5em;font-size: 14px;color: rgb(169, 183, 198);line-height: 18px;border-radius: 0px;background: rgb(40, 43, 46);font-family: Consolas, Inconsolata, Courier, monospace;display: block;overflow-x: auto;letter-spacing: 0px;overflow-wrap: normal !important;word-break: normal !important;overflow-y: auto !important;">preds,_ = learn.get_preds(ds_type=DatasetType.Test) test_df.has_cactus = preds.numpy()[:, 0] </code></pre>
由于测试图像之前已放入训练图像列表中，因此无需对测试图像进行预处理。

 <pre style="box-sizing: border-box;font-size: 16px;color: rgb(62, 62, 62);line-height: inherit;text-align: start;background-color: rgb(255, 255, 255);"><code class="python language-python hljs" style="box-sizing: border-box;margin-right: 2px;margin-left: 2px;padding: 0.5em;font-size: 14px;color: rgb(169, 183, 198);line-height: 18px;border-radius: 0px;background: rgb(40, 43, 46);font-family: Consolas, Inconsolata, Courier, monospace;display: block;overflow-x: auto;letter-spacing: 0px;overflow-wrap: normal !important;word-break: normal !important;overflow-y: auto !important;">test_df.to_csv('submission.csv', index=False) </code></pre>
上面的代码行将创建一个 CSV 文件，其中包含 4000 个测试图像的名称以及每个图像是否包含仙人掌标签。

当我尝试提交时，我发现CSV需要通过core提交，这是我之前没有注意到的。

图片来源：

幸运的是，核心的运行方式非常相似。您可以完全复制并粘贴您在此处创建的内容并提交。

然后，Duang~完成了！

我的天啊！分数结果是0.9999，已经非常不错了。当然，如果你第一次就得到这么好的成绩，应该还有改进的空间。

于是，我调整了网络结构，再次尝试。

得分1！我做到了！！所以你也可以，其实没那么难。

（另外，这个排名是从4月13日开始的，所以我的排名现在可能已经下降了......）

我学到了什么

这个项目非常简单，在解决任务的过程中不会遇到任何奇怪的挑战，所以这个项目非常适合入门。

而且由于很多人已经取得了满分，我认为组织者应该创建另一个测试集来提交，最好是难度更高的测试集。

不管怎样，上手这个项目基本上没有什么难度。您可以立即尝试并获得高分。

资料来源：马里奥·姆拉德

卷积神经网络对于各种不同的任务都有效，无论是图像识别还是图像生成。分析图像并不像以前那么困难。当然，只要你尝试，你就可以做到。

所以，选择一个好的卷积神经网络项目，准备高质量的数据，然后开始吧！

原文链接：

卷积神经网络(CNN)入门指南：从基本原理到实战项目示例

更多帖子推荐