计算机视觉技术 图像分类_如何训练图像分类器并教您的计算机日语
计算机视觉技术 图像分类

介绍 (Introduction)

Hi. Hello. こんにちは

你好 你好。 こんにちは

Those squiggly characters you just saw are from a language called Japanese. You’ve probably heard of it if you’ve ever watched Dragon Ball Z.

您刚刚看到的那些蠕动的字符来自一种叫做日语的语言。 如果您曾经看过《龙珠Z》,可能已经听说过。

Here’s the problem though: you know those ancient Japanese scrolls that make you look like you’re going to unleash an ultimate samurai ninja overlord super combo move.


Yeah, those. I can’t exactly read them, and it turns out that very few people can.

是的,那些。 我看不懂它们,事实证明很少有人能读。

Luckily, a bunch of smart people understands how important it is that I master the Bijudama-Rasenshuriken, so they invented this thing called deep learning.


So pack your ramen and get ready. In this article, I’ll show you how to train a neural network that can accurately predict Japanese characters from  their images.

因此,打包您的拉面并做好准备。 在本文中,我将向您展示如何训练一个神经网络,该网络可以从图像中准确预测日语字符。

To  ensure that we get good results, I’m going to use of an incredible deep learning library called fastAI, which is a wrapper around PyTorch that  makes it easy to implement best practices from modern research. You can  read more about it on their .

为确保获得良好的结果,我将使用一个名为fastAI的令人难以置信的深度学习库,该库是PyTorch的包装,可轻松实施现代研究中的最佳实践。 您可以在他们的了解更多信息。

With that said, let’s get started.



OK, so before we can create anime subtitles, we’re going to need a dataset. Today we’re going to focus on KMNIST.

好的,因此在创建动漫字幕之前,我们将需要一个数据集。 今天,我们将专注于KMNIST。

This dataset takes of examples of characters from the Japanese Kuzushiji script, and organizes them into 10 labeled classes. The images measure 28x28 pixels, and there are 70,000 images in total, mirroring the  structure of MNIST.

该数据集采用了日语Kuzushiji脚本中的字符示例,并将其组织为10个带有标签的类。 图像尺寸为28x28像素,总共有70,000张图像,反映了MNIST的结构。

But why KMNIST? Well firstly, it has “MNIST” in its name, and we all know how much people in machine learning love MNIST.

但是为什么要KMNIST? 首先,它的名称为“ MNIST”,我们都知道有多少机器学习人员喜欢MNIST。

So  in theory, you could just change a few lines of that Keras code that  you copy-pasted from Stack Overflow and BOOM! You now have computer code  that can .

因此,从理论上讲,您只需更改从Stack Overflow和BOOM复制粘贴的Keras代码的几行即可! 现在,您拥有可以计算机代码。

Of  course, in practice, it isn’t that simple. For starters, the cute  little model that you trained on MNIST probably won’t do that well.  Because, you know, figuring out whether a number is a 2 or a 5 is just a  tad easier than deciphering a forgotten cursive script that only a  handful of people on earth know how to read.

当然,实际上,这不是那么简单。 对于初学者来说,您在MNIST上训练的可爱小模型可能效果不佳。 因为,您知道弄清楚数字是2还是5只是比破译一个被遗忘的草书(仅地球上只有少数人会识字)更容易的一点。

Apart  from that, I guess I should point out that Kuzushiji, which is what the  “K” in KMNIST stands for, is not just 10 characters long.  Unfortunately, I’m NOT one of the handfuls of experts that can read the language, so I can’t describe in intricate detail how it works.

除此之外,我想指出的是,KMNIST中的“ K”代表的Kuzushiji不仅仅是10个字符长。 不幸的是,我不是能读懂该语言的少数专家之一,因此我无法详细描述它的工作原理。

But  here’s what I do know: There are actually three variants of these  Kuzushiji character datasets — KMNIST, Kuzushiji-49, and  Kuzushiji-Kanji.


Kuzushiji-49 is variant with 49 classes instead of 10. Kuzushiji-Kanji is even more insane, with a whopping 3832 classes.


Yep, you read that right. It’s three times as many classes as ImageNet.

是的,您没看错。 它是ImageNet的三倍。

如何不弄乱您的数据集 (How to Not Mess Up Your Dataset)

To  keep things as MNIST-y as possible, it looks like the researchers who  put out the KMNIST dataset kept it in the original format (man, they  really took that whole MNIST thing to heart, didn’t they).


If you take a look at , you’ll see that the dataset is served in two formats: the original MNIST thing, and as a bunch of Numpy arrays.

如果查看 ,您会看到该数据集以两种格式提供:原始MNIST数据和一堆Numpy数组。

Of course, I know you were probably too lazy to click that link. So here you go. You can thank me later.

当然,我知道您可能懒得单击该链接。 所以,你去。 您可以稍后感谢我。

Personally,  I found the NumPy array format easier to work with when using fastai,  but the choice is yours. If you’re using PyTorch, KMNIST comes for free  as a part of .

就个人而言,我发现使用fastai时更容易使用NumPy数组格式,但是选择是您自己的。 如果您使用的是PyTorch,则KMNIST作为的一部分免费 。

The  next challenge is actually getting those 10,000-year-old brush strokes  onto your notebook (or IDE, who am I to judge). Luckily, the GitHub repo  mentions that there’s this handy script called download_data.py that’ll  do all the work for us. Yay!

下一个挑战实际上是将这些具有10,000年历史的画笔笔触放到您的笔记本电脑(或IDE,我要判断谁)上。 幸运的是,GitHub回购提到了一个名为download_data.py的便捷脚本,它将为我们完成所有工作。 好极了!

From here, it’ll probably start getting awkward if I continue talking  about how to pre-process your data without actual code. So check out if you want to dive deeper.

从这里开始,如果我继续谈论如何在没有实际代码的情况下预处理数据,它可能会变得尴尬。 因此,如果您想进一步潜水,请查看 。

Moving on…


我应该使用超超高的Inception ResNet XXXL吗? (Should I use a hyper ultra Inception ResNet XXXL?‍)

简短答案 (Short Answer)

Probably not. A regular ResNet should be fine.

可能不是。 定期使用ResNet应该可以。

简短回答少一些 (A Little Less Short Answer)

Ok, look. By now, you’re probably thinking, “KMNIST big. KMNIST hard. Me need to use very new, very fancy model.”

好,看 到现在为止,您可能会想,“ KMNIST大。 KMNIST努力。 我需要使用非常新颖,非常漂亮的模型。”

Did I overdo the Bizzaro voice?


The point is, you DON’T need a shiny new model to do well on these image classification tasks.  At best, you’ll probably get a marginal accuracy improvement at the cost  of a whole lot of time and money.

问题的关键是,你并不需要一个全新的模型来对这些图像分类工作做得很好。 充其量,您可能会以大量的时间和金钱为代价获得少量的精度提高。

Most of the time, you’ll just waste a whole lot of time and money.


So  heed my advice — just stick to good ol’ fashion ResNets. They work  really well, they're relatively fast and lightweight (compared to some  of the other memory hogs like Inception and DenseNet), and best of all,  people have been using them for a while, so it shouldn’t be too hard to  fine-tune.

因此,请注意我的建议-坚持使用流行的ResNets。 它们工作得非常好,它们相对较快且轻便(与其他一些内存生猪如Inception和DenseNet相比),并且最重要的是,人们已经使用了一段时间,因此使用它应该不会太难微调。

If the  dataset you’re working with is simple like MNIST, use ResNet18. If it’s  medium-difficulty, like CIFAR10, use ResNet34. If it’s really hard,  like ImageNet, use ResNet50. If it’s harder than that, you can probably  afford to use something better than a ResNet.

如果您使用的数据集很简单,例如MNIST,请使用ResNet18。 如果是中等难度,例如CIFAR10,请使用ResNet34。 如果真的很难,例如ImageNet,请使用ResNet50。 如果比这困难,您可能可以负担得起使用比ResNet更好的东西。

Don’t believe me? Check out my leading entry for the Stanford DAWNBench competition from April 2019:

不相信我吗 查看我从2019年4月起参加斯坦福DAWNBench竞赛的主要参赛作品:

What do you see? ResNets everywhere! Now come on, there’s got to be a reason for that.‍

你看到了什么? ResNets无处不在! 现在来,一定有一个理由。

超参数 (Hyperparameters Galore)

A few months ago, I wrote an article on .  If you’re interested in a more general solution to this herculean task,  go check that out. Here, I’m going to walk you through my process of  picking good-enough hyperparameters to get good-enough results on  KMNIST.

几个月前,我写了一篇关于 。 如果您对这个繁重的任务有一个更通用的解决方案感兴趣,请检查一下。 在这里,我将引导您完成选择足够好的超参数以在KMNIST上获得足够好的结果的过程。

To start off, let’s go over what hyperparameters we need to tune.


We’ve  already decided to use a ResNet34, so that’s that. We don’t need to  figure out the number of layers, filter size, number of filters, etc.  since that comes baked into our model.

我们已经决定使用ResNet34,仅此而已。 由于模型中包含了层数,滤镜大小,滤镜数量等,因此我们无需弄清楚。

See, I told you it would save time.


So  what’s remaining is the big three: learning rate, batch size, and the  number of epochs (plus stuff like dropout probability for which we can  just use the default values).


Let’s go over them one by one.


纪元数 (Number of Epochs)

Let’s  start with the number of epochs. As you’ll come to see when you play  around with the model in the notebook, our training is pretty efficient.  We can easily cross 90% accuracy within a few minutes.

让我们从时期数开始。 当您看到使用笔记本中的模型时,我们的培训非常有效。 我们可以在几分钟内轻松达到90%的精度。

So  given that our training is so fast in the first place, it seems  extremely unlikely that we would use too many epochs and overfit. I’ve  seen other KMNIST models train for over 50 epochs without any issues, so  staying in the 0-30 range should be absolutely fine.

因此,考虑到我们的培训是如此之快,我们似乎不太可能使用过多的时间和过度拟合。 我已经看到其他KMNIST型号训练了50个以上的纪元而没有任何问题,因此,将其保持在0-30范围内绝对可以。

That  means within the scope of the restrictions we’ve put on the model when  it comes to epochs, the more, the merrier. In my experiments, I found  that 10 epochs strike a good balance between model accuracy and training  time.

这意味着在涉及时代的情况下,我们在模型上施加的限制范围内,越多越好。 在我的实验中,我发现10个纪元在模型准确性和训练时间之间取得了很好的平衡。

学习率 (Learning Rate)

What  I’m about to say is going to piss a lot of people off. But I’ll say it  anyway — We don’t need to pay too much attention to the learning rate.

我要说的是要惹恼很多人。 但是我还是要说-我们不需要太在意学习率。

Yep, you heard me right. But give me a chance to explain.

是的,你没听错。 但是给我一个解释的机会。

Instead  of going “Hmm… that doesn’t seem to work, let’s try again with lr=3e-3  ,” we’re going to use a much more systematic and disciplined approach to  finding a good learning rate.

与其说“嗯……似乎不起作用,不如让我们再次尝试使用lr = 3e-3”,我们将使用一种更加系统化和规范化的方法来找到一个好的学习率。

We’re going to use the learning rate finder, a revolutionary idea proposed by Leslie Smith in his .

我们将使用学习率查找器,这是莱斯利·史密斯(Leslie Smith)在其提出的一种革命性想法。

Here’s how it works:


  • First,  we set up our model and prepare to train it for one epoch. As the model  is training, we’ll gradually increase the learning rate.

    首先,我们建立模型并准备将其训练一个时期。 在训练模型的过程中,我们将逐渐提高学习率。
  • Along the way, we’ll keep track of the loss at every iteration.

  • Finally, we select the learning rate the corresponds to the lowest loss.


When all is said and done, and you plot the loss against the learning rate, you should see something like this:


Now, before you get all giddy and pick 1e-01 as the learning rate, I’ll have you know that it’s NOT the best choice.


That’s  because fastai implements a smoothening technique called exponentially  weighted averages, which is the deep learning researcher version of an  Instagram filter. It prevents our plots from looking like the result of  giving your neighbors’ kid too much time with a blue crayon.

这是因为fastai实现了一种称为指数加权平均值的平滑技术,这是Instagram过滤器的深度学习研究员版本。 它会阻止我们的地块看起来像是用蓝色蜡笔给您邻居的孩子太多时间的结果。

Since  we’re using a form of averaging to make the plot look smooth, the  “minimum” point that you’re looking at on the learning rate finder isn’t  actually a minimum. It’s an average.

由于我们使用平均的形式来使图看起来更平滑,因此您在学习率查找器上看到的“最小”点实际上并不是最小值。 这是平均水平。

Instead, to actually find the learning rate, a good rule of thumb is to pick the learning  rate that’s an order of magnitude lower than the minimum point on the  smoothened plot. That tends to work really well in practice.

相反,要真正找到学习率,一个好的经验法则是选择比平滑化图上的最小点低一个数量级的学习率。 在实践中,这往往效果很好。

I  understand that all this plotting and averaging might seem weird if all  you’ve been brute-forcing learning rate values all your life. So I’d  advise you to check out to learn more.

我了解,如果您一生都在强迫学习率值,那么所有这些绘制和求平均值的过程似乎很奇怪。 因此,我建议您查看以了解更多信息。

批量大小 (Batch Size)

OK, you caught me red-handed here. My initial experiments used a batch size of 128 since that’s what the top submission used.

好,你在这里抓到我了。 我最初的实验使用的批处理大小为128,因为这是最高提交所使用的。

I  know, I know. Not very creative. But it’s what I did. Afterward, I  experimented with a few other batch sizes, and I couldn’t get better  results. So 128 it is!

我知道我知道。 不太有创意。 但这就是我所做的。 之后,我尝试了其他一些批次大小,但没有得到更好的结果。 所以是128!

In  general, batch sizes can be a weird thing to optimize, since it  partially depends on the computer you’re using. If you have a GPU with  more VRAM, you can train on larger batch sizes.

通常,优化批量大小可能是一件奇怪的事情,因为它部分取决于您所使用的计算机。 如果您的GPU具有更多的VRAM,则可以进行更大批量的训练。

So  if I tell you to use a batch size of 2048, for example, instead of  getting that coveted top spot on Kaggle and eternal fame and glory for  life, you might just end up with a CUDA: out of memory error.


So  it’s hard to recommend a perfect batch size because, in practice, there  are clearly computational limits. The best way to pick it is to try out  values that work for you.

因此,很难建议理想的批处理大小,因为在实践中显然存在计算限制。 最好的选择是尝试适合您的价值。

But how would you pick a random number from the vast sea of positive integers?


Well,  you actually don’t. Since GPU memory comes is organized in bits, it’s a  good idea to choose a batch size that’s a power of 2 so that your  mini-batches fit snugly in memory.

好吧,你实际上没有。 由于GPU内存是按位组织的,因此最好选择2的幂的批处理大小,以使您的迷你批处理恰好适合内存。

Here’s  what I would do: start off with a moderately large batch size like 512.  Then, if you find that your model starts acting weird and the loss is  not on a clear downward trend, half it. Next, repeat the training  process with a batch size of 256, and see if it behaves this time.

这就是我要做的:从一个中等大的批次(如512)开始。然后,如果您发现模型开始表现怪异并且损失没有明显的下降趋势,则减少一半。 接下来,以256的批量大小重复训练过程,然后查看它是否在这次运行。

If it doesn’t, wash, rinse, and repeat.


一些漂亮的照片 (A Few Pretty Pictures)

With  the optimizations going on here, it’s going to be pretty challenging to  keep track of this giant mess of models, metrics, and hyperparameters  that we’ve created.


To ensure that we all remain sane human beings while climbing the accuracy mountain, we’re going to use .

为了确保在攀登准确性山峰时我们所有人仍然保持理智,我们将使用 。

So what does wandb actually do?


It  keeps track of a whole lot of statistics about your model and how it’s  performing automatically. But what’s really cool is that it also  provides instant charts and visualizations to keep track of critical  metrics like accuracy and loss, all in real-time!

它跟踪有关模型及其自动执行情况的大量统计信息。 但是真正酷的是它还提供即时图表和可视化效果,以实时跟踪关键指标,如准确性和损失!

If  that wasn’t enough, it also stores all of those charts, visualizations,  and statistics in the cloud, so you can access them anytime anywhere.


Your days of starting at a black terminal screen and fiddling around with matplotlib are over.


for this article has a straightforward introduction to how it works seamlessly with fastai. You can also check out , where you can take a look at all the stuff I mentioned without writing any code.

直接介绍了它如何与fastai无缝配合。 您还可以检出 ,在这里您可以查看我提到的所有内容,而无需编写任何代码。

结论 (Conclusion)



That means “this is the end.”


But  you didn't need me to tell you that, did you? Not after you went  through the trouble of getting a Japanese character dataset, using the  learning rate finder, training a ResNet using modern best practices, and  watching your model rise to glory using real-time monitoring in the  cloud.

但是你不需要我告诉你,对吗? 在您遇到了获取日语字符数据集,使用学习率查找器,使用现代最佳实践训练ResNet以及使用云中的实时监控来观察模型升华的麻烦之后。

Yep, in about 20 minutes, you actually did all of that! Give yourself a pat on the back.

是的,在大约20分钟内,您实际上完成了所有这些操作! 拍拍自己的背部。

And please, go watch some Dragonball.



计算机视觉技术 图像分类


