deeplearning

On the interpretability of models

A common criticism of deep learning models is that they are 'black boxes'. You put data in one end as your inputs, the argument goes, and you get some predictions or results out the other end, but you have no idea why the model gave your those predictions.

Ways of interpreting learning in computer vision models - credit https://thedatascientist.com/what-deep-learning-is-and-isnt/

This has something to do with how neural networks work: you often have many layers that are busy with the 'learning', and each successive layer may be able to interpret or recognise more features or greater levels of abstraction. In the above image, you can get a sense of how the earlier layers (on the left) are learning basic contour features and then these get abstracted together in more general face features and so on.

Some of this also has to do with the fact that when you train your model, you do so assuming that the model will be used on data that the model hasn't seen. In this (common) use case, it becomes a bit harder to say exactly why a certain prediction was made, though there are a lot of ways we can start to open up the black box.

FastAI Lesson Zero: video notes

[These are mainly notes for myself, based off Jeremy Howard’s ‘Lesson 0’ video that was recently posted. It doesn’t capture the entirety of what was said during the course, but it includes pieces that I felt are relevant to me now and that might be relevant to me in the future.]

  • decide when you’re studying
    • be precise about how much time you’re going to spend
    • think about how it’s going to fit into your day or your life
    • give yourself deadlines and goals, perhaps, but also don’t worry if disruptions happen.
    • Mainly make sure that if something does come up, make sure you get back on the horse and keep going. (Tenacity counts for a lot)
  • Finish. The. Course.
    • make a commitment to complete the course, and make sure you actually do that.
    • If you’re attending the course and working through it, you should follow through on your original commitment and actually work through the course.
  • Finish a Project
    • build a project and make it really great.
    • You’ll probably have several projects here and there that you work on during the course of the fastai course, but at a minimum make sure you pick one of those and make it really great.
    • (It doesn’t have to be unique or world-changing. Even replicating something that’s already in existence can still be worth it).
  • Find good role models
  • Learn by doing. Use what you learn and figure out the rest as you go. (Don’t get paralyzed by trying to learn ‘pre-requisites’ like complex mathematics topics, esp since most of them aren’t actually needed to become a practitioner).
  • Share and communicate your work
    • (Jeremy doesn’t mention the book, but I’ll insert here that the book “Show Your Work” by Austin Kleon is a great starter on this point).
    • If you consistently blog during your studies, at the end of it you’ll likely have a huge collection of artefacts of that study, showing what you’ve learned and accomplished.
    • Alongside that, being a good citizen and contributing in the forums etc is also a really solid way to extend whatever knowledge you have to others, and quite possibly cement things in your own mind as you reply.
  • How to do a lesson
    • watch the video / read the chapter
    • Run the notebook & experiment — play around with things + make sure you actually understand what’s happening
    • Reproduce the notebook from scratch — (and really start with nothing here, and try to reimplement whatever was done during the lesson. From previous experience, this work will be hard, but it’s super worth it. Recall learning is the best kind of learning)
    • Repeat with a different dataset — use the techniques you learned in the course on a dataset of your own / or solve some related problem using these techniques
  • Using a notebook server vs a full linux server
    • the notebook server allows you to get going much faster
    • A full linux server is more ‘your own’ and you get to also practice a bunch of other not-specifically-deep-learning skills along the way
    • With the fastsetup library, Jeremy has made getting going with an EC2 instance pretty easy.
    • the video spends a fair amount of time showing how to do this with Colab Noteboks and a AWS EC2 instance. Refer to the FastAI website and the full video for more details.
  • Get better as a developer
    • just doing the course, you’ll also work on your development skills along the way
    • Two important things to do to help with this:
      • Read a lot of code
      • Write a lot of code
  • Start with a simple baseline & get a basic end-to-end solution up and running
    • When you’re working on a project, it’s a really good idea to start with a naive / super-basic baseline so that you know whether you’re making progress or whether you’re achieving anything with the work you’re doing.
    • Successful ML projects that Jeremy has seen start with the simplest possible end-to-end solution and then incrementally grow from there.
    • The work of getting your pipeline working / your data imported etc will take a bit of time, and if you get that all sorted upfront it’ll help you focus on the actual work you want to be focused on.
  • (At some point during the course) join a Kaggle competition and make a serious attempt to do your best
    • just getting a model on the leaderboard tests your knowledge and your skills
    • just work regularly on things, show up every day, try to make your model a little better each day
    • Do these things:
 
 
  • For getting a job in the space
    • having a public-facing portfolio of writings and projects will take you a really long way
    • Some companies are more interested in people having the right credentials etc and will never choose you.
    • Startups are a great place where this matters less.
  • Try to take the second course
    • The first course gets you going as a practitioner of deep learning, but the second course allows you to implement algorithms and models from scratch and digs far more into the depths of the subject.
    • Jeremy wishes more people would take part two + encourages them to do so.
  • The fastsetup library is great for installing everything on a Ubuntu machine (like an AWS EC2 instance)
  • Experiment tracking software
    • The two big players are TensorBoard and Weights & Biases.
    • Jeremy doesn’t use these. Finds it too tempting to spend your time watching your models train instead of doing something else that is probably more valuable.
    • There are some cases where it might help to use this software.
    • Weights & Biases seems like a good company to work for & they’ve hired FastAI grads in the past.

Arthur Samuel and the 'Frontier of Automation'

The use of neural networks / architectures is a powerful pattern, but it's worth remembering that this pattern is part of the broader category of machine learning. (You can think of 'deep learning' as a rebranding of neural networks or what was once more commonly referred to as connectionism).

In a classic essay published in 1962, an IBM researcher called Arthur Samuel proposed a way to have computers 'learn', a different process from how we normally code things up imperatively (see my previous post for more on this):

"Suppose we arrange for some automatic means of testing the effectiveness of any current weight assignment in terms of actual performance and provide a mechanism for altering the weight assignment so as to maximise the performance. We need not go into the details of such a procedure to see that it could be made entirely automatic and to see that a machine so programmed would "learn" from its experience"

Within this essay and this quote specifically, we can find some of the key building blocks of machine learning:

ScreenShot 2021-05-26 at 10.41.29.png

We have our inputs (our data) and our weights. Our weights (or the weight assignments) are variables that allow for different configurations and behaviours of our model. Our results are what the computer has assumed based on the weights and the model, and we have some kind of a metric (our performance) to judge whether this model was accurate or not. The computer then updates the weights based on that performance, tweaking it such that it tries to get better performance.

This is a slightly amended version which language or jargon that are more commonly found today. As you might expect would happen, the language used in the 1960s is in many cases different from what gets used today:

ScreenShot 2021-05-26 at 10.41.37.png

The main difference here is that we have some labels which are used to know whether the predictions are correct or not. The loss is a way of measuring the performance of our model that is suited for updating our parameters (that used to be referred to as weights).

Telling Cats from Dogs

One of the main ways that using neural networks to train models is different from traditional (imperative) programming can be illustrated with a specific task: let's say you want to use a computer to tell you whether any particular photo you give it is a cat or a dog.

An imperative approach might be to make a mega list of certain kinds of features that cats and dogs have, and try to encode the differences into some kind of list of logical features. But even knowing how to tell the computer how it should recognise those features is a potentially massive project. How would you even go about that?

Instead, with neural network layers, we turn that pattern on its side. Instead of this:

ScreenShot 2021-05-26 at 09.28.57.png

We have something closer to this:

So the neural networks are learning on the basis of data provided to it — you give it a bunch of images which you've pre-labelled to say 'this one is a cat and that one is a dog' and so on.

If you use transfer learning, too, you even can use a pretrained model (which is already pretty good at recognising features from images). You can then fine-tune that model to get really good at the specific task you need it to do. (Note, that's exactly what you do in the first chapter of Howard/Gugger's Deep Learning for Coders).

Small, unexpectedly powerful boxes

Graphics Processing Units or GPUs are what your computer uses to quickly display your screen. Most computers (desktop or laptop) have one of these, and they are used to good effect to keep the screen refreshed and display everything in effectively realtime speed. The world of gaming is also, perhaps unsurprisingly, quite dependent on fast GPU performance, with Nvidia as the lead provider of these hardware units.

nvidia gpu

It was discovered a while back that GPUs are also pretty great at performing certain kinds of computation at incredible speed. Certain calculations which, if you would do them on a standard CPU, would take ages to complete are much faster when run on a GPU. For this reason, they're the hardware of choice for training deep learning models.

GPUs also happen to be heavily used (for similar reasons) for cryptocurrency mining and accordingly there has been a worldwide shortage for some time. Between the crypto bros and the deep learning practitioners, the price got inflated for a while. Nvidia has made some attempts to limit crypto miners from using their hardware, but to inconclusive effect.

Held back by misunderstanding

The field of deep learning seems to have had a rough journey into public consciousness and adoption. In particular, two theoretical misunderstandings lead to funding being pulled and energy and attention moving away from the field:

  1. Minsky/Papert's book Perceptrons showed how a neural network using only one layer was unable to learn some critical functions like XOR. Later in the same book, they show how using more layers addresses this problem completely, but for some reason the 'fix' to the problem was ignored and people fixated on the problem with using a single layer and its drawbacks.
  2. By the 1980s, many people were using two layers in their neural networks, and while this did solve the problems identified in 'Perceptrons' and people were using neural networks to solve real problems, it was unwieldy in that form. Yes, you could theoretically approximate any mathematical function with two layers, but it was impractical and slow to do so. People thought that this meant that the principle was broken, whereas really the misunderstanding was that two layers were just not enough and that the number of layers could continue to increase.

These are two key misunderstandings identified by the Howard/Gugger short introduction and I'm sure I'll read more of these in Genius Makers. It's amazing, but not entirely surprising, that a non-generous and unimaginative misreading of the literature could be responsible for such an effective trashing of a research path.

PDP: a precursor to modern neural networks?

Parallel Distributed Processing: Explorations in the Microstructure of Cognition, a multi-volume publication by David Rumelhart, James McClelland and the PDP Research Group, was released in 1968 and is recognised as one of the most important works relating to neural networks.

PDP (1968

They lay out eight features necessary to perform what they called 'parallel distributed processing' (which I suppose you can think of as a sort of precursor to modern-day deep learning):

  • processing units
  • a state of activation
  • an output function for each processing unit
  • a pattern of connectivity among units
  • a propagation rule (for propagating what is learned through the network)
  • an activation rule
  • a learning rule (where 'patterns of connectivity are modified by experience')
  • an environment in which the system operates

I haven't read the book, and I don't fully understand all these different pieces, but it isn't particularly hard to see the pattern of what would later come to be handled by modern-day neural networks in these features. The vocabulary used to describe it is slightly different, but you have the connectivity between neurons, and you have a process through which you update the layers…

This feels like a book that would reward returning to for a proper in-depth read later on in my studies.

Rosenblatt's Mark I Perceptron

I've now read a little about Rosenblatt's Perceptron in two different places: in the Howard/Gugger Deep Learning book, and also in Cade Metz' Genius Makers.

The Mark I Perceptron

Built in 1958, it is usually described as the first machine which was based on the principle of the artificial neutron. It used a single layer in this initial configuration, and even in that simple way you could already see glimpses of where it might go.

Unfortunately, Marvin Minsky and Seymour Papert's apparently perceptive but also damning assessment of the perceptron as a technology without a future ushered in the first of the so-called 'AI winters', and the idea of using neural networks was buried for several years.

Thankfully, some ignored the herd and stuck with it.

Deep Learning: Best in Show?

Deep Learning is an incredibly powerful technology and there are a number of (focused / specific) areas where it already surpasses human-level abilities. Here are some examples:

  1. Translation: If you haven't been watching closely, the quality of Google Translate translations has really been improved in recent years. This 2016 story is a little dated, but it explains how they made a big push a few years back and it continues to improve as the technology improves.
  2. X-ray interpretation: In a matter of a few years, the performance of Deep Learning in reading and making diagnoses from x-rays has surpassed top radiology practitioners. See how DeepMind raised the bar on identifying breast cancer.
  3. Playing Go: Watch the AlphaGo documentary if you haven't already.
  4. Protein Folding: Check out AlphaFold from last November, where DeepMind blasted through a notoriously complicated problem in biology.
  5. Colourising images: A former fast.ai student, Jason Antic, made great progress with his work on DeOldify.

The really great thing about the fastai course is how it successfully has managed to democratise Deep Learning as a technology. I always enjoy reading about niche areas where specific burning problems were solved because someone took the opportunity to educate themselves.

Removing Barriers: Deep Learning Edition

I've been re-reading Jeremy Howard & Sylvain Gugger's Deep Learning for Coders with Fastai and PyTorch and I really appreciate the reminder that a lot of barriers to entry into the Deep Learning space can be productively put to one side.

Gatekeepers make four big claims:

  1. You need lots of maths to use Deep Learning to solve problems
  2. You need lots of data (think prodigious, Google-sized quantities) to use Deep Learning
  3. You need lots of expensive computers and custom hardware to use Deep Learning
  4. You need a PhD, preferably in Maths or Physics or some computation-heavy science

Needless to say, it's not that maths or more data or better hardware isn't maybe going to help or improve your experience. But to say that if you don't have those things then you shouldn't start is also (seemingly) inaccurate or not helpful.

If you are a domain expert in something that has nothing to do with Deep Learning or data science, you probably have a lot of problems that are like low-hanging fruit in terms of your ability to use powerful techniques like Deep Learning to solve them.