Technology

Automating social media posting for my new blogposts

I love blogging and I've benefitted a lot from what it's done for me ever since I started my first Geocities page in the mid 1990s. I maintain a technical blog at mlops.systems and a somewhat less technical blog at alexstrick.com/blog, though hope at some point to merge these together.

In the past I would have been content with ensuring that my blog published an RSS feed and known that anyone wanting to follow what I was writing could do so simply by connecting their feed reader and subscribing. I've become more conscious in recent years of a healthy brew of ambivalence, ignorance or even outright hostility to even the idea of RSS feeds and readers. It seems many people don't have RSS as an essential part of their informational hygiene any more. (I'll put my sadness / confusion about this to one side for now.)

And if I love blogging, I really dislike having to post my new blog posts to social media one by one, coming up with some catchy yet not overtly breathless summary of what I wrote, since this is apparently what many people use instead of RSS.

I've been grumbling under my breath about this situation for this for a few years now, but when ChatGPT came out it seemed like an obvious use: summarise my blogpost and repost to all my social media accounts taking into account their particular needs. (Mastodon uses hashtags more than the others, whereas LinkedIn posts can be a bit longer, vs Twitter which needs to be a bit shorter and so on.)

I held off, thinking I'd want to set up some system fully under my control involving serverless function calls and so on, but then I was reminded that I already use Zapier for some other administrative tasks. So this afternoon I set up and turned on some automation for social media posting to my Mastodon, Twitter and LinkedIn accounts. Posting happens at one step removed since I queue my posts in Buffer so that they go out at a time when people are more likely to see them. I apologise / don't apologise for this. My blog writings remain wholly un-automated; it would completely remove the point of 'learning through writing' if I were to automate the things that I blog about. My social media postings (just one post per blogpost so as not to spam you all) are from now on automated. As an additional courtesy / discourtesy, I've tweaked the prompt such that the social media posts should always read just slightly 'off' and will be labelled with an #automated hashtag.

On the interpretability of models

A common criticism of deep learning models is that they are 'black boxes'. You put data in one end as your inputs, the argument goes, and you get some predictions or results out the other end, but you have no idea why the model gave your those predictions.

Ways of interpreting learning in computer vision models - credit https://thedatascientist.com/what-deep-learning-is-and-isnt/

This has something to do with how neural networks work: you often have many layers that are busy with the 'learning', and each successive layer may be able to interpret or recognise more features or greater levels of abstraction. In the above image, you can get a sense of how the earlier layers (on the left) are learning basic contour features and then these get abstracted together in more general face features and so on.

Some of this also has to do with the fact that when you train your model, you do so assuming that the model will be used on data that the model hasn't seen. In this (common) use case, it becomes a bit harder to say exactly why a certain prediction was made, though there are a lot of ways we can start to open up the black box.

Arthur Samuel and the 'Frontier of Automation'

The use of neural networks / architectures is a powerful pattern, but it's worth remembering that this pattern is part of the broader category of machine learning. (You can think of 'deep learning' as a rebranding of neural networks or what was once more commonly referred to as connectionism).

In a classic essay published in 1962, an IBM researcher called Arthur Samuel proposed a way to have computers 'learn', a different process from how we normally code things up imperatively (see my previous post for more on this):

"Suppose we arrange for some automatic means of testing the effectiveness of any current weight assignment in terms of actual performance and provide a mechanism for altering the weight assignment so as to maximise the performance. We need not go into the details of such a procedure to see that it could be made entirely automatic and to see that a machine so programmed would "learn" from its experience"

Within this essay and this quote specifically, we can find some of the key building blocks of machine learning:

ScreenShot 2021-05-26 at 10.41.29.png

We have our inputs (our data) and our weights. Our weights (or the weight assignments) are variables that allow for different configurations and behaviours of our model. Our results are what the computer has assumed based on the weights and the model, and we have some kind of a metric (our performance) to judge whether this model was accurate or not. The computer then updates the weights based on that performance, tweaking it such that it tries to get better performance.

This is a slightly amended version which language or jargon that are more commonly found today. As you might expect would happen, the language used in the 1960s is in many cases different from what gets used today:

ScreenShot 2021-05-26 at 10.41.37.png

The main difference here is that we have some labels which are used to know whether the predictions are correct or not. The loss is a way of measuring the performance of our model that is suited for updating our parameters (that used to be referred to as weights).

Telling Cats from Dogs

One of the main ways that using neural networks to train models is different from traditional (imperative) programming can be illustrated with a specific task: let's say you want to use a computer to tell you whether any particular photo you give it is a cat or a dog.

An imperative approach might be to make a mega list of certain kinds of features that cats and dogs have, and try to encode the differences into some kind of list of logical features. But even knowing how to tell the computer how it should recognise those features is a potentially massive project. How would you even go about that?

Instead, with neural network layers, we turn that pattern on its side. Instead of this:

ScreenShot 2021-05-26 at 09.28.57.png

We have something closer to this:

So the neural networks are learning on the basis of data provided to it — you give it a bunch of images which you've pre-labelled to say 'this one is a cat and that one is a dog' and so on.

If you use transfer learning, too, you even can use a pretrained model (which is already pretty good at recognising features from images). You can then fine-tune that model to get really good at the specific task you need it to do. (Note, that's exactly what you do in the first chapter of Howard/Gugger's Deep Learning for Coders).

Small, unexpectedly powerful boxes

Graphics Processing Units or GPUs are what your computer uses to quickly display your screen. Most computers (desktop or laptop) have one of these, and they are used to good effect to keep the screen refreshed and display everything in effectively realtime speed. The world of gaming is also, perhaps unsurprisingly, quite dependent on fast GPU performance, with Nvidia as the lead provider of these hardware units.

nvidia gpu

It was discovered a while back that GPUs are also pretty great at performing certain kinds of computation at incredible speed. Certain calculations which, if you would do them on a standard CPU, would take ages to complete are much faster when run on a GPU. For this reason, they're the hardware of choice for training deep learning models.

GPUs also happen to be heavily used (for similar reasons) for cryptocurrency mining and accordingly there has been a worldwide shortage for some time. Between the crypto bros and the deep learning practitioners, the price got inflated for a while. Nvidia has made some attempts to limit crypto miners from using their hardware, but to inconclusive effect.

Held back by misunderstanding

The field of deep learning seems to have had a rough journey into public consciousness and adoption. In particular, two theoretical misunderstandings lead to funding being pulled and energy and attention moving away from the field:

  1. Minsky/Papert's book Perceptrons showed how a neural network using only one layer was unable to learn some critical functions like XOR. Later in the same book, they show how using more layers addresses this problem completely, but for some reason the 'fix' to the problem was ignored and people fixated on the problem with using a single layer and its drawbacks.
  2. By the 1980s, many people were using two layers in their neural networks, and while this did solve the problems identified in 'Perceptrons' and people were using neural networks to solve real problems, it was unwieldy in that form. Yes, you could theoretically approximate any mathematical function with two layers, but it was impractical and slow to do so. People thought that this meant that the principle was broken, whereas really the misunderstanding was that two layers were just not enough and that the number of layers could continue to increase.

These are two key misunderstandings identified by the Howard/Gugger short introduction and I'm sure I'll read more of these in Genius Makers. It's amazing, but not entirely surprising, that a non-generous and unimaginative misreading of the literature could be responsible for such an effective trashing of a research path.

PDP: a precursor to modern neural networks?

Parallel Distributed Processing: Explorations in the Microstructure of Cognition, a multi-volume publication by David Rumelhart, James McClelland and the PDP Research Group, was released in 1968 and is recognised as one of the most important works relating to neural networks.

PDP (1968

They lay out eight features necessary to perform what they called 'parallel distributed processing' (which I suppose you can think of as a sort of precursor to modern-day deep learning):

  • processing units
  • a state of activation
  • an output function for each processing unit
  • a pattern of connectivity among units
  • a propagation rule (for propagating what is learned through the network)
  • an activation rule
  • a learning rule (where 'patterns of connectivity are modified by experience')
  • an environment in which the system operates

I haven't read the book, and I don't fully understand all these different pieces, but it isn't particularly hard to see the pattern of what would later come to be handled by modern-day neural networks in these features. The vocabulary used to describe it is slightly different, but you have the connectivity between neurons, and you have a process through which you update the layers…

This feels like a book that would reward returning to for a proper in-depth read later on in my studies.

Rosenblatt's Mark I Perceptron

I've now read a little about Rosenblatt's Perceptron in two different places: in the Howard/Gugger Deep Learning book, and also in Cade Metz' Genius Makers.

The Mark I Perceptron

Built in 1958, it is usually described as the first machine which was based on the principle of the artificial neutron. It used a single layer in this initial configuration, and even in that simple way you could already see glimpses of where it might go.

Unfortunately, Marvin Minsky and Seymour Papert's apparently perceptive but also damning assessment of the perceptron as a technology without a future ushered in the first of the so-called 'AI winters', and the idea of using neural networks was buried for several years.

Thankfully, some ignored the herd and stuck with it.

Deep Learning: Best in Show?

Deep Learning is an incredibly powerful technology and there are a number of (focused / specific) areas where it already surpasses human-level abilities. Here are some examples:

  1. Translation: If you haven't been watching closely, the quality of Google Translate translations has really been improved in recent years. This 2016 story is a little dated, but it explains how they made a big push a few years back and it continues to improve as the technology improves.
  2. X-ray interpretation: In a matter of a few years, the performance of Deep Learning in reading and making diagnoses from x-rays has surpassed top radiology practitioners. See how DeepMind raised the bar on identifying breast cancer.
  3. Playing Go: Watch the AlphaGo documentary if you haven't already.
  4. Protein Folding: Check out AlphaFold from last November, where DeepMind blasted through a notoriously complicated problem in biology.
  5. Colourising images: A former fast.ai student, Jason Antic, made great progress with his work on DeOldify.

The really great thing about the fastai course is how it successfully has managed to democratise Deep Learning as a technology. I always enjoy reading about niche areas where specific burning problems were solved because someone took the opportunity to educate themselves.

Removing Barriers: Deep Learning Edition

I've been re-reading Jeremy Howard & Sylvain Gugger's Deep Learning for Coders with Fastai and PyTorch and I really appreciate the reminder that a lot of barriers to entry into the Deep Learning space can be productively put to one side.

Gatekeepers make four big claims:

  1. You need lots of maths to use Deep Learning to solve problems
  2. You need lots of data (think prodigious, Google-sized quantities) to use Deep Learning
  3. You need lots of expensive computers and custom hardware to use Deep Learning
  4. You need a PhD, preferably in Maths or Physics or some computation-heavy science

Needless to say, it's not that maths or more data or better hardware isn't maybe going to help or improve your experience. But to say that if you don't have those things then you shouldn't start is also (seemingly) inaccurate or not helpful.

If you are a domain expert in something that has nothing to do with Deep Learning or data science, you probably have a lot of problems that are like low-hanging fruit in terms of your ability to use powerful techniques like Deep Learning to solve them.

Using APIs to make things happen on the web

Towards the end of the Launch School core syllabus, we start to work with API calls. These allow us to pass information between servers. It turns out, this is really useful to be able to do.

Many if not most of the things you do online involve API calls. That little widget on the side of the web page that shows today's weather: an API call. Even things like Siri which aren't exactly web pages: using API calls.

Being able to make those calls and to interact with the services available through the internet gives users all sorts of power. The creativity comes, then, in how these things are all combined together.

That said, the dreams of the connected web have been somewhat oversold in the past. What are we on now, web 4.0?.

From a technical perspective, in this part of the course I enjoyed seeing how simple text in the form of JSON objects was behind so much of our communication and interactivity online these days. (All the more reason to have more secure ways of exchanging those plain text data).

I was also conscious of how much creativity and widened thinking has gone into expanding the possibilities of what HTML, CSS and a bit of JavaScript can do over the internet. Some of these capabilities mean that we're straining the possibilities in this area or that, but above all I take away some inspiration in how people made do with what they had instead of feeling like they needed to reinvent the wheel.

Using CSS selectors with JavaScript DOM methods

I've been using JavaScript methods that interact with the DOM all week as part of my Launch School course. Among them, document.querySelector() and document.querySelectorAll() seem really useful. I realised I didn't fully understand the selectors that you were supposed to pass in as an argument, so I'm writing up some of what I discovered about them here. (See here for the documentation).

The simple part is when you have a single selector. Three important selectors are:

  • id selectors (#) — so if the html contains an id attribute of 'some-id', then you could write #some-id.
  • class selectors (.) — so if the class was 'special-style', then you can write .special-style.
  • tag selectors — for these, you just write the page itself

When combining tags, there is a complex set of options depending on whether things are siblings or descendants or children. For the most part that is TMI. The important ones to remember are:

  • descendant selector combinations — place a space between elements — so if you want to select all <div> tags that descend (however far down in the tree) from a <p> tag, then you can write p div.
  • child selector — place a > between the elements — this allows you to find elements that are direct children (i.e. no intermediary levels) of other elements. p > div will find all div elements that are the direct children of paragraph elements.

(For a more complete exploration of this topic, specifically the combination of selectors, read this blogpost.)