Tech

Automating social media posting for my new blogposts

I love blogging and I've benefitted a lot from what it's done for me ever since I started my first Geocities page in the mid 1990s. I maintain a technical blog at mlops.systems and a somewhat less technical blog at alexstrick.com/blog, though hope at some point to merge these together.

In the past I would have been content with ensuring that my blog published an RSS feed and known that anyone wanting to follow what I was writing could do so simply by connecting their feed reader and subscribing. I've become more conscious in recent years of a healthy brew of ambivalence, ignorance or even outright hostility to even the idea of RSS feeds and readers. It seems many people don't have RSS as an essential part of their informational hygiene any more. (I'll put my sadness / confusion about this to one side for now.)

And if I love blogging, I really dislike having to post my new blog posts to social media one by one, coming up with some catchy yet not overtly breathless summary of what I wrote, since this is apparently what many people use instead of RSS.

I've been grumbling under my breath about this situation for this for a few years now, but when ChatGPT came out it seemed like an obvious use: summarise my blogpost and repost to all my social media accounts taking into account their particular needs. (Mastodon uses hashtags more than the others, whereas LinkedIn posts can be a bit longer, vs Twitter which needs to be a bit shorter and so on.)

I held off, thinking I'd want to set up some system fully under my control involving serverless function calls and so on, but then I was reminded that I already use Zapier for some other administrative tasks. So this afternoon I set up and turned on some automation for social media posting to my Mastodon, Twitter and LinkedIn accounts. Posting happens at one step removed since I queue my posts in Buffer so that they go out at a time when people are more likely to see them. I apologise / don't apologise for this. My blog writings remain wholly un-automated; it would completely remove the point of 'learning through writing' if I were to automate the things that I blog about. My social media postings (just one post per blogpost so as not to spam you all) are from now on automated. As an additional courtesy / discourtesy, I've tweaked the prompt such that the social media posts should always read just slightly 'off' and will be labelled with an #automated hashtag.

On the interpretability of models

A common criticism of deep learning models is that they are 'black boxes'. You put data in one end as your inputs, the argument goes, and you get some predictions or results out the other end, but you have no idea why the model gave your those predictions.

Ways of interpreting learning in computer vision models - credit https://thedatascientist.com/what-deep-learning-is-and-isnt/

This has something to do with how neural networks work: you often have many layers that are busy with the 'learning', and each successive layer may be able to interpret or recognise more features or greater levels of abstraction. In the above image, you can get a sense of how the earlier layers (on the left) are learning basic contour features and then these get abstracted together in more general face features and so on.

Some of this also has to do with the fact that when you train your model, you do so assuming that the model will be used on data that the model hasn't seen. In this (common) use case, it becomes a bit harder to say exactly why a certain prediction was made, though there are a lot of ways we can start to open up the black box.

Small, unexpectedly powerful boxes

Graphics Processing Units or GPUs are what your computer uses to quickly display your screen. Most computers (desktop or laptop) have one of these, and they are used to good effect to keep the screen refreshed and display everything in effectively realtime speed. The world of gaming is also, perhaps unsurprisingly, quite dependent on fast GPU performance, with Nvidia as the lead provider of these hardware units.

nvidia gpu

It was discovered a while back that GPUs are also pretty great at performing certain kinds of computation at incredible speed. Certain calculations which, if you would do them on a standard CPU, would take ages to complete are much faster when run on a GPU. For this reason, they're the hardware of choice for training deep learning models.

GPUs also happen to be heavily used (for similar reasons) for cryptocurrency mining and accordingly there has been a worldwide shortage for some time. Between the crypto bros and the deep learning practitioners, the price got inflated for a while. Nvidia has made some attempts to limit crypto miners from using their hardware, but to inconclusive effect.

Held back by misunderstanding

The field of deep learning seems to have had a rough journey into public consciousness and adoption. In particular, two theoretical misunderstandings lead to funding being pulled and energy and attention moving away from the field:

  1. Minsky/Papert's book Perceptrons showed how a neural network using only one layer was unable to learn some critical functions like XOR. Later in the same book, they show how using more layers addresses this problem completely, but for some reason the 'fix' to the problem was ignored and people fixated on the problem with using a single layer and its drawbacks.
  2. By the 1980s, many people were using two layers in their neural networks, and while this did solve the problems identified in 'Perceptrons' and people were using neural networks to solve real problems, it was unwieldy in that form. Yes, you could theoretically approximate any mathematical function with two layers, but it was impractical and slow to do so. People thought that this meant that the principle was broken, whereas really the misunderstanding was that two layers were just not enough and that the number of layers could continue to increase.

These are two key misunderstandings identified by the Howard/Gugger short introduction and I'm sure I'll read more of these in Genius Makers. It's amazing, but not entirely surprising, that a non-generous and unimaginative misreading of the literature could be responsible for such an effective trashing of a research path.

PDP: a precursor to modern neural networks?

Parallel Distributed Processing: Explorations in the Microstructure of Cognition, a multi-volume publication by David Rumelhart, James McClelland and the PDP Research Group, was released in 1968 and is recognised as one of the most important works relating to neural networks.

PDP (1968

They lay out eight features necessary to perform what they called 'parallel distributed processing' (which I suppose you can think of as a sort of precursor to modern-day deep learning):

  • processing units
  • a state of activation
  • an output function for each processing unit
  • a pattern of connectivity among units
  • a propagation rule (for propagating what is learned through the network)
  • an activation rule
  • a learning rule (where 'patterns of connectivity are modified by experience')
  • an environment in which the system operates

I haven't read the book, and I don't fully understand all these different pieces, but it isn't particularly hard to see the pattern of what would later come to be handled by modern-day neural networks in these features. The vocabulary used to describe it is slightly different, but you have the connectivity between neurons, and you have a process through which you update the layers…

This feels like a book that would reward returning to for a proper in-depth read later on in my studies.

Rosenblatt's Mark I Perceptron

I've now read a little about Rosenblatt's Perceptron in two different places: in the Howard/Gugger Deep Learning book, and also in Cade Metz' Genius Makers.

The Mark I Perceptron

Built in 1958, it is usually described as the first machine which was based on the principle of the artificial neutron. It used a single layer in this initial configuration, and even in that simple way you could already see glimpses of where it might go.

Unfortunately, Marvin Minsky and Seymour Papert's apparently perceptive but also damning assessment of the perceptron as a technology without a future ushered in the first of the so-called 'AI winters', and the idea of using neural networks was buried for several years.

Thankfully, some ignored the herd and stuck with it.

Deep Learning: Best in Show?

Deep Learning is an incredibly powerful technology and there are a number of (focused / specific) areas where it already surpasses human-level abilities. Here are some examples:

  1. Translation: If you haven't been watching closely, the quality of Google Translate translations has really been improved in recent years. This 2016 story is a little dated, but it explains how they made a big push a few years back and it continues to improve as the technology improves.
  2. X-ray interpretation: In a matter of a few years, the performance of Deep Learning in reading and making diagnoses from x-rays has surpassed top radiology practitioners. See how DeepMind raised the bar on identifying breast cancer.
  3. Playing Go: Watch the AlphaGo documentary if you haven't already.
  4. Protein Folding: Check out AlphaFold from last November, where DeepMind blasted through a notoriously complicated problem in biology.
  5. Colourising images: A former fast.ai student, Jason Antic, made great progress with his work on DeOldify.

The really great thing about the fastai course is how it successfully has managed to democratise Deep Learning as a technology. I always enjoy reading about niche areas where specific burning problems were solved because someone took the opportunity to educate themselves.

Removing Barriers: Deep Learning Edition

I've been re-reading Jeremy Howard & Sylvain Gugger's Deep Learning for Coders with Fastai and PyTorch and I really appreciate the reminder that a lot of barriers to entry into the Deep Learning space can be productively put to one side.

Gatekeepers make four big claims:

  1. You need lots of maths to use Deep Learning to solve problems
  2. You need lots of data (think prodigious, Google-sized quantities) to use Deep Learning
  3. You need lots of expensive computers and custom hardware to use Deep Learning
  4. You need a PhD, preferably in Maths or Physics or some computation-heavy science

Needless to say, it's not that maths or more data or better hardware isn't maybe going to help or improve your experience. But to say that if you don't have those things then you shouldn't start is also (seemingly) inaccurate or not helpful.

If you are a domain expert in something that has nothing to do with Deep Learning or data science, you probably have a lot of problems that are like low-hanging fruit in terms of your ability to use powerful techniques like Deep Learning to solve them.

How the Internet Works

The internet. It just works. Understanding exactly how is a bit more complicated than many pieces of engineering. The more you examine the different aspects and parts that make it up, the more you see complexity concealed under the surface.

Visiting this website, for instance: it feels like a trivial thing to do, but there are many different parts making that happen, from the parts that actually transport the bits of data across the physical infrastructure, to the pieces that serve it all to you on a secure connection (ensuring that what I've written hasn't been altered by a third-party).

I've just finished Launch School's LS170 module which takes you a decent way down in the weeds to explain exactly how all of these pieces fit together to make up 'the internet'. So today I thought I'd retransmit some of that as a way of cementing it in my own mind.

At a very abstract level, the internet can be thought of as a network of networks. A network itself is a set of two or more computers which are able to communicate between each other. This could be the computers attached to a home network, or the computers that connect through a central server to a particular Internet Service Provider or ISP.

The internet makes use of a series of 'protocols', shared rules and understandings which have been developed or accreted over time. These protocols allow a computer on the other side of the planet to communicate in a mutually comprehensible manner. (If these shared sets of rules didn't exist, communicating with strangers or sending messages from one server to another would be a lot more difficult).

So once we have this top-down understanding of the internet as a bunch of networks that interact with each other, what, then, is the process by which a web browser in the United Kingdom communicates with a web server in China? Or in other words, if I want to access a website hosted on a Chinese webserver, how does that series of communication steps work to make that happen?

At this point, it's useful to make use of another abstraction: communication across the internet happens across a series of layers. There are several different models for these various layers. Two of the more common models — the OSI model and the TCP/IP model — are represented below:

layered-system-osi-tcp-ip-comparison.png

At the top level — "Application" — you have your website or whatever the user comes into contact with that is being served up to your web browser, let's say. All the layers below that are progressively more and more specialised, which is another way of saying that they become progressively less comprehensible if you were to eavesdrop on the data as it were passing over the wire or through the fibre optic cable.

Let's move through the big pieces of how information is communicated, then, starting at the bottom. (I'll mostly follow the TCP/IP model since it's a bit less granular and allows me to split things up in a way that make sense). This chart will help keep all the pieces in your mind:

layersofinternet.png

Note that each layer has something known as a 'protocol data unit' or PDU. A PDU is usually made up of a combination of a header, payload or chunk of data and an optional footer or trailer. The header and footer contain metadata which allows for the appropriate transmission / decoding etc of the data payload.

The PDU of one layer is used by the layer below or above it to make up its own separate PDU. See the following diagram as an illustration:

encapsulation.png

Physical Layer

Before we get into the realm of protocols, it's worth remembering and reminding ourselves that there is a physical layer on which all the subsequent layers rely. There are some constraints relating to the speed or latency with which data can be transmitted over a network which relate to fixed laws of physics. The most notable of those constraints is the fact of the speed of light.

Ethernet is the protocol that enables communication between devices on a single network. (These devices are also known as 'nodes'). The link layer is the interface between the physical network (i.e. the cables and routers) and the more logical layers above it.

The protocols for this layer are mostly concerned with identifying devices on the network, and moving the data among those devices. On this layer, devices are identified by something called a MAC (Media Access Control) address, which is a permanent address burned into every device at the time of manufacturing.

The PDU of the Ethernet layer is known as a 'frame'. Each frame contains a header (made up of a source address and a destination address), a payload of data, and a footer.

Internet Layer — The Internet Protocol (IPv4 or IPv6)

Moving up a layer, part of the Ethernet frame is what becomes the PDU for the internet or network layer, i.e. a packet.

This internet layer uses something known as the internet protocol which facilitates communication between hosts (i.e. different computers) on different networks. The two main versions of this protocol are known as IPv4 and IPv6. They handle routing of data via IP addressing, and the encapsulation of data into packets.

IPv4 was the de facto standard for addresses on the internet until relatively recently. There are around 4.3 billion possible addresses using this protocol, but we are close to having used up all those addresses now. IPv6 was created for this reason and it allows (through the use of 128-bit addresses) for a massive 340 undecillion (billion billion billion billion) different addresses.

Adoption of IPv6 is increasing, but still slow.

There is a complex system of how data makes its way from one end node on the network, through several other networks, and then on to the destination node. When the data is first transmitted, a full plan of how to reach that destination is not formulated before starting the journey. Rather, the journey is constructed ad hoc as it progresses.

Transport Layer — TCP/UDP

There are a number of different problems that the transport layer exists to solve. Primarily, we want to make sure our data is passed reliably and speedily from one node to another through the network.

TCP and UDP are two protocols which are good at different kinds of communication. If the reliability of data transmission is important to us and we need to make sure that every piece of information is transmitted, then TCP (Transmission Control Protocol) is a good choice. If we don't care about every single piece of information — in the case of streaming a video call, perhaps, or watching a film on Netflix — but rather about the speed and the ability to continuously keep that data stream going, then UDP (User Datagram Protocol) is a better choice.

There are differences between the protocols beyond simply their functionality. We can distinguish between so-called 'connection-oriented' and 'connectionless' protocols. For connection-oriented protocols, a dedicated connection is created for each process or strand of communication. The receiving node or computer listens with its undivided attention. With a connectionless protocol, a single port listens to all incoming communication and has do disambiguate between all the incoming conversations.

TCP is a connection-oriented protocol. It first sends a three-way handshake to establish the connection, then sends the data, and sends a four-way handshake to end the connection. The overhead of having to make these handshakes at the beginning and at the end, it's a fairly costly process in terms of performance, but in many parts of internet communication we really do need all the pieces of information. Just think about an email, for example: it wouldn't be acceptable to receive only 70% of the words, would it?

UDP is a connectionless protocol. It is in some ways a simpler protocol compared to TCP, and this simplicity gives it speed and flexibility; you don't need to make a handshake to start transmitting data. On the negative side, though, it doesn't guarantee message delivery, or provide any kind of congestion avoidance or flow control to stop your receiver from being overwhelmed by the data that's being transmitted.

Application Layer — HTTP

HTTP is the primary communication protocol used on the internet. At the application layer, HTTP provides communication of information to applications. This protocol focuses on the structure of the data rather than just how to deliver it. HTTP has its own syntax rules, where you enclose elements in tags using the < data-preserve-html-node="true" and > symbols.

Communication using HTTP takes the form of response and request pairs. A client will make a 'request' and it'll receive (barring some communication barrier) a 'response'. HTTP is known as a 'stateless' protocol in that each request and response is completely independent of the previous one. Web applications have many tricks up their sleeve to make it seem like the web is stateful, but actually the underlying infrastructure is stateless.

When you make an HTTP request, you must supply a path (e.g. the location of the thing or resource you want to request / access) and a request method. Two of the most common request methods are GET and POST, for requesting and amending things from/on the server respectively. You can also send optional request 'headers' which are bits of meta-data which allow for more complicated requests.

The server is obliged to send a HTTP status code in reply. This code tells you whether the request was completed as expected, or if there were any errors along the way. You'll likely have come across a so-called '404' page. This is referring to the 404 status code indicating that a resource or page wasn't found on the server. If the request was successful, then the response may have a payload or body of data (perhaps a chunk of HTML website text, or an image) alongside some other response headers.

Note that all this information is sent as unencrypted plain text. When you're browsing a vanilla http:// website, all the data sent back and forth is just plain text such that anyone (or any government) can read it. This wasn't such a big issue in the early days of the internet, perhaps, but quite soon it became more of a problem, especially when it came to buying things online, or communicating securely. This is where TLS comes in.

TLS or Transport Layer Security is sometimes also known as SSL. It provides a way to exchange messages securely over an unsecured channel. We can conceptually think of it as occupying the space between the TCP and HTTP protocols (at the session layer of the OSI framework above). TLS offers:

  • encryption (encoding a message so only authorised people can decode it)
  • authentication (verifying the identity of a message sender)
  • integrity (checking whether a message has been interfered with)

Not all three are necessarily needed or used at any one time. We're currently on version 1.3 of TLS.


Whew! That was a lot. There are some really good videos which make the topic slightly less dry. Each of these separate sections are extremely complex, but having a broad overview is useful to be able to disambiguate what's going on when you use the internet.

Mastery-based Learning with Launch School

It’s a new week, so we have a new podcast episode for you. Matt and I spoke with Chris Lee of Launch School about his approach to online education. We discuss the different tradeoffs and considerations that come into play when a curriculum is being put together.

To my mind, mastery-based learning — where you don’t advance to the next stage until you’ve really mastered the topic at hand — really shines for things where you have some kind of longer-term goal. Just because it’s a good approach doesn’t mean it’s easy, though. In Chris’ words:

We started this to try to figure out education. It was not a money making endeavor. So to us, teaching became the engineering problem to solve. I was not a proponent of Mastery Based Learning before Launch School. Mastery Based Learning or Competency Based Learning is not unique to Launch School, it’s a well known pedagogy in academic papers. But it’s really hard to implement.

Think about a physical classroom. Mastery Based Learning means that a student gets to occupy a seat in that classroom for an indefinite amount of time. That’s a really hard promise to make when our schools are tasked to usher through students. It’s not about training students and making sure they understand every topic, but getting people through.

You can download and listen to the episode over on the main Sources & Methods website here.