Automating social media posting for my new blogposts

July 15, 2023 in Productivity, Tech, Useful Tools

I love blogging and I've benefitted a lot from what it's done for me ever since I started my first Geocities page in the mid 1990s. I maintain a technical blog at mlops.systems and a somewhat less technical blog at alexstrick.com/blog, though hope at some point to merge these together.

In the past I would have been content with ensuring that my blog published an RSS feed and known that anyone wanting to follow what I was writing could do so simply by connecting their feed reader and subscribing. I've become more conscious in recent years of a healthy brew of ambivalence, ignorance or even outright hostility to even the idea of RSS feeds and readers. It seems many people don't have RSS as an essential part of their informational hygiene any more. (I'll put my sadness / confusion about this to one side for now.)

And if I love blogging, I really dislike having to post my new blog posts to social media one by one, coming up with some catchy yet not overtly breathless summary of what I wrote, since this is apparently what many people use instead of RSS.

I've been grumbling under my breath about this situation for this for a few years now, but when ChatGPT came out it seemed like an obvious use: summarise my blogpost and repost to all my social media accounts taking into account their particular needs. (Mastodon uses hashtags more than the others, whereas LinkedIn posts can be a bit longer, vs Twitter which needs to be a bit shorter and so on.)

I held off, thinking I'd want to set up some system fully under my control involving serverless function calls and so on, but then I was reminded that I already use Zapier for some other administrative tasks. So this afternoon I set up and turned on some automation for social media posting to my Mastodon, Twitter and LinkedIn accounts. Posting happens at one step removed since I queue my posts in Buffer so that they go out at a time when people are more likely to see them. I apologise / don't apologise for this. My blog writings remain wholly un-automated; it would completely remove the point of 'learning through writing' if I were to automate the things that I blog about. My social media postings (just one post per blogpost so as not to spam you all) are from now on automated. As an additional courtesy / discourtesy, I've tweaked the prompt such that the social media posts should always read just slightly 'off' and will be labelled with an #automated hashtag.

On the interpretability of models

May 28, 2021 in Deep Learning, Tech, Science

A common criticism of deep learning models is that they are 'black boxes'. You put data in one end as your inputs, the argument goes, and you get some predictions or results out the other end, but you have no idea why the model gave your those predictions.

Ways of interpreting learning in computer vision models - credit https://thedatascientist.com/what-deep-learning-is-and-isnt/

This has something to do with how neural networks work: you often have many layers that are busy with the 'learning', and each successive layer may be able to interpret or recognise more features or greater levels of abstraction. In the above image, you can get a sense of how the earlier layers (on the left) are learning basic contour features and then these get abstracted together in more general face features and so on.

Some of this also has to do with the fact that when you train your model, you do so assuming that the model will be used on data that the model hasn't seen. In this (common) use case, it becomes a bit harder to say exactly why a certain prediction was made, though there are a lot of ways we can start to open up the black box.

Small, unexpectedly powerful boxes

May 23, 2021 in Useful Tools, Tech, Deep Learning

Graphics Processing Units or GPUs are what your computer uses to quickly display your screen. Most computers (desktop or laptop) have one of these, and they are used to good effect to keep the screen refreshed and display everything in effectively realtime speed. The world of gaming is also, perhaps unsurprisingly, quite dependent on fast GPU performance, with Nvidia as the lead provider of these hardware units.

nvidia gpu

It was discovered a while back that GPUs are also pretty great at performing certain kinds of computation at incredible speed. Certain calculations which, if you would do them on a standard CPU, would take ages to complete are much faster when run on a GPU. For this reason, they're the hardware of choice for training deep learning models.

GPUs also happen to be heavily used (for similar reasons) for cryptocurrency mining and accordingly there has been a worldwide shortage for some time. Between the crypto bros and the deep learning practitioners, the price got inflated for a while. Nvidia has made some attempts to limit crypto miners from using their hardware, but to inconclusive effect.

Held back by misunderstanding

May 23, 2021 in Deep Learning, Tech

The field of deep learning seems to have had a rough journey into public consciousness and adoption. In particular, two theoretical misunderstandings lead to funding being pulled and energy and attention moving away from the field:

Minsky/Papert's book Perceptrons showed how a neural network using only one layer was unable to learn some critical functions like XOR. Later in the same book, they show how using more layers addresses this problem completely, but for some reason the 'fix' to the problem was ignored and people fixated on the problem with using a single layer and its drawbacks.
By the 1980s, many people were using two layers in their neural networks, and while this did solve the problems identified in 'Perceptrons' and people were using neural networks to solve real problems, it was unwieldy in that form. Yes, you could theoretically approximate any mathematical function with two layers, but it was impractical and slow to do so. People thought that this meant that the principle was broken, whereas really the misunderstanding was that two layers were just not enough and that the number of layers could continue to increase.

These are two key misunderstandings identified by the Howard/Gugger short introduction and I'm sure I'll read more of these in Genius Makers. It's amazing, but not entirely surprising, that a non-generous and unimaginative misreading of the literature could be responsible for such an effective trashing of a research path.

PDP: a precursor to modern neural networks?

May 23, 2021 in Tech, Deep Learning

Parallel Distributed Processing: Explorations in the Microstructure of Cognition, a multi-volume publication by David Rumelhart, James McClelland and the PDP Research Group, was released in 1968 and is recognised as one of the most important works relating to neural networks.

PDP (1968

They lay out eight features necessary to perform what they called 'parallel distributed processing' (which I suppose you can think of as a sort of precursor to modern-day deep learning):

processing units
a state of activation
an output function for each processing unit
a pattern of connectivity among units
a propagation rule (for propagating what is learned through the network)
an activation rule
a learning rule (where 'patterns of connectivity are modified by experience')
an environment in which the system operates

I haven't read the book, and I don't fully understand all these different pieces, but it isn't particularly hard to see the pattern of what would later come to be handled by modern-day neural networks in these features. The vocabulary used to describe it is slightly different, but you have the connectivity between neurons, and you have a process through which you update the layers…

This feels like a book that would reward returning to for a proper in-depth read later on in my studies.

Rosenblatt's Mark I Perceptron

May 23, 2021 in Deep Learning, Coding, Tech, Useful Tools

I've now read a little about Rosenblatt's Perceptron in two different places: in the Howard/Gugger Deep Learning book, and also in Cade Metz' Genius Makers.

The Mark I Perceptron

Built in 1958, it is usually described as the first machine which was based on the principle of the artificial neutron. It used a single layer in this initial configuration, and even in that simple way you could already see glimpses of where it might go.

Unfortunately, Marvin Minsky and Seymour Papert's apparently perceptive but also damning assessment of the perceptron as a technology without a future ushered in the first of the so-called 'AI winters', and the idea of using neural networks was buried for several years.

Thankfully, some ignored the herd and stuck with it.

Deep Learning: Best in Show?

May 23, 2021 in Deep Learning, Useful Tools, Tech, Coding

Deep Learning is an incredibly powerful technology and there are a number of (focused / specific) areas where it already surpasses human-level abilities. Here are some examples:

Translation: If you haven't been watching closely, the quality of Google Translate translations has really been improved in recent years. This 2016 story is a little dated, but it explains how they made a big push a few years back and it continues to improve as the technology improves.
X-ray interpretation: In a matter of a few years, the performance of Deep Learning in reading and making diagnoses from x-rays has surpassed top radiology practitioners. See how DeepMind raised the bar on identifying breast cancer.
Playing Go: Watch the AlphaGo documentary if you haven't already.
Protein Folding: Check out AlphaFold from last November, where DeepMind blasted through a notoriously complicated problem in biology.
Colourising images: A former fast.ai student, Jason Antic, made great progress with his work on DeOldify.

The really great thing about the fastai course is how it successfully has managed to democratise Deep Learning as a technology. I always enjoy reading about niche areas where specific burning problems were solved because someone took the opportunity to educate themselves.

Removing Barriers: Deep Learning Edition

May 23, 2021 in Tech, Deep Learning, Useful Tools

I've been re-reading Jeremy Howard & Sylvain Gugger's Deep Learning for Coders with Fastai and PyTorch and I really appreciate the reminder that a lot of barriers to entry into the Deep Learning space can be productively put to one side.

Gatekeepers make four big claims:

You need lots of maths to use Deep Learning to solve problems
You need lots of data (think prodigious, Google-sized quantities) to use Deep Learning
You need lots of expensive computers and custom hardware to use Deep Learning
You need a PhD, preferably in Maths or Physics or some computation-heavy science

Needless to say, it's not that maths or more data or better hardware isn't maybe going to help or improve your experience. But to say that if you don't have those things then you shouldn't start is also (seemingly) inaccurate or not helpful.

If you are a domain expert in something that has nothing to do with Deep Learning or data science, you probably have a lot of problems that are like low-hanging fruit in terms of your ability to use powerful techniques like Deep Learning to solve them.

How the Internet Works

January 29, 2020 in Coding, Tech

The internet. It just works. Understanding exactly how is a bit more complicated than many pieces of engineering. The more you examine the different aspects and parts that make it up, the more you see complexity concealed under the surface.

Visiting this website, for instance: it feels like a trivial thing to do, but there are many different parts making that happen, from the parts that actually transport the bits of data across the physical infrastructure, to the pieces that serve it all to you on a secure connection (ensuring that what I've written hasn't been altered by a third-party).

I've just finished Launch School's LS170 module which takes you a decent way down in the weeds to explain exactly how all of these pieces fit together to make up 'the internet'. So today I thought I'd retransmit some of that as a way of cementing it in my own mind.

At a very abstract level, the internet can be thought of as a network of networks. A network itself is a set of two or more computers which are able to communicate between each other. This could be the computers attached to a home network, or the computers that connect through a central server to a particular Internet Service Provider or ISP.

The internet makes use of a series of 'protocols', shared rules and understandings which have been developed or accreted over time. These protocols allow a computer on the other side of the planet to communicate in a mutually comprehensible manner. (If these shared sets of rules didn't exist, communicating with strangers or sending messages from one server to another would be a lot more difficult).

So once we have this top-down understanding of the internet as a bunch of networks that interact with each other, what, then, is the process by which a web browser in the United Kingdom communicates with a web server in China? Or in other words, if I want to access a website hosted on a Chinese webserver, how does that series of communication steps work to make that happen?

At this point, it's useful to make use of another abstraction: communication across the internet happens across a series of layers. There are several different models for these various layers. Two of the more common models — the OSI model and the TCP/IP model — are represented below:

At the top level — "Application" — you have your website or whatever the user comes into contact with that is being served up to your web browser, let's say. All the layers below that are progressively more and more specialised, which is another way of saying that they become progressively less comprehensible if you were to eavesdrop on the data as it were passing over the wire or through the fibre optic cable.

Let's move through the big pieces of how information is communicated, then, starting at the bottom. (I'll mostly follow the TCP/IP model since it's a bit less granular and allows me to split things up in a way that make sense). This chart will help keep all the pieces in your mind:

Note that each layer has something known as a 'protocol data unit' or PDU. A PDU is usually made up of a combination of a header, payload or chunk of data and an optional footer or trailer. The header and footer contain metadata which allows for the appropriate transmission / decoding etc of the data payload.

The PDU of one layer is used by the layer below or above it to make up its own separate PDU. See the following diagram as an illustration:

Physical Layer

Before we get into the realm of protocols, it's worth remembering and reminding ourselves that there is a physical layer on which all the subsequent layers rely. There are some constraints relating to the speed or latency with which data can be transmitted over a network which relate to fixed laws of physics. The most notable of those constraints is the fact of the speed of light.

Link Layer — Ethernet

Ethernet is the protocol that enables communication between devices on a single network. (These devices are also known as 'nodes'). The link layer is the interface between the physical network (i.e. the cables and routers) and the more logical layers above it.

The protocols for this layer are mostly concerned with identifying devices on the network, and moving the data among those devices. On this layer, devices are identified by something called a MAC (Media Access Control) address, which is a permanent address burned into every device at the time of manufacturing.

The PDU of the Ethernet layer is known as a 'frame'. Each frame contains a header (made up of a source address and a destination address), a payload of data, and a footer.

Internet Layer — The Internet Protocol (IPv4 or IPv6)

Moving up a layer, part of the Ethernet frame is what becomes the PDU for the internet or network layer, i.e. a packet.

This internet layer uses something known as the internet protocol which facilitates communication between hosts (i.e. different computers) on different networks. The two main versions of this protocol are known as IPv4 and IPv6. They handle routing of data via IP addressing, and the encapsulation of data into packets.

IPv4 was the de facto standard for addresses on the internet until relatively recently. There are around 4.3 billion possible addresses using this protocol, but we are close to having used up all those addresses now. IPv6 was created for this reason and it allows (through the use of 128-bit addresses) for a massive 340 undecillion (billion billion billion billion) different addresses.

Adoption of IPv6 is increasing, but still slow.

There is a complex system of how data makes its way from one end node on the network, through several other networks, and then on to the destination node. When the data is first transmitted, a full plan of how to reach that destination is not formulated before starting the journey. Rather, the journey is constructed ad hoc as it progresses.

Transport Layer — TCP/UDP

There are a number of different problems that the transport layer exists to solve. Primarily, we want to make sure our data is passed reliably and speedily from one node to another through the network.

TCP and UDP are two protocols which are good at different kinds of communication. If the reliability of data transmission is important to us and we need to make sure that every piece of information is transmitted, then TCP (Transmission Control Protocol) is a good choice. If we don't care about every single piece of information — in the case of streaming a video call, perhaps, or watching a film on Netflix — but rather about the speed and the ability to continuously keep that data stream going, then UDP (User Datagram Protocol) is a better choice.

There are differences between the protocols beyond simply their functionality. We can distinguish between so-called 'connection-oriented' and 'connectionless' protocols. For connection-oriented protocols, a dedicated connection is created for each process or strand of communication. The receiving node or computer listens with its undivided attention. With a connectionless protocol, a single port listens to all incoming communication and has do disambiguate between all the incoming conversations.

TCP is a connection-oriented protocol. It first sends a three-way handshake to establish the connection, then sends the data, and sends a four-way handshake to end the connection. The overhead of having to make these handshakes at the beginning and at the end, it's a fairly costly process in terms of performance, but in many parts of internet communication we really do need all the pieces of information. Just think about an email, for example: it wouldn't be acceptable to receive only 70% of the words, would it?

UDP is a connectionless protocol. It is in some ways a simpler protocol compared to TCP, and this simplicity gives it speed and flexibility; you don't need to make a handshake to start transmitting data. On the negative side, though, it doesn't guarantee message delivery, or provide any kind of congestion avoidance or flow control to stop your receiver from being overwhelmed by the data that's being transmitted.

Application Layer — HTTP

HTTP is the primary communication protocol used on the internet. At the application layer, HTTP provides communication of information to applications. This protocol focuses on the structure of the data rather than just how to deliver it. HTTP has its own syntax rules, where you enclose elements in tags using the < data-preserve-html-node="true" and > symbols.

Communication using HTTP takes the form of response and request pairs. A client will make a 'request' and it'll receive (barring some communication barrier) a 'response'. HTTP is known as a 'stateless' protocol in that each request and response is completely independent of the previous one. Web applications have many tricks up their sleeve to make it seem like the web is stateful, but actually the underlying infrastructure is stateless.

When you make an HTTP request, you must supply a path (e.g. the location of the thing or resource you want to request / access) and a request method. Two of the most common request methods are GET and POST, for requesting and amending things from/on the server respectively. You can also send optional request 'headers' which are bits of meta-data which allow for more complicated requests.

The server is obliged to send a HTTP status code in reply. This code tells you whether the request was completed as expected, or if there were any errors along the way. You'll likely have come across a so-called '404' page. This is referring to the 404 status code indicating that a resource or page wasn't found on the server. If the request was successful, then the response may have a payload or body of data (perhaps a chunk of HTML website text, or an image) alongside some other response headers.

Note that all this information is sent as unencrypted plain text. When you're browsing a vanilla http:// website, all the data sent back and forth is just plain text such that anyone (or any government) can read it. This wasn't such a big issue in the early days of the internet, perhaps, but quite soon it became more of a problem, especially when it came to buying things online, or communicating securely. This is where TLS comes in.

TLS or Transport Layer Security is sometimes also known as SSL. It provides a way to exchange messages securely over an unsecured channel. We can conceptually think of it as occupying the space between the TCP and HTTP protocols (at the session layer of the OSI framework above). TLS offers:

encryption (encoding a message so only authorised people can decode it)
authentication (verifying the identity of a message sender)
integrity (checking whether a message has been interfered with)

Not all three are necessarily needed or used at any one time. We're currently on version 1.3 of TLS.

Whew! That was a lot. There are some really good videos which make the topic slightly less dry. Each of these separate sections are extremely complex, but having a broad overview is useful to be able to disambiguate what's going on when you use the internet.

Mastery-based Learning with Launch School

March 21, 2019 in Podcast, Tech, Coding

It’s a new week, so we have a new podcast episode for you. Matt and I spoke with Chris Lee of Launch School about his approach to online education. We discuss the different tradeoffs and considerations that come into play when a curriculum is being put together.

To my mind, mastery-based learning — where you don’t advance to the next stage until you’ve really mastered the topic at hand — really shines for things where you have some kind of longer-term goal. Just because it’s a good approach doesn’t mean it’s easy, though. In Chris’ words:

We started this to try to figure out education. It was not a money making endeavor. So to us, teaching became the engineering problem to solve. I was not a proponent of Mastery Based Learning before Launch School. Mastery Based Learning or Competency Based Learning is not unique to Launch School, it’s a well known pedagogy in academic papers. But it’s really hard to implement.
Think about a physical classroom. Mastery Based Learning means that a student gets to occupy a seat in that classroom for an indefinite amount of time. That’s a really hard promise to make when our schools are tasked to usher through students. It’s not about training students and making sure they understand every topic, but getting people through.

You can download and listen to the episode over on the main Sources & Methods website here.

Sources and Methods Does Technology

March 12, 2019 in Podcast, Tech, Coding

Episode one of Sources and Methods’s new season is out today. In this interview, Matt and I spoke with the two creators of an online programme to teach the abacus.

I’ve been studying the abacus over the past few months using the RightLobeMath course and can attest to how thoroughly they build up the skill progressions. Using the web app doesn’t always feel completely polished as an experience, but it’s clear that a lot of thought has gone into the pedagogy. I’m extremely glad that it even exists at all as a service!

The rest of the season has all been recorded and edited and will be released at regular intervals. We got to talk to some amazing guests, some of whom are personal heroes / inspirations of mine from the world of technology. For now, though, you can find the first episode here.

RaspberryLPIC: A New Series & Setup Steps

December 31, 2018 in Coding, Tech, RaspberryLPIC

I mentioned in my last post that I hoped to move on to the LPIC-1 exam in the coming weeks. I’m going through a bit of flux in terms of my stable laptop setup at the moment and I wanted a bit of stability as I work my way through the course. The idea suggested itself to me: what if I work through the syllabus using a Raspberry Pi?

I have a few Raspberry Pi 3 and Zeros lying around the house, so I’ve chosen the latest model I have — a model B version 1.2. I can SSH into the device over wifi regardless of whatever laptop I’m using at the time.

I’m choosing to use a Raspberry Pi for a few reasons:

I want something that feels (and is) ‘disposable’ — if I make some mistake in my settings, I can install everything from scratch fairly trivially
I didn’t want to do it on a virtual machine because sometimes this can behave idiosyncratically and I wanted something as close to an ‘authentic’ Linux experience as possible.
I didn’t want to use a server from a cloud hosting provider since a dedicated server running online is probably overkill for what I need. You can get cheaper if you’re just using part of a server (via some virtualised service etc) but that seemed likely to provide non-standard output.
I wanted to plug and play bits of hardware as part of my studies. That’s obviously not possible on a cloud server, and can provide non-standard responses when done through a virtual machine.
I didn’t just want to install Linux on a spare laptop since I don’t have one of those lying around where I currently am. If I break anything, moreover, the installation / reformatting process and so on takes much longer than just flashing a SD card for a Raspberry Pi.

The hardware is pretty decent on the model I’m using, at least for the purposes of the LPIC-1 exam. This seems like an ideal use case.

Once again, I’m following through using the Linux Academy’s video lectures. As far as I understand things, the LPIC-1 exam requires more than just passing familiarity with a few commands. For that reason, I’m using a few supplementary books. Once I’ve gone through both books and videos I’ll be testing myself with practice exams.

Yesterday I spent a few hours trying to get my base setup installed on the Raspberry Pi. I started with an ambitious plan to install the version of Arch developed for use on a Raspberry Pi 3 (i.e. this version of an ARM chip) but it ended up being somewhat non-trivial. I ended up breaking Pacman (the Arch package installer) and unable to install any new software or update the system.

I realised that Arch probably wasn’t the ideal setup for this experiment in any case. The default Raspbian distro, based on Debian Wheezy, seemed a better option. Flashing that onto my SD card and getting a headless copy up and running was easy.

I might take a short detour before diving into the LPIC course proper by working my way through the Linux From Scratch series. I figure I’ll learn some useful things in that process of building my own custom kernel / distribution that I can then build on through the LPIC-1 syllabus. But I haven’t fully decided on that path yet.

Earning a certificate in Linux Essentials from LPI

December 02, 2018 in Tech

Last week I went to a test centre here in Amman to take the LPI Linux Essentials certification. The qualification doesn’t expire and takes you from zero to dangerous in a variety of basic scenarios in the Linux operating system.

I’ve been using Linux for a couple of years now. (Started out with Fedora, then went to Manjaro. Currently actually running OpenBSD as my main desktop but mulling a switch back to Arch). I was getting frustrated with only knowing certain areas to some level of competence — usually ones where something went wrong, requiring me to troubleshoot. I wanted more of a sense of the overview / fundamentals.

LPI is the Linux Professional Institute. They are the gatekeepers to a number of different Linux certifications (and, coming soon, BSD certifications as well!). The Linux Essentials syllabus gave a good grounding in the history around Linux as an operating system, a number of core UNIX tools, and a basic overview of some things going on under the hood. Some of it was easy / familiar, but I’m glad I went through this systematically.

I’ve been studying this for a few months now, using a mix of Linux Academy videos and good old Anki. The videos were enough on their own to get a passing grade in the exam (they only have pass/fail, though you do get your final ‘mark’ as well), but I really needed the spaced repetition in order to retain all the key commands and UNIX command line options. Retention might be easier for you if you’re already using Linux tools / OSes in work and you have that practical repetition going on, but that wasn’t my situation so I needed something else to make sure I had it all memorised.

The exam itself is multiple choice, which I’m not sure is the best way to test if someone knows their stuff. The level of depth expected was sort of opaque, too; you get an outline of what they expect you to know in the syllabus, but the various books and online resources all go into different levels of depth in terms of how many of the command options you should know etc. On the plus side, it forces you to overlearn which isn’t the worst thing in this case.

I’m now finishing up Linux Academy’s short DevOps Essentials course. Parts are a little basic, but it gives you a fairly decent overview. After that, most likely in the new year, I’ll settle in for the long haul to study for LPIC-1 (AKA the Comptia Linux+) certification. It’ll bring me a whole new level deeper in my understanding of Linux systems, and I’m excited to get started with that.