Coding

Two Ruby patterns around map and reduce

When you're going to choose a method to work to transform a group of n things in Ruby, there are two broad patterns you can choose: work with a map function or work with a reduce / inject function.

This pattern choice was recently explained to me as part of my Ruby education at Launch School. I hadn't fully grokked that the choice around how you transform a bundle of things (an Array, perhaps) really can be summarised by those two options.

You use a map method when you want to transfer your array into another array of (transformed) things. (For example, you wanted to transform an array of lowercase names into an array of uppercase names).

example_array = ['alex', 'john', 'terry', 'gill']
transformed_array = example_array.map(&:upcase) # => ["ALEX", "JOHN", "TERRY", "GILL"]

You use a reduce method when you want to transform n number of things (inside your array) into a single thing. (For example, you wanted to sum up the values of an array).

example_array = [1, 3, 5, 7, 9]
transformed_array = example_array.reduce(:+) # => 25

How the Internet Works

The internet. It just works. Understanding exactly how is a bit more complicated than many pieces of engineering. The more you examine the different aspects and parts that make it up, the more you see complexity concealed under the surface.

Visiting this website, for instance: it feels like a trivial thing to do, but there are many different parts making that happen, from the parts that actually transport the bits of data across the physical infrastructure, to the pieces that serve it all to you on a secure connection (ensuring that what I've written hasn't been altered by a third-party).

I've just finished Launch School's LS170 module which takes you a decent way down in the weeds to explain exactly how all of these pieces fit together to make up 'the internet'. So today I thought I'd retransmit some of that as a way of cementing it in my own mind.

At a very abstract level, the internet can be thought of as a network of networks. A network itself is a set of two or more computers which are able to communicate between each other. This could be the computers attached to a home network, or the computers that connect through a central server to a particular Internet Service Provider or ISP.

The internet makes use of a series of 'protocols', shared rules and understandings which have been developed or accreted over time. These protocols allow a computer on the other side of the planet to communicate in a mutually comprehensible manner. (If these shared sets of rules didn't exist, communicating with strangers or sending messages from one server to another would be a lot more difficult).

So once we have this top-down understanding of the internet as a bunch of networks that interact with each other, what, then, is the process by which a web browser in the United Kingdom communicates with a web server in China? Or in other words, if I want to access a website hosted on a Chinese webserver, how does that series of communication steps work to make that happen?

At this point, it's useful to make use of another abstraction: communication across the internet happens across a series of layers. There are several different models for these various layers. Two of the more common models — the OSI model and the TCP/IP model — are represented below:

layered-system-osi-tcp-ip-comparison.png

At the top level — "Application" — you have your website or whatever the user comes into contact with that is being served up to your web browser, let's say. All the layers below that are progressively more and more specialised, which is another way of saying that they become progressively less comprehensible if you were to eavesdrop on the data as it were passing over the wire or through the fibre optic cable.

Let's move through the big pieces of how information is communicated, then, starting at the bottom. (I'll mostly follow the TCP/IP model since it's a bit less granular and allows me to split things up in a way that make sense). This chart will help keep all the pieces in your mind:

layersofinternet.png

Note that each layer has something known as a 'protocol data unit' or PDU. A PDU is usually made up of a combination of a header, payload or chunk of data and an optional footer or trailer. The header and footer contain metadata which allows for the appropriate transmission / decoding etc of the data payload.

The PDU of one layer is used by the layer below or above it to make up its own separate PDU. See the following diagram as an illustration:

encapsulation.png

Physical Layer

Before we get into the realm of protocols, it's worth remembering and reminding ourselves that there is a physical layer on which all the subsequent layers rely. There are some constraints relating to the speed or latency with which data can be transmitted over a network which relate to fixed laws of physics. The most notable of those constraints is the fact of the speed of light.

Ethernet is the protocol that enables communication between devices on a single network. (These devices are also known as 'nodes'). The link layer is the interface between the physical network (i.e. the cables and routers) and the more logical layers above it.

The protocols for this layer are mostly concerned with identifying devices on the network, and moving the data among those devices. On this layer, devices are identified by something called a MAC (Media Access Control) address, which is a permanent address burned into every device at the time of manufacturing.

The PDU of the Ethernet layer is known as a 'frame'. Each frame contains a header (made up of a source address and a destination address), a payload of data, and a footer.

Internet Layer — The Internet Protocol (IPv4 or IPv6)

Moving up a layer, part of the Ethernet frame is what becomes the PDU for the internet or network layer, i.e. a packet.

This internet layer uses something known as the internet protocol which facilitates communication between hosts (i.e. different computers) on different networks. The two main versions of this protocol are known as IPv4 and IPv6. They handle routing of data via IP addressing, and the encapsulation of data into packets.

IPv4 was the de facto standard for addresses on the internet until relatively recently. There are around 4.3 billion possible addresses using this protocol, but we are close to having used up all those addresses now. IPv6 was created for this reason and it allows (through the use of 128-bit addresses) for a massive 340 undecillion (billion billion billion billion) different addresses.

Adoption of IPv6 is increasing, but still slow.

There is a complex system of how data makes its way from one end node on the network, through several other networks, and then on to the destination node. When the data is first transmitted, a full plan of how to reach that destination is not formulated before starting the journey. Rather, the journey is constructed ad hoc as it progresses.

Transport Layer — TCP/UDP

There are a number of different problems that the transport layer exists to solve. Primarily, we want to make sure our data is passed reliably and speedily from one node to another through the network.

TCP and UDP are two protocols which are good at different kinds of communication. If the reliability of data transmission is important to us and we need to make sure that every piece of information is transmitted, then TCP (Transmission Control Protocol) is a good choice. If we don't care about every single piece of information — in the case of streaming a video call, perhaps, or watching a film on Netflix — but rather about the speed and the ability to continuously keep that data stream going, then UDP (User Datagram Protocol) is a better choice.

There are differences between the protocols beyond simply their functionality. We can distinguish between so-called 'connection-oriented' and 'connectionless' protocols. For connection-oriented protocols, a dedicated connection is created for each process or strand of communication. The receiving node or computer listens with its undivided attention. With a connectionless protocol, a single port listens to all incoming communication and has do disambiguate between all the incoming conversations.

TCP is a connection-oriented protocol. It first sends a three-way handshake to establish the connection, then sends the data, and sends a four-way handshake to end the connection. The overhead of having to make these handshakes at the beginning and at the end, it's a fairly costly process in terms of performance, but in many parts of internet communication we really do need all the pieces of information. Just think about an email, for example: it wouldn't be acceptable to receive only 70% of the words, would it?

UDP is a connectionless protocol. It is in some ways a simpler protocol compared to TCP, and this simplicity gives it speed and flexibility; you don't need to make a handshake to start transmitting data. On the negative side, though, it doesn't guarantee message delivery, or provide any kind of congestion avoidance or flow control to stop your receiver from being overwhelmed by the data that's being transmitted.

Application Layer — HTTP

HTTP is the primary communication protocol used on the internet. At the application layer, HTTP provides communication of information to applications. This protocol focuses on the structure of the data rather than just how to deliver it. HTTP has its own syntax rules, where you enclose elements in tags using the < data-preserve-html-node="true" and > symbols.

Communication using HTTP takes the form of response and request pairs. A client will make a 'request' and it'll receive (barring some communication barrier) a 'response'. HTTP is known as a 'stateless' protocol in that each request and response is completely independent of the previous one. Web applications have many tricks up their sleeve to make it seem like the web is stateful, but actually the underlying infrastructure is stateless.

When you make an HTTP request, you must supply a path (e.g. the location of the thing or resource you want to request / access) and a request method. Two of the most common request methods are GET and POST, for requesting and amending things from/on the server respectively. You can also send optional request 'headers' which are bits of meta-data which allow for more complicated requests.

The server is obliged to send a HTTP status code in reply. This code tells you whether the request was completed as expected, or if there were any errors along the way. You'll likely have come across a so-called '404' page. This is referring to the 404 status code indicating that a resource or page wasn't found on the server. If the request was successful, then the response may have a payload or body of data (perhaps a chunk of HTML website text, or an image) alongside some other response headers.

Note that all this information is sent as unencrypted plain text. When you're browsing a vanilla http:// website, all the data sent back and forth is just plain text such that anyone (or any government) can read it. This wasn't such a big issue in the early days of the internet, perhaps, but quite soon it became more of a problem, especially when it came to buying things online, or communicating securely. This is where TLS comes in.

TLS or Transport Layer Security is sometimes also known as SSL. It provides a way to exchange messages securely over an unsecured channel. We can conceptually think of it as occupying the space between the TCP and HTTP protocols (at the session layer of the OSI framework above). TLS offers:

  • encryption (encoding a message so only authorised people can decode it)
  • authentication (verifying the identity of a message sender)
  • integrity (checking whether a message has been interfered with)

Not all three are necessarily needed or used at any one time. We're currently on version 1.3 of TLS.


Whew! That was a lot. There are some really good videos which make the topic slightly less dry. Each of these separate sections are extremely complex, but having a broad overview is useful to be able to disambiguate what's going on when you use the internet.

Mastery-based Learning with Launch School

It’s a new week, so we have a new podcast episode for you. Matt and I spoke with Chris Lee of Launch School about his approach to online education. We discuss the different tradeoffs and considerations that come into play when a curriculum is being put together.

To my mind, mastery-based learning — where you don’t advance to the next stage until you’ve really mastered the topic at hand — really shines for things where you have some kind of longer-term goal. Just because it’s a good approach doesn’t mean it’s easy, though. In Chris’ words:

We started this to try to figure out education. It was not a money making endeavor. So to us, teaching became the engineering problem to solve. I was not a proponent of Mastery Based Learning before Launch School. Mastery Based Learning or Competency Based Learning is not unique to Launch School, it’s a well known pedagogy in academic papers. But it’s really hard to implement.

Think about a physical classroom. Mastery Based Learning means that a student gets to occupy a seat in that classroom for an indefinite amount of time. That’s a really hard promise to make when our schools are tasked to usher through students. It’s not about training students and making sure they understand every topic, but getting people through.

You can download and listen to the episode over on the main Sources & Methods website here.

Using Ruby's .digits method

I discovered the .digits method in Ruby the other day. As a quick illustration, it extracts the digits of a method into an array, reverse sorted.

12345.digits #=> [5, 4, 3, 2, 1]

You can optionally specify what base you’d like it to use to calculate the digits using, i.e. the same calculation as above but in base 100 would give you the following:

12345.digits(100) #=> [45, 23, 1]

Reading around a little bit, it seems that if you’re trying to get hold of the digits of a number, simply doing a .digits.reverse is perhaps an ok solution if the number is small, but at a certain point it starts to get slow. This is because .digits isn’t just ‘splitting’ the number.

For that reason, perhaps using .to_s.chars might be a better alternative. You can then use a .map function to convert the characters into integers:

12345.to_s.chars.map { |digit| digit.to_i }

I’m not entirely sure what .digits is actually used / useful for, given the speed issues.

Using Ruby's .zip method to combine arrays

In Ruby, zip is a way to combine elements from two different arrays, albeit in a way that is slightly difficult to understand at first glance.

The documentation is a bit opaque, at least to my eyes, and the examples given take a bit of time to get your head around.

Let’s say you had an array of fruits that you wanted to distribute to your friends. You’re organised, so you have a list of your friends as well.

fruits = ["mango", "orange", "pomegranate"]
friends = ["Rob", "Mary", "Holly"]

Using multiple methods and loops, it’d be fairly trivial to conjure up something to combine these two into a new array, but luckily for us .zip exists to save the day.

friends.zip(fruits)

will return:

[["Rob", "mango"], ["Mary", "orange"], ["Holly", "pomegranate"]]

This way everyone will know what fruit they’re getting.

Note that if one of the two arrays is longer / shorter than the other, the missing space(s) will be filled with nil values.

Python Virtual Environments, Testing Environments and Markdown Strikethrough

I spent part of this afternoon fixing things in my PDF splitter code.

  • I learnt about virtual environments and the various choices available in Python. This was the most useful overview. I ended up choosing pipenv which is also outlined here. It installs a Pipfile in your directory which is an equivalent to the old requirements.txt that was previously used. This means that whenever you use pip to install a new package, it’ll remember and update the file accordingly.
  • For testing, I ended up holding off for the moment. It wasn’t immediately apparent which of the various testing suites I should be using and the examples given in places like this used strange syntax. I’ll have to tackle this later, but for now I’m putting it on hold.
  • I learnt that you can make some text strikethrough (EXAMPLE) in Markdown by enclosing the text in two tildes (~~).
  • I read about application layouts / structures and made some initial decisions about what files to include in my project. Some of this is overkill for where I am currently, but soon enough this project will expand and I’ll need a standard structure format.

Tomorrow I want to start working on my regex search-and-rename function. I’ll start by figuring out the right regex string to use for my search, then I’ll figure out how to add in re.search into my script.

Table Tests in Go

 
ScreenShot 2018-06-18 at 21.47.05.png
 

Today I wanted to stretch my use of test scenarios with Go. The example I described a couple of days ago basically had me running individual tests for specific values. What I wanted was a way to test a bunch of different values for the same function. Enter: table tests.

You can see the code I ended up with partly in the image above but also on Github here. It took a while to get there.

I started with some notes I’d taken during Todd McLeod’s excellent GreaterCommons Go course. Those notes were enough to get a framework up and running. I understood the principle: you create a struct to store all the different values, loop over them all to check whether the test fails in any particular scenario.

When I ran go fmt at the end to format my code, it gave me an error as it refused to build:

I could see that it wanted two ints and I was giving it a slice of ints. Basically this turned into a hunt for fixing my loop and which values I was spitting out at various iterations of the loop.

I ended up isolating the part of the code that was causing the problems, putting it up on the Go Playground so as to isolate exactly what was going wrong. Once I’d figured out exactly how to handle the loop, I could then bring that logic back into my main_test.go file.

Now I know how implement table tests in Go. My next exploration will be around functions that aren’t located in the same file. So far I’ve been mainly using the same main.go file for all the code I’ve written, but a step up in the complexity will be to interact with different files.