Python Virtual Environments, Testing Environments and Markdown Strikethrough

I spent part of this afternoon fixing things in my PDF splitter code.

  • I learnt about virtual environments and the various choices available in Python. This was the most useful overview. I ended up choosing pipenv which is also outlined here. It installs a Pipfile in your directory which is an equivalent to the old requirements.txt that was previously used. This means that whenever you use pip to install a new package, it’ll remember and update the file accordingly.
  • For testing, I ended up holding off for the moment. It wasn’t immediately apparent which of the various testing suites I should be using and the examples given in places like this used strange syntax. I’ll have to tackle this later, but for now I’m putting it on hold.
  • I learnt that you can make some text strikethrough (EXAMPLE) in Markdown by enclosing the text in two tildes (~~).
  • I read about application layouts / structures and made some initial decisions about what files to include in my project. Some of this is overkill for where I am currently, but soon enough this project will expand and I’ll need a standard structure format.

Tomorrow I want to start working on my regex search-and-rename function. I’ll start by figuring out the right regex string to use for my search, then I’ll figure out how to add in re.search into my script.

Table Tests in Go

 
ScreenShot 2018-06-18 at 21.47.05.png
 

Today I wanted to stretch my use of test scenarios with Go. The example I described a couple of days ago basically had me running individual tests for specific values. What I wanted was a way to test a bunch of different values for the same function. Enter: table tests.

You can see the code I ended up with partly in the image above but also on Github here. It took a while to get there.

I started with some notes I’d taken during Todd McLeod’s excellent GreaterCommons Go course. Those notes were enough to get a framework up and running. I understood the principle: you create a struct to store all the different values, loop over them all to check whether the test fails in any particular scenario.

When I ran go fmt at the end to format my code, it gave me an error as it refused to build:

I could see that it wanted two ints and I was giving it a slice of ints. Basically this turned into a hunt for fixing my loop and which values I was spitting out at various iterations of the loop.

I ended up isolating the part of the code that was causing the problems, putting it up on the Go Playground so as to isolate exactly what was going wrong. Once I’d figured out exactly how to handle the loop, I could then bring that logic back into my main_test.go file.

Now I know how implement table tests in Go. My next exploration will be around functions that aren’t located in the same file. So far I’ve been mainly using the same main.go file for all the code I’ve written, but a step up in the complexity will be to interact with different files.

My new book: The Taliban Reader

 
DeXH1VMW0AAN60p.jpg:large.jpg
 

My new book is out (finally). The Taliban Reader is somehow the culmination of years of work to drive studies of the Taliban back to primary sources. Some of this work was accidental; more recently it was more purposeful. The book I produced (together with Felix Kuehn) is long and detailed.

Comments and feedback prior to publication were extremely positive. It'll presumably take readers a while to start getting some real independent reviews in, but I look forward to feedback and whatever conversation is generated off the back of it all.

You can pick up a copy at any good bookshop or from Amazon here.

My First Arabic Book Translation

 
ScreenShot 2018-05-06 at 07.13.47.png
 

I’ve been sitting on this for a while. 

Early in 2017, I was lucky to get put in touch with Sami Al-Hajj and al-Jazeera by a good friend. They were looking for someone to translate Sami’s memoirs from Arabic into English. I had done similar work in the past, working Zaeef’s memoirs from his Guantánamo time that later expanded into My Life With the Taliban. I had never done any serious translation from Arabic, however, particularly of this length.

I finished translating a while back. It was both harder and more enjoyable than I had expected. Harder in that it requires an intense focus that can’t really let up while the translation is happening. More enjoyable since I realised the process was something that brought in a lot of creativity to get the language ‘just right’.

Today, Sami’s memoirs have been published. You can download the PDF version via Al-Jazeera here. It looks like Kindle and iBooks versions will be made available in due course as well.

I hope you get a chance to give this book a read. Not only is it some work that I spent a good chunk of time working on, but it’s a really useful and moving account of someone who passed through Guantánamo. In particular, Sami was often on a hunger strike so there is a lot of detail on his mental state and coping mechanisms, as well as the way the medical authorities at Guantánamo attempted to 'deal' with the problem.

Fuzzy Searching and Foreign Name Recognition

Here's something that happens fairly often: I'll be reading something in a book and someone's name is mentioned. I'll think to myself that it'd be useful at this point to get a bit of extra information before I continue reading. I hop over to DevonThink to do a full-text search over all my databases. I let the search compute for a short while, but nothing comes up. I tweak the name slightly to see if a slightly different spelling brings more results. That works a bit better, but I have to tweak the spelling several times until I can really claim the search has been exhaustively performed.

Anyone who's done work in and on a place where a lot of material is generated without fixed spellings for transliteration. In Afghanistan, this ranges from people's names -- Muhammad, Mohammad, Muhammed, Mohammed etc -- to place and province names -- Kunduz, Konduz, Kondoz, Qonduz, Qhunduz etc.

DevonThink actually has a 'fuzzy search' option that you can toggle but it isn't clear to me how it works or whether it's reliable as a replacement for a more systematic approach.

As I'm currently doing more and more work using Python, I was considering what my options would be for making my own fuzzy search emulator.

My first thought was to be prescriptive about the various rules and transformations that happen when people make different spelling choices. The Kunduz example from above reveals that vowels are a key point of contention: the 'u' can also be spelt 'o'. The 'K' at the beginning could also, in certain circumstances, become 'Q' or 'Qh'. These various rules could then be coded in a system that would collect all the possible spelling variations of a particular string and then search the database for all the different variations.

Following a bit of duckduckgo-ing around, I've since learnt that there are quite extensive discussions of this problem as well as approaches to solution that have been proposed. One, commonly referenced, is a Python package called 'FuzzyWuzzy'; it uses a mathematical metric called the Levenshtein distance to measure how similar or not two strings are. I imagine that there are many other possible metrics that one could use to detect how much two strings resemble one another.

I imagine the most accurate solution is a mixture of both approaches. You want something that is agnostic about content in the case of situations where you don't have domain knowledge. (I happen to have read a lot of the materials relating to Afghanistan, so I know that these variations of names exist and that there is a single entity that unites the various spellings of Kunduz, for example). But you probably want to code in some common rules for things which come up often. (See this article, for example, on the confusion over spellings of Muslim names and how this leads to law enforcement mistakes).

I may end up coding up a version that has high accuracy on Afghan names because it's a scenario in which I often find myself, but I'll have to explore the other more mathematically-driven options to see if I can find a happy medium.

Tweeting to the Void

I've previously written about how I turned off Facebook's news feed. I keep an account with Facebook because people occasionally contact me there. It is also an unfortunate truth that many companies in Jordan (where I live) or in the wider Middle East only have representation on Facebook instead of their own website. (Why they insist on doing this baffles me and is perhaps a topic for a future post).

I have long preferred Twitter as a medium for filtering through or touching -- however obliquely -- things going on at any particular moment. I have no pretensions to actively follow every single tweet to pass through my feed. Rather, it's something I dip into every now and then.

Increasingly in recent months, I found myself growing dissatisfied with the pull it often has on me. It has become something of a truism to state that 'twitter isn't what it once was', but there's less and less long-term benefit in following discussions as and when they happen.

RescueTime tells me that I spent 86 hours and 16 minutes on Twitter in 2017 -- just under quarter of an hour each day. That feels like a lot to me.

ScreenShot 2018-01-25 at 19.13.15.png

Enter 'Tweet to the Void'. This is a Chrome extension. (For Firefox and other browsers, I have to imagine things like this exist.) When I visit twitter.com, the feed is not visible. All I see is somewhere to post a tweet if that's what I want to do. (There is still some value in posting blogposts and articles there, since I know some people don't use RSS). Of course, I can always turn off the extension with ease, but adding this extra step has effectively neutralised Twitter for me. 

Try it; see how you feel about having something standing in the way of your social media fix. Let me know how you get on.