Afghanistan

New book, new ways to order

 
FDP_Cover_Taliban.png
 

“I was around three or four years old when the Communists led the bloodiest coup in Afghanistan. KhAD personnel were arresting the faithful. One day, a few ugly moustached men knocked on our door. My father left with them and then he never came back. We never saw him again.

“After a year, I began to understand that this kind person was no longer with me. Poverty, a cold fireplace, and my old clothes made it evident – I was an orphan. Every man with a moustache looked like my father’s murderer. My uncle took us with him to another village, and we no longer had a home of our own.”

In this way Abdul Hai Mutma’in begins his memoir of time alongside the senior leadership of the Afghan Taliban movement. First published in Afghanistan a couple of years ago, Taliban: A Critical History from Within is now available for pre-order in an English translation.

Mutma’in served as a political advisor to Taliban leader Mullah Muhammad Omar and as spokesperson. He worked in the media section of Kandahar’s Culture and Information Ministry and from 2013 onwards served as a political and humanitarian affairs advisor to Mullah Akhtar Mansour from 2013. In short: he spent a good deal of time around the senior leadership and was privy to the internal workings and machinations of the Taliban movement at its highest levels.

At First Draft Publishing, the small publishing house I started five years ago together with Felix Kuehn, our explicit agenda is to publish books that will help “give researchers, professionals and the interested public access to primary and secondary sources”. This book falls firmly into this remit. The list of primary sources relating to the Taliban (or primary-source-adjacent) is exceedingly thin, even all these years since the movement first burst onto the national and international stage. From our perspective as researchers, the more such memoirs get written, the more we are able to attempt a critical unpicking of narratives and myths that have driven both conflict and efforts towards integration. Without these raw materials, it is impossible to begin the slow and methodical work of scholarship: triangulation, verification, context, synthesis and so on.

A bit of additional housekeeping: if you want to (pre-)order Mutma’in’s book, we have made some changes to how we’re producing and delivering books. We’re moving away from Amazon as the delivery system for our content and will simply process orders manually. For hardcopy purchases, we’ll be printing copies on demand. For ebooks, we’ll distribute DRM-free copies upon receipt of payment. If you’re interested in purchasing any of our books, please visit our website to learn more about our titles and email us to place an order.

My new book: The Taliban Reader

 
DeXH1VMW0AAN60p.jpg:large.jpg
 

My new book is out (finally). The Taliban Reader is somehow the culmination of years of work to drive studies of the Taliban back to primary sources. Some of this work was accidental; more recently it was more purposeful. The book I produced (together with Felix Kuehn) is long and detailed.

Comments and feedback prior to publication were extremely positive. It'll presumably take readers a while to start getting some real independent reviews in, but I look forward to feedback and whatever conversation is generated off the back of it all.

You can pick up a copy at any good bookshop or from Amazon here.

Fuzzy Searching and Foreign Name Recognition

Here's something that happens fairly often: I'll be reading something in a book and someone's name is mentioned. I'll think to myself that it'd be useful at this point to get a bit of extra information before I continue reading. I hop over to DevonThink to do a full-text search over all my databases. I let the search compute for a short while, but nothing comes up. I tweak the name slightly to see if a slightly different spelling brings more results. That works a bit better, but I have to tweak the spelling several times until I can really claim the search has been exhaustively performed.

Anyone who's done work in and on a place where a lot of material is generated without fixed spellings for transliteration. In Afghanistan, this ranges from people's names -- Muhammad, Mohammad, Muhammed, Mohammed etc -- to place and province names -- Kunduz, Konduz, Kondoz, Qonduz, Qhunduz etc.

DevonThink actually has a 'fuzzy search' option that you can toggle but it isn't clear to me how it works or whether it's reliable as a replacement for a more systematic approach.

As I'm currently doing more and more work using Python, I was considering what my options would be for making my own fuzzy search emulator.

My first thought was to be prescriptive about the various rules and transformations that happen when people make different spelling choices. The Kunduz example from above reveals that vowels are a key point of contention: the 'u' can also be spelt 'o'. The 'K' at the beginning could also, in certain circumstances, become 'Q' or 'Qh'. These various rules could then be coded in a system that would collect all the possible spelling variations of a particular string and then search the database for all the different variations.

Following a bit of duckduckgo-ing around, I've since learnt that there are quite extensive discussions of this problem as well as approaches to solution that have been proposed. One, commonly referenced, is a Python package called 'FuzzyWuzzy'; it uses a mathematical metric called the Levenshtein distance to measure how similar or not two strings are. I imagine that there are many other possible metrics that one could use to detect how much two strings resemble one another.

I imagine the most accurate solution is a mixture of both approaches. You want something that is agnostic about content in the case of situations where you don't have domain knowledge. (I happen to have read a lot of the materials relating to Afghanistan, so I know that these variations of names exist and that there is a single entity that unites the various spellings of Kunduz, for example). But you probably want to code in some common rules for things which come up often. (See this article, for example, on the confusion over spellings of Muslim names and how this leads to law enforcement mistakes).

I may end up coding up a version that has high accuracy on Afghan names because it's a scenario in which I often find myself, but I'll have to explore the other more mathematically-driven options to see if I can find a happy medium.

Tabula for extracting table data from PDFs

Have you ever come across a PDF filled with useful data, but wanted to play around with that data yourself? In the past if I had that problem, I'd type the table out manually. This has some disadvantages:

  • it is extremely boring
  • it's likely that mistakes will get made, especially if the table is long and extends over several pages
  • it takes a long time

I recently discovered a tool that solves this problem: Tabula. It works on Windows and Mac and is very easy and intuitive to use. Simply take your page of data:

A page listing Kandahar's provincial council election polling stations from a few years back. Note the use of English and Dari scripts. Tabula handles all this without problems.

Then import the file into Tabula's web interface. It's surprisingly good at autodetecting where tables and table borders are, but you can do it manually if need be:

ScreenShot 2018-01-17 at 15.56.25.png

Then check that the data has been correctly scraped, select formats for export (from CSV to JSON etc):

ScreenShot 2018-01-17 at 15.57.19.png

And there you have it, all your data in a CSV file ready for use in R or Python or just a simple Excel spreadsheet:

ScreenShot 2018-01-17 at 15.57.50.png

Note that even though the interface runs through a browser, none of your data touches external servers. All the processing and stripping of data from PDFs is done on your computer, and isn't sent for processing to cloud servers. This is a really nice feature and I'm glad they wrote the software this way.

I haven't had any problems using Tabula so far. It's a great time saver. Highly recommended.

Kael Weston on Sources and Methods

 
 

Matt and I put out a new episode of Sources and Methods podcast today. We spoke to Kael Weston, discussing his time spent living in Fallujah, the importance of speaking the language of the place in which you work, as well as the political systems countries like the USA employ in far-off places like Iraq and Afghanistan. He also recently wrote a book, The Mirror Test, which is worth reading. You can find the episode over on iTunes or listen directly on the Sources and Methods website.

Learn all the districts of Afghanistan with Anki!

A friend was asking about using Anki to learn to recognise the districts of Afghanistan so I made her a deck that provides tests in the following way;

On the front of the card the question is presented along with a computer-generated audio pronunciation of the district name:

Then if you know it, you'll answer Badakhshan and then you'll click/tap through to the next screen to see if you got it right. You'll see this:

 
 

Then you can mark whether you got it right or not. There are around 400 districts to learn, so if you learn 13-15 new cards each day you'll finish the whole lot in a month.

Why learn all the districts of Afghanistan? Sometimes you'll hear someone talking about a particular place or part of the country, and without knowing which province they're talking about you might not understand the context or the conversation. Plus, a little bit of geography never hurt anyone.

Give it a try. And let me know if you manage to complete the deck. You can download the full Anki file here. Enjoy!