Tech

PhD Tools: Beeminder

[This is part of a series on the tools I used to write my PhD. Check out the other parts here.]

I feel like I've mentioned the end of the PhD several times in recent posts (PHD IS OVER!). It occurred to me that it might be useful to go through some of the tools and principles that I found most useful in completing the doctoral thesis, the research and the work in general. Part of this is by way of giving thanks to the application or methodological creators, and the other part is me thinking that others (future / current PhD students?) might find this useful.

It took me many years to finally settle on these tools. It would probably be unwise to adopt my entire writing style and process for yourself, because everyone's unique. I read a lot of books, blogposts and discussed things in forums and at meetings with others. This is all the product of a lot of procrastination (some active, some just resulting from hanging out on twitter or subscribing to a bunch of productivity-related blogs in my RSS reader).

Each post will vary in size. For some I'll go into a bit more detail because the principle will be somewhat unknown. Others are mega-players in the tech world so I'll just tip my hat in their direction.

Minding the Bees

My first pick is, of course, Beeminder. (I've written about Beeminder before here.) The principle behind this service is pretty simple: you commit to doing a certain thing (or things) by a certain date (or regularly each day etc) and if you don't do them, you're penalised with money taken from your credit/debit card. The amount of money taken depends on whether you're a first-time offender (free, or $5), but then it increases exponentially. Pretty soon you'll be facing $270 or even higher fines.

Needless to say, this is a pretty strong motivator. You can hear about some of the nitty-gritty details in a podcast interview I did with Matt Trevithick and the founders of Beeminder, Bethany Soule and Daniel Reeves.

I have used Beeminder for a really wide variety of things -- not just for my work but for my personal life, too -- but in terms of my PhD, I had three main goals it supported:

1) tracking the amount of time I spent writing. You can hook up RescueTime (a passive activity tracker on your laptop) to feed into Beeminder. I can then say that I want to make sure I do a minimum of 1 hour of writing in Scrivener each day (for example), and Beeminder keeps track of the rest. This is a good thing to track because, ultimately, the PhD is all about keeping writing. You can get lost in the research, but after a certain point you just have to deliver it and ship the damn thing. This keeps you honest about the writing part, the sitting down in the chair and putting words on the page.

2) words drafted -- this one's a bit more delicate, since often when you're starting out, drafting a new section or chapter, the words that come out are useless drivel (or replace with a far less charitable way of describing their quality, and say hi to my inner voice while you're there!). At the beginning, doing basic drafting, it's hard to get started because you feel everything has to be perfect. The best antidote to this is to work on a 'shitty first draft'. Here, the idea is simply to churn out enough thoughts to fill the blank space in the outline, or book, or chapter or wherever.
A specific example: I flew to Karachi in late 2012 to hammer out the first draft of my dissertation. I setup a Beeminder goal of having 100,000 words of text (approx the maximum word count allowed for submission to the university) and a date 6 weeks in the future, and I got writing. Beeminder calculates and tells you how many words you have to get done each day in order to stay ahead of the curve. (There are graphs. They are awesome). As long as you keep writing, you're ok. And I did it. Most was horrible, and some of it was inner conversations between myself and myself about the subject under consideration, almost all of which I had to rewrite in some shape or form later on. But... it was words on the page, and it was me thinking through the issues. It was essential.

3) Sources Read -- this might be unique to me, but at some point I had to return to the newly-gathered sources of the Taliban Sources Project. I looked in my DevonThink database (about which, more to come in a future post) and saw I'd flagged 1000+ articles to reread, catalogue/tag and integrate into the main thesis argument. So I plugged those numbers into Beeminder, gave myself a workable daily rate (50 or 100, I think) and then it calculated the rest and kept me honest.

So, to sum up:

  • Beeminder forces you think backwards from your goal if you have a specific endpoint in mind. This is extremely valuable as it makes sure you're not being overambitious.
  • Beeminder gives you accountability. It keeps you honest. This is what I initially found was most valuable, but later on I needed this less. YMMV.
  • The community of Beeminder users is wonderful. The forum is a great place to get ideas, discuss approaches / failures etc.
  • It works! Many people have had great results using Beeminder.

I'm not going to say I couldn't have written my PhD without Beeminder, but I'm almost saying it. Go check it out!

Learn all the districts of Afghanistan with Anki!

A friend was asking about using Anki to learn to recognise the districts of Afghanistan so I made her a deck that provides tests in the following way;

On the front of the card the question is presented along with a computer-generated audio pronunciation of the district name:

Then if you know it, you'll answer Badakhshan and then you'll click/tap through to the next screen to see if you got it right. You'll see this:

 
 

Then you can mark whether you got it right or not. There are around 400 districts to learn, so if you learn 13-15 new cards each day you'll finish the whole lot in a month.

Why learn all the districts of Afghanistan? Sometimes you'll hear someone talking about a particular place or part of the country, and without knowing which province they're talking about you might not understand the context or the conversation. Plus, a little bit of geography never hurt anyone.

Give it a try. And let me know if you manage to complete the deck. You can download the full Anki file here. Enjoy!

Walking Amman

 
 

I’ve been walking around Amman a little in the past couple of days. My poor sense of direction with the city’s somewhat haphazard street layout mean I make use of digital GPS maps on a regular basis. In Europe or North America, Google Maps is my service of choice, with due acknowledgement of their general creepiness.

But I discovered yesterday that Google Maps is pretty atrocious when walking around Amman. Either their data is old and of poor quality, or the algorithm for calculating time/distance between two points is not properly calibrated for a city with many hills. If you look on Google Maps’ display, you’ll see what looks like a flat terrain. Everything can seem very close. If you look out of the window, or walk on the streets, you’ll see that hills and a highly variable topography are very much a part of the experience of the city. (This gives some idea of it).

Google Maps knows how to deal with hills or variable terrain. After all, San Francisco, close to their centre of operations, is a pretty hilly city and I found the maps and the estimated timings worked pretty well when I was there last year. Which suggests to me that the problem isn’t that Google forgot to take into account topography but rather that the data is poor.

I’m studying data science-y things these days, so I thought a bit about how they might improve this data. Some possible solutions:

  1. They’re already monitoring all the data coming from app usage etc, so why not track whether its estimations match up with how long people actually take to walk certain streets/routes. Mix that in with the topography data, and average it all out.
  2. They could send out more cars. I don’t know how accurate the map data for driving in Amman is, but some anecdotal accounts suggest that it suffers from similar problems. This is probably too expensive, and I’m assuming it’d be preferable to find a solution that doesn’t require custom data collecting of this kind. Maybe something for when the world has millions of driverless cars all powered by Google’s software, but for now it’s impractical as a solution.
  3. Find some abstract solution based on satellite-acquired topographic data which takes better account of gradients of roads etc.

For the moment, Google Maps is pretty poor user experience as a pedestrian. Yesterday evening I was walking back home from the centre of town. The walk would, Google told me, take only 12 minutes. 40+ minutes later I arrived home.

Others have noted this same problem and suggested an alternative: OpenStreetMap data. The data is unattached to a particular app, but I downloaded one alongside the offline mapping data for Jordan/Amman. It seems pretty good at first glance, and I’ll be testing it out in the coming days. I’m interested o learn why it seems to perform better. My initial hypothesis is that its data is just better than that which Google Maps is using.

On Untangling Syria's Socially Mediated War

 
Some old photos from when I used to live in Damascus

Some old photos from when I used to live in Damascus

fountains.jpg

How do we figure out what is going on in a country like Syria, when journalists, researchers and civilians alike are targeted with frustrating ease? Is it enough to track what is being posted on social media outlets? These two questions are at the core of a fascinating recent(ish) study published by the United States Institute for Peace (USIP).

Syria’s Socially Mediated Civil War – by Marc Lynch, Deen Freelon and Sean Aday – came out in January 2014 and analyses an Arabic-and-English-language data set spanning a few years. It offers a useful overview of the social media trends as they relate to the ongoing conflict in Syria. It is especially relevant for those of us who aren’t inside Syria right now, and who are trying to understand things at one remove, whether that is through following social media output or talking to those who have left the country. (This means journalists, researchers and the like.)

Some stark conclusions emerge from the report. The ones I’m choosing to highlight here relate to how international media and research outlets have often been blind to structural issues that obscure their ability to understand Syria from outside the country.

“Social media create a dangerous illusion of unmediated information flows.” [5]

The role of translation or the importance of having research teams that are competent in both English and Arabic comes out very strongly from the research.

“The rapid growth in Arabic social media use poses serious problems for any research that draws only on English-language sources.” [page 3]

The report details how tweets about Syria in Arabic and English came to be different universes, how the discourse rarely overlapped between the two and that to monitor one was to have no idea of what was going on in the other:

“Arabic-language tweets quickly came to dominate the online discourse. Early in the Arab Spring, English-language social media played a crucial role in transmitting the regional uprisings to a Western audience. By June 2011, Arabic had overtaken English as the dominant language, and social media increasingly focused inward on local and identity-based communities. Studies using English-only datasets can no longer be considered acceptable.” [6]

Also:

“The English-language Twitter conversation about Syria is particularly insular and increasingly interacts only with itself, creating a badly skewed impression of the broader Arabic discourse. It focused on different topics, emphasized different themes, and circulated different imagery. This has important implications for understanding mainstream media’s limitations in covering Syria and other non-Western foreign crises and raises troubling questions about the skewed image that coverage might be presenting to audiences.” [6]

Also:

“researchers using only English-language tweets would be significantly misreading the content and nature of the online Twitter discourse.” [17]

And:

“These findings demonstrate once again the insularity of English-language journalists and the rapid growth of the Arabic- speaking networks. Both findings are potentially troubling for at least two reasons. First, they imply a journalistic community whose coverage may be influenced more by its cultural and professional biases than by the myriad constituencies within Syria and across the region. Second, they point to the power of social media to draw people into like-minded networks that interpret the news through the prism of their own information bubbles.” [26]

The general ideas in here won’t necessarily come as a surprise but I found it instructive to see just how different those two discourse universes are in the report.

In a separate but not-unrelated note, I have been thinking of ways that I can stay engaged in what’s going on in Syria beyond just consuming reports at one step removed. I’m working with a beta-testing team using a piece of software called Bridge – made by the lovely team at Meedan – which allows for the translation of social media and the use of those translations as part of an embedded presentation online. I will be translating strands and snippets from certain parts of Syria’s social media universe in Arabic. More on this soon, I hope.

How I use Goodreads to pick what I read

So far this year, I have read 35 books. I'm trying something new for 2015 so I thought I'd write up the outline in case someone else finds it useful. As I wrote at the end of last year, I'll be reading 150 books over the course of 2015. That's fifty books more than I read in 2014. The point of it is to expose myself to lots of different ideas, different styles, different perspectives. I've found that 150 probably isn't an impossible amount to be reading (less than three a week) and I really relish brushing up against interesting authors and ideas.

I've used Goodreads as a way of tracking what I read for a long time now. I'm lucky enough to have an interesting group of 'friends' who also use it (more or less regularly) so there's usually a decent amount of new or niche books that I discover that way. I also use it as a way of noting down the books I want to read in the future. (Incidentally, I've never really had a problem in finding something new to read. The list of books I want to read will always be larger than the time I have to read them. That's just life.)

Goodreads offers a 'list' function whereby you can not only state that you 'want to read' a book, but where you can categorise things to your heart's content. Each year I set up a list ("2015toread" and so on) so I can see which books I think I'm more motivated to read that year. I'll usually take 5 or 10 minutes each weak checking over the list to make sure the things I added to the list are actually things I still want to read (versus things I added in the heat of a moment, after reading a particularly persuasive review, for example, but which I probably don't need to spend my time on).

Previously, I was generally following my gut with what I wanted to read next. Unfortunately, this often meant I went with the easiest option, or the path of least resistance. Long books (weighty histories, or more abstruse theoretical texts) would be passed up for the latest *it* novel or someone's entirely forgettable memoir about their time in Afghanistan that I'll feel obliged to read.

This year I've been trying a different approach. Goodreads allows you to sort lists by various bits of metadata attached to each book (author name, date added etc) but you can also sort by "average rating". This is the average rating given to a particular book by the entire Goodreads user base (20+ million users). You can see how this pans out in my current set of 'up next' books:

 
 

This "average rating" isn't in any way a guarantee of anything resembling quality. It's not that hard for authors to game the system, and books with few reviews (common for niche subjects like Afghanistan or Islam) have either really high or low ratings. But I'm finding this approach brings me to read far more books outside my path-of-least-resistance choices and often brings me into contact with some real gems.

Needless to say, this method of discovery is only a little better than putting all the names in a hat and picking one at random, but I am still finding some benefit. It does mess with my desire to read fewer male authors (you'll note in the picture above that only book number seven is by a woman; the rest are men) but everything in life is a tradeoff of some sort, I suppose.

Let me know if you find some use to this or if you have any other ways you pick what books to read next.

Note-Taking Jujitsu, Or How I Make Sense Of What I Read

Note-taking is a problem. It's an interesting problem, but still a problem. Many people have switched over from paper books to digital copies. I am certainly one of the early adopters in this trend, having wrangled Graeme Smith and his sister into facilitating a first iteration of Amazon's Kindle to be delivered to my house in Kandahar.

My colleague Felix Kuehn and I used Kindle versions of books heavily in our research for An Enemy We Created. Using those references in footnotes was difficult at the time: the format was so new that established footnoting styles (APA/Chicago etc) hadn’t developed the standards for referencing kindle documents. All this was made harder by the fact that Kindle copies of books added a whole new problem into the mix by abandoning page numbers for ‘Kindle location numbers’. This changed a few years later, and current users probably won’t have this problem, but if you go look at the footnotes for An Enemy We Created, you’ll still find that many, if not most, of the references are to Kindle locations and not page numbers. In fact, I think our book was probably the first serious history work to rely so extensively on digital Kindle references in the footnotes; I remember having discussions with our publisher about it.

 
 

All this isn’t to say paper copies don't have their uses. But some books just aren't available in digital format. I'll get into the workaround for that later. The best way to make this less of a problem is to gently nudge publishers to issue their books on a kindle format.1 But I am already getting off track.

All this seemed to come to a head this past week, where a podcast I hosted together with Matt Trevithick took up the topic of notes and note-taking. Mark Bernstein, our guest on the show, wrote a really excellent book on the topic some years ago entitled The Tinderbox Way. I’d strongly recommend you read if you’re involved in knowledge work in any way. Here’s a short excerpt defining the importance and use patterns for notes:

“Notes play three distinct roles in our daily work:

•Notes are records, reminding us of ideas and observations that we might otherwise forget.

•Shared notes are a medium, an efficient communication channel between colleagues and collaborators.

•Notes are a process for clarifying thinking and for refining inchoate ideas.

Understanding often emerges gradually from the accumulation of factual detail and from our growing comprehension of the relationships among isolated details. Only after examining the data, laying it out and handling it, can we feel comfortable in reaching complex decisions.”2

Later in the week, Maria Popova (of Brainpickings fame) was on Tim Ferriss’ podcast to talk about her website, her reading and her workflow. Both Tim and Maria expressed frustration over the lack of tools for people wanting to download and interact with their Kindle clippings:

“I highlight in the kindle app on the iPad, and then Amazon has this function that you can basically see your kindle notes on the desktop on your computer. I go to those, I copy them from that page, and I paste them into an Evernote file to have all my notes on a specific book in one place. But sometimes I will also take a screengrab of a kindle page with my highlighted passage, and then email that screengrab into my Evernote email, because Evernote has, as you know, Optical Character Recognition, so when I search within it, it’s also going to search the text in that image. I don’t have to wait till I’ve finished the book.

The formatting is kind of shitty in the kindle notes on the desktop(…) if you copy them, they paste into Evernote with this really weird formatting. (…) It’s awful. If you want to fix it you have to do it manually within Evernote. (…) There is no viable solution that I know.”3

She then goes on to some more detailed points of how this doesn’t work, and Tim commiserates, suggesting that maybe they should hire some people to fix this problem. But the good thing is that there are solutions. The problems Maria and Tim bemoan are things that every other Kindle user has had to deal with since day one, so thankfully there are a number of workarounds that simplify the process of reading, annotating and sifting within one’s notes of a book or document.4

So notes are important, we get that. But how do we use them to their utmost? How do we even gather them together and store them? How do we use them for our writing, for our thinking? These are all important questions which I don’t feel have been properly answered, and where those answers have been given, they’re buried or hidden somewhere out on the internet.

I want this post to get into the weeds about how to get your materials off a Kindle device, how to store it usefully on a Mac (my apologies, PC/Linux users), and how to repurpose those notes to be creative, to write, and to think.

This post has three parts:

  1. Storage
  2. Clipping & Splitting
  3. Discovery & Meaning

It will by necessity be an overview of some useful tools and options for researchers, but if you leave comments I can probably expand on individual points/sections in follow-up posts if needed.

1. Storage

This is a problem that wasn’t explicitly raised in the things that motivated this post, but it’s something I get asked frequently. Maria and Tim both seem to be avid Evernote users, and I know many others also use this, but there are other options. It’s worth starting here because the tools will determine what you can do with your notes.

I’ve offered advice to other Mac users on what software to use for research projects that require a certain deftness in handling large quantities of sometimes disparate materials. The same applies to people who are just trying to keep track of the things they read, trying to draw together connections, and to derive meaning from it all. I’ll get into the meaning-creation in the final section, but for the moment, let me briefly describe our four options for file/note storage as I see it.5

  1. Finder/PathFinder. This is the lowest-tech option. Basically, once you split your files up (see section two) you store them in folders and refer to them that way. I don’t find this option very attractive or useful, because it’s like a filing cabinet. Your ability to discover connections and to remember what’s in those folders is pretty limited. I don’t recommend this at all, but from conversations with other researchers and writers, it seems this is the default option.
  2. Evernote. I include this here because it’s part of a workflow that we’ll cover later on. Evernote is great for all the reasons you can read about on their site. It syncs across all your mobile and desktop devices, it OCRs images so you can search for text captured inside photos you upload into your library of notes.
  3. DevonThink. This is my default ‘bucket’ for information, documents and notes. You can read up on the many (MANY) things that DevonThink Pro Office or DTPO (the version you should get, if you’re getting this software) does. Not only does DTPO store your documents, but it allows you to access that information in a number of extremely useful formats. There is a mobile app, too, though it could do with a bit more work. The most interesting feature of DTPO is its search and discovery functionality (using some magic sauce algorithms). They don’t make as much of this on their website as they used to, but I’d strongly recommend you check out these two articles (one, and two) by Steve Berlin Johnson which explain a little of the wonderful things DevonThink can do for your notes. As with the next recommendation, it’s not cheap. But powerful doesn’t always come cheap. It’s a solid investment if you spend the time getting to know this piece of software.
  4. Tinderbox. I discussed this at some length on the Sources & Methods podcast with Mark Bernstein, so I’d recommend you listen to that as your first port of call. Tinderbox is not an everything-bucket in the way that Evernote and DevonThink are, and I use it slightly differently, but it’s a great place to actually do the work of thinking, organising and writing once you have something (i.e. a project of some sort) for which you want to use all your notes. I’ll explain more about this in section three.

I’d recommend getting to know the different bits of software to get a sense of what they can do. DevonThink has a handy section of their website where you can see how people use it in their work lives. Tinderbox has something similar, with some case studies of usage.

For DevonThink, it’s generally good to keep your ‘buckets’/databases of files separated by topic. I have a mix of these kinds of databases (50 in total): some are country-specific, some are project-specific (i.e. to contain the research that goes into a book or a long report), and some are topic-specific (i.e. I have one for clippings and notes relating to Mathematics, one for things relating to Cardiology etc). I’d also recommend you give Steve Berlin Johnson’s book Where Good Ideas Come From a read, particularly chapter 4.

Given the learning curve with some aspects of the workflow that follows, you might want to consider introducing these pieces of software one-by-one, or as needed. That way you’re using only what you understand and can implement things without being too overwhelmed by the novelty of the systems. It took me years (almost a decade) to implement and iterate the systems described below, and I’m still not finished modifying as the tools change.

2. Clipping & Splitting

This section is all about getting materials off mobile devices and onto your computer where you can put them into some sort of overarching database.

Accessing Your Amazon Kindle Clippings

First let’s sort out how best to get notes from a kindle onto your Mac. Don’t use Amazon’s website. It’s going to create all sorts of problems for you in terms of formatting.

First thing’s first: sync your kindle to the cloud. Just turn on the wifi/3G and select the “Sync” option. This will ensure all your highlights are backed up to the cloud.

Then plug your Kindle into your computer via USB. Then go into the “Documents” folder, and search for a file called “My Clippings.txt”. If you’ve been using your kindle for a while, it’s probably going to be quite large. Nevertheless, copy that file to your desktop. Feel free to eject your Kindle from your laptop now. We won’t be needing it any more.

 

An example of what you might see when you open your "My Clippings.txt" file

 

If you open the txt file that is now saved to your desktop, you’ll find all your clippings and annotations preserved in a useful plaintext format. This may solve your problems straightaway, in which case, congratulations: you now have all your annotations in a useful format that you can use however you wish.

If you want to take it to the next level, though, you’ll want to split this file up. At the moment, you have a very large plaintext file which contains all your notes. You’re likely to have notes from a wide variety of topics and books in here, so it doesn’t make sense for you to keep them all in a single location. The ideal solution is for you to have a single file for every clipping, a single file for every annotation.6

This is where Split-ter.scpt comes in. I’m afraid I don’t know who to credit for this wonderful piece of code. I downloaded it somewhere on the internet some years back and can’t seem to find a link to the author either in the code or elsewhere online. (Whoever you are, thank you!)

This script works with another piece of software mentioned above — DevonThink Pro Office. For now, I’ll ask you to ignore that bit, and focus on what’s happening to the file. I use the script to convert our “My Clippings.txt” file into multiple files. It goes in, finds a delimiter (any piece of text or syntax that repeats itself in the original file) and creates a new note/file every time it comes across this delimiter. In this way, you’ll quite quickly from the file shown above to something like this:

Now you have a note for every annotation and/or clipping. This is then something you can dump into Evernote, or keep in DevonThink. Again, more about the difference between these programmes in the next section. (Note, that you can use Tinderbox to split up the “MyClippings.txt” file as well using the “Explode” tool).

UPDATE (a little later on Friday night): Seb Pearce has just let me know that there are other options available for dealing with the 'My Clippings.txt' file. Check them out on his site.

The second problem raised on the Tim Ferriss podcast was Amazon’s limitations for clippings. This differs from publisher to publisher, it seems, so there’s no way of predicting it. An unfortunate rule of thumb: the more useful the book, the more likely the publisher has locked it down. When you’re making clippings inside the book, Amazon gives you no notification that you’ve reached the book’s limitations. But when you go to check your “My Clippings.txt” file to start using your notes, then you may find the note says:

"<You have reached the clipping limit set by the publishers>"

All the work you’ve done selecting pieces of text are for nothing, it would seem. The publisher has prevented you from using your book.

One solution is to remove the DRM from the book before you put it on your kindle. This is legal so long as you’re not sharing the book with other people (as this process would theoretically allow you to do).7 Follow this link to find out how to de-DRM your Kindle and iBooks documents. You can also visit libgen.org to download an already-DRMed copy of the book you’ve purchased. These will often be in .epub format so you’ll have to convert these over to a .mobi format if you want to use them on your kindle device. (To convert from .epub to .mobi, use the free Calibre cross-platform software.)

If you read a de-DRMed copy of a kindle book on your kindle device, there will be no limitations as to how much you can annotate. The publishers limitations will all be gone. So that’s one option.

For those who aren’t comfortable removing the DRM on your books, you can get all your annotations out, but it comes with a little bit of hassle.

Here’s an example of what I mean (screenshot from my DevonThink library). I was reading in Hegghammer’s excellent Jihad in Saudi Arabia and making highlights (at 4:06am, apparently) but at some point I hit the limit imposed by the publisher.

 
 

The workaround to bypass this limit from the publisher is to first export all your notes out of your “MyClippings.txt” file. So all your clippings are saved, even though some of them may not work. Let’s say, for the sake of argument, that the final three notes aren’t working because of the publisher’s limitatations. That’s the case in the screenshot above. What you do is (again, once you’ve backed up the clippings txt file) delete three of the earlier clippings that you already have. Then you sync your Kindle to the server and it will think that you have clipped three less quotes, so these will then become available (both in the myclippings.txt file and on the website. Like I said, it’s a bit fiddly. I would much rather remove the DRM completely and not have this hassle at all, though when you do that Amazon will not sync your clippings to the cloud and to their kindle.amazon.com database. You’ll have to export them using the tools I mentioned above.

Keeping Up With The Joneses, or How to Use Instapaper to Clip Web Articles

This may be something completely idiosyncratic to my own workflow, but I don’t enjoy reading articles in a web browser. I’d also prefer not to be hijacked into reading all these articles. For instance, when I’m in Tweetbot/Twitter or Facebook and I see a link that I like, I will almost never read that article then and there. Rather, I’ll send it to my Instapaper queue.

First, a quick word about Instapaper vs Pocket. I use Instapaper. I started off with them, switched over to Pocket for about two years, and now I’m back with Instapaper. They’re both more or less the same. Instapaper happens to be what I’ve chosen for myself because of their handy Kindle service. (If you have articles in your queue, you can have Instapaper send the most recent articles to your Kindle at a particular time (i.e. first thing in the morning) which you can then clip and archive to your heart’s content.) Both Pocket and Instapaper work with what follows, so just pick one and stick to it. I’d recommend Instapaper because they allow for the sharing of the full texts of articles and because of the Kindle digest feature.

I find I have so much to stay on top of and keep tracking online, I can’t just click around and read things as and when I see them online. I schedule time apart for reading of my Instapaper queue (and for reading books on my Kindle) and only read during those times. (I do the same with email, only checking and responding to email between the hours of 12-1pm and 5-6pm each day. The rest of the day email is off and disabled. I even deleted my email account on my iPhone as inspired by this medium.com post.)

My workflow with web articles is to follow as much as possible via RSS. I prune the sites I’m following every three months, but in general the number is stable around 650. I use Newsblur as my RSS reader, and every time I find an article I’d like to read (later), I use the handy ‘send to instapaper’ bookmarklet. This sends the article to my Instapaper queue.

The same goes for twitter. I follow enough people on Twitter for it to be impossible for me to read every post that passes through my stream. I will dip once or twice a day, however, to see what people are saying. I use two services to monitor my Twitter and Facebook streams to pick out the most-shared articles to ensure that I don’t miss the big sharks of the day. They’re both free, and I’d strongly recommend you signing up and getting their daily summaries of what people were talking about on Twitter that day. News.me has been around for a while and I trust their article selection. Nuzzel is newer, but it seems to have a few more options. I guess you could probably do with picking only one of the two.

After reading articles on my Kindle (or sometimes on a mobile device like my iPad or iPhone), you can clip the article if you want to save it (just like making a clipping inside a book, only the entire article is saved).

 

This is what you see in an article when you click to "Clip This Article" on a kindle...

 

Then your clippings will be captured in the ‘MyClippings.txt’ file as explained above and you can export them directly to DevonThink or Evernote or Tinderbox. (The main downside to doing things this way is that when the kindle clips it, all formatting is lost (including paragraph breaks)).

Alternatively, you can ‘Favourite’ the article. I use this setting because it then sends the article and URL to my @stricklinks twitter account, something I created to share the best things I was reading. It also saves the full text of the article to Pinboard (a service I’ve already written about on my blog here) and to Evernote. (I use If This Then That to facilitate this.)

Once I’m done reading, I can go into Evernote and all my articles are waiting for me to be sorted through. Because I use DevonThink as my everything-bucket, and because all the sorting and discoverability features are there, I have a separate stage of exporting my notes out of Evernote into DevonThink. I’ve already probably taken you a little too far down the rabbit-hole of my workflow, but this is an important stage because otherwise you can’t do anything with your notes.

Luckily, someone has written a script which makes this possible. Many many thanks to the good people at Veritrope for updating the script every time updates to the software get released. It’s fairly self-explanatory. You select the notes that you want to export, choose which DevonThink folder you want to export to and then it goes to work. It can occasionally be buggy and stop half-way through, but usually a little trial-and-error will let you pinpoint which Evernote note is causing the problem and you can transport that one over manually.

I usually do an export session to bring everything from Evernote into my DevonThink inbox once a week. This way the number of clippings doesn’t get too out of control, and I’m not constantly playing around with this during the week. You might find this all is overkill, but it has become an essential part of my workflow to store the various things I’m reading on a daily basis.

Pillaging Your Hard Copies, AKA Living the Paperless Dream

You may have hardcover copies of books that you want to use as part of this system. One way to use them is to scan the books into your DevonThink library. DevonThink Pro Office comes with an OCR package (via ABBYY FineReader) so whatever you scan can then become searchable and useful.

In the past, particularly with books I’ve purchased in Afghanistan and Pakistan that are unlikely (read: never) to be made available as electronic versions, I take a Stanley knife to the bindings, feed the pages into my ScanSnap scanner which scans both sides and compiles all the scans into a single PDF document that is searchable on my laptop. The whole process is destructive of the book, but it gives the text inside a new life. Given how fast the new ScanSnap models work (around 25 pages per minute, both sides), this is an attractive way to get digital access to materials that are only available in paper form.

You can highlight text within the resulting PDFs and then later export your clippings from those PDFs as notes into DevonThink. There’s another useful script to help with that. It only works with the free Skim PDF reader, but that’s my default PDF reader so it works out well.

For more on paperless workflows, check out David Sparks’ Field Guide on the topic.

3. Discovery & Meaning

If you made it this far, congratulations. This is the section where all the fiddling with export starts to take on some meaning. After all, we’re not reading and exporting these notes purely because we are hoarders or to fetishise the art of collection (though in some cases, that may be what’s going on). No, we are taking notes because we are trying to understand difficult topics, because we are trying to solve important problems.

Discovering Links and Connections

The Steve Berlin Johnson articles referenced earlier are an essential first stop, particularly in demonstrating how DevonThink can add some serendipity into how you use your individual notes. To give you an example of how this works, here’s a screenshot from my ‘TalQaeda’ database that I put together while working on An Enemy We Created:

 
 

In the upper part you can see a bunch of notes relating to the Haqqani family. The lower left part is the contents of a note (Note: exported from Instapaper). The bottom right list of documents (under “See Also”) is a list of notes that may be related to this particular quote. This is the magic algorithmic sauce I mentioned earlier that makes DevonThink so powerful.

If I click through to some of those suggested notes, I’m taken to similar quotes on the same topics, two PDFs of reports (of dubious analytic value, but that’s a separate issue), three clippings from Kindle books where people are making reference to the relationship between the Haqqanis and al-Qaeda (the subject of the original note). Note that I didn’t have to pre-tag documents for this ‘see also’ functionality to work its magic. It analyses inside the text and makes its suggestions based on the similarities it identifies. (Needless to say, it’s not simply a matter of matching individual words. Some of the suggested notes don’t mention al-Qaeda or the Haqqanis by name, but they are implied; DevonThink catches this all).

Once you start to build up a decent database of notes (my Afghanistan database has just under 65 million words of notes, including 12,800+ PDFs) this ‘See Also’ functionality really allows for some unexpected links to be made, especially when you’re at the stage of writing up a project/book. One note will lead to another note, which will lead to another note. If you follow these trails of notes (like breadcrumbs) you can develop a pretty idiosyncratic picture.

I do not know of a manual method which allows for this kind of process.

DevonThink has an extremely robust search function which allows you to find things along similar principles (including a very useful ‘fuzzy spelling’ option, perfect when checking my database for notes on someone whose first name could be spelt Mohammad, Muhammad, Mohammed or any of the other variations).

Figuring Out What It All Means

Once you have an idea of the outlines of the topic, once you’ve been taking notes for a while, your database in DevonThink is probably starting to fill with useful information.

If you’re writing a book, though, you’ll want to start writing alongside this gathering process. (Check out Michael Lopp’s overview of the process of writing a large research book, which, to my mind, is fairly accurate.)

I don’t find DevonThink a particularly pleasant place to write, so I do that elsewhere. Before I write things out in long form, I usually do some outlining, particularly if it’s something where the dense collection of factual detail is important to the development of the argument (as was the case with An Enemy We Created). For this, I find Tinderbox indispensable for working up an overview of what I know, for figuring out how I’m going to structure it, and for helping me put together my first draft.

Tinderbox can display notes in a number of different ways. You can view your documents as outlines, as maps, or even as timelines:

 
 

In this image you can see the information arranged as an outline, but here (below) you see the same information organised as a map (mirroring the actual layout of the map of those districts in a particular part of Kandahar):

 
 

Just to show you that it can handle complexity, here’s a map created by Felix to help him figure out how people involved in militant Islamism were/are connected across different geographical sectors:

It's complicated...

I’ll often use Tinderbox maps to store outlines for how I’ll write a particular section or chapter, making notes inside the document, dragging quotes in from DevonThink to supplement the argument that’s being constructed.

Getting to the point where you can actually start writing on the basis of your notes is the whole point of all of this. Technology is useful, but mainly when directed at a specific problem or goal. All the tips, tricks and software described in this post has helped me write books, reports and (coming soon!) even my doctoral thesis/PhD. I have encountered only a few (barely a handful) researchers who use their computers for this collation, sifting and discovery process. There’s no way to keep it all in your head. Here’s hoping more people start adopting these tools…

Footnotes:

  1. For many years, Amazon offered users the ability to let publishers know that you wanted to see title X or Y on a Kindle format, but they failed to make this piece of interaction useful by keeping track of what you'd requested of publishers (so as then to be able to let you know when it was finally released in Kindle format).
  2. Excerpt From: Mark Bernstein. “The Tinderbox Way.” iBooks.
  3. Selective transcript from around the 50-minute mark in the podcast audio. Needless to say, the rest of this blogpost constitutions a ‘viable solution’.
  4. Most of these are derived from other people, I should say. I try to give credit where I can, but sometimes I can’t remember where I first read something or who first recommended a particular tool or trick.
  5. Yes yes, I know, I’m going to leave out some mentions for useful software here. This is an overview, and I’m just trying to describe some options for what might work in certain situations.
  6. A clipping is when you have selected and copied a passage from the book for safe-keeping, and an annotation is when you yourself write a note connected to a particular passage.
  7. Needless to say, don’t take legal advice from me.

Learning to Code

Previous posts have been about languages and how to learn them. Not all languages are for communication with other people, though. It is a truism that more and more of our lives are lived through various technologies -- be it computers, 'smart' phones or other appliances -- but we often aren't too good at understanding how those things work. I've been trying to remedy this by getting a better understanding of the back end through programming languages. Not only has it been an interesting intellectual exercise, but I have found practical applications for the skills I have learnt. Recently, for example, I wrote a piece of code that crawled through webpages, saving only the parts of text that I needed to a separate database.

There is a huge variety of things that you can try out here, so I'll just offer some suggestions for things that I've found useful along the way. Most of this is aimed at complete beginners. I'll assume that's where you are as well.

Python is considered by many (if not most) to be the best place to start as a novice programmer. It teaches lots of transferrable skills that can be applied to other languages that you might want to pick up.

My top recommendation would be to enrol in Udacity's CS101 course. Udacity is a relative newcomer to the scene, but I've found the parts of this course that I've done (I'm still working my way through) to be excellent. It has LOTS of practice, frequent testing of your ability to solve problems along the way, and is not dull to watch at all (as some of these courses are). What's more, by the end of the course you will have built your own search engine using the skills you've learnt. It's free, so you have no excuse. Go sign up.

(If that's not for you, although I'm not sure why that would be the case, then visit Codeacademy or Learn Python The Hard Way for alternatives.)

In case you're interested in learning Ruby, you can try the following, many of which are designed for young children to be able to use and follow along, so, again, no excuses...

RubyMonk

TryRuby

Hackety Hack

Rails for Zombies

CodeAcademy Ruby

LearnStreet.com Ruby

Now go try some of those out...

Useful Tools: Pinboard

This is the first in a series of posts I'll be doing on this blog detailing some software or web services that I use. I'll try to end each post with two examples of things I've used the software for recently. Pinboard is an online bookmarking service. I save all the articles I read online there with a handy bookmarklet, and everything I read in Instapaper and via twitter also gets saved there. Even better, if you upgrade to a premium subscription, Pinboard's servers will make an archive copy of the site so even if it is taken offline, you'll still have a copy of the site. And before you say that other services do it better, read this.

It's handy for sharing collated link collections with people and it's useful just as an archive of everything you've been reading.

Two recent uses of Pinboard:

  1. I keep a rolling list of all the reviews and comments on my recent edited book, Poetry of the Taliban. You can see this here. These kinds of lists are great for sharing with other people.
  2. The other day I remembered I had read something online, but couldn't quite remember where, so I searched within my Pinboard archive (including the text of all the websites I'd visited and read articles from in the past two weeks), finding the article within seconds.

#talibantwitterfight: The News Story That Wasn't

The Taliban twitter account (sic) is back in the news again, this time courtesy of the US Senate:

"Senators want to stop feeds which boast of insurgent attacks on Nato forces in Afghanistan and the casualties they inflict. Aides for Joe Lieberman, chair of the Senate Homeland Security Committee, said the move was part of a wider attempt to eliminate violent Islamist extremist propaganda from the internet and social media." (link)

 

The article then goes on to restate some of the usual assumptions and apparently unchecked facts of the story that have been mentioned in the more recent slew of press. I've rounded up links to most of these articles here for you; but seriously, don't waste your time.

I'll leave it to others to explain why the Senate getting excited about 'the Taliban twitter account' doesn't seem to make a lot of sense -- I don't claim any particular understanding of that world -- but I really hope reporting on the matter starts to improve. By way of example, more from that article by Ben Farmer:

"The Taliban movement has embraced the social network as part of its propaganda effort and regularly tweets about attacks or posts links to its statements. The information has ranged from highly accurate, up-to-the-minute accounts of unfolding spectacular attacks, to often completely fabricated or wildly exaggerated reports of American and British casualties."

 

I'm not sure what it takes for the Taliban to 'embrace' social media, but apparently not much. The Taliban set up some official twitter accounts back as far as 2009 and these accounts have been autoposting since then (more below). That's it. It would be more accurate to say that media reports have enthusiastically embraced reporting on the Taliban's activities on Twitter.

"Twitter feeds including @ABalkhi, which has more than 4,100 followers, and @alemarahweb, which has more than 6,200 followers, regularly feature tweeted boasts about the deaths of "cowardly invaders" and "puppet" Afghan government forces. Taliban spokesmen also frequently spar with Nato press officers on Twitter, as they challenge and rebut each other's statements."

 

No. Just no. The account @abalkhi appears to have nothing to do with the Taliban (see below). I'd also be interested to see the evidence for the statement that 'Taliban spokesmen also frequently spar with Nato press officers'. I have not seen a single instance of this. Every other story on these accounts repeats this claim. And it's presumably quite an important distinction: an official spokesman (we might assume it is a man) engaged in verbal attacks on the official ISAF account is a different thing from some fanboy in his bedroom doing the same thing.

So, in the hope that this story can die the death it should have MONTHS ago, here are some facts.

The following is a list of the Twitter accounts most frequently associated with the Taliban, presented in the order they were first created:

@alsomood

started: June 3, 2009 // regularity of tweets: 2 or times per week // language: Arabic // name: Majallat al-Somood

following: 10 // followers: 574 // number of tweets: 379

This was the very first twitter account that the Taliban seem to have set up. (Or, if they set accounts up earlier, they have not been used). @alsomood is the official account for one of the Taliban's magazines, al-Somood. This is an Arabic-language magazine that caters to audiences in the Gulf, for the most part. Printed copies of the magazine have even shown up from time to time. For the most part, however, it's just a PDF edition, released once every month. It mostly includes longer articles and commentary not found elsewhere on the Taliban's main site, although one or two articles are usually translated from al-Somood and shared via the main web outlet. The @alsomood account tweets once or twice a week in Arabic, and every single time these tweets are automated by twitterfeed. Twitterfeed is a site that allows you to automatically post a tweet every time something changes on your website, for example. You give it an RSS feed to follow, and every time there's a new link it autoposts. Which is to say, there does not ever have to be anyone operating this account. It is fully automated.

@alemarah3

started: October 22, 2010 // regularity of tweets: stopped // language: English // name: Islamic Emirate of Afghanistan

following: 3 // followers: 50 // number of tweets: 6

This appears to have been an experimental account. It was only used for 6 tweets, and stopped on October 27, 2010.

@alemarahweb

started: December 19, 2010 // regularity of tweets: daily // language: English // name: Mostafa Ahmedi

following: 4 // followers: 6420 // number of tweets: 3014

This is one of the accounts that is followed by journalists. It is exclusively posted to by twitterfeed. There appears to be no direct manual tweeting on this account. This is an official account. It is also one of the two accounts that @ISAFmedia believe to "have some tie to the Taliban."

@alemarah222

started: February 18, 2011 // regularity of tweets: stopped // language: English // name: Ahmad

following: 8 // followers: 14 // number of tweets: 4

This sees to have been another experimental account. It only tweeted 4 times, each of which were of a Taliban video.

@alemarahmedia

started: February 21, 2011 // regularity of tweets: Irregular // language: English // name: Alemarah Media

following: 2 // followers: 30 // number of tweets: 36

This account was abandoned on March 11, 2011. There was some manual tweeting, including a mix of videos from the Pakistani Taliban. It appears to be unofficial.

@hanif_hamad

started: May 3, 2011 // regularity of tweets: Daily // language: English // name: Afghanistan news

following: 296 // followers: 78 // number of tweets: 1090

This is another official account that runs off twitterfeed. There is no manual tweeting from this account. Moreover, when @alemarahweb updates, @hanif_hamad updates simultaneously with the same message. This means that they are running off the same RSS feed (and probably the same twitterfeed account). It was started the day after bin Laden was killed.

@ABalkhi

started: May 12, 2011 // regularity of tweets: Daily // language: English // name: Abdulqahar Balkhi

following: 24 // followers: 4293 // number of tweets: 865

This is the most well-known of the alleged 'Taliban' accounts, yet everything seems to suggest that @Abalkhi (and the account later created, @Abalkhii with two 'i's) is unofficial. He never tweets any material which isn't already up on the Taliban's website. He seems to speak Pashtu and/or Dari (translating material from the news section of the Pashtu site before it has been translated and uploaded on the English site). This might be (at a stretch) one reason why journalists continue to refer to his account as being 'official'. He also tweets completely manually -- presumably because he has no access to the official site's RSS stream (which is not provided to normal users of the website). He set up his account a week after the death of bin Laden, and my hunch is that the operator of this account probably doesn't even live in Afghanistan (or Pakistan).

@MuhammadZabiull

started: September 20, 2011 // regularity of tweets: Irregular // language: English // name: Muhammad Zabiullah

following: 137 // followers: 46 // number of tweets: 27

This also seems to be an unofficial account. The user states their location as being in 'Paktia Afghanistan', but some of his tweets imply that he is outside the country (although seem to suggest that he is Afghan by nationality). He tweets a fair amount manually, sometimes only providing links, and other times corresponding with @Abalkhi. This account was set up relatively recently.

@ABalkhii

started: September 20, 2011 // regularity of tweets: Weekly // language: English // name: Abdulqahar Balkhi

following: 54 // followers: 54 // number of tweets: 80

This user has a very similar name to @Abalkhi, and it is possible both accounts are operated by the same person. The only difference is that @Abalkhi seems to update almost exclusively through the web app, and @Abalkhii seems to update mostly on his/(her?) iPhone. While @Abalkhi almost exclusively tweets stories from the Taliban website, @Abalkhii (created in September 2011) tweets stories from the international news media and engages in a fair amount of discussion with other twitter users. Moreover, the language style @Abalkhii uses is quite different from that of @Abalkhi.

 

You can view a timeline of when these accounts were created here.

At any rate, I hope this puts to rest the whole 'Taliban spokesmen are on the internet engaged in big twitter discussions with ISAF'. The truth is that they are not. There is one account which occasionally responds to @ISAFmedia, but (for reasons outlined above) it does not seem to be official.

In fact, the only people who seem to really be enjoying this all are @ISAFmedia themselves and the media outlets covering the story. Almost every day now, @ISAFmedia puts out a tweet to @Alemarahweb saying that something that was posted was wrong. This is one example:

And, in a way, it sort of represents the futility of a lot of what goes on in Afghanistan these days: someone sitting behind a desk in ISAF headquarters, tweeting away at a Taliban twitter account, hoping to goad someone in response, but there is nobody to respond to since @Alemarahweb is tweeting automatically without anyone needed to run their account.

More data on 'Kill-Capture' Raids

Last week I charted the numbers of press releases that ISAF have put out relating to "security operations" of one kind or another. I promised to split this data up into numbers of incidents mentioned in those press releases (since some press releases contain multiple incidents). The data from last week showed a decrease in the numbers of press releases (particularly in 2011). When these numbers are split up into individual incidents, however, you can see that there hasn't been a decrease in incident numbers. Actually, April 2011 had almost as many operations as September 2010 (the highest for the data set). Of course, all of this is just a picture as presented by ISAF, but since they don't release these figures in aggregate form to the public/media it's all we have to go on. Here's the new chart:

 

 

 

 

 

 

 

 

A word on the title: the incidents here are collected from ISAF press releases that refer to an event where an Afghan (or a 'foreign fighter') was killed or captured. Sometimes this happened while troops were on patrol, but more often was the result of a targeted raid/operation.

I'm working on something longer that will go into the specifics of this data set in more detail (incident data by type/province/month etc) so this will probably be my last post on the subject until the report is released.

Kandahar Timeline 1979-2010

Many of you have already downloaded and visited my previous post which contained a PDF version of a chronology of events in Kandahar from September 2001 up to the present day. For various other projects in the past (most of all, for work in connection with Mullah Zaeef's My Life With the Taliban) I have found it useful to put together event data of varying levels of granularity.

Various projects made it difficult for me to work on compiling these various chronologies and event lists, but I finally found time to finish it off this week. Accordingly, please visit http://www.alexstrick.com/timeline/ for a more or less complete listing of events that took place in or relating to Kandahar from 1979-2010. Some years are less thoroughly presented than others, but this will change as I incrementally update the timeline over the next few months as I simultaneously go through the final stages of editing (together with Felix Kuehn) Mullah Zaeef's second and forthcoming book.

I hope, also, to be able to find time to explain how I put the raw data together and was able to present it in this format. In short, I used an extremely nifty piece of software called Tinderbox (Mac only, apologies...) and was given a lot of help by some people who understand its ins and outs far better than I currently do. So special thanks to Mark Anderson for that, and to Mark Bernstein for writing the software in the first place. I use Tinderbox for almost all of my work these days (data gathering, data sorting, data organisation... the list goes on) and strongly recommend others with high-volume complex data projects to give it a try.

Anyway, find the timeline here and please don't hesitate to get in touch with comments/corrections.