storage

PhD Tools: Backup Systems for Staving off Sadness

[This is part of a series on the tools I used to write my PhD. Check out the other parts here.]

Having some kind of backup system is essential for all PhD students (and probably anyone else using a computer for writing of one kind or another). The less friction to your backup system, the better. If you have to plug in a USB or Firewire external hard drive in order to start your backup process, you're probably not going to be doing it enough and you're probably going to lose files and data.

I've learnt the hard way how hard drives can fail. A few years ago, I lost roughly a decade's worth of digital photos when my backup system failed. My work files were ok -- because I'd taken steps to check that this was working - but for whatever reason I hadn't taken the same care for my non-work files. Cue sadness.

I use multiple types of backup. Ideally, you'll also use at least two. One should be a regular backup to a hard drive -- something like Apple's Time Machine in conjunction with an external disk -- and the other should be a cloud backup.

I use Backblaze and Spideroak for my cloud backups. You may find it overkill to have two separate systems for storing my backups in the cloud, but space and the services are cheap enough that it's possible. In fact, if I was living somewhere with faster internet I'd probably add in AWS Glacier as an additional backup service.

I also use SuperDuper to make a clone copy of my hard drive. I've been burnt by Apple's Time Machine backup in the past (see above) so I don't use it any more because I lost my trust. But I heard it's better now. Caveat emptor.

Programmes like Scrivener (see earlier blogpost) have built-in auto-backups. Use them, and test them to make sure it's doing what it says it's doing. You don't want to have to find this out after something's gone wrong.

In fact, I encourage you to make a recurring calendar appointment with yourself to stress-test your backup systems once every two or three months. Different scenarios to try out: your hard drive fails; try to get hold of your main PhD working draft from your backup system. Or, another good one, your laptop gets stolen; are you able to access all your files regardless, and eventually (once you replace your computer) restore your system as it was before the theft? Actually do these tests! I've often found that a system that I thought was working properly turns out to be failing in some small but essential way.

Towards the end of the writeup, your paranoia around file failure is likely to be sufficiently intense as to inspire all sorts of manual backup routines. Earlier this year while I was nearing that point myself, I would email myself zipped copies of the scrivener file as well as store copies on Evernote and Google Drive and Dropbox. This, note, in addition to the other backups I had going.

A lot of this is common sense. Backups are important. We all know it. But it's good to have a system that you know and can be confident works. Don't tarry! Take steps to set something up today, even if it's just a background cloud backup service like Backblaze.

PhD Tools: DevonThink for File Storage and Discovery

[This is part of a series on the tools I used to write my PhD. Check out the other parts here.]

Discovering similar notes in one of my DevonThink databases

Discovering similar notes in one of my DevonThink databases

I first heard about DevonThink in the same breath as Tinderbox. They go together, though they serve different purposes. Some people want to make an either/or decision about which to use. I see them as sufficiently different to assess them on their own merits and as per your usage scenario.

As with all tools, you should come to the decision table with a set of features that you're looking for. Don't just shop around for new things for the sake of newness or for the sake of having a really great set of tools. These programmes are not cheap. Luckily almost all of them come with generous trial versions or periods, but I don't recommend 'newness' as a feature of any particular merit.

Devonthink (I use the Pro Office version) is a place to store your files and notes. It can, I think, take any file you can throw at it. It comes with software for processing PDFs into fully-searchable documents (OCR software, in other words) which is part of the reason why the license for the Pro Office version of the programme is so expensive.

If you're anything like me, you're drowning in PDF documents. They all come with helpful names like "afghanistan_final_report_02_16.pdf" and unless you have a rigorous file hierarchy and sorting system, you'll probably be unable to find one file or the other. And using the basic file hierarchy system for storage doesn't help you with situations like when you want to store the same file in multiple folders (i.e. what if a report is about Afghanistan and Tunisia). (DevonThink has a feature which allows you to store the files in multiple locations, but without saving two copies of the file. Any changes or annotations you make in one file will automatically be transferred to the other).

You might ask yourself why you would need DevonThink and Tinderbox (see this post for more). The short answer is that they store different kinds of files/data, and that DevonThink is less about thinking than about storage (to a certain extent) and discovery.

One of the key features of DevonThink Pro Office is its smart searching algorithms, its ability to suggest similar texts based on the contents of what you are looking at, etc. It does this by means of a proprietary algorithm, so I can't really tell you how it works, but just know that it does. It works best on smaller chunks of text. In this way, I was reading through a particular source from the 3 million-word-strong Taliban Sources Project database and then I clicked the "See also" button and it had found a source I would never otherwise have read on the same topic, even though it didn't even use one of the keywords I would have used to search for it. It uses semantic webs of words to figure this stuff out. Anyway, beyond a certain database size, this power becomes really useful. It can also archive websites, store anything including text, do in-text searches on e-books etc etc. (Read more on how I use DevonThink for research in general here.)

I also used it a little as an archive for substantive drafts / iterations of the writeup process. That's another important part of the process: making backups of many different kinds. I never found any use for them, but at least they were there (just in case).

If you're a data and document hoarder at heart, like me, you'll soon have a Devonthink database (or several databases, split up by topic) that is bigger than you can fully comprehend it, or remember what was inside the files. At that point, search becomes really important. Not just a straightforward search, but the ability to input 'fuzzy' terms (i.e. if you search for "Afghanistan" it'll also find instances where it's incorrectly spelt "Afgahistan"), and boolean language, into your query is really powerful/useful. DevonThink is an amazing search tool. The company that developed the database software also make something called DevonAgent, which is basically a power-user search tool for the internet. Google on steroids, if you will. Fully customisable, scriptable... you can really go crazy with this stuff. I use it, but my PhD wasn't really about searching things on the internet, so I didn't use it much for my research or writeup. But it's a great tool, too.

In short, DevonThink is a research database tool that will help you store and find the documents that relate to your research, and do smart things to help you find sources and texts that maybe you'd forgotten you'd saved. Highly recommended for anyone working with large numbers of documents.