Telling Cats from Dogs

May 26, 2021 in Deep Learning

One of the main ways that using neural networks to train models is different from traditional (imperative) programming can be illustrated with a specific task: let's say you want to use a computer to tell you whether any particular photo you give it is a cat or a dog.

An imperative approach might be to make a mega list of certain kinds of features that cats and dogs have, and try to encode the differences into some kind of list of logical features. But even knowing how to tell the computer how it should recognise those features is a potentially massive project. How would you even go about that?

Instead, with neural network layers, we turn that pattern on its side. Instead of this:

We have something closer to this:

So the neural networks are learning on the basis of data provided to it — you give it a bunch of images which you've pre-labelled to say 'this one is a cat and that one is a dog' and so on.

If you use transfer learning, too, you even can use a pretrained model (which is already pretty good at recognising features from images). You can then fine-tune that model to get really good at the specific task you need it to do. (Note, that's exactly what you do in the first chapter of Howard/Gugger's Deep Learning for Coders).

Removing Barriers: Deep Learning Edition

May 23, 2021 in Tech, Deep Learning, Useful Tools

I've been re-reading Jeremy Howard & Sylvain Gugger's Deep Learning for Coders with Fastai and PyTorch and I really appreciate the reminder that a lot of barriers to entry into the Deep Learning space can be productively put to one side.

Gatekeepers make four big claims:

You need lots of maths to use Deep Learning to solve problems
You need lots of data (think prodigious, Google-sized quantities) to use Deep Learning
You need lots of expensive computers and custom hardware to use Deep Learning
You need a PhD, preferably in Maths or Physics or some computation-heavy science

Needless to say, it's not that maths or more data or better hardware isn't maybe going to help or improve your experience. But to say that if you don't have those things then you shouldn't start is also (seemingly) inaccurate or not helpful.

If you are a domain expert in something that has nothing to do with Deep Learning or data science, you probably have a lot of problems that are like low-hanging fruit in terms of your ability to use powerful techniques like Deep Learning to solve them.

Tabula for extracting table data from PDFs

January 17, 2018 in Afghanistan, Coding, Productivity, Tech, Useful Tools

Have you ever come across a PDF filled with useful data, but wanted to play around with that data yourself? In the past if I had that problem, I'd type the table out manually. This has some disadvantages:

it is extremely boring
it's likely that mistakes will get made, especially if the table is long and extends over several pages
it takes a long time

I recently discovered a tool that solves this problem: Tabula. It works on Windows and Mac and is very easy and intuitive to use. Simply take your page of data:

A page listing Kandahar's provincial council election polling stations from a few years back. Note the use of English and Dari scripts. Tabula handles all this without problems.

Then import the file into Tabula's web interface. It's surprisingly good at autodetecting where tables and table borders are, but you can do it manually if need be:

Then check that the data has been correctly scraped, select formats for export (from CSV to JSON etc):

And there you have it, all your data in a CSV file ready for use in R or Python or just a simple Excel spreadsheet:

Note that even though the interface runs through a browser, none of your data touches external servers. All the processing and stripping of data from PDFs is done on your computer, and isn't sent for processing to cloud servers. This is a really nice feature and I'm glad they wrote the software this way.

I haven't had any problems using Tabula so far. It's a great time saver. Highly recommended.

Pet Peeve: Tech Switching

January 02, 2017 in Tech, Useful Tools

I read a decent amount of tech media/press. Barely a day goes by when there isn't someone in my RSS feed explaining how they dropped application X for application Y. This seems to happen most often for frequently-used applications or workflows like scheduling/calendars or email.

I won't call out the specific blog post that set me writing this post, but suffice it to say that I wish there was a clause (in the contract of life) forcing tech writers or bloggers to state why the application they're singing the praises of is better than the one they were using up to now. Specifically, are there any new features, or does it just look shinier? Also, have you been using it for longer than a day or two?

I'm pretty solid and stable in the applications I use. It'll take something pretty seismic to rid me of DevonThink or Tinderbox or Mailmate. But if you catch me flip-flopping in my tech-related writing, please call me out on it.

Talking DevonThink with Gabe Weatherhead

December 26, 2016 in Podcast, Tech, Useful Tools

I’ve been on a bit of a DevonThink kick these past weeks, and the catalyst for all of this was a conversation I had with Gabe Weatherhead (@macdrifter over on Twitter, though that account is no longer active).

You can listen to the full episode on your podcast player of choice or over on the Sources and Methods site. Towards the end of the episode we get into the weeds on how he uses DevonThink Pro Office and several other pieces of software. I’m looking forward to hearing Gabe’s much-anticipated appearance on MacPowerUsers in January, since I imagine he’ll go into even more detail there.

We also discussed social media and some of the ways he found himself drifting away from commonly-used sites like Facebook and Twitter. For me, this was the most interesting part of the podcast.

DevonThink Resurgent

December 19, 2016 in Useful Tools, Tech

There has never been a better time to get into DevonThink and Tinderbox. Winterfest 2016 is on, and you can get 25% reductions on both those apps, as well as a number of other really useful pieces of software like Scrivener, TaskPaper, Bookends, Scapple and PDFPen.

If you’re unsure if DevonThink is something you’d be interested in, they have a 150-hours-of-use free trial for all their different apps. MacPowerUsers podcast just released a useful overview of the current state of the app — an interview with Stuart Ingram. ScreenCastsOnline also published the first part of a trilogy of video learning materials on DevonThink.

If you’re a Mac user who is perhaps uncomfortable with Evernote’s privacy policies or just seeking to get more out of the data you’ve stored on your hard drive, give DevonThink a try.

Highlights + DevonThink = Pretty Great

December 09, 2016 in Books, Productivity, Tech, Useful Tools

I’m late to the Highlights party, but I’m glad I got here.

Like many readers of this blog, I get sent (and occasionally read) a lot of PDFs. In fact, I did a quick search in DevonThink, and I am informed that I have 52,244 PDFs in my library. These are a mix of reports, archived copies of websites, scanned-and-OCRed photos and a thousand-and-one things in between.

Thus far, my workflow has been to read PDFs on my Mac. Any notes I took while reading the file were written up manually in separate files. I would laboriously copy and paste whatever text snippet or quotation I wanted to preserve along with its page reference. These would be fed into DevonThink’s AI engine and magic would happen.

Now, post-Highlights-installation, my workflow is much less laborious. I can take highlights in-app, export all the quotations as separate text or HTML files and have have DevonThink go do its thing without all the intermediary hassle. If you’re a professional researcher or writer using DevonThink as your notes database — and quite frankly, if not, why not? — the Highlights app will probably please you.

Encrypt Your Dropbox Files

November 17, 2016 in Useful Tools, Tech

Much has been written about securing your digital life since the conclusion of the US election. (This was one of the better posts, I thought.) It's worth doing this regardless of whatever is going on in the politics of country x or y. If the country is big enough, it's going to want to suck your digital life up into its archives one way or another. So why make it easy for them?

Encrypting your Dropbox is one of the low-hanging fruit when it comes to securing your digital life. Most of us store various files in the Dropbox cloud, and they're just sitting there unencrypted unless you take specific action. Moreover, Dropbox has been hacked before (more than once).

Fortunately, Boxcryptor exists to package everything up in an encrypted form. It's a paid service, and worth every cent. Sign up by clicking here.