• Home
  • About
  • Ekko
  • C.V.
  • Blog
  • Archive
  • Contact
  • RSS
  • Menu

Alex Strick van Linschoten

  • Home
  • About
  • Ekko
  • C.V.
  • Blog
  • Archive
  • Contact
  • RSS

Using Regex with Python to Find Strings

July 08, 2018 in Coding, Useful Tools, Productivity

The next step in my data processing project is to find strings matching certain patterns in the PDF data. Today I worked my way through the relevant chapter (#7) of Al Sweigart's excellent / useful Automate the Boring Stuff with Python.

I've left some sample code above as a reminder (mainly for myself) of the basic pattern / syntax that you can use. I saw a slightly more concise pattern for running the search in Data Wrangling with Python; I may experiment with that in the future. That has you running something like:

search_result = re.search(word, fulltext)

I guess one of them will have a speed advantage, especially when multiplied over hundreds of thousands of pieces of text.

The next step with this project will be to connect this regex function with the splitting file. That way when I split the file, I can rename the file at the same time with a string that I've extracted using a regex search.

If you've reached this far and you don't know what I'm talking about, there's an interesting article by Cory Doctorow where he argues that regular expressions should probably be taught as a foundational skill to children:

Knowing regexp can mean the difference between solving a problem in three steps and solving it in 3,000 steps. When you're a nerd, you forget that the problems you solve with a couple keystrokes can take other people days of tedious, error-prone work to slog through.
Tags: python, coding, regex
Prev / Next

Mailing List

Popular Posts

Featured
Coding, Productivity
Solid Study Habits for Coders
Coding, Productivity
Coding, Productivity
General, Movement
Pain: A Love Story
General, Movement
General, Movement
Useful Tools, Productivity, Tech, Language, Coding
Introducing CoachBot: Your Personal Language Taskmaster
Useful Tools, Productivity, Tech, Language, Coding
Useful Tools, Productivity, Tech, Language, Coding
Books, Jordan, Language
Everything You Need to Study Jordanian Arabic
Books, Jordan, Language
Books, Jordan, Language
Incremental Elephant, Language, Books
The Two Books Every Intermediate Arabic Student Needs to Read
Incremental Elephant, Language, Books
Incremental Elephant, Language, Books
Books, Productivity
Fundamentals Versus Hacks
Books, Productivity
Books, Productivity
Productivity, PhD
PhD Tools: The Secret to Finishing Your PhD
Productivity, PhD
Productivity, PhD
Jordan, Climbing
Existential Battles: Climbing in Amman
Jordan, Climbing
Jordan, Climbing
Afghanistan, Books, First Draft Publishing
Reading the Afghan Taliban: 67 Sources You Should Be Studying
Afghanistan, Books, First Draft Publishing
Afghanistan, Books, First Draft Publishing
Books, Journalism, Pakistan
North Waziristan: A Reading List
Books, Journalism, Pakistan
Books, Journalism, Pakistan

Recent Posts

Blog
First stitches: on learning to knit
about 5 months ago
Language Learning Crash Course: from slightly more than zero to slightly less than advanced
about a year ago
All the things I wish I knew about studying at school
about a year ago
Automating social media posting for my new blogposts
about a year ago
Vermeer at the Rijksmuseum
about 2 years ago