Making and shuffling lists in Python

I discovered some useful functions the other day while trying to solve one of the Dataquest guided projects. These all relate somehow to lists and use Numpy. I'm listing them here mainly as a note for my future-self.

import numpy as np

# this code returns a list of n number of items starting at 0
np.arange(3)
---- returns [0,1,2]

# this code is a variation on the previous one
np.arange(3,7)
---- returns [3,4,5,6]

# this adds the functionality of steps in between values
np.arange(2,9,2)
---- returns [2,4,6,8]

# these are slightly different; they sort lists
# if you want to make list of numbers randomly sorted:

np.random.permutation(10)
---- returns the numbers 1-9 in a list, randomly sorted

# you can also pass non-numeric lists into the `permutation`
list = [a,b,c]
np.random.permutation(list)
---- returns something like [b,a,c]

Tabula for extracting table data from PDFs

Have you ever come across a PDF filled with useful data, but wanted to play around with that data yourself? In the past if I had that problem, I'd type the table out manually. This has some disadvantages:

  • it is extremely boring
  • it's likely that mistakes will get made, especially if the table is long and extends over several pages
  • it takes a long time

I recently discovered a tool that solves this problem: Tabula. It works on Windows and Mac and is very easy and intuitive to use. Simply take your page of data:

A page listing Kandahar's provincial council election polling stations from a few years back. Note the use of English and Dari scripts. Tabula handles all this without problems.

Then import the file into Tabula's web interface. It's surprisingly good at autodetecting where tables and table borders are, but you can do it manually if need be:

ScreenShot 2018-01-17 at 15.56.25.png

Then check that the data has been correctly scraped, select formats for export (from CSV to JSON etc):

ScreenShot 2018-01-17 at 15.57.19.png

And there you have it, all your data in a CSV file ready for use in R or Python or just a simple Excel spreadsheet:

ScreenShot 2018-01-17 at 15.57.50.png

Note that even though the interface runs through a browser, none of your data touches external servers. All the processing and stripping of data from PDFs is done on your computer, and isn't sent for processing to cloud servers. This is a really nice feature and I'm glad they wrote the software this way.

I haven't had any problems using Tabula so far. It's a great time saver. Highly recommended.

Language Learner's Journal: Meaningful Leisure

[This is a continuation of Taylor's blog series where she details some of the week-in-week-out lessons that she learns through her Arabic studies and coaching work together with me. For other posts in the series, click here.] 

If the first phase of my Arabic study in Jordan was intensive textbook fusha and the second was track-switching ammiya classes, this third and current could be called meaningful leisure, or, hanging out around town a lot and making friends. 

When I went to Bombay for an extended stay in 2010, a journalism colleague gave me a piece of advice: "Take everyone up on their offer to hang out with you." It may sound "duh," but over the years living abroad, I've seen how foreigners spend their free time in ways that often diverge from how residents in a given city do so. When we, as gringos in Rio, may have wanted to go to foreign film festivals or paragilding over the beach, many of our Brazilian peers would be going to baby showers, a classmate's thesis defense, or Outback Steakhouse. All of those activities are great ones, and I think the spirit of my colleague's advice was: If you want to get to know a culture, let your host take the lead and show you how they spend their free time.

That means over the past few weeks, I've sat on the sidewalk in front of a gift shop with a delightful young sculptor and a store clerk, my partners in very unstructured language exchanges that break when one of them needs to pop into the shop to attend a client. I went for a 6:30 a.m. workout with two of the fastest runners in Amman, a pair of brothers I met at a sunset race in Wadi Rum as we waited in the dunes watching for headlamps of other runners finishing. I went to a capoeira performance at Jadal cafe that was held in commemoration of the nakba; I was pleased with how accessible the discussion after the performance was for me, particularly when an older man in the audience vigorously questioned the capoeristas as to why they needed to do someone else's sport when they could do dabke.

Alex often talks about "islands" of vocabulary, and I thought about that as I spent more time with the same people and can make good guesses about the words they're using. (As I crossed the finished line at the race, other runners asked me ايش كان مركزك؟ though I certainly hadn't run fast enough to place. It was satisfying, though, to deduce what they were saying.) The store clerk and I talk often about money and salaries, since she hustles to work two jobs to help her family out.

I could be more purist; I speak plenty of English in these interactions. I'm still searching for the point of equilibrium between taking advantage of each opportunity I get to speak in Arabic while (of course!) having genuine friendships with peers with whom I share interests (running, yoga, current events, feminism, vegetarianism, pets). Plenty of the vocabulary and references regarding those topics are in English, not to mention the people who are interested in them often read and speak in English about them. I don't believe every friendship needs to be instrumentalized for one's language-learning goals (though I believe even more strongly that such an attitude should not be a lofty cover for native English speakers kicking back and relaxing). When I told Alex about my happy sidewalk sessions, which qualify more as bilingual shooting-the-shit than a proper language exchange, he said: You're doing the real thing, rather than practicing for it.

Some working notes, now, on practice:

I've been happy with my second time around testing out language exchanges; I've used the website Conversation Exchange, which I had suspected could be out of use by its retro web design but is actually popping. I'm pretty strict about where I meet the person, i.e., it needs to be as quiet as possible (a first exchange at Indoor cafe across from the University of Jordan was really hard to decipher and, from my point of view, turned into disjointed monologues rather than a conversation because I couldn't hear her well).

I think the exchanges, for my current level, are less experimental zones and more consolidation ones. That is to say, I don't risk and try to reach for vocabulary I'm shaky on but work with what I know decently. That's why I like coupling the exchanges with private classes, which I go to twice a week and are a better place for reaching and experimenting. I also think that in a language exchange it is useful to ask my partner "is the way I said that correct?" but not productive to ask "why?" I save those questions for my teacher.

Alex encouraged me to discover certain transition phrases (على فكرة... على كل حال... بالرغم من) and put them into practice in my speech, which give the impression of being more fluent and conversant than I am. This has been a fun exercise with my private teacher, since I take the English phrases I want and try to describe to her a situation that I might use them.

I'm on board with the many lines of criticism telling us that we need to make an active effort to start unplugging our lives before we turn into cyborgs; that said, having a round of friends here I chat with on Facebook or Whatsapp has indeed been great practice for seeing spelled out how people are saying what I hear each day. In conversations, I still feel like I rarely could repeat back word-for-word what someone has said to me, even if I usually get the message through key words and context.

I bought Diwan Baladna, an ammiya vocabulary book organized by subject matter. I really like it – my hope is that it will help me turn a lot of passive vocabulary into active vocabulary. I have a quibble with the audio component (read too fast in long audio files that make it tedious to isolate the word I want. And having sample sentences is far better than English translations!).

And finally, as per Alex's encouragement, I continue to avoid dictionaries and translation apps. I make ample use of Reverso Context, but only after I've read a message or passage several times through, and usually I'm using it to confirm my guess of a word's meaning is true. Especially when it comes to Whatsapp and chatting, the majority of messages I am receiving are ones that involve words I know well (Want to meet at this time? How far did you run today? I have foul and rice my mom made, want some? It's veg.)

On Reading in Arabic: The Evidence

[This is the second post in a series on the importance of reading when studying Arabic (or any other language). Read the first post here.]

It is notoriously difficult to study and show which are the most efficient methods to study second languages. For starters, everyone is slightly different, so it's hard to compare between individuals. Learning a language is also such an involved pursuit (taking place over all hours of the day, and in the mind, where microscope or dictaphone can't usefully reach) that it is impractical to follow the student for all twenty-four hours of the day.

Having given the pitch for why I think reading is so important for students of Arabic, today I wanted to summarise a study that was carried out from 1970-1977. This study, by ElSaid Badawi, is entitled "In the quest for the Level 4+ in Arabic: training Level 2–3 learners in independent reading" and can be found as an article in Betty Lou Leaver and Boris Shekhtman's fascinating (and underrated / underred) edited volume, Developing Professional-Level Language Proficiency. Given its somewhat obscure provenance, it's unlikely you'd come across this fascinating article in the normal course of your day, hence my interest in summarising it for you here.

Badawi offers an overview of his experience running the CASA (Center for Arabic Study Abroad) programme between 1970-1977. This programme was originally started in 1967 for advanced-level students and the idea of it was to give a year of intensive study in order to really catapult students into real competency in being able to read, speak and use Arabic in a professional capacity. (Badawi begins his article with a justification for reading, but I'll skip those details since their is a great deal of overlap with what I've already written).

The original CASA curriculum in the 1967-era programme was established around a 3000-word vocabulary list, reading of some short passages using those words in context, a grammar book and two long 'authentic' texts that would be covered over the course of the year. The students found this dull and unrewarding, however, so CASA's administrators decided to design a new course based around familiarising students with a 'language domain of their interests'. In other words: allowing them to read things that were related to their interests and professional trajectory.

Students taking part in the programme were assessed (prior to joining) as being at a high level, but their vocabulary was generally limited to political subjects. They had a poor understanding of morphology and little to no facility with semantics. They had, Badawi writes, bad reading habits in Arabic: too much focus on sentence structure, engaged in 'parsing-based reading' and with only a minimal grasp of the "semantic role of punctuation". In that last case, this is the way Arabic uses words, phrasing and sentence constructions to signify the meaning of a sentence, whereas in English a lot of those meaning structures are conveyed through punctuation. Most of all, students suffered from an 'excessive / crippling' use of the Arabic-English dictionary, which was identified as an obstacle to spontaneous and contextualised language learning; words were quickly forgotten.

The programme sought to encourage a switch in its students: "a change of attitude toward Arabic from that of a language they are being taught to one which they should start learning". The responsibility, at this level, generally should switch from the teacher to the students.

The programme was split up into three semesters / terms:

  • Semester 1: 8-week summer programme

This was made up of introductory cultural classes (based around Cairo, Egypt, where students were living. It offered classes to bring students up to a competent level in functional colloquial Arabic. (Students could solve all their problems and interact with Egyptians in a functional way, following the course). There was also a component of media Arabic where students would become familiar with the formalised language used in printed and spoken contexts.

  • Semester 2: 14-week autumn programme

This semester was for allowing students to gain a higher competence in MSA. Reading was one of the core elements here (news reading became effortless and there was some inclusion of classical language as well). Colloquial Arabic was encouraged through the reading of plays (which often used colloquial/dialect expressions and language). An intensive reading programme was added alongside this to boost confidence.

  • Semester 3: 14-week spring programme

The final semester included three graduate-level courses in subjects of the students' interest / choice. There was also some training in 'Educated Spoken Arabic' (i.e. the discussion of high-culture topics).

The Intensive Reading Course

The core belief behind the programme was that reading was important to the students' knowledge of Arabic in a fundamental way. All the other skills would benefit and develop alongside the reading done as part of the programme. There were different kinds of texts available and a selection criteria for what kinds of reading took place:

Finding materials for intensive / analytic reading was easy. The harder issue was finding materials suitable for extensive reading, i.e. the kind of wide-reading that students are able to do with some level of ease. Arabic poses a particular problem in this regard, given its 'wide range of active vocabulary in use', and the 'complexity of the morphs-semantic system'.

Plays were believed to be the best for extensive reading. They carried a "high degree of word and sentence redundancy", usually had only a single theme and were of moderate length. (It was found that reading two 200-page books was much more satisfying than reading a single 400-page book). Plays also lend themselves to real-life activities. There is also the possibility of watching the plays being performed (or, now, on YouTube).

Novels were also considered useful, but the fact that dialogue is used only minimally means that they were kept for later in the semester. Short stories were denser in meaning and language use and thus harder. They were included in the programme, though, for the sake of variety.

Overall, texts were chosen for the language structures used rather than for their literary value / content.

Reading Texts

The course had students reading three items each week. Usually one novel or a play (a long item) and a short story and a 1-act play (i.e. two short items). These were generally from the same author, and difficulty would escalate over time. All texts were authentic and unabridged. Ideally they were selected from leading literary figures and they would all be texts for which no English translations already exist. Selecting these texts was hard at the beginning, but over the years they settled into a broad pattern, escalating in difficulty:

  • Group 1 (first three weeks)

Plays by Tawfiq al-Hakim (short and long). These were good because he uses a lot of redundant vocabulary, follow familiar thematic sources to those with which students would have been familiar, used a lively dialogue and generally contained "straightforward language".

  • Group 2 (5 weeks)

This consisted of works by Ihsan Abdul Quddus, a journalist, novelist and short story writer. These works tackled themes from social phenomena and thus were appropriate to a young audience. They referenced local customs and expressions. They included fewer dialogues in the novels and short stories. They had a lucid structure and controlled range of vocabulary.

  • Group 3

This was works by Yusuf Idris, blending MSA with colloquial idioms, Qur'anic citations and quotations from the hadith literature. These were at a higher difficulty level.

  • Group 4

This was a mix of items chosen for special topical interest or artistic value. For example, in the final week, students read Fathy Ghanem's 1958 novel Al-Gabal. They also tackled some of the non-famous novels by Nagib Mahfouz.

Mixed in these various groups were shorter items: one-act plays and short stories. There was generally a balance between length of a text and its linguistic difficulty.

Reading Instructions

I found this section of the article the most interesting / instructive. Students were told the following:

  • The beginning of a story / text is always the hardest. You don't know what's going on, who the characters are and what the context / scene is. Bear with it. A lot of this will be scene-setting. You can always return back to it later on.
  • Arabic has a lot of redundancy. Compare what you are stuck on with what follows and check if you can figure out the meaning that way.
  • Continue reading as long as you can make out a story or theme for yourself. Don't worry or second-guess yourself as to whether what you understand from the story is the same thing as what the author intended you to understand.
  • If you find a word or part of the structure you don't understand and stop, DON'T look the word up in the dictionary unless:
    • you have failed to guess the meaning
    • there is nobody around to ask the meaning
  • Mark / highlight the words you were able to guess in the text. Mark the words you were able to do without understanding.
  • Make a list of cultural features that you'd like to be addressed in class.
  • Mark and make a list of any expressions and grammatical features or constructions that you want addressed in class.

Classes

Class sessions were essentially there to ensure that students were keeping up with the reading volume. Students would narrate their understanding of the texts they had read, and would raise any issues they wanted to learn more about.

Classes were also a good time to increase students' semantic understanding -- allowing students to identify shared roots and usages in different contexts and forums.

Students submitted written responses / follow-ups to the text in the class with the teacher present. A weekly conference with students gathered feedback on the choices of texts, allowing teachers to adapt the programme depending on the ease/difficulty perceived by each individual cohort of students.

Results

By the end of the 14-week programme, students had read an average of 2500 pages of authentic Arabic texts. Graded text levels showed that their language was improving. They were encouraged by managing to review words and structures that had been marked as 'hard' earlier on in the semester. (Usually 25-40% of these words had become intelligible to them, despite no vocabulary learning strategy specifically targeted at learning these words.) The graduate-level courses (all taught in Arabic, obviously) of the final semester were also a proving ground for students.

This reading programme increase students' competence and was transferrable to their other skills. (Yes, even their spoken Arabic.) Reading helped with writing. Reading 'complete texts' did a lot for the morale of the students at the intermediate-level, too. And the literary focus of the content was useful for students even if their interests didn't lie in that particular area.

My next post about reading Arabic will detail some options that are available to the intermediate-level student of Arabic, and some practical considerations resulting from this article.