Generating frequency lists of vocabulary words for study

When it comes to word frequency, languages follow the Pareto principle. It’s been said that the top 1,000 most frequent words in the English language make up 85% of speech, and the top 5,000 make up 80% of writing. The same is probably true for most languages, including Portuguese. This means that we can optimize our study of vocabulary by focusing on learning the most common, useful words, the ones that we are likely to see and hear over and over again. One way to do this is to use a frequency list.

A frequency list of words is a list sorted by how common a word is (realistically, how often it appears in a database like a corpus). Using the online Corpus do Português, it’s possible to make our own customized lists of words. I’ve gotten you started with a list of the 1000 most common verbs and another for nouns.

If working with the Corpus is too complicated or too much trouble, another way to study frequency lists is to purchase A Frequency Dictionary of Portuguese, which is based on the same data set as the Corpus. It contains the top 5,000 most common words already defined for you and sorted by part of speech, plus example sentences.

Using the Corpus

Let’s say we want to make a list of the 1,000 most used verbs in the Corpus. The screencast and the instructions below will show you step-by-step how to do it.

  1. In the WORD(S) field, type [v*] to tell it to search for all forms of all verbs. Alternatively, you can click inside the WORD(S) text box, then click POS List to show the drop-down parts-of-speech list, and then choose VERB from the list. It will insert the proper code in the WORD(S) box.
  2. Make sure Display is set to LIST and that the COLLOCATES field is blank.
  3. Now check the Sections box. By default, 1900s and 1800s are highlighted. Click on 1900s because we only want to search twentieth century sources.
  4. Under SORTING, choose FREQUENCY
  5. Click CLICK TO SEE OPTIONS, then under FREQ type in 1000, which is the number of results we want it to give us. 1000 results is the maximum available to non-academic users.
  6. Under SAVE LISTS, choose YES.
  7. Finally, click the SEARCH button.

Here’s how everything should look:

If everything was set up properly, you should now see a list of verb in the upper-right pane. The TOT column shows how many times each verb occurs in the Corpus ( in any form), and because we sorted by frequency, they should be in decreasing order. As you might expect, the top of the list contains extremely common verbs like ser, ter, estar, poder. These first 30 verbs are an excellent place for beginners to start, and the first 100 would make an excellent foundation for low-intermediate learners.

The end of the list, meanwhile, contains verbs like coçar, viabilizar, afogar that only high intermediate students might know, plus some really obscure verbs like balbuciar (to stammer, babble, mumble) and empatar (to break even, to end in a draw, to hinder).

My advice is to start at the top and scroll down until you start to see verbs that you don’t know. Depending on how long you’ve been studying Portuguese, you might have to scroll down quite a ways. One neat thing is that this can give you a rough estimate of how many verbs you know.

Be on the lookout for unknown verbs that appear high in the list, surrounded by verbs that you do know – these are the low-hanging fruit, the big payoff verbs, the ones that somehow escaped your attention but are common enough that it would be very worth your while to go and learn them.

Making lists of other types of words

If you want to make a list of nouns, adjectives, whatever, the procedure is nearly the same. The only search field that will change is the WORD(S) box, where you will type [nn*] for nouns or [j*] for adjectives – consult the POS LIST for other options. Note that unless you’re a researcher or a linguistics student, your account will be limited to lists of no more than 1,000 words.

Saving your list of words

This is only possible if you’re registered for a free account. Assuming you set SAVE LISTS to YES before running the search, you should see a field at the top of the wordlist that says NAME OF LIST. Type in a name like “top 1000 verbs”, click the check box to the left of the CONTEXT button to check all the verbs in the list, and click the SUBMIT button.

The bottom right pane will now show the new list you just made. Click on the green M and the list will appear in the green box on the right. You can now use your mouse to highlight all the words, copy them, and paste them into Word, Excel, or a text editor to make a printable list for yourself.

6 Responses to Generating frequency lists of vocabulary words for study

  1. THANK YOU for mentioning this!! I have been trying to Google this, and found that most people are not thinking strategically, so they don’t mention this. There is so much material out there for “quick phrases,” or giant lists of random vocabulary, and neither of those are so helpful!

  2. Amber says:

    Thanks so much for this! I’m actually a student of Spanish, and I’ve been trying to build my own frequency lists, but had no idea how to go about it. Googling the Spanish corpus paid off!

  3. Broken Link says:

    The link to Corpus do Português is formatted incorrectly so it doesn’t work without editing.

  4. Alex says:

    Hi,

    Thank you very much for this. Extremely helpful!

    Would you by chance know how to make a frequency list of all types of words? (ie, verbs+nouns+everything else) I can’t find a way … maybe it is not possible.

    Cheers,
    A

    • Ben says:

      Yes, if you enter * into the palavra/word input, it will search for all words, and by default sort by frequency. I note that some punctuation also appears in this case, such as commas, semi-colons etc.

Leave a Reply to Ben Cancel reply

Your email address will not be published. Required fields are marked *