Saturday, May 22, 2021

What's in Your Vocabulary Deck?

I am something of a student of New Testament Greek and made my beginnings a few years ago with Mounce's Basics of Biblical Greek. Along with the text book was a flash card program called FlashWorks, and it is a solid way to learn your vocab. The vocab was indexed with frequency and chapter and user level difficulty (based on how often you answered correctly) which was almost all that I wanted to help me keep track of which words I should be working on. 

There was one thing that I felt was missing, though I wasn't sure how to define it or to decide what could be done about it. Something like, "how do I get a balanced randomness to my vocab selection if I only want to do a few right now?"

I don't want to randomly select 10 words from out of 500 words. I won't get much repetition if I do that every day. On the other hand, if I only focus on the most frequent, hard words, then I repeat them every day, then the risk might be that I "cram" them and don't really get them into long term memory. So, maybe the thing to do is to start with a selection of 30 words that are the most difficult/frequent words and randomly select 10 from that list. I call it a "pool factor". The pool factor in that case, would be 3. Make a selection of the most difficult, frequent words up to 30 long (and include a random sorting factor in case there are contenders for the last spots in the 30) and then randomly select 10 out of the 30.

The 30 words are a kind of "working set" that I am working toward mastering, although, I don't need to specifically decide on them, they get selected based on criteria. I do 10 words today out of those 30. As I do the daily work, some of those 30 words drop out because I got them right a bunch of time and new words enter the selection of 30.

The next issue is to decide what is meant by hardest, frequent words. If I use a score that starts high for words that have never been marked correct, then I can do a descending sort on score, then by frequency. The whole deck will start at the highest score and initially I am getting the most frequent words. In order to prevent words from disappearing from this set too soon, keep track of not just a score, but the number of times marked correct. Only decrease the score after after getting the word right correct, say, 3 consecutive times. (Since the score of a word doesn't change until you have reached a level of mastery with it, you don't run into the scenario of interchanging sets of words that you then never master.)

Note that you can probably apply this strategy to other types of vocabulary that don't reference a fixed body of literature, but you have to come up with some kind of importance rating on each word in your database that serves the same purpose as the frequency field here.

The code below belongs to a C# project of mine that is using SQLiteAsyncConnection with some fields that are still missing data (hence, ifnull):

        public Task<List<Vocab>> GetNRandomHardFrequent(int numberOfWords, double poolFactor)
        {
            int PoolSize = Convert.ToInt32(numberOfWords * poolFactor);
            Random rand = new Random();

            string query = @"
                    select 
                        sub.*
                    from
                        (select 
                            v.*
                        from
                            Vocab v
                        order by
                            ifnull(v.score, 0) desc, ifnull(v.frequency, 0) desc, RANDOM()
                        limit ?) sub
                    order by
                        RANDOM()
                    limit ?;";
                        
            var words = database.QueryAsync<Vocab>(query, PoolSize, numberOfWords); 

            return words;
        }

No comments: