Showing posts with label language. Show all posts
Showing posts with label language. Show all posts

Saturday, May 22, 2021

What's in Your Vocabulary Deck?

I am something of a student of New Testament Greek and made my beginnings a few years ago with Mounce's Basics of Biblical Greek. Along with the text book was a flash card program called FlashWorks, and it is a solid way to learn your vocab. The vocab was indexed with frequency and chapter and user level difficulty (based on how often you answered correctly) which was almost all that I wanted to help me keep track of which words I should be working on. 

There was one thing that I felt was missing, though I wasn't sure how to define it or to decide what could be done about it. Something like, "how do I get a balanced randomness to my vocab selection if I only want to do a few right now?"

I don't want to randomly select 10 words from out of 500 words. I won't get much repetition if I do that every day. On the other hand, if I only focus on the most frequent, hard words, then I repeat them every day, then the risk might be that I "cram" them and don't really get them into long term memory. So, maybe the thing to do is to start with a selection of 30 words that are the most difficult/frequent words and randomly select 10 from that list. I call it a "pool factor". The pool factor in that case, would be 3. Make a selection of the most difficult, frequent words up to 30 long (and include a random sorting factor in case there are contenders for the last spots in the 30) and then randomly select 10 out of the 30.

The 30 words are a kind of "working set" that I am working toward mastering, although, I don't need to specifically decide on them, they get selected based on criteria. I do 10 words today out of those 30. As I do the daily work, some of those 30 words drop out because I got them right a bunch of time and new words enter the selection of 30.

The next issue is to decide what is meant by hardest, frequent words. If I use a score that starts high for words that have never been marked correct, then I can do a descending sort on score, then by frequency. The whole deck will start at the highest score and initially I am getting the most frequent words. In order to prevent words from disappearing from this set too soon, keep track of not just a score, but the number of times marked correct. Only decrease the score after after getting the word right correct, say, 3 consecutive times. (Since the score of a word doesn't change until you have reached a level of mastery with it, you don't run into the scenario of interchanging sets of words that you then never master.)

Note that you can probably apply this strategy to other types of vocabulary that don't reference a fixed body of literature, but you have to come up with some kind of importance rating on each word in your database that serves the same purpose as the frequency field here.

The code below belongs to a C# project of mine that is using SQLiteAsyncConnection with some fields that are still missing data (hence, ifnull):

        public Task<List<Vocab>> GetNRandomHardFrequent(int numberOfWords, double poolFactor)
        {
            int PoolSize = Convert.ToInt32(numberOfWords * poolFactor);
            Random rand = new Random();

            string query = @"
                    select 
                        sub.*
                    from
                        (select 
                            v.*
                        from
                            Vocab v
                        order by
                            ifnull(v.score, 0) desc, ifnull(v.frequency, 0) desc, RANDOM()
                        limit ?) sub
                    order by
                        RANDOM()
                    limit ?;";
                        
            var words = database.QueryAsync<Vocab>(query, PoolSize, numberOfWords); 

            return words;
        }

Friday, April 19, 2019

"and" and "or" in Set Theory and English

Relating the English words "and" and "or" to concepts in set theory and logic can be done by starting with basic examples and reasoning by analogy to more complex cases. It must also be remembered that human language is generally less precise than formal logic. In natural language, we recognize a range of meaning for a word and so we must look not for one definition, but a few definitions. Here, below, I attempt to give a listing of the use-cases of the words "and" and "or".

and:

  1. between two objects, P and Q means the set \(\{P, Q\}\).
  2. between two sets, P and Q means the set \(P\cup Q\).
  3. between two propositions, P and Q means that \(\{P, Q\}\) is contained in the set of all true propositions.

A possible objection to use-case 2 for "and" is that the conjunction "and" is used in the definition of intersection and so P and Q on sets should refer to set intersection, not union. However, a set or a list or the designation of a category refer to objects (conceptual ones or otherwise). Our best analogy is with with use-case 1, where we have both items included in the set. This conforms to customary usage. For example, in the designation The Department of Math and Computer Science, customary usage indicates this means "the department which contains math courses and computer science courses", or, more expansively, "the department which contains math courses and contains Computer Science courses". We recognize this rendering as use-case 3, which serves as the argument for use-case 2. If we called it The Department of Math or Computer Science or The Department of Math and/or Computer Science, this would imply that the course name would be valid if it contained math courses only, computer science courses only, or both types of courses. The name of such a department would not inspire confidence in the applicant interested in one of these areas specifically, that the department offered the kind of courses he/she was interested in.

or:
  1. between two objects, exclusively, either P or Q means exactly one element of the set \(\{P, Q\}\).
  2. between two object, inclusively, P or Q means any element of the set \(\{P, Q\}\).
  3. between two sets, exclusively, x is in either P or Q means \(x\in P\cup Q - P\cap Q\).
  4. between two sets, inclusively, x is in P or Q means \(x\in P\cup Q\).
  5. between propositions, exclusively, either P or Q means \(|\{P, Q\}\cap T| = 1\), where \(T\) is all true statements.
  6. between propositions, inclusively, P or Q means \(\{P, Q\}\cap T\ne \emptyset\).
At least some of the above inclusive cases are sometimes expressed in English using "and/or" in place of "or" as a means of indicating the inclusive nature of the conjunction intended. The exclusive uses are sometimes emphasized by using the word "either", as above.

We can take our example of The Department of Math and Computer Science and express it in a way that uses set intersection. Let M be the set of departments in a university offering the majority of mathematics courses and let C be the set of departments offering the majority of Computer Science courses. Then \(M\cap C\) will either be a one element set containing The Department(s) of Math and Computer Science or the empty set, meaning there is no such department at the given university. However, this is forcing the issue and use case 2 for "and" is the more natural expression.

We can also see set intersection in the exclusive use-cases of "or". Use-case 3, in particular, can be expressed as \(P\cup Q - P\cap Q\). Depending on the context, this might appear redundant. For example, we might say that a washroom is made for either males or females. We take M as washrooms intended for male occupancy and F as washrooms intended for female occupancy. This can be seen as use-case 3, but if the context of the conversation is public, multiple-occupancy washrooms the intersection is empty. In fact, the nature of the statement may be to emphasize the fact that the intersection is empty and the speaker is really saying \(M\cup F = M\cup F - M\cap F\), that is, "all of the washrooms I have in mind are for single sex occupancy, none of the washrooms I have in mind are accepting of both male and female". There may be other washrooms where the intersection is non-empty (e.g., single occupancy), but not the washrooms the speaker has in mind.