Additional Information
Lubukusu
Lubukusu, or Bukusu, is a spoken language part of the Bantu language family. Many speakers use it in informal written contexts like when texting others or posting on social media, but Lubukusu currently has no standardized spelling system, and as a result, many words in the database have more than one spelling provided by participants. Though Lubukusu is spoken by about 1.2 million people in Kenya, it remains a relatively understudied language with few existing language resources.
Information in the database
Each word in the database is represented by a unique numeric word key associated with information like spellings, translations, category, frequency, etc.
When searching, the singular Lubukusu word, plural Lubukusu word, and English translation for a word key are always returned. These are the spellings or translations that were provided the most number of times by survey participants. If several spellings or translations tie for the most number of occurrences, all of them are returned, separated by a '/'.
The other information associated with each word is described below:
- Alternate spellings: alternate spellings of the singular or plural Lubukusu word are arranged in descending order of frequency in a comma-separated list, with the alternate spelling that was provided the most number of times appearing first.
- Category: this refers to the semantic category or categories in the survey for which participants provided the word.
- Occurrences: this is the number of times the word was provided by survey participants. Since not all categories were shown equally frequently to participants, it may not be the most accurate to compare the occurrences of words in different categories.
- Frequency: this is calculated by taking the number of occurrences associated with the word and dividing it by the total number of times its category (or all its categories) were shown to participants.
- Noun classes: the singular and plural noun classes are determined by comparing a list of prefixes to the first few letters of each word. If the word doesn't match any prefixes, the noun class is left blank.
- Number of characters: the number of characters in the singular or plural Lubukusu word is calculated only by looking at the most frequent singular or plural spelling. If there are multiple spellings that are the most frequent, their lengths are averaged and then rounded to the nearest whole number.
- Modifications: this is a list of descriptions of the changes that were made to a word during manual processing of the data, arranged in reverse chronological order. Such changes include correcting typos or switching the singular and plural Lubukusu word when they are swapped. A word only shows up as 'modified' (has a tilde ~ next to it in the search results) if something other than its noun class or its word key were changed; however, changes to the noun class or word key are still logged and will show up alongside any other changes that were made.
About the project
This database was first created in 2024 by a team comprised of one researcher from Moi University in Kenya and two others from the University of Southern California. It is meant to be an updated, current resource for Lubukusu that can be used as a basis for future research on the language.
Further information on the project's background and how the database was constructed can be found in our Annual Conference on African Linguistics (ACAL) 56 proceedings paper.