Expert in copywriting, admin duties, data entry and research
October 9th, 2018
Writing & Translation
Transcription (audio to text), English Punctuation, Translation, Copywriting, English Spelling, Data Entry, Cover Letter Writing, English (UK), English Grammar, Business Proposal Writing, English (US), English Proofreading, Academic Writing
Admin & Support
General Office Skills, Research, Typing, Administrative Support, Christian theology
Master of Information Science
University of Ibadan, Nigeria
This was a master degree.
Microsoft Language Program on Localization of Windows 7
I was awarded this as a team member of the localization team for the windows project
The Automation and Compilation of a Hausa Corpus
The International Journal of Science and Technology
A spell checker is an indispensable tool for text editing as it can be used to assist the possible poor language skills of writers
as well as to identify and correct inevitable typing errors. With a population of over 40 million speakers, the Hausa
language is the second most widely spoken language in Africa, yet it is without a standard spell checker.
To create a Hausa spell checker, a Hausa corpus was built by data entry and web crawling. The wordlist was cleaned to
remove non-Hausa words as well as to correct typographical and other errors. Also, in order to determine the extent to
which the modest corpus used for the spell checker covers the Hausa language, the rate of increase in the size of the wordlist
in relation to corpus size was determined. A modest 2 million-word Hausa corpus was realized. The corpus was then
tokenized to produce a wordlist of about 30,000 Hausa tokens. After cleaning, the wordlist was reduced to 23,306 tokens.
Based on the use of Hausa morphology, the word list was compressed to 12,569 stems and 62 affix rules. This made up the
spell checker files. Also, a700,000-word corpus drawn from the Hausa corpus was tokenized in separate files with a
successive increment of 20,000 words per file.
Results showed that Hausa morphology proved effective for information compression as expected and a rudimentary spell
checker was produced. Furthermore, results of the corpus study showed that a corpus of 20,000 words would produce an
average of about 3000 tokens and the number of new tokens produced will decrease with every addition of each new file
until it asymptotes to a point that an addition of corpus of any size would produce little or no new tokens at all. The rate of
new tokens realised with each addition decreased from 2000 tokens to 1000 tokens and to less than that.
This work is recommended for use by individuals, institutions and organisations to guide in the design of a standard spell in all agglutinative languages.