Khmer Unicode creator developing ‘AI’ spell check
In 2001, Dahn Hong developed Khmer Unicode, which remains the basis for most Khmer language fonts in use today. He designed the Unicode – a standardised system which assigns a unique number to each character – in response to frustration with the Limon font, a much earlier version, which was developed in 1994. Limon was difficult to employ, particularly in the age of smartphones.
Not content with providing a clean, dependable source for nearly all of the Khmer typography in use, he has set his sights on reliable spellchecking software.
Hong, a former law student, was inspired to devise the program after observing a rise in misspelled words in the Kingdom’s national language.
He began working on the Nextspell app in 2019.
Most spellchecking software uses optical character recognition – the process that converts an image of text into a machine-readable text format. This meant that a system needed to be developed that would recognise Khmer text.
“English correction is simple in most well known programs, such as Microsoft’s Office suite. A person can make corrections with a click of a button, misspelled words are underlined automatically, and new words can be added to the program’s dictionary. Nextspell is similar in some ways, but it employs AI to store common corrections in a large central database,” said Hong.
The more people use the program, the more accurate it will become. Google Translate is an excellent example of this progress. As little as 10 years ago, it was often illegible when translating Khmer to English. Now, it is far more accurate.
“As long as data is uploaded on a regular basis, it will continue to improve,” he said.
Ultimately, his goal is to enable the app to automatically correct misspelled words, although this is ultimately dependant on the amount of date that is added to the app’s database.
“Misspelled words are underlined in red, and if the database contains an alternative, it will make the correction, and underline it in blue. If there is no alternative available in the database, all the program can do is highlight it,” said Hong.
He explained that the Khmer script is one of the most complicated in the world, and regular updates are necessary.
“We depend on the dictionary devised by supreme patriarch Chuon Nath in 1938, but there are modern terms that need to be added, and some words that still need to be standardised,” he said.
Hong is unaware of the exact number of people who use his app, but explained that the free version of the app limited each user to 200 words. Regular users could access up to 3,000 words, for just $12 a year.
“There are many more free users than professionals, but this is common with most apps,” he said.
In addition to Android and Apple apps, Nextspell can be accessed for free via any browser.
Users simply visit the Nextspell website, register for a free account, and then copy and paste Khmer text onto the page.
“The programme is becoming more accurate every day. Obviously, the most common words are the ones it identifies most quickly, but its database is constantly improving its base of knowledge,” said Hong.
Although his Unicode remains in wide use, he did not profit from it. Nextspell provides a modest income from app sales, and his work with the Khmer language led to him assisting the government of Laos to create their own Unicode.
“Khmer with vowels and syllables are far more complex that English ones, so this program is essential. I am also working on developing fonts for children. The Khmer language must remain relevant in the digital age,” he said.