|
 |
|
|
|
| |
 |
 |
 |
| |
The Punjabi spell checker has the following main
features:
- Unicode/ISCII Compliant : For keeping pace with
the time, Punjabi Spell checker works on ISCII and
Unicode encoded Punjabi files as well as text
encoded in the common file formats such as DOC, RTF
and HTML.
- Support for popular Punjabi fonts and keyboards:
The spell checker has been designed to provide
support for Punjabi text encoded in any of the
popular Punjabi fonts such as AnanpurSahib,
AmritLipi, Jasmine, Punjabi, Satluj etc. More than
one hundred Punjabi fonts and thirty two Punjabi
keyboard layouts have been supported. This removes
the constraint on the user to type the text in
predefined fonts only. The inbuilt font converter
automatically converts the text to be spell checked
to one of the standard fonts in which the dictionary
has been encoded.
- Provision for additional dictionaries: The main
dictionary for spell checker has around 1.50 lakh
words in its database. The main dictionary contains
most common words, but it might not include proper
names, technical terms, acronyms, words from Gurbani
and so on. To prevent the spelling checker from
questioning such words, the Spell Checker provides a
built-in custom dictionary and in addition the user
can create their own custom dictionaries such as
separate custom dictionaries for legal or medical
terms. In addition, a dictionary of all the words
occurring in Guru Granth Sahib has also been
developed and can be be used along with the main
dictionary to check for texts quoted from Gurbani.
- Powerful Suggestion List: The real power and
utility of a spell checker depends on the suggestion
list provided by the spell checker. For designing an
efficient spellchecker a detailed error pattern
analysis of Punjabi has been carried out. It
included analysis of various types of errors
(insertion, deletion, transposition, substitution,
run-on, split word error) positional analysis, word
length effects, phonetic errors, first position
error analysis, keyboard effects etc. A detailed
analysis based on the carefully collected spelling
error patterns has been done on around 20,000
misspellings from more than 12 lakh words. To
collect the candidates of suggestion list, a reverse
minimum edit distance is used where a candidate set
of words is produced by first generating every
possible single error permutation of the misspelled
string and then checking the dictionary if any make
up valid word. Spelling error pattern knowledge is
applied to the selected word list and words are
sorted in the list based on that knowledge. As a
result, we a very powerful suggestion list, in which
the correct word is usually placed on the top of the
suggestion list.
|
|
 |
 |
 |
|
|
|
 |
|
|
|