Typing Utilities
Spell Checking
Dictionaries
Font Conversion
Other Utilities
Formatting
FAQ's
Purchase
Contact us
 
   
 
 

Other Utilities

Text analyser
A text analysis utility has also been bundled with Akhar, which is a very useful tool for researchers working in fields of computational linguistics, translation, natural language processing, text processing, lexicography, optical character recognition, information retrieval systems, speech recognition etc. It performs quantitative analysis of text and generates word-frequency lists, character frequency lists, concordances and other statistics such as count of running words and unique in a text, token by type ratio, mean word length, percentage frequency of each word length etc. These statistics have applications in linguistic and stylistic analysis. As proposed by one of the researcher, counts of the number of words of various lengths (one-letter words, two-letter words, and so on) could be graphed to produce a consistent fingerprint for a writer -- as long as the samples were sufficiently large. A comparison of novels written by the same author shows few and small differences in word lengths while the comparison of the word lengths of novels written by different authors shows significant differences. Automatic statistical routines are applied to very large bodies of text to uncover the facts about language that no amount of manual searching could reveal. Computer analysis of electronic texts can make it easy to answer a series of questions that otherwise can be answered only by intuition, guess, or uncommonly mind-numbing research. The ten most frequently occurring words in Sri Guru Granth Sahib are displayed below.
Akhar

Some of the main features of the text analysis utility are:

  • It performs quantitative analysis of text and generates word-frequency lists, character frequency lists and other statistics such as count of running words and unique in a text, token by type ratio, mean word length, percentage frequency of each word length etc.

  • It can analyse the text encoded in UNICODE/ISCII or font encoded files stored in RTF/DOC/HTML formats as well as plain text files. Multiple files can also be selected

  • The word lists can be displayed on alphabetical order, occurance in text, frequency and word length. They can also be arranged in ascending or descending order.

  • It can analyse both English and Punjabi text and arrange the words in alphabetic order according to the text’s language. The Punjabi text, if font encoded, could be encoded in any of the popular font.

  • A concordance utility has also been provided (Fig. 6), which can prove very useful for context analysis. The user can search for any word in a document and all the occurrences of the word will be displayed in KWIC (Key Word in Context) format, where the main word is highlighted placed in centre along with its neighbouring text.

  • The frequency lists and statistics can be stored for future reference.


 
 

 

 
Web Designer Mehra Media