text analysis with visualization, co-word maps

These two programs are specialized derivates from Ti.Exe and FullText.Exe for the construction of words.dbf containing (without producing the cosine normalized matrix for analysis in Pajek):

A variable named “Chi_Sq” which provides Chi-square contributions for each of the variables; these are defined for word_i as Σ_iχ² = (Observed_ij – Expected_ij)² / Expected_in. In other words, the sum of the contributions over the column for the variable in each row (Mogoutov et al., 2008);
A variable named “ObsExp” which provides the sum of |Observed – Expected| for the word as a variable summed over the column;
A variable named “TfIdf” which use Salton & McGill’s (1983: 63) TermFrequency-InverseDocumentFrequency measure defined as follows: WEIGHT_ik = FREQ_ik * [log₂ (n) – log₂ (DOCFREQ_k)]. This function assigns a high degree of importance to terms occurring in only a few documents in the collection;
The word frequency within the set.

References:

- Magerman, T., Van Looy, B., & Song, X. (2007). Exploring the feasibility and accuracy of Latent Semantic Analysis based text mining techniques to detect similarity between patent documents and scientific publications. Paper presented at the 6th Triple Helix Conference, 16-19 May 2007, Singapore.

- Mogoutov, A., Cambrosio, A., Keating, P., & Mustar, P. (2008). Biomedical innovation at the laboratory, clinical and commercial interface: A new method for mapping research projects, publications and patents in the field of microarrays. Journal of Informetrics (In print); doi:10.1016/j.joi.2008.06.005.

- Salton, G. & M. J. McGill (1983). Introduction to Modern Information Retrieval. Auckland, etc.: McGraw-Hill.

Links to programs for (Porter’s) stemming:

http://maya.cs.depaul.edu/~classes/ds575/porter.html

http://snowball.tartarus.org/demo.php

Links to programs for parsing:

http://l2r.cs.uiuc.edu/~cogcomp/eoh/posdemo.html

http://l2r.cs.uiuc.edu/~cogcomp/shallow_parse_demo.php

http://nlp.stanford.edu:8080/parser/

http://alias-i.com/lingpipe/web/demos.html

php-versions of Porter’s stemmer:

http://www.chuggnutt.com/stemmer-source.php

http://www.phpguru.org/downloads/PorterStemmer/PorterStemmer.phps

http://webscripts.softpedia.com/scriptDownload/Porter-Stemming-Algorithm-Download-46193.html

return to home page