In my linguistics classes, we’re doing stuff with text processing/analysis software (like WordCruncher) that’s Windows-only, and it’s a shame. How hard would it be to write a text analysis engine in Perl or Python and a frontend in PyObjC? It can’t be that hard… Perhaps I’ll use that as my learning program for Python — start small and build up. So, the next question then is what text analysis software ought to do. There’s got to be a ton of different ways to look at a text computationally — wordprint analyses, statistics of various types, etc. The engine would also have to support tagging the text, so you could say “This word is a verb, 3rd person singular present active indicative” or “This is a conjunction” or whatever. I really only have experience with WordCruncher, but in my research class we looked at some Oxford tools a month or two ago which seemed to be the same sort of thing.
But in all reality, to make this project worth my time (and to keep my interest), it has to be something I care about. Adding statistics on end won’t cut it. So, what does it need to do to be useful? For me, it’d be nice to make a list of the top x (50 or 100 or 1000 or whatnot) words in a text. Foreign language texts are important to me as well, and I can see this tool (let’s call it Wordsmith for now) being of some use in preparing texts for Riverglen Press “publication.” In fact, that’s a good way to ensure that it will be useful, to me at least — focus it on helping with production for Riverglen Press. Excellent…
[tags]text processing, Mac, WordCruncher, Perl, Python, PyObjC[/tags]
