Recently I needed to use tf-idf for a personal project. I couldn't find a suitable library on the internet in python, without complicated dependencies. I ended up writing a simple one. Here it is, in case anyone else would find it useful.
http://code.google.com/p/tfidf/
No n-grams or stemming, but it computes basic tf-idf. Thanks to Alex for reviewing.
A girl phoned me
ReplyDeleteFrom the moon;
Asking me for
Fork and spoon.
I have been happy with Xapian and its python bindings, but it is an external dep on the Xapian C pkg.
ReplyDeleteI recently heard about http://whoosh.ca/ but haven't tried it yet.
-- Dave
PS
Can I give a shout out? Hi Carl.
From the wiki:
ReplyDeleteThe tf–idf weight (term frequency–inverse document frequency) is a weight often used in information retrieval and text mining.
I wonder, how would you pronounce the term in conversation with others?
"And would you believe, I spent half the friggin night rewriting those damned tee-eff-eye-dee-eff routines? (laughter) Wtf? You know, the uh, tiff-i-diff stuff from last night? (roaring laughter) Damn it! Fine, have it your way: Cool Whip."
@ Quinn: I usually just pronounce it tee-eff-eye-dee-eff. Just like it sounds. Its too difficult to try to get your point across making a word out of it :).
ReplyDeleteNiniane, thanks for putting this library together! I'm working on a web mining undergrad research project and this is exactly what I was looking for! Nice work!