Saturday, February 28, 2009

tf-idf library

Recently I needed to use tf-idf for a personal project. I couldn't find a suitable library on the internet in python, without complicated dependencies. I ended up writing a simple one. Here it is, in case anyone else would find it useful.

http://code.google.com/p/tfidf/

No n-grams or stemming, but it computes basic tf-idf. Thanks to Alex for reviewing.

4 comments:

  1. A girl phoned me
    From the moon;
    Asking me for
    Fork and spoon.

    ReplyDelete
  2. I have been happy with Xapian and its python bindings, but it is an external dep on the Xapian C pkg.

    I recently heard about http://whoosh.ca/ but haven't tried it yet.

    -- Dave

    PS
    Can I give a shout out? Hi Carl.

    ReplyDelete
  3. From the wiki:

    The tf–idf weight (term frequency–inverse document frequency) is a weight often used in information retrieval and text mining.

    I wonder, how would you pronounce the term in conversation with others?

    "And would you believe, I spent half the friggin night rewriting those damned tee-eff-eye-dee-eff routines? (laughter) Wtf? You know, the uh, tiff-i-diff stuff from last night? (roaring laughter) Damn it! Fine, have it your way: Cool Whip."

    ReplyDelete
  4. @ Quinn: I usually just pronounce it tee-eff-eye-dee-eff. Just like it sounds. Its too difficult to try to get your point across making a word out of it :).

    Niniane, thanks for putting this library together! I'm working on a web mining undergrad research project and this is exactly what I was looking for! Nice work!

    ReplyDelete

Your comment will need approval before it is shown: