Autotagging

Classification accuracy

I’ve been looking at the data we’ve collected from MajorMiner recently and I’m quite excited by the results. The idea is to use all of the clips that have been tagged “rap” to train a classifier to recognize rap. When we do this for all of the tags with 50 or more clips on MajorMiner, we get 36 “autotaggers”, to use the term from Eck et al.’s NIPS paper.

This picture shows how well the classifiers were able to learn each of the tags. Random guessing will get you an accuracy of 0.5, and 0.59 is statistically significantly better than that. The little dots are the accuracy achievable when there’s only a little data to train on, and the dots get bigger as we add more data. It’s interesting that genre words are the easiest to classify and instrument words are the hardest, this is almost certainly because of the features we’re using, which capture the holistic timbre of a clip. Last year I ran some experiments with the MajorMiner data which gave me the impression that it wasn’t any good for training autotaggers, but I think there was a bug in my code because now it’s working great.

This entry was posted in research. Bookmark the permalink.

4 Responses to Autotagging

  1. Todd Lipcon says:

    Hey Michael,

    You mention that random guessing gives an accuracy of 0.5. Does this mean that you are testing the classifiers on sets that are designed to be split 50/50 between the two classes?

    Do you have graphs of the ROC curves for these autotagging classifiers on randomly selected test examples? This is something we’re interested in too, but haven’t gotten to the point of results yet.

    -Todd from Amie Street

  2. mim says:

    Hey Todd,

    That’s right, for each class we’re testing and training on sets split 50/50 between positive and negative examples of that class. I haven’t generated any ROC curves yet, but they wouldn’t be too hard to whip up. When I’ve evaluated using precision-at-10 with all examples instead of classification accuracy on balanced sets, the performance has also been good, although it is correlated with the number of examples in a class.

  3. Douglas Eck says:

    Hi Michael,

    What software generated the plot? I like the antialiased fonts, etc. More generally I like how you did the visualization. Good work.
    -Doug from University of Montreal / Sun Labs

  4. mim says:

    I actually did it in matlab, believe it or not. It’s amazing what you can get it to do with a lot of set(gca, …) calls. To get the anti-aliasing I just exported it as an eps and then converted it to png with ImageMagick. But I am happy with the way it turned out, glad you like it.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>