Autotagging

I’ve been looking at the data we’ve collected from MajorMiner recently and I’m quite excited by the results. The idea is to use all of the clips that have been tagged “rap” to train a classifier to recognize rap. When we do this for all of the tags with 50 or more clips on MajorMiner, we get 36 “autotaggers”, to use the term from Eck et al.’s NIPS paper.
This picture shows how well the classifiers were able to learn each of the tags. Random guessing will get you an accuracy of 0.5, and 0.59 is statistically significantly better than that. The little dots are the accuracy achievable when there’s only a little data to train on, and the dots get bigger as we add more data. It’s interesting that genre words are the easiest to classify and instrument words are the hardest, this is almost certainly because of the features we’re using, which capture the holistic timbre of a clip. Last year I ran some experiments with the MajorMiner data which gave me the impression that it wasn’t any good for training autotaggers, but I think there was a bug in my code because now it’s working great.
March 8th, 2008 at 7:11 pm
Hey Michael,
You mention that random guessing gives an accuracy of 0.5. Does this mean that you are testing the classifiers on sets that are designed to be split 50/50 between the two classes?
Do you have graphs of the ROC curves for these autotagging classifiers on randomly selected test examples? This is something we’re interested in too, but haven’t gotten to the point of results yet.
-Todd from Amie Street
March 10th, 2008 at 11:02 am
Hey Todd,
That’s right, for each class we’re testing and training on sets split 50/50 between positive and negative examples of that class. I haven’t generated any ROC curves yet, but they wouldn’t be too hard to whip up. When I’ve evaluated using precision-at-10 with all examples instead of classification accuracy on balanced sets, the performance has also been good, although it is correlated with the number of examples in a class.
April 1st, 2008 at 7:47 am
Hi Michael,
What software generated the plot? I like the antialiased fonts, etc. More generally I like how you did the visualization. Good work.
-Doug from University of Montreal / Sun Labs
April 2nd, 2008 at 4:14 pm
I actually did it in matlab, believe it or not. It’s amazing what you can get it to do with a lot of set(gca, …) calls. To get the anti-aliasing I just exported it as an eps and then converted it to png with ImageMagick. But I am happy with the way it turned out, glad you like it.