ISMIR 2008
My paper was accepted to ISMIR this year in Philadelphia. It uses the MajorMiner data we’ve collected to explore the relationship between different granularities of music metadata. That is to say, we compare the accuracy with which clip-level audio classifiers can be trained using ground truth data that is supplied at the artist, album, track, or clip level. There’s lots of music metadata supplied at one of these coarser granularities, e.g. pandora, Last.fm, the all music guide, which have described a much greater fraction of the artists out there than the tracks. This paper looks at the feasibility of using such data to train clip-level classifiers. Since the MajorMiner data is collected at the clip level, it’s easy to blur it out to tracks, albums, or artists, and also easy to evaluate the accuracy of the final clip classifiers. Here’s the abstract:
Multiple-instance learning algorithms train classifiers from lightly supervised data, i.e. labeled collections of items, rather than labeled items. We compare the multiple-instance learners mi-SVM and MILES on the task of classifying 10-second song clips. These classifiers are trained on tags at the track, album, and artist levels, or granularities, that have been derived from tags at the clip granularity, allowing us to test the effectiveness of the learners at recovering the clip labeling in the training set and predicting the clip labeling for a held-out test set. We find that mi-SVM is better than a control at the recovery task on training clips, with an average classification accuracy as high as 87% over 43 tags; on test clips, it is comparable to the control with an average classification accuracy of up to 68%. MILES performed adequately on the recovery task, but poorly on the test clips.