The funny thing about my research, whether it’s music classification, source separation, or any other sort of machine learning task I can think of, is the difference between developing an algorithm and deploying it. It’s actually harder to develop an algorithm than it is to deploy it. To deploy an algorithm, if you’re shooting from the hip, you just need to build it and run it on the data you want to analyze. So if I want to develop a music classifier, I extract some features, train and classifier, and classify some music. To develop the algorithm, however, you need to do everything you would need to do to deploy it, but then you also need ground truth. That is to say that you need to know what answer you’re expecting before you get it, so you can tell how well you’re doing. So, paradoxically, I need to have my music already classified in order to see how well my classifier can classify it.
It was always clear to me that if you can get new ground truth data, you can do cool new things, but it took a while for it to sink in that you really can’t develop a system to solve a problem without having the problem already solved, in some sense. Of course, the power of machine learning comes from being able extrapolate results from a small subset of labeled data to an infinite amount of as-yet-unlabeled data. I can develop music classifiers (and pick the one that does best) using a small set of already-classified music and then use it to classify as much music as I want. The question is, is that as-yet-unlabeled data really that similar to the test set? When you have enough data to know the answer to that question, you probably have enough data to do pretty well with a basic classifier.
As an aside, I’m always highly doubtful of claims that computers can latch onto things beyond human perception. For watermarking, sure, it’s designed so that machines can perceive it, but people can’t. But when it comes to very human-grounded ideas like similarity, I think it is impossible to try to circumvent human “subjectivity”. There really is no objective measure of whether two sounds are similar besides the consistency in subjective ratings of human listeners. I think much of the trick of developing (provably) useful algorithms is defining problems that have objective solutions and then solving them objectively.