Off to Montreal

May 15th, 2009

Just when you thought I’d given up on this blog…big news! The grant I wrote to the Quebec government came through. I’m going to do a postdoc with Doug Eck at the Universite de Montreal. It should be very cool, I’ll be building up my machine learning chops working on autotagging and other music problems. I move up there in the fall. Now all I have to do is write my dissertation and learn French…

Outliers

January 11th, 2009

As I’ve said many times, I’m a big fan of Malcolm Gladwell’s essays, so when Uncle Wayne and Aunt Jane got me a gift certificate to Barnes and Noble for Hanukkah, I bought a copy of his new book, Outliers. I couldn’t put it down and it was a quick read, so I finished it in a few days. The book is broken into two halves. The first half explores the idea that successful people are successful because firstly they get lucky and secondly they work hard to take advantage of that luck. The second half explores the idea that different cultures are, in fact, different and that those differences have real effects over many generations. It’s linked to the first half in that these differences are intertwined with the lucky breaks people get.

While I enjoyed it, the book seemed a bit padded at times. There were tangential tables that took up multiple pages and the epilogue, an account of the lucky occurrences throughout Gladwell’s family tree, was a nice anecdote but didn’t really bring any more support to the thesis. A couple of the citations came from wikipedia. Certainly not worth the $29 list price, or even the $20 discount price, I’d recommend waiting for the paperback edition or a secondhand copy. There were, however, a number of choice Gladwellian factoids, which I will relate.

In the first half of the book, there were some interesting anecdotes about the lucky breaks that Bill Gates and Bill Joy got on their ways to the top, but the most interesting idea was the fact that there are certain birth months and years that are better than others for success in various fields. The first example of this is in sports, where a national birthday cutoff for kids leads to a disproportionate number of the best adult athletes being born just after that cutoff. The explanation is that the kids who are the oldest for their age group are the biggest and best and they get put on traveling teams, get more practice, play more, and get better coaching, which eventually leads to a significant advantage when they grow up. In English football, the cutoff is September 1st and, at one point in the mid 90s, the Premier League had twice as many players born in the three months after the cutoff than the three before it (nature article). This is apparently also true in elementary school where older 4th graders score better on math tests than younger classmates. This advantage even continues through college, where “students belonging to the relatively youngest group in their class are under-represented by about 11.6 percent.”

Gladwell also describes two cases where birth year gave people legs up. The first was in the software industry. The titans of which were disproportionately born in 1954 and 1955. Gladwell’s argument is that these people were 20 or 21 in 1975 when the Altaire 8800, the first personally-attainable computer, was released. The second is in the lawyers specializing in hostile takeovers in the 1970s, who were disproportionately born during the great depression in the 1930s when birth rates dropped significantly. This gave them better access to schools, colleges, and law schools, which had just been expanded to accommodate the previous generation.

The second half of the book focused on the measurable effects of differences between cultures. In an interesting, but less convincing argument, Gladwell claims that rice cultivation in paddies leads to more entrepreneurial farmers, while wheat and corn cultivation lead to stronger feudal hierarchies. Apparently rice cultivation is quite a tricky endeavor and yields are increased by leveling the ground in the paddy, maintaining the correct water level, using the right combination of rice strains, weeding thoroughly, and fertilizing properly. Rice landlords charged fixed rent, allowing rice farmers to profit from larger harvests while wheat landlords payed fixed wages regardless of yield.

Chinese rice farmers were able to grow rice all year round, harvesting and planting new seedlings two or three times a year. French peasants, on the other hand, planted in the spring, harvested in the fall, and hibernated through the winter. Rice paddies, furthermore, are enriched by nutrients in the irrigation and can be used continuously. Wheat and corn fields, on the other hand, are exhausted by agriculture and need to lie fallow every few years to recover. Gladwell suggests that this difference in farming practices led to opposing cultural analogies for human mental growth, and to differences national school schedules: the American school year is on average 180 days long, while the Japanese school year is 243 days long.

His slightly more convincing argument in this section was about plane crashes. Apparently, plane crashes are generally caused by the compounding of a number of small errors, a condition that is best mitigated by sharing responsibilities between the captain and the first officer. In cultures that have a great deal of respect for authority, such as Korea, the deference the first officers showed to captains tended to cause more crashes. Between 1988 and 1998, Korean Airlines lost 4.79 planes in accidents for every million departures. Compare that to United Airlines, which in the same period lost 0.27 planes in accidents for every million departures. By training its first officers to be more assertive when they noticed a problem, Korean Air has gotten these numbers in line with other carriers. An IBM psychology Geert Hofstede surveyed employees around the globe and used their answers to assemble a set of dimensions for measuring how cultures differ from each other, now known as Hofstede’s dimensions. Korea is apparently second from the top of Hofstede’s list in deference towards authority.

Overall it was a fun read, but I think I’m a bigger fan of Gladwell’s shorter writings.

Ground truth

December 1st, 2008

The funny thing about my research, whether it’s music classification, source separation, or any other sort of machine learning task I can think of, is the difference between developing an algorithm and deploying it. It’s actually harder to develop an algorithm than it is to deploy it. To deploy an algorithm, if you’re shooting from the hip, you just need to build it and run it on the data you want to analyze. So if I want to develop a music classifier, I extract some features, train and classifier, and classify some music. To develop the algorithm, however, you need to do everything you would need to do to deploy it, but then you also need ground truth. That is to say that you need to know what answer you’re expecting before you get it, so you can tell how well you’re doing. So, paradoxically, I need to have my music already classified in order to see how well my classifier can classify it.

It was always clear to me that if you can get new ground truth data, you can do cool new things, but it took a while for it to sink in that you really can’t develop a system to solve a problem without having the problem already solved, in some sense. Of course, the power of machine learning comes from being able extrapolate results from a small subset of labeled data to an infinite amount of as-yet-unlabeled data. I can develop music classifiers (and pick the one that does best) using a small set of already-classified music and then use it to classify as much music as I want. The question is, is that as-yet-unlabeled data really that similar to the test set? When you have enough data to know the answer to that question, you probably have enough data to do pretty well with a basic classifier.

As an aside, I’m always highly doubtful of claims that computers can latch onto things beyond human perception. For watermarking, sure, it’s designed so that machines can perceive it, but people can’t. But when it comes to very human-grounded ideas like similarity, I think it is impossible to try to circumvent human “subjectivity”. There really is no objective measure of whether two sounds are similar besides the consistency in subjective ratings of human listeners. I think much of the trick of developing (provably) useful algorithms is defining problems that have objective solutions and then solving them objectively.

ISMIR 2008

October 3rd, 2008

ISMIR was fun this year. I was pleasantly surprised by the quality of the papers, there were many solid experiments. I added a lot of them to my to-read list. I enjoyed hanging out with the ISMIR crowd, people I only get to see a few times a year or less. While I got a lot out of the conference, I regret not expanding my social circle more and not showing off my majorminer search demo more. I did, however, get to spend some quality time with my dad, my sister, and Joanne and I probably spent more time in West Philly than ever before.

There was an interesting panel on Commercial Music Discovery and Recommendation, which I found a little bit discouraging. The message seemed to be that academic research and corporate development are different things and shouldn’t be confused. Elias said something to the effect that given the choice between 10x more data and an algorithm that was a few percent better, he’d take the data. Brian said that companies do what they do pretty well and academics should focus on doing things that companies can’t. Anthony didn’t foresee developments in MIR affecting him very much, predicting instead that user interface was the area that could improve the online music experience the most.

For more thorough coverage of the conference, take a look at some of the other ISMIR attendes’ blogs. Google blog search pointed me to a number of posts (in pagerank order): Paul Lamere (one of many), Elias Pampalk, Michael Good, Justin Donaldson, Jeremy ?, Kris West, Luke Barrington, Matthias Mauch, and Karin Dressler (in German).

MajorMiner good news and bad news

September 10th, 2008

The bad news is that I’m having some DNS issues with majorminer.com. The good news, is that you can now access the same great MajorMiner game and search at majorminer.org.

The other good news is that there’s a new and improved search page. Hopefully it’s easier to understand what’s going on, I’ve even added a FAQ. If you look hard enough, you might also notice a new feature I’ve introduced: similarity browsing. For each clip that was autotagged, I computed a similarity value between its autotag vector and all of the others, finding the thirty or so nearest neighbors. You can follow this web of similarity around to some fun stuff, even though it’s not a huge collection of music. Here’s a random starting place.

This similarity is a semantic similarity, as Doug Turnbull likes to call it. The the clips might not have an exactly similar sound, the rhythms might not match, or they might differ in an instrument or two, but for the most part you would describe them with the same sorts of words. If you want to know what kinds of words we’re talking about, take a look at the autotagging results. Have a look and let me know what you think (you can leave a link to any exceptional results you find in the comments).

NYC Triathlon 2008

August 24th, 2008

Crossing the finish lineI did the New York City Triathlon again this year, it was fun, but tough. I didn’t train as much for it this time, especially the swim, but I did ok. This year Joanne, Mom, and Dad all came to the race to cheer me on and I’m really glad they did. You can see from this picture the effect a fan club can have on a runner. I was also very afraid of bonking this year, so I ate a lot, probably too much, in the days leading up to the race, but I did survive in the end.

The swim was pretty rough. I took a calculated risk and didn’t worry too much about training for the swim. It takes a lot of pool time to shave a minute or two off of a mile swim, which doesn’t matter so much in a two and a half hour race. As a result, I was passed by people from at least four different waves. It didn’t help that the current was much weaker this year than last (start time 7:15am), that there were big tentacly red jellyfish in the Hudson, or that my un-wetsuited jersey unzipped at some point. I only realized that my jersey had turned into a parachute as I got out of the water, at which point I also noticed the stinging pain in my nipples.

The bike was better than the swim. My glutes didn’t die and I managed to put up a pretty decent time without aero bars, even though it seemed like everyone else was riding a cervelo. They would pass me on the downhills, but I would pass them right back on the climbs, which there were quite a few of, as I’d learned last year. My glutes did start to heat up towards the end of the ride, but they kept working the whole time, perhaps because of the hill repeats I’d done in training or the extra effort I went through to properly fuel during the race.

On the run, I thought I was way behind. I was told around mile 2 that it was 9:30am, which I thought meant that my time was already 2:45. In reality, it meant that my time was 2:15, which I didn’t realize until around mile 5, when I passed another Columbian who had started in the wave before me. I felt fine running across the finish line, but as soon as I stopped running I nearly passed out. After I finished, Joanne told me the athlete-tracking text message system said I finished in 2:41 with a 28 minute swim. But my time kept improving after I’d finished, my official time was 2:37:19 with a 24 minute swim, the discrepancy being attributable to a late start. This is 2 minutes worse than last year, but the heat and humidity set the field back even more and I moved up from 95th to 52nd in my age group. Graham did quite well, even with an injured leg and his friend Matt Worstell finished 4th in our age group again this year.

For my reference later, there are lots of stats online from this year and last, some pictures taken by professionals, some pictures taken by my mom, and a video of me crossing the finish line.

Journal of New Music Research

August 14th, 2008

The journal version of lat year’s ISMIR paper is ready to be published. The main addition is an analysis of the tags we’ve collected with the game, including a comparison with tags for the same music from Last.fm. In these experiments, we compared the accuracy of classifiers trained on different tag corpora, which was a bit tricky. Since the Last.fm tags and the MajorMiner tags were not the same, we could only compare seven of them directly. For an overall comparison, we used the mean accuracy across tags, which is useful, but not terribly sophisticated. Here’s the abstract:

We have designed a web-based game, MajorMiner, that makes collecting descriptions of musical excerpts fun, easy, useful, and objective. Participants describe 10 second clips of songs and score points when their descriptions match those of other participants. The rules were designed to encourage players to be thorough and the clip length was chosen to make judgments objective and specific. To analyze the data, we measured the degree to which binary classifiers could be trained to spot popular tags. We also compared the performance of clip classifiers trained with MajorMiner’s tag data to those trained with social tag data from a popular website. On the top 25 tags from each source, MajorMiner’s tags were classified correctly 67.2% of the time, while the social tags were classified correctly 62.6% of the time.

MIREX Audio Tag Classification

August 13th, 2008

Every time I write about MajorMiner, people ask when I’m going to make the data publicly available. Well, I’m starting to do that by building a MIREX task around it. The task is officially called the Audio Tag Classification task and you can take a look at the details on its MIREX wiki page. As emails and conversations bounced around, it became not just a classification task, but also a retrieval task. Doug Turnbull formulated it well by breaking it down into three related tasks:

  1. Clip-Tag classification: determine whether each tag applies to each clip or not
  2. Clip retrieval: for each tag, rank the clips by their relevance
  3. Tag retrieval: for each clip, rank the tags by their relevance

There are only a few days left until the submission deadline, but if you want to throw something together, more submissions would be great. In case you’re wondering, the main contributors to the design of this task have been Kris West, Thierry Bertin-Mahieux, Doug Turnbull, and Greg Tsoumakas. Mert Bay is running things at IMIRSEL.

Interspeech 2008

August 12th, 2008

Ron had a paper accepted to Interspeech this year about adding speech models (source priors) to MESSL. It is entitled, “Source separation based on binaural cues and source model constraints.” As much as I’d like to go, Brisbane, Australia is a bit farther than Pittsburgh was. Here’s the abstract:

We describe a system for separating multiple sources from a two-channel recording based on interaural cues and known characteristics of the source signals. We combine a probabilistic model of the observed interaural level and phase differences with a prior model of the source statistics and derive an EM algorithm for finding the maximum likelihood parameters of the joint model. The system is able to separate more sound sources than there are observed channels. In simulated reverberant mixtures of three speakers the proposed algorithm gives a signal-to-noise ratio improvement of 2.1 dB over a baseline algorithm using only interaural cues.

ISMIR 2008

August 11th, 2008

My paper was accepted to ISMIR this year in Philadelphia. It uses the MajorMiner data we’ve collected to explore the relationship between different granularities of music metadata. That is to say, we compare the accuracy with which clip-level audio classifiers can be trained using ground truth data that is supplied at the artist, album, track, or clip level. There’s lots of music metadata supplied at one of these coarser granularities, e.g. pandora, Last.fm, the all music guide, which have described a much greater fraction of the artists out there than the tracks. This paper looks at the feasibility of using such data to train clip-level classifiers. Since the MajorMiner data is collected at the clip level, it’s easy to blur it out to tracks, albums, or artists, and also easy to evaluate the accuracy of the final clip classifiers. Here’s the abstract:

Multiple-instance learning algorithms train classifiers from lightly supervised data, i.e. labeled collections of items, rather than labeled items. We compare the multiple-instance learners mi-SVM and MILES on the task of classifying 10-second song clips. These classifiers are trained on tags at the track, album, and artist levels, or granularities, that have been derived from tags at the clip granularity, allowing us to test the effectiveness of the learners at recovering the clip labeling in the training set and predicting the clip labeling for a held-out test set. We find that mi-SVM is better than a control at the recovery task on training clips, with an average classification accuracy as high as 87% over 43 tags; on test clips, it is comparable to the control with an average classification accuracy of up to 68%. MILES performed adequately on the recovery task, but poorly on the test clips.