NYC Triathlon 2008

August 24th, 2008

Crossing the finish lineI did the New York City Triathlon again this year, it was fun, but tough. I didn’t train as much for it this time, especially the swim, but I did ok. This year Joanne, Mom, and Dad all came to the race to cheer me on and I’m really glad they did. You can see from this picture the effect a fan club can have on a runner. I was also very afraid of bonking this year, so I ate a lot, probably too much, in the days leading up to the race, but I did survive in the end.

The swim was pretty rough. I took a calculated risk and didn’t worry too much about training for the swim. It takes a lot of pool time to shave a minute or two off of a mile swim, which doesn’t matter so much in a two and a half hour race. As a result, I was passed by people from at least four different waves. It didn’t help that the current was much weaker this year than last (start time 7:15am), that there were big tentacly red jellyfish in the Hudson, or that my un-wetsuited jersey unzipped at some point. I only realized that my jersey had turned into a parachute as I got out of the water, at which point I also noticed the stinging pain in my nipples.

The bike was better than the swim. My glutes didn’t die and I managed to put up a pretty decent time without aero bars, even though it seemed like everyone else was riding a cervelo. They would pass me on the downhills, but I would pass them right back on the climbs, which there were quite a few of, as I’d learned last year. My glutes did start to heat up towards the end of the ride, but they kept working the whole time, perhaps because of the hill repeats I’d done in training or the extra effort I went through to properly fuel during the race.

On the run, I thought I was way behind. I was told around mile 2 that it was 9:30am, which I thought meant that my time was already 2:45. In reality, it meant that my time was 2:15, which I didn’t realize until around mile 5, when I passed another Columbian who had started in the wave before me. I felt fine running across the finish line, but as soon as I stopped running I nearly passed out. After I finished, Joanne told me the athlete-tracking text message system said I finished in 2:41 with a 28 minute swim. But my time kept improving after I’d finished, my official time was 2:37:19 with a 24 minute swim, the discrepancy being attributable to a late start. This is 2 minutes worse than last year, but the heat and humidity set the field back even more and I moved up from 95th to 52nd in my age group. Graham did quite well, even with an injured leg and his friend Matt Worstell finished 4th in our age group again this year.

For my reference later, there are lots of stats online from this year and last, some pictures taken by professionals, some pictures taken by my mom, and a video of me crossing the finish line.

Journal of New Music Research

August 14th, 2008

The journal version of lat year’s ISMIR paper is ready to be published. The main addition is an analysis of the tags we’ve collected with the game, including a comparison with tags for the same music from Last.fm. In these experiments, we compared the accuracy of classifiers trained on different tag corpora, which was a bit tricky. Since the Last.fm tags and the MajorMiner tags were not the same, we could only compare seven of them directly. For an overall comparison, we used the mean accuracy across tags, which is useful, but not terribly sophisticated. Here’s the abstract:

We have designed a web-based game, MajorMiner, that makes collecting descriptions of musical excerpts fun, easy, useful, and objective. Participants describe 10 second clips of songs and score points when their descriptions match those of other participants. The rules were designed to encourage players to be thorough and the clip length was chosen to make judgments objective and specific. To analyze the data, we measured the degree to which binary classifiers could be trained to spot popular tags. We also compared the performance of clip classifiers trained with MajorMiner’s tag data to those trained with social tag data from a popular website. On the top 25 tags from each source, MajorMiner’s tags were classified correctly 67.2% of the time, while the social tags were classified correctly 62.6% of the time.

MIREX Audio Tag Classification

August 13th, 2008

Every time I write about MajorMiner, people ask when I’m going to make the data publicly available. Well, I’m starting to do that by building a MIREX task around it. The task is officially called the Audio Tag Classification task and you can take a look at the details on its MIREX wiki page. As emails and conversations bounced around, it became not just a classification task, but also a retrieval task. Doug Turnbull formulated it well by breaking it down into three related tasks:

  1. Clip-Tag classification: determine whether each tag applies to each clip or not
  2. Clip retrieval: for each tag, rank the clips by their relevance
  3. Tag retrieval: for each clip, rank the tags by their relevance

There are only a few days left until the submission deadline, but if you want to throw something together, more submissions would be great. In case you’re wondering, the main contributors to the design of this task have been Kris West, Thierry Bertin-Mahieux, Doug Turnbull, and Greg Tsoumakas. Mert Bay is running things at IMIRSEL.

Interspeech 2008

August 12th, 2008

Ron had a paper accepted to Interspeech this year about adding speech models (source priors) to MESSL. It is entitled, “Source separation based on binaural cues and source model constraints.” As much as I’d like to go, Brisbane, Australia is a bit farther than Pittsburgh was. Here’s the abstract:

We describe a system for separating multiple sources from a two-channel recording based on interaural cues and known characteristics of the source signals. We combine a probabilistic model of the observed interaural level and phase differences with a prior model of the source statistics and derive an EM algorithm for finding the maximum likelihood parameters of the joint model. The system is able to separate more sound sources than there are observed channels. In simulated reverberant mixtures of three speakers the proposed algorithm gives a signal-to-noise ratio improvement of 2.1 dB over a baseline algorithm using only interaural cues.

ISMIR 2008

August 11th, 2008

My paper was accepted to ISMIR this year in Philadelphia. It uses the MajorMiner data we’ve collected to explore the relationship between different granularities of music metadata. That is to say, we compare the accuracy with which clip-level audio classifiers can be trained using ground truth data that is supplied at the artist, album, track, or clip level. There’s lots of music metadata supplied at one of these coarser granularities, e.g. pandora, Last.fm, the all music guide, which have described a much greater fraction of the artists out there than the tracks. This paper looks at the feasibility of using such data to train clip-level classifiers. Since the MajorMiner data is collected at the clip level, it’s easy to blur it out to tracks, albums, or artists, and also easy to evaluate the accuracy of the final clip classifiers. Here’s the abstract:

Multiple-instance learning algorithms train classifiers from lightly supervised data, i.e. labeled collections of items, rather than labeled items. We compare the multiple-instance learners mi-SVM and MILES on the task of classifying 10-second song clips. These classifiers are trained on tags at the track, album, and artist levels, or granularities, that have been derived from tags at the clip granularity, allowing us to test the effectiveness of the learners at recovering the clip labeling in the training set and predicting the clip labeling for a held-out test set. We find that mi-SVM is better than a control at the recovery task on training clips, with an average classification accuracy as high as 87% over 43 tags; on test clips, it is comparable to the control with an average classification accuracy of up to 68%. MILES performed adequately on the recovery task, but poorly on the test clips.

Underworld

July 29th, 2008

I’ve wanted to read one of Don Delillo’s books for a while and finally got around to reading Underworld. It’s long, but I’d say it’s worth it. For a masculine, post-modern book with a backwards-flowing timeline, I found it quite easy to follow, mainly because dates are included and there’s plenty of repetition to piece the stories together.

In a sentence, it’s a story about a baseball, a guy from the bronx, America during the cold war, and waste. The first chapter is most of the story about the baseball, and it’s excellent. Either it was turned into a novella after Underworld was published or it was a novella that Underworld was built around. It describes in astonishing detail the home run hit by Bobby Thompson to win the national league pennant in 1951. I couldn’t put it down.

That first chapter exemplifies my impression of the book, that it is extremely well-constructed. The characters are interesting and deep, and the descriptions are rich if sometimes over the top. It’s a solid novel, you see where it’s going and it takes you there with the inevitability of sunrise after dusk, the methodicalness of a marathoner.

This inevitability, however, eventually became ponderousness and the backwards timeline ended up detracting from the novel. Delillo fell victim to his own writing talent. The beginning sections of the book were so well written and so engrossing, I wanted to know what happened next. But instead, I had to read 700 pages of why it happened, the backstory. There’s something of a reprieve in the epilogue, set in the present, but it’s not much of a payoff. While it is interesting to see the years of a character’s life peeled away, it certainly wasn’t as engrossing as it might have been.

Technical difficulties over

July 27th, 2008

You may have noticed a few weeks ago that mr-pc.org was up and down, mostly down, for a number of reason. First my subletter turned off the computer, and when it was turned back on, it had a different internal IP address and I had to talk my roommate through changing the settings. Then, electricians were installing new sockets and lights and things in the apartment and they needed to turn off the power two days from 9 to 5. The second day it came back up, it was in need of an fsck and I was still away for a week. When I got back I fixed it, but it started crashing every day or two and I knew it was on its last legs.

So I ordered a new Dell desktop with ubuntu pre-installed. It’s up and running now and all of my data is transfered over. It’s got a pentium dual core, 2 gb of ram, and a 320 gb hard drive so it can serve as my general compute server as well. And it only cost about $400 with tax and shipping. The old machine has served me well, but will have to be retired, although the hard drive lives on (for now) in an external enclosure. Hopefully this one will last me eight years as well.

Confessions of an Economic Hit Man

June 23rd, 2008

I’d been eying John Perkins’ Confessions of an Economic Hit Man for a long time, so I decided to swallow hard and buy it new. It would have been worth it to get it used. It’s mostly an autobiography, the writing is not great, and Perkins has gone a bit new age-y since renouncing his former career. But the glimpses inside the international “aid” business and the bits of lesser-known Latin American history are the interesting parts.

As he describes it, the IMF and Wold Bank provide loans to developing countries to bring them into the US’s sphere of influence. These loans pay for infrastructure projects like dams, improvements to electrical grids, highways, etc. These projects are forecast (by an economic hit man) to spur massive, sustained GDP growth, which would allow the country to pay off a big loan. When the country gets the loan, it immediately uses it to pay American contracting firms like Bechtel to actually build the project. The local elite-run utilities get their upgrades and the growth forecast turns out to be much too optimistic, leaving the country with an unsustainable debt that must be paid back in favors. Countries and rulers amenable to such a system are termed “friendly”, rulers who resist always manage to be on the plane that crashes in the rain forest. The whole story sounds believable to me and agrees with the Latin American history he describes.

Broken sound card

May 21st, 2008

My desktop is getting a little old. I got it the summer before I went to college, which means its 8th birthday is coming up in August. It’s still plugging along, though, hosting this blog, among other things with a whopping 384 MB of rambus ram. Aside from the memory, it’s starting to show its age in other ways, like the sound card going on the fritz.

It all started when I went to check out guitarati.com. I clicked on a “play” link and garbage came out the speakers. It wasn’t complete garbage, it sounded like music in the most tenuous of ways, but it wasn’t supposed to sound like that. “Too bad that website’s sound is broken,” I thought, except that everything I played after that sounded garbled in the same way. Working with sound and computers for a living, I figured I should investigate the problem. My first thought was that something was getting the endianness of the 16-bit samples wrong, like matlab did (does?) on linux. Trying a variety of endianness styles in playback, however, didn’t solve the problem.

Original chirp Recorded chirp

My next thought was to record the weirdness, and the mic seemed to be working fine (although I couldn’t hear the playback to be certain). I constructed a simple chirp in the linear algebra program octave, the spectrogram of which can be seen at the left. What I recorded coming out of my speakers was very different from that, as can be seen from the spectrogram on the right. Disregard the gradual changes in color, the important thing to notice is that instead of just one line sloping up in the low frequencies, there are suddenly lines every 1380 Hz sloping up and down from common beginnings. The multiple horizontal repetitions are just repeated playbacks.

This repetition in frequency indicated that the sound coming out of the speakers had a bandwidth of only 690 Hz, instead of the requested 22050 Hz. Furthermore, playing the sound back at different sampling rates changed the bandwidth of the signal coming out of the speakers. The ratio of requested bandwidth to actual bandwidth turned out to be almost exactly 32:1. This sort of replication happens when upsampling a signal (pictures from the time and frequency domains), and seems to indicate that 31 of every 32 samples were being set to 0, with the horrible distortion coming from the resulting aliasing.

Even with a pretty good idea of what was happening, I still couldn’t figure out why it was happening. I tried resetting the various sound drivers, restarting the computer, even booting off a LiveCD, and nothing worked. I purchased a USB sound card (for $7 + $6 shipping), plugged it in, and could hear again. It’s a cute little device, basically just a USB plug on one end and a headphone and mic jack on the other. It was quite a relief to be able to watch my backlog of youtube videos, listen to music, and stream NPR while I was working out. Sound is pretty handy, as it turns out.

The question still remains, whether what happened to my internal sound card was a hardware failure or a just a bad setting. There aren’t that many reports online of the hardware on a sound card failing, it’s not like there are moving parts. If you have any insight, I’m all ears.

Recti-Linear room simulator

May 18th, 2008

Spectrogram of simulated impulse response For studying auditory localization and separation, it’s very important to have realistic spatial recordings. These can come from a number of sources, what I’ve been using in the past is a collection of impulse responses recorded through a KEMAR dummy in a real classroom. Each impulse response allows a sound to be simulated for one particular listener and source position in the room. While these are very realistic, they are time-consuming to record. The particular binaural impulse responses I’ve been using were recorded by Tim Streeter in Barbara Shinn-Cunningham’s lab at BU.

The next-best way to create spatial recordings is to simulate these binaural impulse responses. There are a couple of decent packages out there for doing this. Stephen McGovern’s rir is very fast, but only creates a bare-bones impulse response. Douglas Campbell et al.’s roomsim includes lots of features to make the impulse responses realistic, but it is very slow and can only be driven through a GUI.

I’ve written a new room simulator in the same spirit, but combining the best features of both of these. It’s called rlrs, short for “recti-linear room simulator” and you can download it here (3.6 MB). I’m releasing it under the GPLv3. Here’s the intro from the README:

This code will generate binaural impulse responses from a simulation of the acoustics of a rectilinear room using the image method. It has a number of features that improve the realism and speed of the simulation. It can generate a pair of 680 ms impulse responses sampled at 22050 Hz in 75 seconds on a 1.8 GHz Intel Xeon. It’s easy to run from within scripts to generate a large set of impulse responses programmatically.

To improve the realism, it applies anechoic head-related transfer functions to each incoming reflection, allows fractional delays, includes frequency-dependent absorption due to walls, includes frequency- and humidity-dependent absorption due to air, and varies the speed of sound with temperature. It also randomly perturbs sources in proportion to their distance to the listener to simulate imperfections in the alignment of the walls.

To improve simulation speed, it performs all calculations in the frequency domain and the complex exponential generation code is written in C, it only calculates the Fourier transforms of anechoic HRTFs as it needs them, and then it caches them, and it culls sources that are beyond the desired impulse response length or are significantly quieter than the direct path.