Broken sound card

May 21st, 2008

My desktop is getting a little old. I got it the summer before I went to college, which means its 8th birthday is coming up in August. It’s still plugging along, though, hosting this blog, among other things with a whopping 384 MB of rambus ram. Aside from the memory, it’s starting to show its age in other ways, like the sound card going on the fritz.

It all started when I went to check out guitarati.com. I clicked on a “play” link and garbage came out the speakers. It wasn’t complete garbage, it sounded like music in the most tenuous of ways, but it wasn’t supposed to sound like that. “Too bad that website’s sound is broken,” I thought, except that everything I played after that sounded garbled in the same way. Working with sound and computers for a living, I figured I should investigate the problem. My first thought was that something was getting the endianness of the 16-bit samples wrong, like matlab did (does?) on linux. Trying a variety of endianness styles in playback, however, didn’t solve the problem.

Original chirp Recorded chirp

My next thought was to record the weirdness, and the mic seemed to be working fine (although I couldn’t hear the playback to be certain). I constructed a simple chirp in the linear algebra program octave, the spectrogram of which can be seen at the left. What I recorded coming out of my speakers was very different from that, as can be seen from the spectrogram on the right. Disregard the gradual changes in color, the important thing to notice is that instead of just one line sloping up in the low frequencies, there are suddenly lines every 1380 Hz sloping up and down from common beginnings. The multiple horizontal repetitions are just repeated playbacks.

This repetition in frequency indicated that the sound coming out of the speakers had a bandwidth of only 690 Hz, instead of the requested 22050 Hz. Furthermore, playing the sound back at different sampling rates changed the bandwidth of the signal coming out of the speakers. The ratio of requested bandwidth to actual bandwidth turned out to be almost exactly 32:1. This sort of replication happens when upsampling a signal (pictures from the time and frequency domains), and seems to indicate that 31 of every 32 samples were being set to 0, with the horrible distortion coming from the resulting aliasing.

Even with a pretty good idea of what was happening, I still couldn’t figure out why it was happening. I tried resetting the various sound drivers, restarting the computer, even booting off a LiveCD, and nothing worked. I purchased a USB sound card (for $7 + $6 shipping), plugged it in, and could hear again. It’s a cute little device, basically just a USB plug on one end and a headphone and mic jack on the other. It was quite a relief to be able to watch my backlog of youtube videos, listen to music, and stream NPR while I was working out. Sound is pretty handy, as it turns out.

The question still remains, whether what happened to my internal sound card was a hardware failure or a just a bad setting. There aren’t that many reports online of the hardware on a sound card failing, it’s not like there are moving parts. If you have any insight, I’m all ears.

Recti-Linear room simulator

May 18th, 2008

Spectrogram of simulated impulse response For studying auditory localization and separation, it’s very important to have realistic spatial recordings. These can come from a number of sources, what I’ve been using in the past is a collection of impulse responses recorded through a KEMAR dummy in a real classroom. Each impulse response allows a sound to be simulated for one particular listener and source position in the room. While these are very realistic, they are time-consuming to record. The particular binaural impulse responses I’ve been using were recorded by Tim Streeter in Barbara Shinn-Cunningham’s lab at BU.

The next-best way to create spatial recordings is to simulate these binaural impulse responses. There are a couple of decent packages out there for doing this. Stephen McGovern’s rir is very fast, but only creates a bare-bones impulse response. Douglas Campbell et al.’s roomsim includes lots of features to make the impulse responses realistic, but it is very slow and can only be driven through a GUI.

I’ve written a new room simulator in the same spirit, but combining the best features of both of these. It’s called rlrs, short for “recti-linear room simulator” and you can download it here (3.6 MB). I’m releasing it under the GPLv3. Here’s the intro from the README:

This code will generate binaural impulse responses from a simulation of the acoustics of a rectilinear room using the image method. It has a number of features that improve the realism and speed of the simulation. It can generate a pair of 680 ms impulse responses sampled at 22050 Hz in 75 seconds on a 1.8 GHz Intel Xeon. It’s easy to run from within scripts to generate a large set of impulse responses programmatically.

To improve the realism, it applies anechoic head-related transfer functions to each incoming reflection, allows fractional delays, includes frequency-dependent absorption due to walls, includes frequency- and humidity-dependent absorption due to air, and varies the speed of sound with temperature. It also randomly perturbs sources in proportion to their distance to the listener to simulate imperfections in the alignment of the walls.

To improve simulation speed, it performs all calculations in the frequency domain and the complex exponential generation code is written in C, it only calculates the Fourier transforms of anechoic HRTFs as it needs them, and then it caches them, and it culls sources that are beyond the desired impulse response length or are significantly quieter than the direct path.

Thinking about ideas

May 17th, 2008

Marios pointed me to this article from the New Yorker, again by Malcolm Gladwell. I’ve been thinking about many of these ideas lately, and Gladwell seems to have come up with them at the same time…

The article is both about coincidences in simultaneous, independent inventions and a company called Intellectual Ventures (IV) with the business model of generating, patenting, and licensing lots of ideas. It’s a story that appeals to everybody who thinks, “I could do that, I have lots of ideas.” My problem with this presentation of the business model is that it’s misleading. Everyone loves coming up with new ideas, so everyone does, and as a result, ideas are cheap. The real business, however, is being able to turn an idea into something that people can and do use. I would guess that if IV is going to succeed, it will be because they’re able to make ideas useful and used and not just because they’re able to come up with new inventions. They appear to have all of the accompanying things necessary to get ideas off of the ground: funding to pay lawyers to file 500 patents a year, research before and after brainstorming sessions, and connections to people who want the ideas. It doesn’t hurt if Bill Gates is pushing your patent, either.

It seems like many people believe that there is a person called an inventor, and this person comes up with ideas, which they send out into the world. These ideas then supply money without any additional work required of the inventor and the only thing that keeps the inventor inventing is a need to tinker or to fatten the royalty stream. This seems very naive to me. As Marios has said many times before, ideas are cheap, what’s expensive is the execution and the follow-through, and what’s risky is whether people will find the invention useful and actually use it.

It seems like a person with answers or potential answers can get around some of these issues by going to a person with a problem. That a person has a problem indicates that a solution of that problem ought to be useful and used. A person with a problem has probably looked around for solutions. The more they need their problem solved, the more thorough they’ve been in their search. If you’re interested in science, then a person with a problem indicates a problem that probably hasn’t been solved yet and would be worth solving.

When I come up with ideas, just about all of them have been thought of before. I enjoy the feeling of elation that comes from discovering something new to me and running through the implications of it, and it’s always disappointing when the internet tells me it’s already been invented. Even so, it’s fun to find out when the idea was invented. “Oh, that was a 1962 idea, that one was a 1998 idea, this one’s a 2005 idea.” I find that as I’ve gone through school, my ideas have been catching up with the present. In a class, it’s a matter of extrapolating from one lecture to come up with the idea that might be presented in the next lecture. Of course, the next lecture will present it with more of the implications worked out, because other people have been thinking about it for more than a week. This process has also made me think about the relationship between my personal history of learning and the collective history of science and how something that’s new to me generally isn’t new to science. Every once in a while, though, it will be, especially in areas where I’m “caught up” with science, whatever that means.

As an aside, Malcolm Gladwell has obviously never experienced the Pfaffian, or he would have included Pfaff in the section on eponymous inventions and not just the section on second-tier scientists.

MajorMiner music search

April 28th, 2008

We’ve started using the data that we collected through the MajorMiner game. We’re using it in two ways: making it searchable directly, and training autotaggers with it. The human search finds all of the clips that have had a particular tag applied to them by at least two people, sorted by the number of times it’s been applied. You can type a search directly into the search box, or browse through the top few. People are pretty good at finding things in music, as it turns out, check out british, u2, tambourine, and scratch. This search also takes advantage of the newly introduced canonicalization of tags, so that funk matches funky. But there are always ambiguity issues, e.g. club as lyric vs genre.

The machine search is a little more involved. We took all of the tags that had been applied to enough (35) clips and used them to train classifiers. Actually, we only used clips from half of the artists in our collection to train the classifiers, then we ranked all of the clips from the rest of the artists by each classifier’s output. This means we can look at all of those clips sorted by how much they appeal to the rap classifier, the saxophone classifier, the house classifier, and so on. I like how the guitar classifier catches Outkast’s acoustic guitar (!), but also the Jesus and Mary Chain’s fuzzed out guitar. For those of you interested in the details, we have a couple of papers that we’ve submitted recently describing them, but the gist is that we’re using the features from last MIREX and the usual SVM classifier.

Some thought went into the ranking of the tags on the main search page as well. Since we know the answers for some of the clips in the test set, and we ranked the tags by how well their classifier was able to learn them. Actually, we used a Bayesian estimate of the classification accuracy from the beta-binomial model to do the ranking more intelligently. The basic idea is that test accuracy is measured more accurately for tags with a lot of test examples, and less accurately for tags with few test examples. The measured accuracy of tags are then shrunk towards the overall mean accuracy in proportion to how well the model thinks they are estimated. So even though club has a better raw accuracy than rap, it was tested on many fewer examples, so it ends up below rap in the final ranking, i.e. the raw accuracy is more likely a random fluctuation than a meaningful result.

So go check out some of the creative ways our players have found to describe music, and describe some music yourself!

No Logo

April 27th, 2008

For Christmas, Joanne gave me Naomi Klein’s book No Logo. She described it as a bible for a generation of activists, and I can see why. It’s very well researched, reasoned, and written. Its one major flaw is that being about trends, the examples that it uses are a bit dated even just eight years after its publication. Discussions about the potential for the internet to empower citizens against megabrands sound very web-1.0. The book itself is put together in a very brand-savvy way, with the punchy section headings: no space, no choice, no jobs, and no logo. There’s even a picture of a toddler wearing a No Logo sweatshirt. The FAQ offers an explanation.

The book is about the creation of megabrand companies in the 80s and 90s and the lengths that they have gone to to increase their visibility while simultaneously cutting costs. While it includes powerful stories about the working conditions in the factories where these products are made, they were confined to only one of the four sections. Workers were subject to forced unpaid overtime, public humiliation, sexual harassment, a lack of security, a lack of safety, the constant threat of jobs moving away, and union busting. I expected that to take up most of the book, but the rest of the book tells the rest of the story, including analysis and arguments that were well-reasoned, thorough, and less radical than I expected.

The premise of the megabrand strategy, although not explicitly stated in the book, is that companies like the Gap, Nike, McDonalds, Starbucks, etc will apply as much marketing as it takes to propel their commodities out of the market of commodities. More precisely, since they have generally stopped manufacturing their own products, they purchase commodities, but sell them at premium branded prices. The message seems to be “raise you product above market forces, but force your suppliers to compete.” After all, free markets are only great when they happen to other people.

The idea of commoditization permeates these companies, not just in buying the products that they will transmute, but in their relationships with employees, and with the countries hosting their commoditized production firms. By threatening countries with the loss of factories, they earn concessions in tax abatements, lax labor laws, and negligent enforcement. By threatening manufacturers with competition from other manufacturers, they squeeze everyone else’s margins as thin as possible. By threatening employees with replacement by temps, they depress wages. Of course, the savvy employee can turn the tables and make a brand out of him or herself. If one is no longer a commodity code monkey, but the expert in a particular problem, solution, language, etc, then one can command a brand name premium. This is a great way to look out for number one, but seems less sustainable for everyone.

Although mentioned only briefly, one interesting point she raised was the difference between consumerism and citizenship. By her definition, consumers judge and criticize, knowing the minutest details that distinguish two products, yet always remaining within the bounds defined by the marketing of those products. Citizens, in her terminology, create original thoughts, processes, works, etc. I’d recommend this book to any citizen that wants to read a well researched, but slightly dated account of megabrands.

The Puzzling Nature of Success in Cultural Markets

April 26th, 2008

Matthew Salganik gave a talk in the EE department with this title. He got his PhD in sociology last year under Duncan Watts, studying the (un)predictability of hits, blockbusters, best-sellers, etc. You probably read about it. If not, the basic idea is that they set up a website where people would come and listen to music and examined the influence of popularity on people’s listening habits. We’re not talking millions of songs here, just 48 chosen pretty much at random from unknown bands on PureVolume. Users could listen to any song and then after listening to it had the opportunity to download it.

As users arrived, they were assigned to one of eight completely separate “worlds”. In seven of these worlds, the users could see how many times each song had been downloaded, the last world served as a control in that users couldn’t see how popular each song was. The punchline is that in the worlds where people could influence each other, popular songs were downloaded a lot, but different songs became popular in each world. In the control group, some songs were still downloaded more than others, but the difference wasn’t as striking.

Popularity vs quality

The graph from the talk that really stuck with me was this one, taken from their Science paper. It shows the marketshare of each song in the control world versus its marketshare in each of the seven influence worlds. The marketshare in the control world is taken as an un-influenced measure of quality, while the marketshare in the influence worlds are taken as measures of popularity. What you can see is a triangular shape indicating that the “bad” songs were unpopular in all worlds, while the “good” songs were only popular in some of the worlds. Sagalnik said that this agreed with what people in hit-based industries told them, that it’s easy to predict what won’t be a hit, but hard to predict what will.

Ormia ochracea

April 7th, 2008

O. ochracea operates a trackball I was reading Fay and Popper’s book Sound source localization and came across an awesome chapter about the auditory localization abilities of insects by Daniel Robert. With a distance of just millimeters between their ears of the same size, they’re able to hear and localize sounds with wavelengths of centimeters. That’s pretty amazing in itself, but one little fly can do even better. Ormia ochracea is a parasitoid fly that lays its eggs in crickets. Crickets make a lot of noise, so O. ochracea uses that to find them, dive bomb them, and spray them with eggs.

Müller and Robert were intrigued and wanted to see how well O. ochracea could do just with its ears. They put it in a room, set up some infrared cameras to track it, and turned off the lights. When they played cricket sounds out of a speaker four meters away from the fly, it took off, flew to just over the speaker, then spiraled down and landed right on it, slightly miffed. Then they repeated the experiment, but shut the speaker off while the fly was in mid-air. The incredible thing was, it was still able to land right on target. This means that the fly can not only judge the three-dimensional position of the sound source, but it can remember where the sound source is and control its flight well enough to land right on the mark. It’s amazing what you can do with only a few hundred auditory neurons.

Auditory transduction video

April 6th, 2008

the inner ear This semester I’m co-teaching Dan’s class ELEN 6820, Speech and Audio Processing and Recognition. It’s fun to finally get to lecture and to improve the course a little. One thing I’m happy I brought to the course was a video on auditory physiology that Ian Shipsey showed in his talk on cochlear implants. The video is entitled “Auditory Transduction” and it was made by medical illustrator Brandon Pletsch as a final project at the Medical College of Georgia. Apparently he dissected cadaver ears to get a sense of how everything is set up.

It starts at the outer ear, flies up to the ear drum, shows how it works, and then flies in to the middle ear. It shows the middle ear being driven by the ear drum and driving the inner ear. It shows waves traveling on the cochlea, and then zooms in on the organ of Corti, where it shows the basilar membrane and the tectorial membrane. Then it zooms in on the inner and outer hair cells and describes how they work. While I had a pretty good idea of the physiology of hearing before I saw the video, seeing it in a 3D really cleared a few things up, mainly in the relative shapes and positions of structures. The illustrations of fluid motion were great, the only thing I didn’t find convincing was the wave motion of the basilar membrane. When Beethoven’s 9th was piped in, the responded like it was reading the score as opposed to the spectrogram.

The video won first place in the 2003 science and engineering visualization challange from the NSF and Science magazine. We found Mr Pletsch on LinkedIn and Dan emailed him asking if we could show the video in class. He agreed and sent us the link. It’s not on youtube, but it seems like it could be very popular. I wish it was easier to find, especially since it won this contest sponsored by the NSF, but who knows what the copyright arrangements are. There are a few pictures and short clips on the NSF website, I suppose it was harder to post videos online when they put it together in 2003.

What is this I’m listening to?

April 4th, 2008

I’ve been listening to last.fm a lot recently, especially while I’m working. My goal has been mostly to listen to music, but also to be exposed to new music. The problem is that if I hear something new that I like, I need to stop working, flip desktops to firefox, flip tabs to last.fm, and look down the page for the artist’s name. I could use the pop-out player, but then I’d still have to wait for the little scroll-y display to come around to the artist. I have the same experience with my iPod, having to dig it out of my pocket to see what’s playing.

Since I’m already listening to the music, what I’d like is an auditory display of what’s playing, that is to say I’d like a DJ to tell me what I’m listening to. It could say only the artist’s name to minimize interruptions or it could include the album and track name if I’m in a more interested mood. It could be machine generated, or it could be pre-recorded, maybe by the bands themselves. It would help me learn to pronounce band names as well, e.g. !!!. It would be much less annoying than listening to DJs on the radio without the clearchannel audio logos, station IDs, or whatever you call them (cue Family Guy joke).

Betwittered

March 12th, 2008

A bunch of my friends have started using twitter, seemingly all at once. I think the whole status update part of it is pretty dumb, I just use it for posting interesting tidbits in under 160 characters. It’s nice because I text them in as well, which I guess would be less notable if I could send email with my phone, but I can’t. Adrian asked me how I got my twitter updates into my sidebar. When my response was longer than I expected it to be, I realized that it hadn’t been trivial and other people might be interested.

As of version 2.2, wordpress has sidebar widgets. You get to them by going to the “presentation” tab and then the “widgets” sub-tab. You can move around all the stuff on the sidebar, take things out, etc, as long as your theme is compatible. I think my theme was compatible, but if you have a custom theme or are a theme author, there are instructions. There’s an RSS widget that shows RSS feeds which I think it checks every hour or so. Getting that running is the first step.

The only problem is that the wordpress RSS reader doesn’t like certain RSS feeds. Or at least that was the case with my version of it, maybe it’s been fixed since 2.2.2. It likes RSS feeds from feedburner, so I set up all of the feeds I wanted it to display on feedburner and have the RSS widget read them.

The third thing, which is still sub-optimal is getting the right rss feed from twitter. I wanted it to be the last N things I’ve posted, but unfortunately twitter has some sort of time limit so things that are too old don’t show up in the feed even if they’re in the N most recent posts. It would be nice to get the rss feed from the twitter archives, but it’s not immediately obvious how to do that. Another tricky thing was getting the feed for just me, not for me+friends. Also on the wishlist is some way for people to comment on the twitters, which might involve getting a different feed from twitter again, but I’m not sure.