Music Information Retrieval
Over the weekend, I really got into music information retrieval (MIR). Its basically grabbing meta-information of an audio file by analyzing its waveform. This type of information is really valuable, especially for a music company (ie: Grooveshark). If I ever have time, this would be a really fun side project. A really good source of information about this topic is this bibiography page (too bad it hasn’t been updated since August, 2007). A list of up and running MIR systems can be found here.
What makes MIR systems so important is that for music sites, they can generate a lot of useful data without anyone having to enter it by hand. For iTunes, this is not a problem because labels give them all the information they need, but for sites where song files can come from anywhere and anyone, there’s no way you can handle the variability in data quality and availability. By having a system that could automatically fetch the required info, within certain bounds of error, you create a vast collection of information that you can use to generate recommendations, provide more accurate searches, and create better categorization of all that music.
The problem with MIR systems is that they require large amounts of storage space and processing power. The cost of both storage and processing are dropping everyday which is great for the future of MIR systems. Processing power is the largest inhibiting factor, especially when you try to analyze millions of songs. The only companies that could probably do a project like this on a large scale would be Google, Amazon and their ilk. Currently, I’m very hopeful that a startup with the right mix of programmers, hardware, and music can compete with the big boys ![]()