Archive

Archive for the ‘Netflix’ Category

Netflix Entry 1: Test runs

December 8th, 2008

I decided for my first 2 entries into the Netflix prize, I would see just how good/bad current scores are. For those who have never heard of the Netflix prize, see my post here.

First, I will predict a 3 for every rating just to see how much deviation there really is from that average ranking. And for my second submission, I just use a random number between 1 and 5 to see how well a random prediction fares with the rest of the leaderboard. After setting those two scripts up and submitting, I never got back a score. I’m not that dissappointed but maybe they saw that i was just sending them random/constant data and decided I was not worth their time. Made me feel kinda sad…

Also, I was having some difficulties initially writing my script in python, so I decided to just hurry up and do it in PHP and then port it python later. The first thing i noticed between the two scripts was just how much faster python is than php. I ran each script three times and calculated the averages. Below are the results:

Language Script Time
Python Constant 30.037 seconds
PHP Constant 40.388 seconds
Python Random 9.312 seconds
PHP Random 19.464 seconds

I know I’m probably beating a dead horse here but python is WAY faster. And nothing really intensive or complex is going on here: string concatenation, random number generation, and writing to a file. That is it. Python performs the constant value prediction ~50% faster and the random value prediction ~25% faster.

As I get more comfortable with python and its vast library of scripts, it will definitely become my goto scripting language. For all you Ruby enthusiasts, especially you, Travis, the ruby language is very well thought out, but until anybody can come close to NumPy/SciPy, I will stick to python for now.

Netflix, linux, math, programming

Netflix Prize: Joining the Rat Race

November 19th, 2008

I decided a couple of nights ago to see how well I would do in the Netflix Prize. This is a competition from the company Netflix, an online movie rental site, that gives two gigs worth of user ranking data to see if anyone can improve their own ranking algorithm, Cinematech, by at least 10%. Many have tried, but few have come close.

Recommendations are a fairly difficult problem. At Grooveshark, we have our own recommendation system using various statistical techniques that have been fine-tuned over the years. They are not perfect, but they do come close to what Pandora, and Last.fm have to offer in certain instances. I’m sure the techniques Grooveshark uses are no way near as sophisticated as Google or Amazon, but we try our best. Early Google has shown that simple algorithms using the right data can be more successful than advanced statistical tools. But even with Google, their algorithms and tools have grown more sophisticated over time. In my opinion, simple tools using the correct insights can be very powerful as proven by a psychologist who has jumped very high in the leaderboard (Just a guy in a garage).

Overall, this project gives a lot of goodwill to Netflix for being so open and providing a great competition for researchers and joe-schmoes alike. I really just want to apply some of the new techniques I have learned in an environment other than music (not surprising when you spend 60 hours a week thinking about music). Here’s some of the books I have read or currently reading:

On Intelligence by John Hawkins: hierarchical Markov Models FTW

Collective Intelligence by Toby Segaran: leveraging simple statistical tools to add intelligence to web applications

Predictably Irrational by Dan Ariely: more psychology than statistics/intelligence

Pattern Recognition by Theodoridis and Koutroumbas: never finished – a little over my head for right now

Probabilistic Reasoning in Intelligent Systems by Judea Pearl: not finished, but find the language more understandable than “Pattern Recognition”

Along with these books, I have kept up a large collection of recommendation and music information retrieval papers. I have read a lot of them, but most of them are on my to-read list. If you would like, check out my document subversion repository at: svn://cmunezero.com/docs.

Also, here’s a pretty good presentation by somebody at Netflix talking about the challenges and issues they face. Now for some muzik:

Grooveshark, Netflix, music, programming , , , , , , , , , , , , , , , , , , , , ,