Archive

Archive for November, 2008

Netflix Prize: Joining the Rat Race

November 19th, 2008

I decided a couple of nights ago to see how well I would do in the Netflix Prize. This is a competition from the company Netflix, an online movie rental site, that gives two gigs worth of user ranking data to see if anyone can improve their own ranking algorithm, Cinematech, by at least 10%. Many have tried, but few have come close.

Recommendations are a fairly difficult problem. At Grooveshark, we have our own recommendation system using various statistical techniques that have been fine-tuned over the years. They are not perfect, but they do come close to what Pandora, and Last.fm have to offer in certain instances. I’m sure the techniques Grooveshark uses are no way near as sophisticated as Google or Amazon, but we try our best. Early Google has shown that simple algorithms using the right data can be more successful than advanced statistical tools. But even with Google, their algorithms and tools have grown more sophisticated over time. In my opinion, simple tools using the correct insights can be very powerful as proven by a psychologist who has jumped very high in the leaderboard (Just a guy in a garage).

Overall, this project gives a lot of goodwill to Netflix for being so open and providing a great competition for researchers and joe-schmoes alike. I really just want to apply some of the new techniques I have learned in an environment other than music (not surprising when you spend 60 hours a week thinking about music). Here’s some of the books I have read or currently reading:

On Intelligence by John Hawkins: hierarchical Markov Models FTW

Collective Intelligence by Toby Segaran: leveraging simple statistical tools to add intelligence to web applications

Predictably Irrational by Dan Ariely: more psychology than statistics/intelligence

Pattern Recognition by Theodoridis and Koutroumbas: never finished – a little over my head for right now

Probabilistic Reasoning in Intelligent Systems by Judea Pearl: not finished, but find the language more understandable than “Pattern Recognition”

Along with these books, I have kept up a large collection of recommendation and music information retrieval papers. I have read a lot of them, but most of them are on my to-read list. If you would like, check out my document subversion repository at: svn://cmunezero.com/docs.

Also, here’s a pretty good presentation by somebody at Netflix talking about the challenges and issues they face. Now for some muzik:

Grooveshark, Netflix, music, programming , , , , , , , , , , , , , , , , , , , , ,

Replacing Eclipse as My IDE With Vim

November 15th, 2008

Code editors are among the most important applications for a programmers. They are also the source of some the most heated debates online. Whether you use a full-fledged IDE like Eclipse or VisualStudios, or even a souped up text editors like Emacs or Vim, everybody has a favorite. My first IDE was Borland Turbo C and after a couple of months, “purchased” a free copy of VisualC++ (later known as VisualStudios). Finally paid for a student version when my parents found out I was really interested in programing.

VisualStudios was my main IDE until I started web development and moved to Linux. For more than a year now, I’ve been using Eclipse because their plugin system has enabled people to create really good editors for Java, PHP, Javascipt, Flex and HTML/CSS. Because it’s all in one program, the editor is really heavy and fairly bug prone. Recently, I’ve been making the switch from using Eclipse exclusively to moving most of my development to Vim.

Vim is a very good text editor and I chose it ahead of Emacs because in my opinion, the commands are simpler. Most of the common functionality of Eclipse can be found in Vim: search/replace, syntax highlighting, XDebug-ing, and much more. The debate between gui editors or vim/emacs is basically a moot point. They are just tools. Certain people’s thinking patterns are just more suited to one tool than the other. Since I use the command line for almost everything I do, using vim allows a much easier transition between editing files, writing scripts and interacting with remote hosts. For me, vim, along with all of the standard linux apps (find, grep, tail, ssh, scp, etc…), allows me to work more efficiently than any other tool so that is what I use.

linux, programming , , , , , , , , , , , , , , , , ,

Memcached Pool Bash Start Script

November 14th, 2008

If you installed Memcached using Yum under the RedHat flavor of Linux, they have this really nice init.d scripts for starting and stopping Memcache. I modified it in order to support creating a bunch memcache instances using contiguous ports. What’s great is that only the “start” script has to be modified since the “stop” script uses a special RedHat function, killproc, which can accept a program name or path and kill all instances of that program. I’m still a noob at bash scripts but here is my only changes:

NUMBUCKETS=3 #only new value needed
start() {
    for ((i=1;i<=$NUMBUCKETS;i+=1)); do
        FULLPORT=${PORT}${i}
        echo -n $"Starting $FULLPORT ($prog): \n"
        daemon $prog -d -p $FULLPORT -u $USER -c $MAXCONN -m $CACHESIZE $OPTIONS
        RETVAL=$?
    done
    echo
    [ $RETVAL -eq 0 ] && touch /var/lock/subsys/$prog
    return $RETVAL
}

linux, programming , , , , , , , , , ,

jqUploader: Flash-based File Uploader and Progress

November 13th, 2008

For the longest time, I’ve been trying to find a fairly simple solution for showing the real progress of file uploads. Most of them that I have found involved patching PHP to allow this functionality. What I think is going on is that during a file upload, PHP allows the web server, most of the time, apache, to handle the actual uploading of the file and only after the file is completely done uploading does PHP have access to the file itself.

Instead of using html based file uploads, flash is available. By using jqUploader, one can allow people with flash enabled browsers real time progress of their uploads. This is really handy for large images, zip files, or any file larger than 3mb.

Update: I have to give some props to John David at Conceptual Arts for initially telling me about jqUploader.

javascript, jquery, web , , , , , , , , , ,

Long Time Coming: Widgets and New Work

November 12th, 2008

It has been a really long time since my last post. Lots of stuff have been happening at Grooveshark. We’ve created a widget site for sharing music across various social networks using Clearspring. Here’s an example below (you can also find it here):

I started part-time work with Conceptual Arts a local Gainesville web shop. The people are really cool and smart, so I’m really excited to work on lots of cool projects.

I’m also going to start tracking personal projects I work on in my Projects page. Hopefully by posting it online, it will force me to stop abandoning projects and create complete prototypes of each.

Grooveshark, web , , , , , ,