Archive

Posts Tagged ‘stats’

Football and Statistics Unite!

December 4th, 2008

Being a huge football fan and nerd, I love it when the two actually combine. There’s no need for brawn vs brain, or jocks vs geeks. In a perfect world, they can live together. Here are some sites that I’ve found that do a great job marrying the two:

Saurian Sagacity: With a major Florida Gator bias, this blog always have an interesting way of looking at football with great stats to back it up. My favorite post is when he absolutely calls out Texas Tech, and every other team, for not having a defense worthy of stopping anybody.

Smart Football: They are the most focused on X’s and O’s of any of the sites. They really go into how and why certain formations, plays or schemes succeed. The latest post is an absolute gem detailing exactly why the Florida Gators offense is so successful. But he also p0wnz my hometown Auburn Tigers for being the exact opposite of good on offense this year.

Advanced NFL Stats: Geared towards the NFL, they always have really statistically oriented posts. This post on signal vs noise in stats and how they correlate to wins is especially intriguing.

BCS Guru: For not being very mathematically or schematically involved, this site always stay on top of the lunacy that is the BCS. This mailbag-style post really goes into the sheer wackiness of the BCS, and the Big 12 South this year.

Dr Saturday: This is the most entertaining site and also contains the best writing. This Yahoo! blog started out as the best college football blog in the world: Sunday Morning Quarterback. But like most people in his position, he decided to take more money and a larger audience and moved on to bigger and better things leaving behind his absolutely hilarious, yet informative pre-season previews. Despite not having as much of a statistical bent towards his stories nowadays, Dr. Saturday still produces great articles.

Now here’s some football music for your listening pleasure (thanks to Groovshark):

auburn, college football, math , , , , , , , , , , , , , , , , ,

Netflix Prize: Joining the Rat Race

November 19th, 2008

I decided a couple of nights ago to see how well I would do in the Netflix Prize. This is a competition from the company Netflix, an online movie rental site, that gives two gigs worth of user ranking data to see if anyone can improve their own ranking algorithm, Cinematech, by at least 10%. Many have tried, but few have come close.

Recommendations are a fairly difficult problem. At Grooveshark, we have our own recommendation system using various statistical techniques that have been fine-tuned over the years. They are not perfect, but they do come close to what Pandora, and Last.fm have to offer in certain instances. I’m sure the techniques Grooveshark uses are no way near as sophisticated as Google or Amazon, but we try our best. Early Google has shown that simple algorithms using the right data can be more successful than advanced statistical tools. But even with Google, their algorithms and tools have grown more sophisticated over time. In my opinion, simple tools using the correct insights can be very powerful as proven by a psychologist who has jumped very high in the leaderboard (Just a guy in a garage).

Overall, this project gives a lot of goodwill to Netflix for being so open and providing a great competition for researchers and joe-schmoes alike. I really just want to apply some of the new techniques I have learned in an environment other than music (not surprising when you spend 60 hours a week thinking about music). Here’s some of the books I have read or currently reading:

On Intelligence by John Hawkins: hierarchical Markov Models FTW

Collective Intelligence by Toby Segaran: leveraging simple statistical tools to add intelligence to web applications

Predictably Irrational by Dan Ariely: more psychology than statistics/intelligence

Pattern Recognition by Theodoridis and Koutroumbas: never finished – a little over my head for right now

Probabilistic Reasoning in Intelligent Systems by Judea Pearl: not finished, but find the language more understandable than “Pattern Recognition”

Along with these books, I have kept up a large collection of recommendation and music information retrieval papers. I have read a lot of them, but most of them are on my to-read list. If you would like, check out my document subversion repository at: svn://cmunezero.com/docs.

Also, here’s a pretty good presentation by somebody at Netflix talking about the challenges and issues they face. Now for some muzik:

Grooveshark, Netflix, music, programming , , , , , , , , , , , , , , , , , , , , ,