IOTop: Monitor Program’s Disk Input/Output

January 20th, 2010

Top and especially HTop are really useful tools for monitering system resources. The thing I always found lacking was the amount of disk io (input/output) attributed to a particular process. Recently, I found a pretty cool tool that does just that: . It display disk the amount of reads and writes per second and also swap and io percentage if the CONFIG_TASK_DELAY_ACCT setting is enabled in the kernel.

misc

Pure Linux DVD Ripping

April 10th, 2009

Whenever I wanted to rip DVD’s for longterm storage, I’ve always used Wine + DVDShrink. I just now discovered a pure linux program, Handbrake, for ripping DVD’s and so far, it has worked out really well This article does a really good job explaining it. Merry Ripping!

linux , , , , ,

Ubiquity: the web made cool again

March 11th, 2009

If you haven’t heard about Ubiquity, I advise you to check it out. It’s a great Firefox plugin that allows basically any service to be at your disposal. The basic premise is you highlight some words, start typing an action, and it will find a service that meets your needs. Check out the video below for a better explanation. Watch out for a Grooveshark command later on.

This movie requires Flash Player 9

programming, technology, video, web , , , ,

Recursions in the Sky

December 13th, 2008

The most fascinating concept in computer science has to be recursion. Even when it’s not its pure form, recursion is just a beautiful concept:

A programming technique in which a program or routine calls itself to perform successive steps in an operation, with each step using the output of the preceding step.

It might not be the most efficient or the fastest way to do things, but it’s just damn cool. Some cool uses for them are quines, computing Fibonacci numbers and fractals.

Fractals and Fibonnaci numbers, along with the golden ratio, can be found in numerous places in nature. Tree branches and roots grow in a vary fractal nature. The spiral of a sea shell is dominated by the golden ratio. If you look closesly, recursive properties are found all over the universe.

I’ve known this link between recursion and nature for a while but this past summer, a co-worker at Grooveshark, Chris, and I have had many discussions about math and science. He revealed to me an idea of his that the universe is made of many self-similar structures: sub-atomic particles, atoms, solar systems, galaxies and even the mega structures of the universe. His goal is to describe the basic laws of physics using a fractal model. While this idea seems really crazy, I really like it because fractals have shown that very simple equations can describe very complex behavior (see Chaos theory).

A while back, I read an article describing evidence that the universe appears to be fractal at certain scales. Currently, there is no theory to describe why this occurs and due to the limitations of modern technology, scientists cannot prove this observation at larger scales. Despite these limitations, I really do believe there might be something “fractal” behind the basic structures of the universe. Here’s a really uncanny picture proving this point. While there are many theories trying to bridge the gap between quantum theory and relativity, only string theory and its variations have come close. While I wait for Chris to finish his fractal model of physics, here’s some cool recursive and fractal music to check out:

math, science , , , , , , , ,

Netflix Entry 1: Test runs

December 8th, 2008

I decided for my first 2 entries into the Netflix prize, I would see just how good/bad current scores are. For those who have never heard of the Netflix prize, see my post here.

First, I will predict a 3 for every rating just to see how much deviation there really is from that average ranking. And for my second submission, I just use a random number between 1 and 5 to see how well a random prediction fares with the rest of the leaderboard. After setting those two scripts up and submitting, I never got back a score. I’m not that dissappointed but maybe they saw that i was just sending them random/constant data and decided I was not worth their time. Made me feel kinda sad…

Also, I was having some difficulties initially writing my script in python, so I decided to just hurry up and do it in PHP and then port it python later. The first thing i noticed between the two scripts was just how much faster python is than php. I ran each script three times and calculated the averages. Below are the results:

Language Script Time
Python Constant 30.037 seconds
PHP Constant 40.388 seconds
Python Random 9.312 seconds
PHP Random 19.464 seconds

I know I’m probably beating a dead horse here but python is WAY faster. And nothing really intensive or complex is going on here: string concatenation, random number generation, and writing to a file. That is it. Python performs the constant value prediction ~50% faster and the random value prediction ~25% faster.

As I get more comfortable with python and its vast library of scripts, it will definitely become my goto scripting language. For all you Ruby enthusiasts, especially you, Travis, the ruby language is very well thought out, but until anybody can come close to NumPy/SciPy, I will stick to python for now.

Netflix, linux, math, programming

SEC Championship: Bring It On!

December 5th, 2008

I just received this “question” from the Grooveshark contact page:

Name: Big Al
Question: Roll Tide!!!

Now being the good-sport that I am, I really love the enthusiasm. Especially the fact they actually thought it was worth their time to send me that message. This Saturday at 8pm, the SEC college football championship game will be played in Atlanta. I went the last time UF won it in 2006 and Atlanta is a really fun city, especially Peachtree Rd in Buckhead (funny enough, its really near the EspnZone).

misc

Football and Statistics Unite!

December 4th, 2008

Being a huge football fan and nerd, I love it when the two actually combine. There’s no need for brawn vs brain, or jocks vs geeks. In a perfect world, they can live together. Here are some sites that I’ve found that do a great job marrying the two:

Saurian Sagacity: With a major Florida Gator bias, this blog always have an interesting way of looking at football with great stats to back it up. My favorite post is when he absolutely calls out Texas Tech, and every other team, for not having a defense worthy of stopping anybody.

Smart Football: They are the most focused on X’s and O’s of any of the sites. They really go into how and why certain formations, plays or schemes succeed. The latest post is an absolute gem detailing exactly why the Florida Gators offense is so successful. But he also p0wnz my hometown Auburn Tigers for being the exact opposite of good on offense this year.

Advanced NFL Stats: Geared towards the NFL, they always have really statistically oriented posts. This post on signal vs noise in stats and how they correlate to wins is especially intriguing.

BCS Guru: For not being very mathematically or schematically involved, this site always stay on top of the lunacy that is the BCS. This mailbag-style post really goes into the sheer wackiness of the BCS, and the Big 12 South this year.

Dr Saturday: This is the most entertaining site and also contains the best writing. This Yahoo! blog started out as the best college football blog in the world: Sunday Morning Quarterback. But like most people in his position, he decided to take more money and a larger audience and moved on to bigger and better things leaving behind his absolutely hilarious, yet informative pre-season previews. Despite not having as much of a statistical bent towards his stories nowadays, Dr. Saturday still produces great articles.

Now here’s some football music for your listening pleasure (thanks to Groovshark):

auburn, college football, math , , , , , , , , , , , , , , , , ,

Netflix Prize: Joining the Rat Race

November 19th, 2008

I decided a couple of nights ago to see how well I would do in the Netflix Prize. This is a competition from the company Netflix, an online movie rental site, that gives two gigs worth of user ranking data to see if anyone can improve their own ranking algorithm, Cinematech, by at least 10%. Many have tried, but few have come close.

Recommendations are a fairly difficult problem. At Grooveshark, we have our own recommendation system using various statistical techniques that have been fine-tuned over the years. They are not perfect, but they do come close to what Pandora, and Last.fm have to offer in certain instances. I’m sure the techniques Grooveshark uses are no way near as sophisticated as Google or Amazon, but we try our best. Early Google has shown that simple algorithms using the right data can be more successful than advanced statistical tools. But even with Google, their algorithms and tools have grown more sophisticated over time. In my opinion, simple tools using the correct insights can be very powerful as proven by a psychologist who has jumped very high in the leaderboard (Just a guy in a garage).

Overall, this project gives a lot of goodwill to Netflix for being so open and providing a great competition for researchers and joe-schmoes alike. I really just want to apply some of the new techniques I have learned in an environment other than music (not surprising when you spend 60 hours a week thinking about music). Here’s some of the books I have read or currently reading:

On Intelligence by John Hawkins: hierarchical Markov Models FTW

Collective Intelligence by Toby Segaran: leveraging simple statistical tools to add intelligence to web applications

Predictably Irrational by Dan Ariely: more psychology than statistics/intelligence

Pattern Recognition by Theodoridis and Koutroumbas: never finished - a little over my head for right now

Probabilistic Reasoning in Intelligent Systems by Judea Pearl: not finished, but find the language more understandable than “Pattern Recognition”

Along with these books, I have kept up a large collection of recommendation and music information retrieval papers. I have read a lot of them, but most of them are on my to-read list. If you would like, check out my document subversion repository at: svn://cmunezero.com/docs.

Also, here’s a pretty good presentation by somebody at Netflix talking about the challenges and issues they face. Now for some muzik:

Grooveshark, Netflix, music, programming , , , , , , , , , , , , , , , , , , , , ,

Replacing Eclipse as My IDE With Vim

November 15th, 2008

Code editors are among the most important applications for a programmers. They are also the source of some the most heated debates online. Whether you use a full-fledged IDE like Eclipse or VisualStudios, or even a souped up text editors like Emacs or Vim, everybody has a favorite. My first IDE was Borland Turbo C and after a couple of months, “purchased” a free copy of VisualC++ (later known as VisualStudios). Finally paid for a student version when my parents found out I was really interested in programing.

VisualStudios was my main IDE until I started web development and moved to Linux. For more than a year now, I’ve been using Eclipse because their plugin system has enabled people to create really good editors for Java, PHP, Javascipt, Flex and HTML/CSS. Because it’s all in one program, the editor is really heavy and fairly bug prone. Recently, I’ve been making the switch from using Eclipse exclusively to moving most of my development to Vim.

Vim is a very good text editor and I chose it ahead of Emacs because in my opinion, the commands are simpler. Most of the common functionality of Eclipse can be found in Vim: search/replace, syntax highlighting, XDebug-ing, and much more. The debate between gui editors or vim/emacs is basically a moot point. They are just tools. Certain people’s thinking patterns are just more suited to one tool than the other. Since I use the command line for almost everything I do, using vim allows a much easier transition between editing files, writing scripts and interacting with remote hosts. For me, vim, along with all of the standard linux apps (find, grep, tail, ssh, scp, etc…), allows me to work more efficiently than any other tool so that is what I use.

linux, programming , , , , , , , , , , , , , , , , ,

Memcached Pool Bash Start Script

November 14th, 2008

If you installed Memcached using Yum under the RedHat flavor of Linux, they have this really nice init.d scripts for starting and stopping Memcache. I modified it in order to support creating a bunch memcache instances using contiguous ports. What’s great is that only the “start” script has to be modified since the “stop” script uses a special RedHat function, killproc, which can accept a program name or path and kill all instances of that program. I’m still a noob at bash scripts but here is my only changes:

NUMBUCKETS=3 #only new value needed
start() {
    for ((i=1;i<=$NUMBUCKETS;i+=1)); do
        FULLPORT=${PORT}${i}
        echo -n $"Starting $FULLPORT ($prog): \n"
        daemon $prog -d -p $FULLPORT -u $USER -c $MAXCONN -m $CACHESIZE $OPTIONS
        RETVAL=$?
    done
    echo
    [ $RETVAL -eq 0 ] && touch /var/lock/subsys/$prog
    return $RETVAL
}

linux, programming , , , , , , , , , ,