Archive for the ‘misc’ Category

Consistent Memcache Hashing and Failover with PHP

Monday, August 11th, 2008

I’ve written about Memcache before because it’s one of the best pieces of software written. It’s power lies in it’s simplicity and how easily you can plug it into any application. One of the things I really wanted was to implement consistent hashing and failover.

After checking out the PHP/Memcache documentation, the code to achieve this is fairly simple:

ini_set('memcache.allow_failover', true); // default is usually true
ini_set('memcache.hash_strategy', 'consistent');

Somewhere in your configuration/initialization script, make sure these memcache settings are in place. Allow failover tells the memcache client to try another server if it cannot connect to a particular memcache daemon. The hash strategy is pretty self explanatory. Now onto the actual server code:

$failCount = 0;
$realInstance = new Memcache;
$testInstance = new Memcache;
 
$servers = array("server1", "server2", "server3");
$defautlPort = '1';
foreach($servers as $host) {
    if($testInstance->connect($host)) {
        $realInstance->addServer($host);
        $testInstance->close(); // only close if connection was success
    } else {
        $realInstance->addServer($host, defautlPort, true, 1, 1, -1, false);
        $failCount++;
    }
}
$isConnected = true;
if($failCount == count($servers)) {
    // set false if every server is marked as failed
    $isConnected = false;
}

I use two instances of Memcache (might not be most optimal solution) to check the availability of that server and if it is available, add it to the pool with all of the default options. If it is not available, set up the connection to automatically failover but also maintain its position in the server pool. These options are set using this paragraph from the PHP docs:

bool Memcache::addServer ( string $host [, int $port [, bool $persistent [, int $weight [, int $timeout [, int $retry_interval [, bool $status [, callback $failure_callback ]]]]]]] )

retry_interval: Controls how often a failed server will be retried, the default value is 15 seconds. Setting this parameter to -1 disables automatic retry.

status: Controls if the server should be flagged as online. Setting this parameter to FALSE and retry_interval to -1 allows a failed server to be kept in the pool so as not to affect the key distribution algoritm. Requests for this server will then failover or fail immediatly depending on the memcache.allow_failover setting.

There’s also error checking to ensure that at least one server is online to be considered connected to a memcache pool.

jQuery: I thought I knew you so well

Friday, July 25th, 2008

I just recently found out that jQuery’s “get” method actually returns the DOM objects within the jQuery object. For the longest time, I’ve been using my own “dom” method to do the same thing. For some reason, I never use their documentation site. The search isn’t that great, and something about the site itself just makes me not want to use. Maybe its because I loved the old Visual jQuery site so much. Only if someone would update it for jQuery 1.2.

Sphinx: MySQL Full-text Search Replacement

Thursday, May 29th, 2008

The cornerstone of the entire internet is search. The internet is so vast and sparse, the only way to make it truly useful is by allowing a person to easily find whatever they are looking for. That is why search engines and web portals have been so successful. At Grooveshark, search, along with recommendations, are the fundamental way people find music.

In the past, using MySQL’s full-text search was convenient and rather fast. As the amount of information grew, it became very apparent that MySQL would not be able to handle the size of the data and the number of searches. Looking around for replacements, the two best solutions I found were Lucene and Sphinx. Lucene is a nice tool that integrates with a bunch of other Apache projects, but Sphinx was small, fast and really easy to use.

Setting up Sphinx is a cinch. Using the official documentation and this IBM article, I was able to get Sphinx running in less than 30 mins. You have to compile the source and getting the data into Sphinx can take awhile depending on your data source (MySQL, PostgreSQL or XML). In the actual Sphinx download, there are PHP and Python examples to also help you start out using their really easy to use API. For international support, you can modify the charset_table option in your configuration file using Sphinx’s Unicode character mapping.

With Grooveshark, even Sphinx is not the perfect solution because we don’t have “perfect” data. After we get results back from Sphinx (on our slow test machine, we never had a query go over 0.3 seconds!), we put the results through a filter and reorder them accordingly. An example is preventing songs that contain the artist name in multiple places from being considered a more “relevant” result then the same song that has the correct metadata. The current solution is definitely not perfect and there is still more work to do, but now, searches are quicker and more relevant than they ever were before.

Smarty: A Great PHP Template Engine

Monday, March 24th, 2008

Working with any website, using the Model-view-controller (MVC) design pattern is a must. One way to achieve this is by using templates. Within PHP, there is a large divide on whether using a formal template system is necessary. Most proponents will claim that PHP itself is a template system (see Wordpress and its countless themes). Lately I have come to really like Smarty, a php template engine.

Over at Grooveshark, we’ve been making A LOT of changes. Basically, the brains of Grooveshark is improving with a different database design and backend code while the face of the site stays the same. This is where Smarty has made my life so much easier. All I do is make sure that the same variables are assigned with the same information and Smarty handles the rest.

Smarty has other handy features like caching to compensate for the extra overhead of processing the templates. For really dynamic sites, Smarty provides really fine control of the cache so nothing is ever stale. Smarty is really adaptable so that you can use it to produce your feeds (interchange XML for HTML and you are done). As of right now, I’m really liking Smarty.

Extendable And Adaptable Code That Will Stand The Test Of Time

Thursday, March 6th, 2008

All coders have heard the saying that the best code is highly cohesive and loosely coupled. It’s something so fundamental that all of us sometimes loose sight of this. Everybody has had a moment when you are refactoring old code and you just want to pull out a gun and ask your self, “Why?!? Why did you write it like that.” Along with laziness, its simply impossible to make external problems like time constraints, lack of personnel or dealing with users go away. But even in a perfect world, most of the code that you produce will be rewritten.

“If all code will be rewritten, then why write code at all?” you might ask. One answer is that you might get fired, but a more appropriate response is that for any program that you produce, it is also a lesson for producing the next iteration (if there is not another iteration, then you seriously did something wrong). What sucks about this view is that everything you do is merely practice for your next version or project. Of course this isn’t always valid when the code cannot be updated: embedded applications or software installed in an unreachable device (NASA - silly metric system). These cases are very rare and for most people, making changes is a server update or a patch away.

The reason I started thinking about writing long-term code is that working at Grooveshark, the underlying server code is being overhauled. But after the last couple of days, I started noticing that almost every other development group is also rewriting/refactoring a major portion of their code. Whether its Javascript, Java, PHP or SQL (maybe HTML/CSS?), the underlying principles are pretty much the same.

This problem has been discussed and worked on by acedemics and hackers much smarter than me so I really don’t have anything to add at the moment. Just go out there, make some mistakes and do better next time.

Gullah Gullah Island!

Tuesday, March 4th, 2008

I found this article about the silence of Justice Clarence Thomas:
“In the past, the Georgia-born Thomas has chalked up his silence to his struggle as a teenager to master standard English after having grown up speaking geechee, a dialect that thrived among descendants of former slaves on the islands off the South Carolina, Georgia and Florida coasts.”

One word instantly caught my attention and I had to figure out what it meant: geechee. I hopped on over to wikipedia to figure it out.

From the wikipedia article:
“The Gullah people are also called Geechee, especially in Georgia.”

That’s right, Supreme Court Justice Clarence Thomas is basically from Gullah Island… sort of.

GULLAH! GULLAH! ISLAND!!!

If you have never heard of Gullah Gullah Island, it’s a fairly old Nickelodeon tv show from “back in the day” set in the unique community of Gullah Island.

By the way, what does “Binya Binya!” mean anyway. It seems that television taught kids to trust giant, costumed creatures, regardless of what they said. See Barney and Big Bird.