My ad-hoc mini-cluster
Well, I’ve been doing quite a bit of Java programming recently for a new project I’ve joined. I’ve mostly done web application development with PHP as well as some personal C projects until now. I must say, I am fast becoming a fan of Java. It was relatively easily to jump into after having done OO PHP work, so that was a plus. I am rather disappointed in the speed difference between my PHP applications and Java. However, I can’t definitively say that Java is slow. I’m only a newbie and perhaps I haven’t picked up all the nuances in Java that might allow me to code faster applications. Anywho, this project I’ve been working on is hosted on Google’s App Engine using the HRD (High-Replication Data Store). This may also be a cause of the slowness I’ve been experiencing in my web applications. I’ve always been a relational database man and denormalization pains my soul
. This High-Replication Data Store thing got me thinking about Google’s infrastructure and also led me to start playing around with Hadoop. Hadoop is pretty neat from what I’ve played with so far. I’ve got a little mini cluster set up in the office (8 nodes) from some commodity hardware on which I’ve installed Hadoop. The HDFS seems to work pretty well and after reading up on MapReduce, I began running some test programs through the cluster. It’s definitely an interesting concept and something I look forward to learning more about. Look for some example Java M/R programs in the future.
Cheers.

