Everything Andrew Knight & MorePosts RSS Comments RSS


Google Sorts 1 Petabyte in 6 Hours

Google, the largest search engine in the world, revealed on the 29th of November they managed to move and sort 1 petabyte of data (1024 terabytes) to over 48,000 hard drives in just 6 hours and 2 minutes. Breaking it down, that’s 1 terabyte of data (1024 gigabytes) in 68 seconds, crushing the previous record of 1 terabyte across 910 computers in 209 seconds.

So with so much data being moved, so quickly, there is always a chance of something not going right and in this case, Google  was on the ball. They have triple redundancy, keeping the possibility open that something could happen, fortunately, nothing did. Though you may ask, with triple redundancy, would’ve this slowed things down? Not really, since storing may not be part of the sorting algorithm that was recorded (and storing the data could have still been ongoing after the sort had been completed). Though this is purely speculation as we’re unsure of Google’s full play.

Though all this was a milestone for the technology world, demonstrating how effective our technology is becoming and how quick things can be moved with the right hardware and knowledge. Google did push the boundaries for various reasons with the main one being to help further our knowledge to create better systems to make things like these seem small in the future.

Written based on: http://www.web-app.net/?Google_Breaks_Speed_Record_Sorting_1PB_Data_In_6_hours_%2F&action=viewnews&id=43

Share this Article:
  • Digg
  • del.icio.us
  • Facebook
  • Furl
  • Technorati

One response so far

One Response to “Google Sorts 1 Petabyte in 6 Hours”

  1. Richard Walkeron Dec 1st 2008 at 9:37 pm

    That really is a lot of data.

    I don’t think physical technology is the only thing that is progressing; our understanding of data, and how it is applied, is also progressing and is what drives innovations that allows a company like Google to process a MONUMENTAL quantity of data into a state that allows it to be utilised in ways that really matter - it’s possible to waste an enormous amount of resources crunching data and make it only 2% more useful, but what’s even better is knowing exactly what to do with that data, using half the resources and making it a hundred times more useful.

    I think you’re on the money using “hardware” and “knowledge” in the same sentence…. any idiot can throw a million dollars worth of hardware at a problem and then rejoice when the speed increases, but true genius will try to find a way to increase the efficiency of such a process without touching the infrastructure…. and then dance all over the arrogant sods head when they get an extra hundred-fold increase using the same million dollars worth of tech.

Leave a Reply

Page copy protected against web site content infringement by Copyscape