Monday 26 November 2012

Using Jigsaw Puzzles to explain Big Data




In Paul Zikopoulos et al's book titled Harness the Power of Big Data, puzzles are used to explain how Big Data is used to solve problems. I found it very effective and have summarized below, what I consider to be the key points.

Big Data is like a puzzle that has been taken out of its box. Analytics is required to complete the picture. 

As you work on the puzzle and frame its edges and the puzzle is taking shape, you have a lot more context and understanding. Although the data has not changed (same number of puzzles), there is now metadata (relationships and patterns) to manage. So, as the puzzle gets resolved, we have more data associated with it, but the problem is a lot easier to solve. Even if someone threw in puzzle pieces that did not belong to this set, it would be easier to identify the foreign pieces due to the knowledge we have gained about the pattern of the puzzle and relationship.

How can we solve this puzzle more effectively?

We could split the work among a number of people (processes), asking each to perform a complementary, but discrete task, e.g. finding all pieces that have an element of a face. This is how Big Data platforms are able to scale out work, instead of relying on a limited number of processors. Because we can't anticipate how many pieces will make up the Big Data puzzle, scale out, machine learning and massive parallel processing capabilities of analytics systems are needed to frame the edge, sort, group the pieces and discover the patterns.

How is this different from data in a warehouse ?

Assume for a moment that all boarder shapes came in a separate bag, all pieces of a certain color in their own bag etc. This is how data in a warehouse might look like, and it can be analyzed with traditional reporting techniques, which focusses on data that is organized and of a known structure. 

No comments:

Post a Comment