- Adopt Big Data platforms that enable data discovery and exploration. A Big Data platform should enable users to understand the variety of data, their sources, along with the quality and relationships to other data elements, while data is in place. This process enables data scientist to create the right analytic model and computational strategy.
- Run analytics against the whole population of data, irrespective of the variety, structured or unstructured. The key point here is that organizational process do not make a distinction between underlying data structures.
- Gain competitive advantage by running analytics faster than others. To do this, it might be necessary to run analytics on the same platform as data processing.
- The ability to run analytics in realtime, enables organizations to react faster to change and gain competitive advantage.
- Taking point 1) further, it is useful to have access a rich library of analytical functions and tool set. This will enable organizations to analyze and publish results much faster.
- Tools that enable organizations to establish trust in data are critical for success. These tools should be able to address policy, security, governance, integration and lifecycle management requirements.
Tuesday, 27 November 2012
How can organizations boost their Big Data IQ?
Monday, 26 November 2012
Examples of Big Data Use Cases
IT for IT: e.g. using log analytics to gain better insights into how IT systems are running, when and how they might breakdown.
Social Media Analytics: e.g. using Big Data to figure out what customers are saying about your brand (or your competitor's brand) and why they are saying it.
Customer Sentiments: e.g. minimizing customer churn by detecting loyalty decay and suggest the next best action before the contact is made with the customer.
Fraud Detection: e.g. using Big Data to detect cyclical fraud patterns
Using Jigsaw Puzzles to explain Big Data
In Paul Zikopoulos et al's book titled Harness the Power of Big Data, puzzles are used to explain how Big Data is used to solve problems. I found it very effective and have summarized below, what I consider to be the key points.
Big Data is like a puzzle that has been taken out of its box. Analytics is required to complete the picture.
As you work on the puzzle and frame its edges and the puzzle is taking shape, you have a lot more context and understanding. Although the data has not changed (same number of puzzles), there is now metadata (relationships and patterns) to manage. So, as the puzzle gets resolved, we have more data associated with it, but the problem is a lot easier to solve. Even if someone threw in puzzle pieces that did not belong to this set, it would be easier to identify the foreign pieces due to the knowledge we have gained about the pattern of the puzzle and relationship.
How can we solve this puzzle more effectively?
We could split the work among a number of people (processes), asking each to perform a complementary, but discrete task, e.g. finding all pieces that have an element of a face. This is how Big Data platforms are able to scale out work, instead of relying on a limited number of processors. Because we can't anticipate how many pieces will make up the Big Data puzzle, scale out, machine learning and massive parallel processing capabilities of analytics systems are needed to frame the edge, sort, group the pieces and discover the patterns.
How is this different from data in a warehouse ?
Assume for a moment that all boarder shapes came in a separate bag, all pieces of a certain color in their own bag etc. This is how data in a warehouse might look like, and it can be analyzed with traditional reporting techniques, which focusses on data that is organized and of a known structure.
How is this different from data in a warehouse ?
Assume for a moment that all boarder shapes came in a separate bag, all pieces of a certain color in their own bag etc. This is how data in a warehouse might look like, and it can be analyzed with traditional reporting techniques, which focusses on data that is organized and of a known structure.
When does a Big Data solution make sense?
1) Current platform capabilities are insufficient to process amount of data needed
2) New data structure is inconstant with the schema of existing analytics datastore, resulting in a mix of information types
3) Data transmission rates are too high for analytics platform or there is a need to ingest data quickly without knowing its schema ahead of time
4) Need to move from analysis of short history to several years of data
Sunday, 25 November 2012
Data Warehouse in a Big Data World. What is the use case?
I like the analogy in the IBM Big Data Platform book titled "Harness the Power of Big Data".
It tells the story of the days long gone, when miners could easily spot nuggets or veins of gold with the naked eye. This made investing easier, as its value could be seen and therefore the resources required to extract it considered against its perceived value. Using a Big Data analogy, we can consider this gold to be "high value-per-byte of data".
Assume for a moment that there is more gold nearby, but it is just no visible to the naked eye. Trying to find this gold is a bit more of a gamble and potentially more expensive. This would be "low value-per-byte of data" due to the challenges associated with finding gold not visible to the naked eye.
With the right equipment however, it might be possible to economically process lots of dirt and keep the flakes of gold found. This flakes can be taken for processing and combined to make flakes of gold.
Back to our Big Data analogy...
In this scenario, it would make sense to keep all the dirt we could find (in a Big Data System), so that as new, economical dirt processing techniques emerge (Big Data Analytics on commodity systems) we would have an opportunity to extract the flakes of gold (value / insights) and store it for processing into gold bars (in our Data Warehousing system).
Hadoop is a Big Data batch system that allows users to store all data in its native business object format and get value out of it through massive parallel processing on commodity components.
Data Warehouse is characterized by "speed-of-thought response times" requirements where sustainable data with proven value stored and delivered interactively.
It is therefore clear to see that in a Big Data world, there is value and a place for both Hadoop (Big Data) and Data Warehouse systems.
IBM's Hadoop system is Infosphere Big Insights. For simplified Big Data Analytics, look no further than IBM PureData for Analytics powered by Netezza, and Infosphere Warehouse for your Data Warehousing needs.
It tells the story of the days long gone, when miners could easily spot nuggets or veins of gold with the naked eye. This made investing easier, as its value could be seen and therefore the resources required to extract it considered against its perceived value. Using a Big Data analogy, we can consider this gold to be "high value-per-byte of data".
Assume for a moment that there is more gold nearby, but it is just no visible to the naked eye. Trying to find this gold is a bit more of a gamble and potentially more expensive. This would be "low value-per-byte of data" due to the challenges associated with finding gold not visible to the naked eye.
With the right equipment however, it might be possible to economically process lots of dirt and keep the flakes of gold found. This flakes can be taken for processing and combined to make flakes of gold.
Back to our Big Data analogy...
In this scenario, it would make sense to keep all the dirt we could find (in a Big Data System), so that as new, economical dirt processing techniques emerge (Big Data Analytics on commodity systems) we would have an opportunity to extract the flakes of gold (value / insights) and store it for processing into gold bars (in our Data Warehousing system).
Hadoop is a Big Data batch system that allows users to store all data in its native business object format and get value out of it through massive parallel processing on commodity components.
Data Warehouse is characterized by "speed-of-thought response times" requirements where sustainable data with proven value stored and delivered interactively.
It is therefore clear to see that in a Big Data world, there is value and a place for both Hadoop (Big Data) and Data Warehouse systems.
IBM's Hadoop system is Infosphere Big Insights. For simplified Big Data Analytics, look no further than IBM PureData for Analytics powered by Netezza, and Infosphere Warehouse for your Data Warehousing needs.
TerraEchos..."the next generation big-data analytics company"
I came across TerraEchos recently, an IBM business partner with a set of capabilities that illustrates the potential Big Data Analytics provides.
TerraEchos describe themselves as a company that extract meaningful information from massive amounts of complex streaming data on the fly, and simultaneously deliver insights, decisions and actions on the fly - at the precise moment they are needed. This in my view is the promise of Big Data Analytics.
As you can see from the diagram to the right, their Streaming Analytics capability requires significantly less time to analyze data.
Some of the Big Data Analytics capabilities the TerraEcho platform exhibits includes the ability to analyze data irrespective of the amount, speed, or source of digital data, including input from any kind of cyber or physical sensor, in both structured and unstructured form. It is being positioned as suitable for organization that requires the processing, analysis, and visualization of multiple or complex streaming data sources.
One use case that caught my attention was a sophisticated sound classification system that can be used for real-time perimeter security control. Thousands of sensors buried underground can be used to collect and classify sounds. The system can differentiate between a whisper of the wind and a human voice, or the sound of a human footstep from a running deer. If can even identify or affirm sounds that are difficult for humans to pick up.
TerraEcho has partnered with IBM to deliver these capabilities. IBM's PureData platform has been designed to simplify systems for delivering data services, making the deployment and analysis of Big Data more accessible.
TerraEchos describe themselves as a company that extract meaningful information from massive amounts of complex streaming data on the fly, and simultaneously deliver insights, decisions and actions on the fly - at the precise moment they are needed. This in my view is the promise of Big Data Analytics.
As you can see from the diagram to the right, their Streaming Analytics capability requires significantly less time to analyze data.
Some of the Big Data Analytics capabilities the TerraEcho platform exhibits includes the ability to analyze data irrespective of the amount, speed, or source of digital data, including input from any kind of cyber or physical sensor, in both structured and unstructured form. It is being positioned as suitable for organization that requires the processing, analysis, and visualization of multiple or complex streaming data sources.
One use case that caught my attention was a sophisticated sound classification system that can be used for real-time perimeter security control. Thousands of sensors buried underground can be used to collect and classify sounds. The system can differentiate between a whisper of the wind and a human voice, or the sound of a human footstep from a running deer. If can even identify or affirm sounds that are difficult for humans to pick up.
TerraEcho has partnered with IBM to deliver these capabilities. IBM's PureData platform has been designed to simplify systems for delivering data services, making the deployment and analysis of Big Data more accessible.
What are some of the trends creating opportunities for Big Data enriched analytics?
As I seek to understand Big Data and what it means to me as an IT practitioner, and to my clients as a consultant, I have found it useful to identify some of the trends that have underpin this opportunity
- The number of RFID tags used in supply chain, tracking conference attendees, tracking luggage at airports, monitoring temperature of food, structures etc has increased from about 1.3 billion in 2005 to over 30 billion by end of 2011. Prices are predicted to drop below 1US cent making it possible to instrument event more systems.
- A flight from London to New York generates about 650 TB of data which could be proactively analyzed to gain new insights that could lead to improvements in safety and other efficiencies.
- Capturing every user's online clickstream would generate TBs of data that can be used to analyze and optimize the shopping experience.
- Data generated from smart meters can be used to better understand customer behavour, align supply better to demand, and enable customers to make more informed decisions about their energy usage patterns.
- Take Facebook. The ability to analyze the whole data population, taking into account intents and sentiments can offer tremendous value. Doing this is not without its challenges. Facebook for example, experiences over 2.5 billion likes and more than 300 million photo uploads each day.
- ..and Twitter. Twitter's 140 character or less design allows users to provide precise commentaries on a variety of subjects. The value to be derived from analyzing this data for sentiments and intents is significant.
- What about location based services (LBS)? Apparently, the average commuter in London has their photo taken about 150 times as they travel to work. Most of the mobile devices we carry with us have LBS enabled. This information can be used to further personalize interactions.
So, there is a lot of data being generated, and this will increase over time. Most of the data is not analyzed at all. Imagine being able to not only analyze data at rest, but also data in motion, as it hits the enterprise. He in lies the tremendous opportunities of Big Data Analytics. The PureData System from IBM simplified today's data requirements and enables clients to develop capabilities that enable them to gain insights that create a competitive advantage.
Big Data...What is it ?
Without a doubt, Big Data is considered one of the most important IT trends of recent time, and many would say a business imperative. But what really is it?
In this article, I summarize insights from a text I have been reading, Harness the Power of Big Data.
- Big Data has nothing to do with the size of data, but rather, the ability to perform analytics on a broader spectrum of data, and gain a competitive advantage from the insights gained
- The ability to instrument and capture data, not only data that is stored (at rest) but also data in motion (data being generated in realtime), an perform analytics on the whole population of this data, is a key requirement.
- Big Data is typically defined using the 3 Vs; volume, variety and velocity. IBM added an additional V, veracity.
- Volume; Data growth between 2009 and 2011 is estimated at 80%. Six years from now, data is estimated to be around 35 Zeta Bytes, equivalent to about 4 trillion 8GB iPods.
- Variety; This relates to the need to capture all data that could be useful for the decision making process, structured or otherwise.
- Velocity; This relates to the ability to analyze, process and gain insights from data as it becomes available.
- Veracity; Big Data can contain a lot of inaccurate / untrustworthy data. Veracity relates the process of transforming Big Data into trustworthy data and discarding the noise.
How my world has changed...
Within 3 years, I had graduated from the University of Manchester, and was now faced with the decision all graduates need to make. What next? Times were not so dissimilar to today's climate. In 1992, we were in the midst of a recession, and the construction industry was badly hit. I remember applying for 100s of jobs, and the rejection letters kept coming in. I still have most of them in storage, although in retrospect, I would have found it difficult offering me a job from the resume I had submitted. Back then, we had to go the library and/or the career service and solicit assistance. There was no Internet, no way of gaining advice or opinions online.
In any case, I was fortunate to have my scholarship extended and faced with the realities of a challenging business environment, I had to make some tough decisions, persist and follow my dream or be pragmatic and ride the trend.
Back then, it was clear that Information Technology was becoming more and more important. I was not aware of email or the Internet at that time. I recall in our lab assignments, being fascinated by the ability to send a message (can't recall the protocol used) to other lab computers on a local network. I made a decision to pursue a course in Computational Modeling and Finite Elements in Engineering Mechanics. There was another motivations for pursuing this. I believed I needed to select something that sounded "hard", in order to differentiate myself. This took me to the Swansea in Wales. I throughly enjoyed my time in Swansea. It was a smaller city, by the sea and had a University town feeling to it. Manchester was great, but a bit too overwhelming.
I subsequently spent 4 years in Swansea, having received a scholarship from the University to pursue a PhD in Computational Modeling. My research was focussed on developing software for modeling the casting process, a process used by foundries to manufacture metallic components. The Institute of Numerical Methods in Engineering at Swansea was famous for having invented key aspects of the Finite Element Methods. You can find some of my papers on Google Scholars.
Upon graduating from the University of Wales in Swansea, I joined CD-Adapco. CD-adapco is the world's largest independent CFD-focused provider of engineering simulation software, support and services. I was part of the development team delivering industrial strength engineering simulation. This was mathematical optimization in Engineering. CD-Adapco was a very exciting company to work for, with great engineers solving some of the most fascinating engineering challenges. While exciting, my attention was fast shifting to the Internet, and a new technology called Java that held a lot of promise. Apparently, Java would be everywhere. You write once and run anywhere. It seems so long ago, but in my view, Java has indeed delivered on much of that promise. This was the trend that ultimately led me to IBM where I took on the role of Consulting IT Specialist.
Today, with Big Data and Analytics, I find myself closer to my base, the place I began my professional career. I left the world of mathematical optimization in Engineering encouraged by an urge to be connected more with real people, technologies that had a direct and visible impact on our daily lives. The advances made in Internet connectivity (bandwidth and speed) and the reduction in the cost of storage is enabling organizations to capture potentially useful data. Analytics is now considered a business imperative, providing organizations able to harness its potential a significant competitive advantage over their peers.
Having taken a break from blogging, I have returned, intending to use this blog to engage in conversations on Cloud Computing, Big Data, Mobile, Social Business, Enterprise Marketing Management and Commerce.
Subscribe to:
Posts (Atom)