"Big Data" is here, and it is even more confusing than the “cloud." Incomplete and obsolete definitions are being used to define Big Data, which confuses customers and vendors alike. How can we get past the confusion in the market and identify opportunities to successfully help companies implement Big Data?
First, let's start with the definition of Big Data. More than a decade ago, META Group analyst Doug Laney introduced the challenges of data growth with the three Vs: volume, velocity and variety. This is the description that is still used today as the starting point of describing Big Data. However, as data has evolved over the past decade, we now start to ask ourselves what each of these Vs really mean:
- Does "volume" refer to the size of a database? Or size of a data object? Or the cumulative size of all data within an organization?
- Does "velocity" refer to the speed of data coming in? The speed of data acquisition? The speed of data processing? The speed of visualizing and charting data?
- And does "variety" refer to a variety of data types? Or a variety of sources? Or a variety of applications supported?
Although the answer could be "All of the Above," this isn't a helpful way to think about Big Data or to start designing potential product suites to help companies with their business needs. To be less clever and more straightforward, let's just say that Big Data is any sort of data that can't be stored or analyzed by a business's existing database or analytics solution. And let's start breaking out the concept of Big Data into different types of use cases based on the type of Big Data being used and the tools involved in supporting Big Data.
The increasing size of data volumes we can call Expanding Big Data. This refers to data that is outgrowing its original database or data warehouse. As this data grows from gigabytes to terabytes, companies typically seek Big Data appliances that combine hardware with either a Hadoop distribution (open-source software for distributed computing)or a data warehouse. Examples include the Dell Apache Hadoop solution, EMC Greenplum Data Computing Appliance, IBM Netezza, Netapp Hadoopler, Oracle Big Data Appliance and Teradata Extreme Data Appliance.
Networked and correlated data on a massive scale is Social Big Data. The most obvious examples of Social Big Data come from social media monitoring, where marketing and service professionals seek to monitor Facebook, Twitter and other social networks to identify brand sentiment and business opportunities. Although social media monitoring solutions such as Salesforce's Radian6, Marketwire's Sysomos and Crimson Hexagon can provide social monitoring and analytics, companies seeking to integrate social information with existing product and customer information need to bring this information in-house. In this case, data integration technologies become important in combining social data with CRM and other traditional enterprise data transactions. This data integration consists of three steps often abbreviated as ETL: