By Sid Probstein
Many people say the Big Data movement is about "unstructured data," but they are missing an important point. Log files and click streams are not really unstructured; they are just relatively unfamiliar, and sometimes variable structures. What about the other sources that contain important information about customers and buying habits? These offer a wealth of value that is today largely untapped. Emails, open-ended survey questions, Web forms, call logs, discussion boards, SharePoint and Wiki sites — this is the true "unstructured content" that completes the picture of customer perception. It is moreover the best source to create a useful internal view — employee and partner behavior for example.
Unstructured data is not that different from structured data. It tells you what happened, and probably where. Unstructured content, on the other hand, explains WHY things happen. The inability to process and analyze this unstructured content is what prevents most of the Big Data players from presenting a comprehensive view. Aggregating and analyzing unstructured content is challenging for the following reasons:
- Variety. Human expression is shockingly diverse, varies by location and changes over time. Assembling the elements required to analyze and mine unstructured content requires a lot of expertise and software.
- Velocity and volume. The rate or "velocity" at which this information arrives is astounding and growing, and the expected speed in analyzing it is equally incredible.
- Complexity. Not all data is equal. When data systems consider only a single "slice" of data in isolation, they miss complex relationships between data types and produce more questions than answers. A complete informational picture requires analysis and correlation across the entire set.
Vendors in the unified information access (UIA) space have been focused on aggregating, enriching and analyzing unstructured content — as well as data — for years. These vendors provide technology that complements Big Data infrastructures by bringing unstructured content into the analysis framework and by presenting Big Data in context to remove information blind spots in business applications and automated business processes.
Recent IT market consolidation renders the Big Data arena an even more difficult space to compete in. Partnering with a UIA expert gives companies an advantage over the big stack mega vendors. A $3.5 billion market in software and services, UIA is one of the fastest areas of IT development, growing at 24 percent CAGR according to IDC. UIA technology includes the essential text analytic capabilities such as entity, concept, key phrase and sentiment analysis that help transform unstructured content into meaningful insight. This provides the critical "why" that allows companies to act swiftly and decisively, gaining a competitive edge with significantly faster time-to-value.
Sid Probstein is co-founder and CTO of Attivio, responsible for product and technology strategy, implementation and delivery. He has more than 20 years of experience in managing R&D organizations and delivering award-winning, high-value enterprise software and solutions.
- Oracle Expands Cloud Portfolio With Storage, Big Data, Integration Services
- Channel Partners Names T.C. Doyle Executive Editor, Information Technology & Cloud
- Symantec Divides Security, Information Management Businesses
- Resellers Get Information Builders' BI Offering
- What Is a Cloud Virtual Data Center?