Thursday, March 27, 2008

What’s all the hype around unstructured data?

Check out the DM Review article by Michael GonzalesComprehensive Insight: Structured and Unstructured Analysisfor an introduction into the topic of unstructured data and releasing its potential.

Michael provides some insight into the challenges organizations face in dealing with unstructured data vs. structured and how technology has been evolving to help better leverage such information assets.

With an estimate that “more than 85 percent of all business information exists as unstructured data” it is no wonder that technology vendors are putting more focus on how to extend their products to make this information more accessible and usable.

Although the article gives some interesting insight into the evolution of the technologies it doesn’t provide any insight into how to actually integrate and store this data in the traditional data warehouse. How does one integrate such unstructured data in the form of documents, images, video content, and other multimedia formats? Is such data actually relevant to data warehouses and CPM processes? Perhaps not the actual content but perhaps the metadata associated with the content (e.g. x number of documents types, average occurrence of y in videos of type z, number of emails on subject w, etc).

Vendors that are beginning to address the storage and integration of such unstructured data into existing solutions are primarily the large database vendors. Certainly “Big Blue” (IBM) boasts support for analysis of unstructured data with its DB2 Warehouse 9.5 product offering and Microsoft SQL Server 2008 is touted by Microsoft to “provide a flexible solution for storing and searching unstructured data”.

Although the advances in vendor technologies are providing a means of storing such information in a manner that makes it accessible, is the typical organization yet ready to focus its resources on doing so? When so many have yet to fully realize the benefits of provisioning to the business traditional structured data, e.g. Financial, Operational, Customer, etc, you have to beg the question as to whether this should yet be a high priority?

What is your organization doing? Have you implemented any creative solutions? What are the demands from the business?

No comments: