Thursday, March 6, 2008

Building an enterprise semantic layer

I recently read a blog post on FastForward by Paula Thorton mentioning a Reuters technology infrastructure called Calais. The purpose of Calais, to put it in simple terms, is to provide a service to automatically put context to unstructured data. The unstructured data could be in the form of news articles, blog postings, or any other text based content. The Calais web service would then identify the entities, facts and events based on the natural language descriptions in the text. The service would then return a descriptive model of the unstructured data.

The reason why this is important or at least is generating excitement is that many believe Web 3.0 will be based on this type of inference of content. Web 2.0 is highly dependent on many individuals applying context themselves through concepts like tagging, social sharing and general broader collaboration.

Now, the majority of the excitement and focus is on the user base that is currently driving the Web 2.0 movement. As someone who has done a majority of my professional work inside the four walls of an enterprise in areas such as data architecture, data integration, business intelligence and corporate performance management, I see incredible opportunity in something like Calais. A majority of the effort associated with building internal measurement systems like dashboards and management reporting applications is in developing a single semantic layer of metadata. Aside from the effort to develop the semantic layer, it is typically inconsistent because the organization lacks the ability to agree on a common business taxonomy that describes the enterprise.

When we think about bringing Web 2.0 technologies (social networks, wikis, prediction markets, blogs) to the enterprise, the critical first step is building out a metadata layer that puts descriptors on data to create information and starts to establish context. There is plenty of unstructured data floating around organizations in the form of documents, internal web sites, email communications and traditional knowledge centers. In addition, there is an inordinate amount of structured data, which I would argue, can have very little value when it lacks context. It requires speaking to an info-worker who can explain the report, dashboard or data set. These sets of structured data are typically living/trapped within silos of different functions such as finance, marketing, sales and operations. If you look at the collective structured and unstructured data/information that an organization captures across disparate groups/functions, being able to "infer" the entities, facts, and events will start to build the context needed to make better informed decisions. Add the ability to link people through the social network of an organization to share and disseminate information, and you are starting to see the value in implementing a Web 2.0 platform across the enterprise.

No comments: