Talk DIG: 2008-04-20

Friday, April 25, 2008

Follow Up On DW Appliance Post

There has been a response in the market place to the continued rise of the DW appliance. Last week, I made a post on the DW and BI appliance market and some of the upstart players that have created this new market. Well, based on this article, the DW market leader has responded. Teradata is now providing entry level products to compete with the upstarts. Even more interesting is that they are now providing their database software for free for test purposes! Netezza had some interesting responses to the Teradata strategy, including that they are glad to see Teradata finally recognize the DW appliance space.

An interesting side note is that Teradata's marketing slogan on their website is "The Power of Choice". It is very Keanu Reeves from the Matrix Trilogy. Wondering if there is some sublimal message there with HP's DW solution, Neoview.

The Data Integration Challenges and BI (Part Two)

In Part One of this topic Brian introduced some of the key data integration challenges for a typical BI engagement and left off by highlighting some of the specific data integration challenges that included:

(i) Transformation of data that does not meet expected rules (contents of data elements and the validation of referential integrity relationships for example)

(ii) Mapping of data elements to some standard or common value

(iii) Cleansing of data to improve the data content (for example to cleanse and standardize name and address data) that extends the data transformation process a step further

(iv) Determining what action to take when those integration rules fail

(v) Ensuring proper ownership of the data quality process

In this second part of the article he takes a little deeper into several of these components.

Data transformations may be as simple as replacing one attribute value with another or validating that a piece of reference data exists. The extent of this data validation effort is dependent on the extent of the data quality issues and may require a detailed data quality initiative to understand exactly what data quality issues exist. At a minimum the data model that supports the data integration effort should be designed to enforce data integrity across the data model components and to enforce data quality on any component of that model that contains important business content. The solution must have a process in place to determine what actions to take when a data integration issue is encountered and should provide a method for the communication and ultimate resolution of those issues (typically enforced by implementing a solid technical solution that meets each of these requirements).

As Organizations grow via mergers and/or acquisitions, so too does the number of data sources and eventually lack of insight into overall corporate performance. Integration of these systems upstream may not be feasible and so the BI application may be tasked with this integration dilemma. A typical example is the integration of financial data from what used to be multiple Organizations or the integration of data from different geographical systems.

This integration is a challenge. It must consider (i) the number of sources to be integrated, (ii) commonality and differences across the different sources, (iii) requirements to conform attributes [such as accounts] to a common value but retain visibility to the original data values and (iv) how to model this information to support future integration efforts as well as downstream applications. This task is indeed a challenging one. All attributes of all sources must be analyzed to determine what is needed and what can be thrown away. Common attribute domains must be understood and translated to common values. Transformation rules and templates must be developed and maintained. The data usage must be clearly understood especially if the transformation of data is expected to lose visibility into any data that is transformed (for example if translating financial data to common charts of accounts).

Making Information Accessible to Downstream Applications

With this data integration effort in place, it is important to understand the eventual usage for this information (downstream applications and data marts) and to ensure that downstream applications can extract data efficiently. The data integration process should be designed to support the requirements for integrating data, that is to support the data acquisition and data validation/data quality processes (validation, reporting, recycling, etc), to be flexible to support future data integration requirements and to support historical data changes (regardless of any reporting expectations that may require a subset of this functionality requirement). The data integration process should also be designed to support the push or pull of data in addition. With that in mind the data integration model should provided metadata that can assist downstream processes (timestamps for example that indicate when data elements are added or modified), partition large data sets (to enable efficient extraction of data), reliable effective dating of model entities (to allow simple point in time identification) and be designed consistently.

The data integration process may at first seem a daunting process. But by breaking the BI architecture into it’s core components (data acquisition, data integration, information access), developing a consistent data model to support the data integration effort, establishing a robust exception handling and data quality initiative and finally implementing processes to manage the data transformation and integration rules, the goal of creating a solid foundation for data integration can be met.

Thursday, April 24, 2008

Seeing What You Want To See

Before reading further, please watch this 1-minute video:

I first saw this video in an article on “car vs. bicycle” traffic accidents, which noted that motorists almost always say “I never saw him” or “She came out of nowhere” after snapping the bike and/or rider like a twig. The video, produced as a public-service message by Transport for London, is a brilliant illustration of how people often fail to see a change in their surroundings because their attention is elsewhere.

I’ll save my post on bicycle safety laws for another day, and instead ask whether this same phenomenon applies in BI or Performance Management applications – do your reports, scorecards, and dashboards show you “what you want to see” or are they designed so that you can spot the “moonwalking bear” in your company’s performance?

Here’s just one example, from an article in USA Today, where data was potentially mis-interpreted and mis-used with disastrous results. Documents from Vioxx lawsuits indicate that Merck & Co. apparently downplayed evidence showing the pain-killer tripled the risk of death in Alzheimer’s-prone patients. Was Merck so anxious for the clinical trials to be successful that they “saw what they wanted to see” in the results? The company claims they did nothing wrong; we’ll see what the lawsuits ultimately determine.

Sometimes the data is good, but the visualization of that data is bad. Dashboards that look like this

are useful if you’re interested in variance analysis of high-level metrics. But the visualization (essentially a hardcopy report with traffic-lights) doesn’t help with the really interesting stuff, which are the drivers underneath those high-level metrics.

Advanced visualization methods are becoming more prevalent in dashboard designs. Over the next few weeks, we’ll look at some examples of visualization methods that can improve awareness of underlying data and help spot the moonwalking bears.

In the meantime, do you have examples of good techniques you’ve used or situations where better visualization of data would’ve helped improve performance?

Boston Globe and Central Intelligence Agency to Speak on Data Architecture at DIG 2008

I am pleased to announce that the Boston Globe and Central Intelligence Agency will be speaking on the topic of Creating One Version of the Truth at DIG 2008. In addition to these two organizations, Dan Power, an industry guru, will be presenting on master data management.

Dennis Newman will present how the Boston Globe has established an Enterprise Information Management (EIM) initiative to address data integrity challenges to support an enhanced customer reporting platform. The Globe established a set of common definitions for customer-centric metrics to deliver sales and marketing analytics.

David Roberts will discuss the Central Intelligence Agencies approach to maximize value from enterprise data assets. The CIA is highly dependent on quality data and information to drive decisions. David will present the CIA’s enterprise data architecture and the value that the intelligence community has gained by having a robust data platform.

Dan Power from Hub Solutions design will present on the importance of establishing a master data management initiative and platform. Dan has over 20 years in enterprise technology with a specialization in master data management (MDM), customer data integration (CDI) and enterprise data architecture.

Wednesday, April 23, 2008

Top 10 Reasons to Attend DIG 2008

For those either already registered or considering attending the DIG event in May, here are some serious and not-so-serious reasons to attend:

Architect your enterprise data warehouse and create your personal “one version of the truth” to rationalize your missing expense receipts from your DIG conference trip.

Have Jeffrey Ma sign your copy of “Bringing Down the House” after he talks about harnessing the power of rational, quantitative analysis to make smarter business decisions.

See case studies from Reliant Energy, Kelley Blue Book and Infosys on how their respective organizations are embedding advanced analytic techniques into their management processes to make better decisions.

Post to your blog, co-create your wiki, join the DIG social network and Twitter your impressions of DIG in real-time to become part of the Enterprise 2.0 phenomenon.

Meet Andrew McAfee, who coined the term “Enterprise 2.0”, as he discusses the value creation that organizations are realizing through Web 2.0 concepts and technologies.

Hear the Boston Globe and Central Intelligence Agency speak about leading practices to capture, organize and establish a common set of information assets to create one version of the truth.

Test your driver tree analysis techniques and advanced analytic dashboards at the card tables to pay for your conference registration.

Listen to Charles Fishman, award-winning journalist at Fast Company and author of “The Wal-mart Effect”, speak about leading organizations that are using information to uncover insights about their customers and what it means to be a “fast company”.

Understand how Google, AT&T and the BBC are all leveraging Web 2.0 technologies such as blogs, wikis, social networks, tagging and prediction markets to drive mass collaboration inside and outside the organization.

Attend the only conference that combines the theories, concepts and real world practical examples of data architecture, analytics and Enterprise 2.0 in a single agenda.

100% committed

When thinking about E2.o and social computing, the question we must ask is: Are we getting the most that we can from the people in our business and from its community? Do we have 100% of their energy and imagination? Mobilizing people and teams is the aim of E2.o.

Several articles/posts from the past few weeks push on this very topic. The McKinsey Quarterly (registration required) published an article on Innovation lessons from Pixar which highlights how Brad Bird, Pixar’s Oscar-winning director, motivates his people by including them in the dialog. Fast Company spoke with Gartner researcher, Tom Austin, about how IT’s Not about the Technology but rather information technology is about leveraging the people. And Susan Scrupski posted her comments in SocailMediaToday on Corporate Antisocial Behavior: the Enemy is Us.

Each of these articles pushes us, in one way or another, to focus on the key driver behind business success – motivated teams of people. People are paramount to making things happen. E2.o tools are technologies that magnify and broadcast the culture that empowers people.

I see three opportunities in E2.o
1) Get people talking about the business - E2.o can highlight and build conversation around the “social objects” of a business. In my last post, I spoke about social objects. These are the things that allow people to connect and be in dialog. In business, these objects are things like business goals, customer wants, or new innovations.
2) Get the facts to the people - E2.o can reduce what I call the perception gap between what you think is happening in the business and what is actually happening in the business. Once facts are clear, true dialog and problem solving begins to occur. E2.o tools can integrate business intelligence into the mainstream business conversations. These same tools can then be used to solve problems collaboratively by tapping both experts' thoughts and front line operators' experiences into creative solutions.
3) Equip people with a contextual understanding of the business - E2.o can provide a more holistic understanding of a business. Through these tools, people are exposed to and vicariously taught about tangential yet pertinent topics beyond their specialized skills. This broader knowledge gives these folks the insight to act or respond with a systems-thinking mindset that is coherent with the overall business. In this way, people are more naturally prepared to act in manner that supports and adapts the business the their changing marketplace.

All three of these benefits of E2.o foster a more pronounced business culture - good or bad.

Do you agree?

Tuesday, April 22, 2008

Example Prediction Market for IT Projects

A colleague of mine forwarded me this great research paper on an example internal prediction market for an IT project. The research is not fully complete, but there were a few interesting nuggets that support the usage of internal markets for accurate predictions This is the topic that Bo Cowgill from Google will be presenting at DIG next month.

The research paper highlights 4 key needs for an accurate prediction market

Ability to aggregate information and knowledge from individuals
Incentives to encourage active participation
Feedback to participants based on market prices
Anonymous trading

The results from the case study were quite positive. Acxiom Corporation was the test case and used the Inkling Markets software to host the market to predict 26 milestone events of an internal IT project. Two results jumped out at me. The market was 92% accurate on the milestone events (24 for 26) and had an 87% participation rate (33 participants). There was also a higher perceived level of collaboration as a project team, which had positive impact on the outcome.

The authors of the paper are Herbert Remidez, Jr. Ph.D. and Curtis Joslin from the University of Arkansas. Looking forward to seeing further output from the research.

The Price You Pay When Your Data is Questioned

I read this article in yesterday’s Wall Street Journal that I found interesting and relevant to DIG. There is always a debate on the value of having “one version of the truth” and the necessity of accuracy in corporate data. To date, that hasn’t been the case with certain types of performance measurement, especially website visits. Well, comScore is paying the price through shareholder value and their stock price. The issue stems from the accuracy of “clickstream” data that comScore, like their competitor Nielson, collect and track the popularity of websites on the web. Google announced that advertising clicks grew by 20%, while comScore reported only a 1.8% growth. Well, who is right?

This data is critical for marketers when deciding where to spend their ad dollars. You should read the full article to gain a full appreciation of the entire story, but here are a few snippets that are relevant to the importance of having “one version of the truth”.

Sarah Fay, chief executive of both Carat and Isobar US, ad companies owned by Aegis group said “We have not expected the numbers to be 100%”. It’s good to see that no expectations were being set out of the gates. Not sure this would fly when discussing something like revenue for an organization.

The article goes on to point out that comScore and Nielson data doesn’t always match up. “To complicate matters, disparities between comScore and Nielson data are common, as the two companies use different methodologies to measure their audience panels.” This isn’t something we don’t here inside the four walls of a corporation for something like a measures calculation rule.

Brad Bortner, an analyst with Forrester Research points out “There is no truth on the Internet, but you have two companies vying to say they are the truth of the Internet, and they disagree.”

And finally, my favorite quote in the article came from Sean Muzzy, senior partner and media director at digital ad agency http://www.ogilvy.com/neo/. “We are not going to look at comScore to determine the effectiveness of Google. We are going to look at our own campaign-performance measures”. This would be the equivalent of “if you don’t like the results, try a different measure.”

I have always wavered on the need for accurate data for certain types of measurement, especially something like clickstream analysis. I guess that wavering has now fallen to the side of the camp with the other types of data that require precision and accuracy.

Monday, April 21, 2008

DIG Bits & Bytes

I had a couple of interesting articles come across the “ether” that I felt were newsworthy enough to post and comment on. They each hit on the themes of DIG: Data IN, Information OUT and Knowledge AROUND (The articles are not listed in that order):

Outsourcing your data warehouse

In this article on TWeb by Jannie Strydom, the idea of outsourcing an organization’s data warehouse is proposed. The primary drivers are around lack of skills to properly maintain and keep the warehouse relevant to the business. As much as I agree with outsourcing the components of a warehouse that are repetitive and process oriented (loading data, maintaining production processes, fixing errors), it is a slippery slope to outsource aspects that are critical to meet the business needs. A strong understanding of an organization’s business model and needs should be weighed heavily against the value gained (typically cost savings) by outsourcing certain aspects of a data warehouse, especially those that can help facilitate better management of performance.

More on Enterprise Mashups

I made a post a few weeks back on BI and enterprise mashups. This news story came out of the O’Reilly Web 2.0 Expo that caught my eye because of the mention of integration with Excel. In particular, the article discusses going beyond the geographic mashups being done with Google Maps and starts “mashing” multiple external data sources for enhanced analytics inside of Excel. For example, pulling competitor data directly into your own organization’s performance (I am working at a client where modeling this in their data mart has become a bit of a challenge). Two software companies that are mentioned in the article that are providing these type of mashup services and software are Kapow Technologies and JackBe Corporation. I have not kicked the tires on these two products but they sound extremely valuable to a business user trying to consume multiple and different types of data sources into a single “view”. I would like to understand how these products might fit into an overall information architecture from a consistency and “one version of the truth” perspective.

Business Intelligence and My Carbon Footprint

This one builds on the “everything must be environmental” and “green movement”. At the WTTC Global Travel & Tourism Summit in Dubai, Travelport announced their new Carbon Tracker reporting tool. It is designed for travel agencies and corporations to track their carbon footprint when it comes to corporate travel. It provides different analytic views using standard environmental calculations. The reporting tool includes travel budget and environmental impact analysis and comparison to other modes of travel (car, bus, train, flying). There is a slick product overview with screenshots on the Travelport website. I am considering using this tool to calculate the carbon footprint of DIG in Las Vegas versus another location in the US. I may need to recommend that the speakers ride bicycles to the event to reduce our environmental impact. I know Mark Lorence would be up for it (Mark is an avid bicyclist enthusiast that continues to educate me on the nuances of professional cycling…we have a prediction market already established on his first post that links Lance Armstrong to Business Intelligence). I may need to start referring to the first theme of DIG as "An Incovenient One Version of the Truth".