Tuesday, April 22, 2008
The Price You Pay When Your Data is Questioned
This data is critical for marketers when deciding where to spend their ad dollars. You should read the full article to gain a full appreciation of the entire story, but here are a few snippets that are relevant to the importance of having “one version of the truth”.
Sarah Fay, chief executive of both Carat and Isobar US, ad companies owned by Aegis group said “We have not expected the numbers to be 100%”. It’s good to see that no expectations were being set out of the gates. Not sure this would fly when discussing something like revenue for an organization.
The article goes on to point out that comScore and Nielson data doesn’t always match up. “To complicate matters, disparities between comScore and Nielson data are common, as the two companies use different methodologies to measure their audience panels.” This isn’t something we don’t here inside the four walls of a corporation for something like a measures calculation rule.
Brad Bortner, an analyst with Forrester Research points out “There is no truth on the Internet, but you have two companies vying to say they are the truth of the Internet, and they disagree.”
And finally, my favorite quote in the article came from Sean Muzzy, senior partner and media director at digital ad agency http://www.ogilvy.com/neo/. “We are not going to look at comScore to determine the effectiveness of Google. We are going to look at our own campaign-performance measures”. This would be the equivalent of “if you don’t like the results, try a different measure.”
I have always wavered on the need for accurate data for certain types of measurement, especially something like clickstream analysis. I guess that wavering has now fallen to the side of the camp with the other types of data that require precision and accuracy.
Monday, April 21, 2008
DIG Bits & Bytes
Outsourcing your data warehouse
In this article on TWeb by Jannie Strydom, the idea of outsourcing an organization’s data warehouse is proposed. The primary drivers are around lack of skills to properly maintain and keep the warehouse relevant to the business. As much as I agree with outsourcing the components of a warehouse that are repetitive and process oriented (loading data, maintaining production processes, fixing errors), it is a slippery slope to outsource aspects that are critical to meet the business needs. A strong understanding of an organization’s business model and needs should be weighed heavily against the value gained (typically cost savings) by outsourcing certain aspects of a data warehouse, especially those that can help facilitate better management of performance.
More on Enterprise Mashups
I made a post a few weeks back on BI and enterprise mashups. This news story came out of the O’Reilly Web 2.0 Expo that caught my eye because of the mention of integration with Excel. In particular, the article discusses going beyond the geographic mashups being done with Google Maps and starts “mashing” multiple external data sources for enhanced analytics inside of Excel. For example, pulling competitor data directly into your own organization’s performance (I am working at a client where modeling this in their data mart has become a bit of a challenge). Two software companies that are mentioned in the article that are providing these type of mashup services and software are Kapow Technologies and JackBe Corporation. I have not kicked the tires on these two products but they sound extremely valuable to a business user trying to consume multiple and different types of data sources into a single “view”. I would like to understand how these products might fit into an overall information architecture from a consistency and “one version of the truth” perspective.
Business Intelligence and My Carbon Footprint
This one builds on the “everything must be environmental” and “green movement”. At the WTTC Global Travel & Tourism Summit in Dubai, Travelport announced their new Carbon Tracker reporting tool. It is designed for travel agencies and corporations to track their carbon footprint when it comes to corporate travel. It provides different analytic views using standard environmental calculations. The reporting tool includes travel budget and environmental impact analysis and comparison to other modes of travel (car, bus, train, flying). There is a slick product overview with screenshots on the Travelport website. I am considering using this tool to calculate the carbon footprint of DIG in Las Vegas versus another location in the US. I may need to recommend that the speakers ride bicycles to the event to reduce our environmental impact. I know Mark Lorence would be up for it (Mark is an avid bicyclist enthusiast that continues to educate me on the nuances of professional cycling…we have a prediction market already established on his first post that links Lance Armstrong to Business Intelligence). I may need to start referring to the first theme of DIG as "An Incovenient One Version of the Truth".
Friday, April 18, 2008
The Data Integration Challenge and BI (Part One)

The Data Integration Challenge and BI
The goal of any BI solution should be to provide accurate and timely information to the User organization. The User must be shielded from any complexities related to data sourcing and data integration. It is up to the development team to ensure that they deliver a robust architecture that meets these expectations.
The most important aspect of any BI solution is the design of the overall BI framework that encompasses data acquisition, data integration and information access. There are challenges in designing each of these components correctly but often data integration is the one that is the most complex yet important component of the BI solution that must be developed. A solid architecture is required to support the data integration effort (see Claudia Imhoff’s article on why a Data Integration Architecture is needed).
So what are some of key the challenges and considerations that should be addressed when thinking about data integration?
First, unless your project is tasked with building "one off’’ or departmental type solutions, it is important to separate the integration component of the architecture from the analytical component (this is the point where some readers may disagree, but separation of these components allows for a more flexible and scaleable architecture over time – a must for any Enterprise solution today). With this rule in place, the data integration team can focus on what they do best (data integration) and the analytical team can focus on what they do best (designing for reporting and analytics).
With this structure in place the data integration team has some tough challenges ahead of them that must be addressed:
(i) Identifying the correct data sources of information
(ii) Identifying and addressing data quality and integration challenges
(iii) Making information accessible to downstream applications
Before data can be integrated it must be identified and sourced. As simple as this sounds it in not unusual for an Organization to have multiple sources of the same data. It is important to identify the data source that is the true ‘system of record’ for that information, contains the elements that support current information requirements and can extend to support future information requirements. Choose the data source that makes the most sense and not the one that is the easiest to get to.
Once the appropriate sources of information have been identified, the integration team must then determine how best to access that information. The team must identify how often the data needs to be extracted (once a day, week, etc) and how the data will be extracted (push or pull, direct or indirect). The frequency should be based on future as well as current requirements for information. It is easier to build based on what is required for today than for what may be planned or needed tomorrow. Data volumes should be a consideration when determining the optimum acquisition method and often a more frequent data sourcing process may be beneficial irrespective of the final reporting expectations (this is a good example of where separation of integration and analytics has merit since the data integration layer can be designed for optimum integration without impact to the requirements of the analytic environment).
Getting at the data itself is often more politically challenging that technically challenging. Source data may exist in internally developed as well as packaged and externally supported applications.
Pull paradigms are good when:
(a) Tools are available that can connect directly to the source systems (that’s a given) and when needed provide options for change data capture mechanisms
(b) Access to the systems is allowed; just because you can connect to a source system does not mean that the IT organization will allow that to happen – these solutions can be invasive and direct access may not be welcomed or allowed (so make sure you consider this)
(c) Source volumes are small and all data is being extracted in full or there is a means to identify new or changed records. The latter is a definite consideration when data volumes are large but there must be a means to identify these changes and it must be reliable and efficient else source invasiveness becomes a concern (especially if the source system must perform tuning to support these downstream processes)
Push paradigms (even when enterprise tools for pulling data are available) are good
options when:
(a) Data with the desired granularity, frequency and content is readily available in a different format and can be leveraged
(b) Direct access to source systems is not an option and/or the IT prefers to source the data that is needed. In this scenario a solution for change data capture may need
to be developed
(c) It is easier for IT to identify the data to be pulled and provide it instead of downstream applications pulling the data directly
Before determining the best choice for your project you also need to consider the limitations of the tools available within your environment
Identifying and Addressing Data Integration Challenges
Once the method for data acquisition has been addressed, data must be cleansed, transformed and integrated to support downstream applications such as data marts. So what does this mean and what are the potential challenges?
The size of the data integration effort is dependent on several factors: (i) the number of data sources being integrated and the number of source systems from which data is provided (ii) quality of data within each of those systems, (iii) quality of data and integration across those source systems, (iv) the Organization’s priority for improving data quality in general. When integrating data the Organization has the choice of enforcing data quality during the integration process or ignoring it.
So what are some of the key challenges for a typical data integration effort? These typically include:
(i) Transformation of data that does not meet expected rules (contents of data elements and the validation of referential integrity relationships for example)
(ii) Mapping of data elements to some standard or common value
(iii) Cleansing of data to improve the data content (for example to cleanse and standardize name and address data) that extends the data transformation process a step further
(iv) Determining what action to take when those integration rules fail
(v) Ensuring proper ownership of the data quality process
So what are some of the challenges and considerations within each of these areas? Tune in to Part Two of this article when we will address some of these considerations as well as addressing the need for making information easily accessible downstream of the integration process.
Thursday, April 17, 2008
Politics: There's No "I" in "DIG"
What do sports, politics and DIG have in common? Well, of course, it’s prediction markets. There’s Protrade, and Tradesports and the Iowa Electronic Markets and, well,
In the last few years, some individuals and organizations have begun to make a dent in this space; notably among them Get Out the Vote: How to Increase Voter Turnout by a couple of Yale professors who base their recommendations on actual research. More recently, Brendan Nyhan at Duke reports on his blog the founding of “The Analyst Institute,” which states as its mission “for all voter contact to be informed by evidence-based best practices. To ensure that the progressive community becomes more effective with every election, we facilitate and support organizations in building evaluation into their election plans.”
It’s not as if there isn’t incentive to win, and it’s not as if there’s a lack of interested funding. So why is politics behind the curve on data and analytics? Is there a rational (or irrational) belief that politics need to be managed by gut? Or are there structural reasons? Or am I mistaken in thinking politics is late to the game, and that McCain is hiding the next Billy Beane somewhere on the Straight Talk Express?
Wednesday, April 16, 2008
Can I get the Consumer Reports for these Appliances?
I wanted to pull together a quick summary of the Data Warehouse and Business Intelligence Appliance space. It is a continually maturing space with a set of strong vendors still fighting for market share. Teradata, DATAllegro, Netezza, NeoView from HP and Dataupia all provide solutions that combine hardware, operating system and database software into a single unit. Calpont, Kognitio, Vertica and ParAccel provide software only and platform independent solutions. The benefits of these appliance solutions include a reduced total cost of ownership, increased performance through massively parallel systems, reduced administration and database administrators, and high availability and scalability. Where these solutions typically sell is through a proof of concept where a customer has a very specific performance issue that the vendor can show proven results. Industries that collect massive amounts of transaction data such as retailers or web clickstream data are inherit sweet spots for DW appliances.If you aren’t familiar with the DW Appliance space, I would recommend taking a look at a series of articles from Krish Krishnan (intro, part 1, part 2) on the topic. I also came across this blog posting fact or fiction that unwinds some of the misconceptions on the DW appliance space.
Another interesting area that has followed the DW Appliance trend is in Business Intelligence. I have come across fewer vendors here, but Celequest (acquired by Cognos) and Ingres Icebreaker are two that provide a bundled hardware, operating system, database software and reporting tools. Business Objects has also partnered with Netezza to provide a single point solution in data warehousing and business intelligence. All the solutions adhere to standards which allows for integration with a majority of the BI vendor software that are SQL based tools.
Tuesday, April 15, 2008
Deregulation of Utility Computing and my Gmail account
Google last week announced their foray into utility computing with their Google App Engine. Google is opening up their computing horsepower to allow scalable, web-based application development for anyone. And it’s free. They aren’t the only ones doing providing utility computing. Amazon has been providing a similar platform called Elastic Compute Cloud (EC2) for a “resizable” computing capacity cloud and their Simple Storage Service (S3) for inexpensive storage services.
The idea of utility computing has been batted around as an idea for a while, but Nicholas Carr’s book “The Big Switch” makes an interesting correlation with the switch manufacturers made 100 years ago from providing their own electricity to tapping into the expanding power grids. Carr makes a compelling case that this is the direction of computing for businesses and consumers.
If you aren’t familiar with Carr, he is a bit of a lightning rod in the IT industry based on his controversial point of view of IT. He was just named #93 on the Ziff Davis Most Influential People in IT. Not everyone necessarily agrees with Carr’s view of IT, but he has forced the industry to take a look in the mirror and question the value being provided.
So you may be asking yourself, what does this post have to do with DIG and why did I start on the topic of utility computing? Honestly, there is not direct relationship beyond I have been having some “constructive” budgetary discussions with a client around disk storage sizing. When I got home tonight I asked myself “This has to be easier”, thus my research on utility computing. Why is it that I can get 6.6 gigabytes of free storage from Google for my email but not enough storage for a data mart? (btw – this is a hypothetical question and doesn’t need to be answered via a comment).
(sigh)
Monday, April 14, 2008
Social Objects for Business Conversation
MacLeod defines Social Objects as “The Social Object, in a nutshell, is the reason two people are talking to each other, as opposed to talking to somebody else. Human beings are social animals. We like to socialize. But if think about it, there needs to be a reason for it to happen in the first place. That reason, that "node" in the social network, is what we call the Social Object.”
Honestly, I don’t know if I had ever thought about the idea before seeing these posts – but it seems to make a lot of sense. It’s certainly true for me. There exists some social object in the mix most every time I talk to my friends or colleagues. It might be a movie. It could be a baseball game. It could be a friend’s job situation. It could be a finance report. The bottom line is that “social objects” are the basis for most all of my conversations. It’s a fascinating concept if you think about it.
The idea got me thinking. What are the typical social objects within business conversation? What social objects attract the most attention and could become the basis for a robust conversation? It would seem to me that these objects could be the obvious building blocks of a productive corporate social network or social computing application. Here’s my informal list from a 5 minute brainstorm. I have put an “x” next to the Top 10 form my perspective!
Variable compensation plans
Performance objectives
Market factors
Executive Leaders (x)
Management (x)
Strategy
Mission statement
Values
Culture (x)
Norms (x)
Office environment (x)
Corporate Communications (x)
Brand
Public Advertising
Performance Review Process (x)
Benefits Package
Finance function
IT function (x)
Budgeting Process (x)
Key initiatives (x)
Budget variance explanations
Forecasting assumptions
Customer needs
Customer experience
Parking
Lunch destination
MacLeod goes on to say, “The thing to remember is, Human beings do not socialize in a completely random way. There’s a tangible reason for us being together, that ties us together. Again, that reason is called the Social Object. Social Networks form around Social Objects, not the other way around.”
I wonder. I just wonder what it takes to influence and/or transform the core social objects within our business conversations? At first glance, I would suspect that we would be better off if the top few objects in our business dialog were the following:
Customer needs
Customer experience
Strategy
Performance objectives
As an aside, in thinking back to my many years of consulting, I must say that there is only one client where I remember hearing this last set of “social objects” integrated into almost every conversation. It was WalMart! I wonder what that is saying?
Can you think of other social objects that I have missed? What are your thoughts on the topic? Please drop me your comments.