Talk DIG: data warehousing

Showing posts with label data warehousing. Show all posts

Friday, April 25, 2008

Follow Up On DW Appliance Post

There has been a response in the market place to the continued rise of the DW appliance. Last week, I made a post on the DW and BI appliance market and some of the upstart players that have created this new market. Well, based on this article, the DW market leader has responded. Teradata is now providing entry level products to compete with the upstarts. Even more interesting is that they are now providing their database software for free for test purposes! Netezza had some interesting responses to the Teradata strategy, including that they are glad to see Teradata finally recognize the DW appliance space.

An interesting side note is that Teradata's marketing slogan on their website is "The Power of Choice". It is very Keanu Reeves from the Matrix Trilogy. Wondering if there is some sublimal message there with HP's DW solution, Neoview.

The Data Integration Challenges and BI (Part Two)

In Part One of this topic Brian introduced some of the key data integration challenges for a typical BI engagement and left off by highlighting some of the specific data integration challenges that included:

(i) Transformation of data that does not meet expected rules (contents of data elements and the validation of referential integrity relationships for example)

(ii) Mapping of data elements to some standard or common value

(iii) Cleansing of data to improve the data content (for example to cleanse and standardize name and address data) that extends the data transformation process a step further

(iv) Determining what action to take when those integration rules fail

(v) Ensuring proper ownership of the data quality process

In this second part of the article he takes a little deeper into several of these components.

Data transformations may be as simple as replacing one attribute value with another or validating that a piece of reference data exists. The extent of this data validation effort is dependent on the extent of the data quality issues and may require a detailed data quality initiative to understand exactly what data quality issues exist. At a minimum the data model that supports the data integration effort should be designed to enforce data integrity across the data model components and to enforce data quality on any component of that model that contains important business content. The solution must have a process in place to determine what actions to take when a data integration issue is encountered and should provide a method for the communication and ultimate resolution of those issues (typically enforced by implementing a solid technical solution that meets each of these requirements).

As Organizations grow via mergers and/or acquisitions, so too does the number of data sources and eventually lack of insight into overall corporate performance. Integration of these systems upstream may not be feasible and so the BI application may be tasked with this integration dilemma. A typical example is the integration of financial data from what used to be multiple Organizations or the integration of data from different geographical systems.

This integration is a challenge. It must consider (i) the number of sources to be integrated, (ii) commonality and differences across the different sources, (iii) requirements to conform attributes [such as accounts] to a common value but retain visibility to the original data values and (iv) how to model this information to support future integration efforts as well as downstream applications. This task is indeed a challenging one. All attributes of all sources must be analyzed to determine what is needed and what can be thrown away. Common attribute domains must be understood and translated to common values. Transformation rules and templates must be developed and maintained. The data usage must be clearly understood especially if the transformation of data is expected to lose visibility into any data that is transformed (for example if translating financial data to common charts of accounts).

Making Information Accessible to Downstream Applications

With this data integration effort in place, it is important to understand the eventual usage for this information (downstream applications and data marts) and to ensure that downstream applications can extract data efficiently. The data integration process should be designed to support the requirements for integrating data, that is to support the data acquisition and data validation/data quality processes (validation, reporting, recycling, etc), to be flexible to support future data integration requirements and to support historical data changes (regardless of any reporting expectations that may require a subset of this functionality requirement). The data integration process should also be designed to support the push or pull of data in addition. With that in mind the data integration model should provided metadata that can assist downstream processes (timestamps for example that indicate when data elements are added or modified), partition large data sets (to enable efficient extraction of data), reliable effective dating of model entities (to allow simple point in time identification) and be designed consistently.

The data integration process may at first seem a daunting process. But by breaking the BI architecture into it’s core components (data acquisition, data integration, information access), developing a consistent data model to support the data integration effort, establishing a robust exception handling and data quality initiative and finally implementing processes to manage the data transformation and integration rules, the goal of creating a solid foundation for data integration can be met.

Monday, April 21, 2008

DIG Bits & Bytes

I had a couple of interesting articles come across the “ether” that I felt were newsworthy enough to post and comment on. They each hit on the themes of DIG: Data IN, Information OUT and Knowledge AROUND (The articles are not listed in that order):

Outsourcing your data warehouse

In this article on TWeb by Jannie Strydom, the idea of outsourcing an organization’s data warehouse is proposed. The primary drivers are around lack of skills to properly maintain and keep the warehouse relevant to the business. As much as I agree with outsourcing the components of a warehouse that are repetitive and process oriented (loading data, maintaining production processes, fixing errors), it is a slippery slope to outsource aspects that are critical to meet the business needs. A strong understanding of an organization’s business model and needs should be weighed heavily against the value gained (typically cost savings) by outsourcing certain aspects of a data warehouse, especially those that can help facilitate better management of performance.

More on Enterprise Mashups

I made a post a few weeks back on BI and enterprise mashups. This news story came out of the O’Reilly Web 2.0 Expo that caught my eye because of the mention of integration with Excel. In particular, the article discusses going beyond the geographic mashups being done with Google Maps and starts “mashing” multiple external data sources for enhanced analytics inside of Excel. For example, pulling competitor data directly into your own organization’s performance (I am working at a client where modeling this in their data mart has become a bit of a challenge). Two software companies that are mentioned in the article that are providing these type of mashup services and software are Kapow Technologies and JackBe Corporation. I have not kicked the tires on these two products but they sound extremely valuable to a business user trying to consume multiple and different types of data sources into a single “view”. I would like to understand how these products might fit into an overall information architecture from a consistency and “one version of the truth” perspective.

Business Intelligence and My Carbon Footprint

This one builds on the “everything must be environmental” and “green movement”. At the WTTC Global Travel & Tourism Summit in Dubai, Travelport announced their new Carbon Tracker reporting tool. It is designed for travel agencies and corporations to track their carbon footprint when it comes to corporate travel. It provides different analytic views using standard environmental calculations. The reporting tool includes travel budget and environmental impact analysis and comparison to other modes of travel (car, bus, train, flying). There is a slick product overview with screenshots on the Travelport website. I am considering using this tool to calculate the carbon footprint of DIG in Las Vegas versus another location in the US. I may need to recommend that the speakers ride bicycles to the event to reduce our environmental impact. I know Mark Lorence would be up for it (Mark is an avid bicyclist enthusiast that continues to educate me on the nuances of professional cycling…we have a prediction market already established on his first post that links Lance Armstrong to Business Intelligence). I may need to start referring to the first theme of DIG as "An Incovenient One Version of the Truth".

Wednesday, April 16, 2008

Can I get the Consumer Reports for these Appliances?

I wanted to pull together a quick summary of the Data Warehouse and Business Intelligence Appliance space. It is a continually maturing space with a set of strong vendors still fighting for market share. Teradata, DATAllegro, Netezza, NeoView from HP and Dataupia all provide solutions that combine hardware, operating system and database software into a single unit. Calpont, Kognitio, Vertica and ParAccel provide software only and platform independent solutions. The benefits of these appliance solutions include a reduced total cost of ownership, increased performance through massively parallel systems, reduced administration and database administrators, and high availability and scalability. Where these solutions typically sell is through a proof of concept where a customer has a very specific performance issue that the vendor can show proven results. Industries that collect massive amounts of transaction data such as retailers or web clickstream data are inherit sweet spots for DW appliances.

If you aren’t familiar with the DW Appliance space, I would recommend taking a look at a series of articles from Krish Krishnan (intro, part 1, part 2) on the topic. I also came across this blog posting fact or fiction that unwinds some of the misconceptions on the DW appliance space.

Another interesting area that has followed the DW Appliance trend is in Business Intelligence. I have come across fewer vendors here, but Celequest (acquired by Cognos) and Ingres Icebreaker are two that provide a bundled hardware, operating system, database software and reporting tools. Business Objects has also partnered with Netezza to provide a single point solution in data warehousing and business intelligence. All the solutions adhere to standards which allows for integration with a majority of the BI vendor software that are SQL based tools.

Friday, April 25, 2008

Follow Up On DW Appliance Post

The Data Integration Challenges and BI (Part Two)

Monday, April 21, 2008

DIG Bits & Bytes

Wednesday, April 16, 2008

Can I get the Consumer Reports for these Appliances?

DIG Contributors

Popular DIG Posts

Label Cloud

DIG Roll

Blog Archive