Talk DIG: heatley

Showing posts with label heatley. Show all posts

Tuesday, May 6, 2008

Quality Data helps us go GREEN!

Yesterday was another day of coming home from work late and having to force the door open to get past all the junk mail inside. After picking up, taking into the kitchen and spending 10 minutes going through I was yet again presented with another fine example of poor data quality (i.e. the majority of organizations really don’t have a grip on their customer data let alone the ability to household).

3 copies of a news letter from the same software company (no names mentioned!), the exact same letter from a State Insurance agency for both my wife and I, and then two copies of the Crate & Barrel latest summer catalog addressed to me (how on earth I became registered on their list I’ll never know!).

I wonder what the impact to the environment would be if organizations simply got a better understanding of their customer data and improved their marketing functions alone?

So once I finished my nightly chore of “shredding” I did some quick research to see what sort of impact to the environment today junk mail has. Check out the following facts listed by New America Dream:

More than 100 million trees’ worth of bulk mail arrive in American mail boxes each year – that’s the equivalent of deforesting the entire Rocky Mountain National Park every four months. (New American Dream calculation from Conservatree and U.S. Forest Service statistics)

In 2005, 5.8 million tons of catalogs and other direct mailings ended up in the U.S. municipal solid waste stream – enough to fill over 450,000 garbage trucks. Parked bumper to bumper these garbage trucks would extend from Atlanta to Albuquerque. Less than 36% of this ad mail was recycled. (U.S. Environmental Protection Agency)

The production and disposal of direct mail consumes more energy than 3 million cars. (New American Dream calculation from U.S. Department of Energy and the Paper Task Force statistics)

Citizens and local governments spend hundreds of millions of dollars per year to collect and dispose of all the bulk mail that doesn’t get recycled. (New American Dream estimate from EPA statistics)

California's state and local governments spend $500,000 each year collecting and disposing of AOL’s direct mail disks alone. (California State Assembly)

With companies trying to put on a more “Green” face you would think this would be a nice eco friendly place to start. Imagine the impact of cutting bulk/junk mail in half by just knowing who your customer is and the fact that you may have multiple that live at the same address?

Even though the challenges surrounding customer data are not new, more is being spoken in the industry around Customer Data Integration. Check out Tony Fisher’s article on TDWI for an introduction on Data Quality and the Emergence of Customer Data Integration as well as go directly to such vendor sites as DataFlux and Trillium Software for innovative solutions that work to address data quality challenges, deduplication and relationship identification.

Lastly while not being one to solicit an audience, if you do have any interest in helping the environment and stopping all that junk mail look at GreenDimes. I signed up last night… I’ll let you know how it works out!

Friday, April 25, 2008

The Data Integration Challenges and BI (Part Two)

In Part One of this topic Brian introduced some of the key data integration challenges for a typical BI engagement and left off by highlighting some of the specific data integration challenges that included:

(i) Transformation of data that does not meet expected rules (contents of data elements and the validation of referential integrity relationships for example)

(ii) Mapping of data elements to some standard or common value

(iii) Cleansing of data to improve the data content (for example to cleanse and standardize name and address data) that extends the data transformation process a step further

(iv) Determining what action to take when those integration rules fail

(v) Ensuring proper ownership of the data quality process

In this second part of the article he takes a little deeper into several of these components.

Data transformations may be as simple as replacing one attribute value with another or validating that a piece of reference data exists. The extent of this data validation effort is dependent on the extent of the data quality issues and may require a detailed data quality initiative to understand exactly what data quality issues exist. At a minimum the data model that supports the data integration effort should be designed to enforce data integrity across the data model components and to enforce data quality on any component of that model that contains important business content. The solution must have a process in place to determine what actions to take when a data integration issue is encountered and should provide a method for the communication and ultimate resolution of those issues (typically enforced by implementing a solid technical solution that meets each of these requirements).

As Organizations grow via mergers and/or acquisitions, so too does the number of data sources and eventually lack of insight into overall corporate performance. Integration of these systems upstream may not be feasible and so the BI application may be tasked with this integration dilemma. A typical example is the integration of financial data from what used to be multiple Organizations or the integration of data from different geographical systems.

This integration is a challenge. It must consider (i) the number of sources to be integrated, (ii) commonality and differences across the different sources, (iii) requirements to conform attributes [such as accounts] to a common value but retain visibility to the original data values and (iv) how to model this information to support future integration efforts as well as downstream applications. This task is indeed a challenging one. All attributes of all sources must be analyzed to determine what is needed and what can be thrown away. Common attribute domains must be understood and translated to common values. Transformation rules and templates must be developed and maintained. The data usage must be clearly understood especially if the transformation of data is expected to lose visibility into any data that is transformed (for example if translating financial data to common charts of accounts).

Making Information Accessible to Downstream Applications

With this data integration effort in place, it is important to understand the eventual usage for this information (downstream applications and data marts) and to ensure that downstream applications can extract data efficiently. The data integration process should be designed to support the requirements for integrating data, that is to support the data acquisition and data validation/data quality processes (validation, reporting, recycling, etc), to be flexible to support future data integration requirements and to support historical data changes (regardless of any reporting expectations that may require a subset of this functionality requirement). The data integration process should also be designed to support the push or pull of data in addition. With that in mind the data integration model should provided metadata that can assist downstream processes (timestamps for example that indicate when data elements are added or modified), partition large data sets (to enable efficient extraction of data), reliable effective dating of model entities (to allow simple point in time identification) and be designed consistently.

The data integration process may at first seem a daunting process. But by breaking the BI architecture into it’s core components (data acquisition, data integration, information access), developing a consistent data model to support the data integration effort, establishing a robust exception handling and data quality initiative and finally implementing processes to manage the data transformation and integration rules, the goal of creating a solid foundation for data integration can be met.

Sunday, April 13, 2008

Immelman crowned Masters champion!

Congratulations to South African Trevor Immelman who secured a maiden major title win today with a three-shot victory at the 72nd Masters Golf tournament at Augusta.

If you are a golf fan it was for sure a quality weekend in front of the TV.

If you were immersed in the golf, waiting for Tiger to yet again make a run for the championship, did you ever find yourself wondering where on earth do the commentators get all those performance statistics from that they continuously feed you during their commentary?

Well check out the article on “How the PGA Tour Manages Its Data” to see how “IT Chief Steve Evans relies on legions of golf-crazed volunteers, high-tech lasers and the input of golf pros to help him identify, manage and display the Tour's most critical data.”

Thomas Wailgum’s article provides interesting insight into the efforts made to capture data real-time from the field, then to translate it into interesting statistics that both the general public can hear about to enhance their golf watching experience, as well as players on the course to not just help them track their progress but also help them evaluate their risk exposure when contemplating their next shot.

The system used is ShotLink, a revolutionary system that “tracks every shot at every event—where a player's golf ball starts and lands, and all the ground covered in between.” ShotLink requires the use of over 1000 volunteers out on the course to help “capture” the required data on over 32,000 shots which is feed into the system.

Impressively in the article Mr. Evans states that through some minor modification to ShotLink they have been able to ensure a very high level of data quality in the statistics that they produce. He states that “Our goal is to have any data corrections made inside of one minute, and we consistently meet that metric.”

How accurate is the data that you use? What data quality manaagement process do you have in place? Do you tend to resolve data quality issues upstream at its source, or do you cleanse within the applications your report from, or "tweak" the actual reports?

Sunday, April 6, 2008

Information Quality & Master Data Management?

Master Data Management is the process used to create and maintain a “system of record” for core sets of data elements and their associated dimensions, hierarchies and properties which typically span business units and IT systems.

Master Data, often referred to as “Reference Data”, may in your organization take the form of Charter of Accounts, Product Catalogue, Stores Organization, Suppliers and Vendor Lists but to name a few.

In his article “Demystifying Master Data Management”, Tony Fischer uses Customer as an example of Master data and how, if not understood and managed appropriately, can cause all sort of headaches for a company, in this case the CEO himself!

“Years ago, a global manufacturing company lost a key distribution plant to a fire. The CEO, eager to maintain profitable relationships with customers, decided to send a letter to key distributors letting them know why their shipments were delayed—and when service would return to normal.

He wrote the letter and asked his executive team to "make it happen." So, they went to their CRM, ERP, billing and logistics systems to find a list of customers. The result? Each application returned a different list, and no single system held a true view of the customer. The CEO learned of this confusion and was understandably irate. What kind of company doesn't understand who its customers
are?”

So what are the typical barriers that hinder organizations from addressing their master data management problem? My colleagues and I typically encounter four primary barriers:

Multiple Sources and Targets: Reference data is created, stored and updated in multiple transactional and analytic systems causing inaccuracies. Synchronization challenges between disparate systems
Ability to Standardize: Most organizations cannot agree on a standardized view of master data. There are a lack of audit policies that comply with federal regulations
Organizational Ownership: Disagreement within the organization as to who takes ownership of master data management, business or IT. Assignment of accountability with cross-functional processes is difficult
Centralization of Master Data: Organizational resistance to centralizing master data since there is a sense that control will be lost. Challenges to find a technology solution that supports existing systems and the lifecycle of master data management

Organizations that are addressing such barriers typically have a successful master data management process in place that contains the following components:

Data Quality: Focus on the accuracy, correctness, completeness and relevance of dataIncorporate validation processes and checkpoints. Effort is highest in the beginning of a MDM initiative to correct quality issues.
Governance: Cross functional team formed to establish organizational standards for MDM related to ownership, change control, validation and audit policies. Focus includes establishing a standard meeting process to discuss standards, large changes and organizational issues.
Stewardship: Assignment of ongoing ownership of MDM stewardship. Typically MDM stewards are business users. Accountable for the implementation of standards established through MDM governance

Technology: Create an architectural foundation that aligns with the other three components. Implement a technology that centralizes reference data. Align processes with the technology solution to synchronize master data across source and analytic systems

As we can see, master data management is not a one-time initiative but rather a long-term program that runs continuously within the organization. To be successful organizations need to instill an iterative approach that helps develop a program that continuously monitors, evaluates, validates and creates master data in a consistent, meaningful and well communicated way.

What is your organization doing about Master Data Management? Have you had success in establishing a Data Governance program? Who own the process in your organization, IT or the business?

Thursday, March 27, 2008

What’s all the hype around unstructured data?

Check out the DM Review article by Michael Gonzales “Comprehensive Insight: Structured and Unstructured Analysis” for an introduction into the topic of unstructured data and releasing its potential.

Michael provides some insight into the challenges organizations face in dealing with unstructured data vs. structured and how technology has been evolving to help better leverage such information assets.

With an estimate that “more than 85 percent of all business information exists as unstructured data” it is no wonder that technology vendors are putting more focus on how to extend their products to make this information more accessible and usable.

Although the article gives some interesting insight into the evolution of the technologies it doesn’t provide any insight into how to actually integrate and store this data in the traditional data warehouse. How does one integrate such unstructured data in the form of documents, images, video content, and other multimedia formats? Is such data actually relevant to data warehouses and CPM processes? Perhaps not the actual content but perhaps the metadata associated with the content (e.g. x number of documents types, average occurrence of y in videos of type z, number of emails on subject w, etc).

Vendors that are beginning to address the storage and integration of such unstructured data into existing solutions are primarily the large database vendors. Certainly “Big Blue” (IBM) boasts support for analysis of unstructured data with its DB2 Warehouse 9.5 product offering and Microsoft SQL Server 2008 is touted by Microsoft to “provide a flexible solution for storing and searching unstructured data”.

Although the advances in vendor technologies are providing a means of storing such information in a manner that makes it accessible, is the typical organization yet ready to focus its resources on doing so? When so many have yet to fully realize the benefits of provisioning to the business traditional structured data, e.g. Financial, Operational, Customer, etc, you have to beg the question as to whether this should yet be a high priority?

What is your organization doing? Have you implemented any creative solutions? What are the demands from the business?

Wednesday, March 19, 2008

Real-time Data Usage

This past weekend was a major weekend for me, one that would determine my core happiness for the rest of the year!

You see I’m Rugby mad. I’ve played it, I’ve coached it, and now I watch it, incessantly.

This past weekend was the concluding weekend of the European 6 Nations Rugby Championship, an annual international tournament between the home nations of Europe (England, Ireland, Scotland, Wales, France and Italy). A tournament that started in 1871 and every year since has been an excuse for all to bring out their nationalistic pride and cheer for their ancestral team! For me it’s Wales, the land of my forefathers, the land of daffodils and of course sheep. Wales had the opportunity to win the tournament in style, beat France at home and raise the championship trophy undefeated, Grand Slam winners. And did they do it? They sure did!

Rugby, just like most American sports is now a professional sport and with it many changes have come. Dragged out of the traditions of amateur pastimes where the local butcher was your star player, teams today are forced to continually explore all possible avenues in an attempt to better themselves and obtain competitive advantage over their rivals. No longer is it good enough to just employ the best players and coaching staff, teams are looking elsewhere.

One interesting field of innovation that we were given a brief insight into during one of the games was the usage of statistical information real-time by the Welsh coaches that allowed them to then make real-time adjustments to how their team and players were approaching the game. Using the interesting technology Sportstec the Welsh team was actively making adjustments that helped provide a competitive advantage over their opposing team.

Around the field “spotters” were employed to feed information into a database application on specific events happening. Number of times a player passed to his left vs. his right, how many carries an individual had with the ball, number of times the ball was kicked from a certain place vs. passed, number of missed tackles by each player, success rate of a particular move, etc... By providing such detailed information on actions performed by their team as well as the opposition, the coaches were then able to react and make tactical real-time changes, for example adjustments to the team’s strategy on the field, instruction to specifically focus and improve in certain aspects of the game, as well as instruction to target identified weaknesses in the opposing team.

Did having this level of information access have a direct result on the outcome of the game? Who knows, but one thing for sure Wales beat Italy in this game 47-8, when on average the other teams who beat Italy did so by only 6 points! The other thing, did I mention, Wales won the Championship, undefeated!

So where next? If teams are able to get hold of and make use of such real-time data I just wonder how much further they could go.

What if players began to wear RFIDs in their shirts so we know where they are at anyone time. The ability to understand in real-time how much time they have spent in one location, how much time it took them to reposition, the average distance they make while running with the ball, efficiency of path travelled by each player when covering a kick-off. What about collecting information from body skins that can sense applied pressure? Could we measure the level of impact endured in a tackle thus giving the ability to predict level of fatigue vs. amount where performance begins to degrade, an opportune time for a tactical substitution perhaps?

For more detailed commentary on how the Welsh team and others are finding innovative ways to capture and use information check out the videos @ http://www.sportstec.com/videos.htm

Wednesday, March 12, 2008

Top 5 mistakes in Data Warehousing

Top 5 reasons why many data warehouse managers fail to deliver successful data warehouse initiatives:

Data Quality: Quality of source system data that is to be integrated into the data warehouse is “overrated” and thus time to resolve is “underestimated”

Bad information in means bad information out. The CPM applications that will source data from the warehouse will suffer diminishing adoption if not addressed upstream
Data integration strategy must include methodology to address erroneous data
Significant level of involvement from business and IT to help resolve (decision and execution of) challenges

Data Integration: Lack of robust data integration design results in incomplete and erroneous data and unacceptable load times

What happens when you are the process of loading data and you start receiving exceptions to what is expected? Is data rejected and you are now faced with the dilemma of partial data loads? How do you avoid manual intervention?
What checks and balances do you have in place that ensure what you are extracting from source systems is being populated into the target? Can you audit your data movement processes to ensure completeness as well as satisfy regulatory obligations?
Your processes can handle the data volumes you are dealing with today but can they handle the data volumes of tomorrow? How easy is it to reuse existing processes when adding additional source systems/subject areas to your Warehouse?

Data Architecture: Creating a solution that is not able to scale after an initial success will result in a redesign of the architecture

After the first success the business will quickly want to extend the usage of the solution to a greater number of users, will the performance continue to live up to expectations?
As users mature and adoption improves so will the complexity of information usage, i.e. more advanced queries, can the design continue to perform as expected?
Increased usage and maturity results in the demand to integrate into the solution additional data sources/subject areas. Is the architecture easily extensible?

Data Governance & Stewardship: With no controls established around data usage, its management and adherence to definitions, data silos and erroneous reporting begin to reappear

Stakeholders must be identified and give decision rights to help improve the quality and accuracy of your common data
Practices around the managing of standard definitions of common data and business rules applied must be established
Understand who is responsible for the data and hold them accountable

Change Management: Not preparing an organization to utilize what is being built results in the investment in data warehouse not being fully realized and thus deemed a failure due to low user adoption

“Build it and they will come”; providing information access does not necessarily equate to information usage.
Helping the business understand how they can leverage these newly available data often results in changes to the way that they work. “Day in the life of” today vs. “day in the life of” tomorrow
Education and training programs are required
Integrated project teams (business and IT) are essential to the success of data warehouse initiatives, with individuals becoming champions within the organization for change and adoption

Tuesday, March 4, 2008

Welcome to the Data Theme

How often do you sit back and spend a moment to actually think about how much data we actually generate and use on a day to day basis? Whether it be in the office sending an email to a colleague, or updating a financial spreadsheet for your CFO we are using and generating data. Do you ever think that when you simply stop on the way home to fill up your car with gas, or scroll through the TV guide looking for your favorite show to TIVO, you are generating and using data?

With advances in processor speeds, data storage capabilities, and application technologies our capacity to generate and capture data is forever increasing. Many organizations are leveraging this data, turning it into usable, sustainable information that can be used as an asset to help gain competitive advantage. The majority of organizations however are simply overwhelmed as to what to do and where to start.

Over the next couple of months I welcome you to join me as we explore the theme of Data. We will look to discuss a variety of topics that relate to the challenges faced by organizations who are working to develop an "information architecture for the 21^st century". Not only will we discuss the traditional well heeled topics such as "Data Strategy", "Data Quality" and "Data Integration", but also others that often influence the success of organizational initatives due to lack of prioritization or simple underestimation, such topics may include "Data Governance", "Master & Meta Data Management", and "Managing Unstructured vs Structured Data" to name a few. Of course if there is a topic of interest that you wish to discuss I fully encourage the suggestion.

***

Glyn D. Heatley bio - I'm a Director and Leader in the Information Strategy and Architecture Practice at The Palladium Group. I bring over 13 years of experience in delivering large scale Corporate Performance Management solutions with a primary focus in Data Management and Business Intelligence. Over the years I've gained experience in all areas of the Data Warehouse Life Cycle including Requirements Gathering, Solutions Architecture, Data Architecture, Data Integration, Business Intelligence and Project Management.