Business 21C - Published by UTS Business

Receive our MONDAY 9am Newsletter & more

Sep 14th 2010

1 Star2 Stars3 Stars4 Stars5 Stars Current Rating
Loading ... Loading ...

The data mining revolution

We’re generating terabytes of data every minute. Figuring out what it all means will be the next gold rush, says Daniel Poon.

One of my clients, a large US conglomerate, had just acquired and integrated several gardening-equipment businesses. Due diligence had been undertaken and due process followed, but it wasn’t until much later the company discovered that thanks to product misclassifications, it was now the proud owner of no less than four million plastic planting pots, stacked in warehouses across the country.

Another case: a telco giant, while analysing calls data, discovered rampant misuse of an unlimited calling plan through calling cards. When it looked a bit closer, it found that the Mexican mafia was using the product to wholesale telephone capacity to retail customers at substantial discounts. Then, when it came time for them to pay their bills, they disappeared.

These are just two examples of the major opportunity and challenge for large organisations today: how to sort and analyse the increasingly large volumes of data they collect in ways that reveal the company’s real situation.

At the moment, business sits at the bottom of the business intelligence curve. The technology for gathering and storing data has accelerated over the past decade. Examples of the tidal waves of data engulfing the business world are abundant: Walmart handles more than one million customer transactions each hour and its databases are estimated to contain 165 times the information contained in America’s Library of Congress. Facebook is home to 40 billion photos and counting. Decoding the human genome involves analysing three billion base pairs; the first time it was done, in 2003, it took 10 years. Today it takes a week.

We can now store more data than we ever imagined possible, and our analytical capabilities are way beyond where they were just five years ago. The problem for organisations is mobilising those capabilities into actionable intelligence. We have solved the scaling problem – how to store terabytes of data in easily accessible, classifiable ways – but solving the intelligence issue is an ongoing project with huge implications for the efficiency of business.

Multiple versions of the truth

Most organisations haven’t got to square one with an integrated data management and analysis strategy, or what is known as master data management (MDM). At a basic level, MDM seeks to ensure that an organisation does not use multiple (potentially inconsistent) versions of the same master data (eg, product definition, business unit hierarchy, cost-centre business rules and the like) in different parts of its operations.

A simple but common example of poor MDM is when the chief financial officer asks a simple question: ‘What is the year-to-date product revenue of our company?’ The corporate controller at headquarters reports $12.1 million. The vice-president of sales reports $14,001,234 and the regional controller reports $10,800,678.

How could there be three such different answers?

The answer lies in the fact that each finance group is measuring results in a different way. The corporate controller may have reported a rounded GAAP (generally accepted accounting principles) number prepared for the Securities and Exchange Commission (SEC). The vice-president of sales’ figure included intercompany revenue, which should be eliminated. The regional controller took into account some of the deals that will be closed for the quarter but in which product has yet to be shipped. Although a single of version of the truth is assumed, it rarely exists. Reality depends on perspective and context:

  • Varied sources: product revenue may be reported from the booking or order entry system as opposed to the billing or invoicing system.
  • Varied business rules: business entities are rolled up differently. For example, headquarters often roll up entities differently than the functional business.
  • Varied assumptions: different groups tend to include or exclude certain business events, therefore generating different information.
  • Varied time of reporting: information reported at different times can produce different results.
  • Varied accuracy: not all information is measured at the same level of granularity.

As a result, while data quality at data-entry level may be relatively easy to control, interpretation and analysis of the information provided by different groups without any common guidelines leads to incorrect reporting and, eventually, incorrect decisions. MDM will be a hot topic for the coming decade.

The development of business processes that actually enable analysts to gain useful insight from the mountains of numbers sitting in our data warehouses remains in its infancy. The evolution of data mining will be one of the great productivity revolutions of the next decade.

It will first take the form of an expansion in the number of standard-use cases from which companies will be able to draw. Back in the 1980s, if you wanted to analyse what your gross margin, divisional profitability or overhead attributable was in relation to a particular product line, the project might take weeks or months. Today, it is done at the press of a button.

Other more complex queries, such as profitability per customer, may still take weeks or require a complex project of their own to answer. Very soon, however, even the most complex business analysis will be standardised, with the information scraped out of the data warehouse and put together by business intelligence bots that come standard with any of the major enterprise systems.

Already, web-based business dashboards, combined with social networking tools and strategic communication technologies, are spreading business numeracy and stimulating people who wouldn’t otherwise be thinking about business intelligence to develop queries and refine their business strategies using real data.

But there is still much to be done and huge productivity potential remains. The business intelligence revolution is here.


  1. Edwin Morris says:
    Good article and very perceptive of the common reporting problems faced by so many organisations. Coming from Finance, I would suggest that all divisions/departments in the organisation should report, and be evaluated, on what the company reports externally to stakeholders. At the end of the day, that is the only benchmark that matters as investors and lenders decide the cost of equity and debt respectively. This may entail an education exercise so all front-line staff understand the set of financial metrics that really matter to the company, and therefore to themselves.
  2. Interesting point of view of business intelligence. Business analysis has been major part of company's financial activity, and thankfully business intelligence specifically address this gap from IT point of view. However, BI adoption alone might not be able to address the different revenue calculation standard mentioned in the article. This is because each division in company needs different kind of information, not just those information relevant to C-level management.
  3. Hey thanks Edwin and Aylwin for your perspective. Edwin is right in benchmarking on what matters. Aylwin is right about more diverse info for all internal divisions. What's interesting with these 2 comments is that both are equally valid but may not be equally compatible. Externally reported numbers are not often useful enough for internal customers. Internal information often has very little governance that it often serves more for political use. I have seen companies where if you add up the total revenue for each division, it's bigger than the company's total revenue. It's crazy but true. With externally reported info, it's governed by GAPP and SEC and can be trusted more. So you see the gap here. My perspective here is that through Data Governance and Master Data Management, we can build information that can be federated and be as trusted as SEC / GAAP reporting. Although this is part of BI, but I'd argue it's rarely practiced correctly. This is because it's often easier for someone to build silo systems for their needs, silo spreadsheets, than to build a real trusted ecosystem for the company. IT, although plays an important role, but often lacked the business acumen nor motivated enough for these goals. C-level management often are too ignorant of these issues. What's proposed here is innovative enough that you will only see a handful of organization that will adopt. And those that do will succeed for the long term. As for the rest, we'll see them in another 15-20 years looking back thinking what a great idea this has been.

Related Material