09 Feb Five Data Myths
Every one of us works with Data all the time – our own data. Because it’s our own, we know it well, and it’s usually easy for us to manage. We fill in forms, we answer questions over the telephone, we use the web to order stuff, and to make enquiries. Even tasks as time consuming as reviewing financial statements or tax returns are manageable, because the data is ours and we understand it.
It becomes more complex when we have to work with other people’s data – typically in our workplace. This is much more challenging. We don’t always understand other people’s data. The volume of data high. We can’t ask questions about it, because we can’t ask the people who created it (or changed it).
Seed Analytics are specialists at working with data. It’s our job, and data is all we do. So we have a lot of experience with data, and we are called upon daily to help less experienced people be effective and successful when they need to work with data.
What I will discuss here are five common ideas we see in our work that are actually not correct – we might call them myths. On the face of it, these ideas make sense, and so they are commonly accepted. But we know from our years of delivery that you have to look deeper to see the truth.
Myth 1: Tools will solve our data problems
So many of us who work with data have a background in Information Technology. So when we think of solving problem, our thoughts naturally turn to an IT solution. This is why spreadsheets are so widespread in the enterprise – people see them as the solution to every problem. We look for software, apps, platforms, appliances, and suites of tools to deliver the fix to whatever problem is put in front of us. We have a hammer, and we see nails everywhere…
Then there are the Vendors who have invested a whole chunk of time, resources and money to building products, who tell us that their product is The One True Way ™ to Data Heaven ©. If only it were that easy!
You can migrate data using MS-Excel, or you can use a million-dollar custom-built proprietary tool – and you can end up with the same result. Whatever outcome you get, it’s going to largely depend on the discipline and repeatability of the process you use. The tool helps, but it doesn’t solve the entire data problem by itself.
From my point of view, a tool serves as an amplifier for your approach to data. If you don’t know what you are doing, a tool is going to make that clear. If your approach to managing data is rock solid, the tool is going to show how good your process is.
Truth 1: Focus on building a quality process for working with Data. Then find a tool that supports your process, and that also fits your budget (dollar budget, and time budget, and resource availability). There is no Silver Bullet.
Myth 2: Our Data is different
I wish I had a dollar for the number of times I have walked into clients and they have said “Our data is different”. It’s true, some companies do have something “special” about their approach, but that is usually not applicable to everything they do.
Every company buys goods and services. Every company sells goods or services. Every company hires and pays staff. Every company has to file taxes. So every company has business processes in place to do these things, and every company has the corresponding data that is captured as the processes are executed.
Your company buys widgets? It’s not hard to find out which Vendor you bought them from, or the Purchase order you used to buy them. It’s not hard to find the inventory that shows how many widgets you have on hand, or to find the financial transactions that accompany the purchase. It doesn’t matter which company you work for, or which systems you run, there will always be Vendor, Purchase Order, Inventory, and GL Balance data. And the attributes of the data will largely be the same.
Truth be told, your data is not that different from most other companies data. Volumes may vary, organisation of data in a system may change, but the underlying data is similar.
Truth2: Basic business processes are very similar across the business world. Every company has a process for Buying, Selling, Hiring, Maintaining, Disposing, etc., and each the same process across different companies is likely to vary in a small way. The data captured as part of those business processes is largely the same. Your data isn’t so much different.
Myth 3: Data Quality is the same for everyone
When a Consumer Products company wants to improve the quality of their Customer data, they have a huge volume of Customers, and they are going to focus on ensuring they have the information necessary to support whatever their current marketing campaign focus is – be it Facebook, or SMS, or whatever. It’s highly unlikely that they are going to have reliable home address information for every Customer, and neither do they need this – they just need reliable information for the particular touch point – such as telephone number, or e-mail address, or Facebook ID.
When a Manufacturing company wants to improve the quality of their data, they are going to have relatively small volume of customers, and they are definitely going to want reliable business address information so they can invoice the customer, and so they can complete the delivery for all goods ordered by the Customer.
Both companies have Customer data. Both want to improve data quality. But both have a different idea of what Data Quality is, and different expectations for how to measure it. No matter which data domain we are talking about – Fixed Assets, Vendors, Equipment, Materials/Products/Items, Employees, etc., – the lens you view the quality of that data through is shaped by how you work with that data.
Truth 3: Data Quality is in the eye of the beholder. The important point is that the value of data degrades over time. When data is created, you have an expectation of trust, and the older the data becomes, the less trusted it is. If you don’t trust your data, how can you make trusted decisions?
Myth 4: Governance is only for Big Companies…
When people think “Governance”, they start to think “difficult, costly, slow, and too hard”. It’s natural to think this way when we think of all the governance that has been previously put in place across the enterprise. All the rules, all those checks-and-balances, all that hierarchy.
Data Governance is about bringing discipline to the management of data, that’s true. Personally, I think it’s long overdue that we think about managing data carefully rather than just ignoring how data is treated. If you don’t manage, how can you measure? If you don’t measure, how can you improve?
When implementing, Data Governance, the Seed Analytics approach is to be pragmatic. Sure, we could push our clients to implement “Full” data governance from start to finish, phased over two years, for the sake of data governance. We will leave that to the Big Systems Integrators. Our pragmatic approach is to implement enough Data Governance to support the landscape you have, and allow the changes to take hold. The changes around data management have to be implemented in order for them to be effective.
Management of data lives and dies in Ownership, and so that’s where all Data Governance has to start. Owners of data will have the understanding of the lifecycle of their data, and so are best equipped to make decisions on how to manage the data. No ownership means nothing will change; too many owners means that changes will get bogged down in disputes.
Truth 4: Implementing Data Governance may sound like a huge undertaking, but it is possible to implement it in bite-sized chunks rather than “big bang”. Start by understanding the life cycle of your data, and establishing unequivocal ownership for the data. When your business accepts ownership, you can move onto the management of data through process, standards and measurement, and on to areas like metadata management and modelling.
Myth 5: You don’t need to see ALL my data
Often when I offer a Data Deep Dive (Seed’s Data Quality assessment) to prospects, sometimes I am asked “Can I give you a subset of my <domain> data to analyse?” My answer is “Sure, as long as you are happy to see a subset of your <domain> data quality issues. If you want to see the entire set of issues you have and measure their impact, you need to give me an entire set of data”.
The idea of assessing a subset of data, but expecting to blindly extrapolate the results to the entire set of data is dangerous. You want the truth? You can handle the truth? Then we need to analyse all your data in a domain, not a subset.
How does this apply to “Big Data”? The exciting opportunity offered by Big Data is that you can use the data to do predictive modelling – which often uses statistical regression analysis techniques – where the idea is that past behaviour will predict future behaviour too.
Would you stake your job on providing accurate predictive modelling, based on a subset of data? I wouldn’t. There are often so many variables to consider, and it’s not until you do your analysis that you learn which are important, and which are not. Again, you need all the data in order to figure out where you need to focus your attention.
Truth 5: Don’t be scare by the volume of data you have. Computers are excellent at processing massive volumes of data, now more than any time in the past. In order to understand your data, in order to understand your data quality problems, in order to use data and turn it into actionable insights, you need to start by looking at all of it – not a subset.
When you consider data as an entity that is separate from IT systems and from business process, you may realise that the governance that has been applied to both systems and process is not effective in managing data. Let’s face it, there is a long history of managing IT and business process carefully, and that these are considered to be well-governed in well-run organisations. But despite these well-governed domains, we all know that data quality from systems, and in business processes is less than optimal, and continues to degrade over time.
It is time for the Governance of Data to be considered as essential to a well-run business. If you are new to data, perhaps the discussion of these 5 myths-and-truths will help you understand data better. Feel free to e-mail me with your view, and I’ll air them in a future blog.
Michael is the Practice Lead for Seed Analytics, an SAP Services Partner, and global provider of solutions for SAP Data Governance and SAP Data Migration. An approachable and easy-going fellow, Michael is occasionally known to get up on his Soapbox and shout forth his views on topics he is passionate about.
You can contact Michael via e-mail (michael dot curran at seedanalyticsgroup dot com), find him on linkedin (au.linkedin.com/pub/michael-curran/0/771/31a/), and follow him on Twitter using @MichaelBCurran