As Barclays recently discovered, the loss of customer data is bad news and is a guaranteed way to get you the wrong sort of media attention. Articles describing the data breach have broken out of the IT press and into the mainstream. A recent Google Search showed hundreds of articles from sources including; The Register, This is Money, The Daily Mail, FT, Reuters, and The Street (NY).
As a performance tester, I have worked for many organisations that use production test data for performance testing. During performance testing, it is frequently desirable to have large amounts of customer information which you can use to simulate a business process. For example, in a financial institution, this could be applying for a mortgage, loan or bank account. For many of these business processes it is possible to use random/jumbled data which is not true production data. For example, you may need a list of valid addresses, postcodes and dates of birth but be able to use random names
However, some business processes don’t lend themselves to the use of random or made up data. Here are just a few reasons why you may want to use production / true to life data in testing.
Test the data as much as you test the application
I’ve worked with systems where common surnames such as O’Neil, O’Donnel, O’Brien cause problems due to the apostrophe character. If I’d simply used the most common surnames I wouldn’t have encountered these problems, thus allowing a defect to make it through to production. Hyphenated surnames, though uncommon could also pose a problem to a badly written program.
Database size matters
Some inefficient processes, such as table scans, may complete quickly in a small-sized database. However, as database size increases, these may take too long and exceed acceptable performance limits.
Most large organisations have multiple databases containing customer information. It is common to have customer databases holding name and address information, separate accounting systems that hold details of transactions, and CRM systems holding marketing and sales information. If your application touches several of these systems, you may need “true” data to ensure that the customer ID (held in one database tables) matches the transaction details in another. Where the relationships between different datasets are complex, it can often be easier to use real data instead of generating your own.
OK, so I need to use production data but how can I protect the customers?
If you are forced down the route of using “real data”, it is sensible to use some form of data obfuscation. This is particularly important when third-party contractors have access to your test data, either by accident or design. This is a key factor highlighted by @JoeHarris_UK, author of the feature article “Digital Forensics Magazine – November 2013”. Whilst Joe’s article describes the ease with which people inside an organisation can get access to data through misconfigured network storage devices, the points he makes are equally valid for test data that is legitimately accessible to testers.
A recent Dilbert cartoon on “Corporate Security” highlighted the potential incentives to poorly paid staff (like Wally and Dilbert) for leaking data. The incentives for people who aren’t even on your payroll and the risk that they leak information are even greater.
To avoid data leaks, the best option to allow production-like data to be used in testing is data obfuscation. Data obfuscation, such as that offered by our partner, DataVantage can help; mask, obfuscate and remove identification data from test data and help prevent data breaches.
For more information about DataVantage and how it can help to protect one of the core assets of your business, please read our DataVantage data sheet or get in touch. We’d be happy to share our experiences in this area.