HP LoadRunner is the industry leading performance testing tool.
Personally, I'm not that happy about that. I'm a big believer in open source, and I think it's a shame we're so dependent on a proprietary tool. But, the facts are the facts, and LoadRunner is the market leader.
Now, one of the reasons I like open source is that if you find a bug, and you've got the technical skills to fix it, then you can. On the other hand, LoadRunner's got lots of bugs that could use fixing, but only HP are in a position to fix them, and they seem more concerned with cosmetic improvements and trendy new features.
Still, it'd be nice if we could do something about some of LoadRunner's bugs, particularly the ones that destroy data.
A while back, LoadRunner managed to eat some particularly valuable data. We'd just run an important test, and we wouldn't have time to run another, but the results were corrupted. When we tried to analyse them, LoadRunner gave us an error:
We couldn't afford to lose this data, so we had to do something.
The Bug (The technical bit)
As it happened, I'd recently spent some time poking around in LoadRunner's binary data files, trying to reverse engineer them. Reverse engineering's hard work, and there's still a lot to learn, but in this case, I was able to figure out the cause of the issue.
The bulk of the data in a LoadRunner results directory is in two key files: a "
.eve" file, and a "
.map" file - one each for each injector. The "
.eve" fle contains a list of events, with numerical identifiers, and the "
.map" file maps numerical identifiers to text.
Now, I noticed that the numerical identifiers didn't all match up. Some of the identifiers in the "
.map" file were negative, but all the identifiers in the ".eve" file were positive. Moreover, if I assumed there was an integer handling bug in the LoadRunner code (so the ".map" file generator was treating numbers as
int32_t, and the "
.eve" file generator was treating numbers as
uint32_t), then the negative numbers matched up with the positive numbers.
This bug seems to manifest itself most often on testware that hasn't been restarted in a while - which would make sense, if identifiers aren't recycled between test runs. This might explain the tendency for tests to run more reliably after a testware reboot.
The bug is definitely present in LoadRunner 9.5. I've heard anecdotally that it's still there in 11.5, but I can't confirm this.
It's a damn shame LoadRunner isn't open source. If it were, I reckon I could have found and fixed the cause of the bug, rather than writing a workaround.
From this, it was easy to write a Python script that fixed corrupted results. The script is on GitHub, at https://github.com/jamespic/lr_uint32_bug. Note: It worked for us, but use at your own risk.
The script repaired our results, and put the project back on track.