Over the past decade we have seen a phenomenal rise in the usage of high-end business-to-business information and regulatory compliance services. We predicted that this rise in these data-driven applications would require super-accurate data inputs and our massive growth since then has borne out that analysis. Along the way we learned a lot about tactics and methodologies and one of the most deceptively simple lessons we learned came from our crowdsourcing work: the use of ‘confidentiality indexes’.
A ’confidentiality index’ is a simple numerical representation of how confident the researcher was that the data gathered was complete and accurate. These values (typically 1-3 or 1-10) can be applied to both the field-level and the record-level. Applying the value is quick and easy so it doesn’t appreciably affect processing times, but it opens up a world of opportunities when it comes to data quality.
The main opportunity is the ability of both the research firm and/or the customer to focus QA resources on the records that require the most scrutiny. These would typically be ‘edge cases’ (small pockets of records with rare combinations of attributes) so this targeted review process can also result in more granular guidance that can be incorporated into the written process guidelines as well.
Using confidence metadata at the field-level can also allow sophisticated layering of QA and research escalation processes. For instance, core data about a firm may be assessed as being high-confidence whereas the company’s firmographic data (revenues for private firms, headcounts for firms with multiple offices, etc.) may have been derived from less reliable sources and thus would have lower confidence associated with them.
The confidence metadata can also be shared with end-users so they have a better context for the data when they use it or, as is the case with certain high tolerance applications in the financial sector, records can be suppressed entirely if the confidence index value is not the very highest value.
So, overall, the broad strokes of using this tool are that it quantifies the analyses of human resources in a simple way that allows managers to focus quality control resources where they are most needed and reduce errors. Beyond that, it gives end-users and government regulators more transparency into data provenance, in a useful, accurate, and responsible way.