Estimated data gets a bad rap. AI is based on the probable likelihood that something is accurate (based on semantic analysis, training, and probability) and nobody has a problem drinking this Kool-Aid from a firehose. But when it comes to the commercial data that powers enterprise SaaS products, product managers are very reluctant to include data that has not been recently verified from a primary source.
There are good reasons for this, not the least of which is the ability to defend the source of a software product’s underlying data upon which b-to-b customers make decisions with significant financial consequences. And data provenance is extremely important, of course, but it is not the only factor that determines the value of software to the end-user.
Take a minute, put yourself behind the desk of your end-user, and imagine yourself performing their daily tasks. Perhaps they are searching for data on
- the names of CTOs at manufacturers of a certain size, or
- the comparative number of employees at school districts, or
- the interest rates offered by municipal bonds in various jurisdictions.
In all these cases the absence or low population rate of a key field of data could yield incomplete results that skew their analyses. And the results of a bad analysis by your customer could have painfully real consequences to both their careers and your annual license renewal rates.
The solution? Guesswork—educated guesswork—otherwise known as estimation.
You have the CTO contact data for all companies but are missing the annual revenue data for their employers? Use available employee headcount data and industry revenue-per-employee ratios to estimate annual revenues. You have headcount data on 70% of the schools in a district but not for the district as a whole? Calculate the average headcount for the schools in the district based on the data you do have and extrapolate the rest. You’re not sure if you are offering competitive interest rates and terms for your county’s bond offering? Model the rates and terms on similar jurisdictions, make an educated guess, and fill in the blanks in your database with these average values.
These methods work (almost uncannily so: see “The Wisdom of Crowds” for more info on why this is true) and they complete your customers’ unmet needs. So, take the leap and embrace the unknown. You’re doing it with the opaque ‘black box’ of most AI solutions so why not jump in and reap the rewards of having more satisfied customers.
That little asterisk next to the hitherto absent value in your database (i.e., *estimated) could be the difference between an 85% renewal rate and a 95% renewal rate.