We built Information Evolution to pursue the holy grail of database ownership: true real-time accuracy in a world where databases degrade at a rapid pace. A lofty goal, but today we do see cases where large-scale monitoring systems come within spitting distance of making this goal a reality.
The methodology—harvesting-monitoring-updating—is conceptually very simple. It does, however, take significant upfront investments and offshore resources to pull it off. First of all, you need to identify the sources of the data to harvest. Then you need to accurately harvest the desired data, fielding it as you go. Next is setting up the “pinging” mechanism to monitor changes and trigger alerts. And finally comes the act of analyzing the alert and either updating a database or using the “false positive” result to train the monitoring mechanism so it avoids future false positives.
Once it’s set up, though, on-going maintenance costs are modest because research is extremely fast and targeted. The majority of alerts lead to timely updates, the system gets better over time, and the cost-per-update becomes far lower than alternate approaches.
Event-driven Updates
Data changes when an event happens, for example when someone is hired or fired, a product is offered for sale for the first time, a price changes, or an organization acts in a certain way (signing deals, relocating, opening a new office, going into a new line of business, filing a government document, etc.). To be aware of an event that has just happened, there needs to be a way to “see” the source data indicating the event. This can be a new or changed web page, a mention in an RSS feed, an update to a LinkedIn profile, or a Twitter, blog, or Facebook post.
Almost all of these event indicators occur after the event itself has taken place. The key is the speed with which these indicators can be gathered and filtered for true changes. The recent hubbub around Dataminr, a Twitter-driven alert service, clearly showed how valuable a 1-hour “jump” on traditional news services can be to users where speed means money in the bank.
Fresh Data
In the world of data publishing, the effect of near-real-time updating is no less striking. A major school data provider, for instance, uses Connotate* to monitor tens of thousands of schools in the US to ensure that, for instance, every replacement of a 1st grade teacher is captured within days of the event. This may not mean much to folks interested in breaking news, but for those selling to teachers, a virgin list of newly appointed fifth grade math teachers can easily be the difference between earning a bonus or having to find a new job.
* Information Evolution is Connotate’s small- and medium-sized business partner.