In the 1990s, large manufacturers spent millions on software to streamline their supply chains, and they continue to invest in refining processes to make those supply chains as efficient as possible. For those companies whose business is the manufacturing and sale of information, the redesign of “data supply chains” came more slowly, but it is now in high gear.
A typical data supply chain has five parts:
- Collection
- Cleanup
- Appending/enrichment/overlay
- Storage
- Delivery
Each of these stages typically involves both human and machine resources and a cluster of custom, off-the-shelf, SaaS, and open source technologies. Tools, almost always custom-built, handle the hand-offs between the stages and shape the design of all the interconnected parts of the process in order to:
- Push the average age of the data to near real-time
- Allow flexibility (i.e., easy integration with third parties)
- Enable rapid, cost-effective scalability.
The current best practices for each stage of the data supply chain are:
- Collection:
• Reduced dependence on telephone verification
• Self-updating mechanisms
• Managed crowdsourcing
• Built-in data validation to keep out “bad” data
• Outsourced collection teams
• Real-time data acquisition directly from primary sources (government filings, news articles, social chatter, etc.). - Clean-up: ETL routines and many other tools for normalization and standardization.
- Appending/enrichment/overlay: Increased reliance on associating database records with related information rather than “hard-wiring” the related information into the database.
- Storage: The cloud.
- Delivery: Outputs have also multiplied to include iOS, Android, and Windows 8.
As these new supply chains are deployed and refined, the industry gets ever closer to the promised land of fully automated processes to gather and deliver data.