The explosion of data-driven reporting, recent data privacy legislation, and the unfortunate loss of one of the Internet’s leading lights have brought the definition of “public” data back to the forefront of discussions about the information industry.
At this late date in the game, it is astounding that there is so little understanding of “public” content and the rules that govern its use. A consensus seems to be emerging, however, that most of the data created and curated by the government and its employees should be accessible to the public. That sounds simple enough, but the devil is in the definitions of the words in that short declarative sentence.
“Most” Data
Personal financial and health data, specific data on children, and national security info comprise just a small sample of the types of public data only available to government employees with the appropriate mandates. So we may all look to the Feist decision on copyright to cover access to address data on government facilities, but location data on sensitive military and nuclear facilities is sometimes suppressed from publicly available databases. Another de facto consensual compromise involves the purveyors of consumer data who routinely sell “presence of children,” average household incomes, and other aggregated data.
Created v. Curated Data
The government creates and gathers information (political contributions, contact info for various license holders, etc.) and allows access to this essentially user-generated data under a variety of models. In just the company data space, there is IRS data, which is 100% off-limits, and SEC filing data, which are both licensed and made public on-demand. Overall, the expectations of privacy for data supplied to the government is undefined and wildly inconsistent.
Accessibility
Publishers of accurate public information often face jail time and death threats. Just a few recent examples include: Bradley Manning and Julian Assange (WikiLeaks); Kostas Vaxevanis (list of tax-dodging Greeks); the White Plains “Journal News” (handgun owners in Westchester); and, Gawker (licensed gun owners in NYC). So, if you publish what appears to be public data, you had better be sure that you obtained it through “proper” channels and that you are on the popular side of the current rough consensus about “fair use.” That is not a very hospitable environment for those who would make more data accessible and it is only going to get more confusing over time. It is a small wonder then that growing firms like Cortera have been aggregating “inferential” data from private, not public, sources.