Austin

1601 E. 5th St. #109

Austin, Texas 78702

United States

Coimbatore

Module 002/1, Ground Floor, Tidel Park

Elcosez, Aerodrome Post

Coimbatore, Tamil Nadu 641014 India

Coonoor

138G Grays Hill

Opp. BSNL GM Office, Sims Park

Coonoor, Tamil Nadu 643101 India

Laguna

Block 7, Lot 5,

Camella Homes Bermuda,

Phase 2B, Brgy. Banlic,

City of Cabuyao, Laguna,

Philippines

San Jose

Escazu Village

Calle 118B, San Rafael

San Jose, SJ 10203

Costa Rica

News & Insights

News & Insights

Data Sycophancy

AI models are subject to a common unintended consequence: data sycophancy. This term means that an AI query of a specific dataset will amplify any biases already inherent in that dataset.

Here are examples of common sycophancy effects:

  • Resumes are inherently effusive and positive so an analysis of millions of CVs will be skewed toward an over-estimation of everyone’s actual skillsets. Basing a decision on that inflated value could further amplify the bias. For example, if you used AI to calculate the average the level of AI tool proficiency for managers in a given industry using a resume corpus and then use that value to extrapolate the size of the market for a software training tool serving managers in that market your original over-estimation could lead to a costly product launch miscalculation.
  • Marketing materials and product descriptions are inherently biased. The use of words and phrases like ‘leading’, ‘best’, ‘powerful’, ‘innovative’, ‘market-leading’, etc. will cause semantic analysis can get the ‘sentiment’ wrong and attribute more positive attributes to a product than a market truly has. Queries against large product review datasets can similarly skew analyses toward negative product characterizations.
  • Legal settlement amounts are notoriously opaque because so few settlement amounts are made public. If you queried the dataset of all disclosed US settlement amounts to find how much, for instance, a meat-packing plant pays for a severed hand in Oklahoma, that data would be old and spotty and not useful for analyses. Those queries can be answered but the statistical methodology involves using inferential data, defensible averages, geographic weighting, and other mechanisms to fill in the “blanks” required for an accurate calculation.

This sycophancy effect, and the risks it poses, can be counteracted with more carefully structured queries and compensations for the skewing aspects of source materials in a given dataset or LLM. If your DevOps team needs help with their audit of your AI-assisted decision-making processes then Information Evolution can help. We work with data supply chains that have AI components used for all kinds of processes so we’ve likely seen and solved problems similar to the ones your team may now be facing.

Keep on top of the information industry 
with our ‘Data Content Best Practices’ newsletter:

Keep on top of the information industry with our ‘Data Content Best Practices’ newsletter: