The Validated Synthetic Data Principle
Why your MVP compliance scores should use validated synthetic data — not invented numbers
Every SaaS product starts with synthetic data. Demo dashboards, sample scores, placeholder charts. There is nothing wrong with that — you need to ship before you have real users.
But there is a massive difference between invented synthetic data and validated synthetic data. One builds trust. The other destroys it the moment a compliance officer asks: "Where does this number come from?"
The Problem with Fake Scores
In EU regulatory compliance, numbers carry legal weight. When your dashboard shows a company's CSRD readiness at 72%, that number implies a methodology. When an auditor asks for the source, Math.random() * 100 is not an acceptable answer.
What most MVPs do:
// "Looks about right"
const csrdScore = 72;
const etsScore = 85;
const cbamScore = 45;
const overallScore = Math.round((csrdScore + etsScore + cbamScore) / 3);
No methodology. No source. No way to defend these numbers in a meeting.
This approach has three failure modes:
- Trust collapse — The first enterprise buyer who asks "where does 72% come from?" gets no answer. Deal lost.
- Migration debt — When you finally connect real data, every number changes. Users think the product is broken.
- Legal risk — In regulated industries, showing compliance scores without methodology can be classified as misleading representation.
The Validated Synthetic Data Principle
MVP features may use synthetic data, but every value must cite a real, verifiable source.
Synthetic data is production-ready data that has been structured, bounded, and sourced before it is personalized.
Instead of inventing numbers, you build every synthetic value from the same official data that the real system will eventually use. The result: your demo data is synthetic in form but real in provenance — sector-bounded, regulation-grounded, and ready to upgrade to company-specific values.
What validated synthetic data looks like:
// EU ETS benchmark — Commission Implementing Regulation 2021/447
{
sector: 'steel',
product: 'Hot metal',
factor: 1.328,
unit: 'tCO2e/t product',
source: 'EU ETS Benchmark 2021-2025',
methodology: 'ets-benchmark',
year: 2024
}
Exact source. Auditable. Upgrades to company-specific data with zero migration.
How We Apply It at DWS IQ
Our compliance scoring engine uses this principle at every layer. Here are the six data categories and their validated sources:
The Upgrade Path Is Built In
The beauty of validated mock data is that upgrading to real data is not a migration — it is a configuration change. The scoring engine does not care whether the input came from a sector benchmark or a company's actual ESG report:
Same scoring engine. Same regulatory thresholds. Same deterministic logic. The only thing that changes is the input source — and each layer is strictly better than the last.
Five Rules for Validated Synthetic Data
Every synthetic value has a source field
If you cannot cite the source, the number does not belong in your codebase.
Use the same data structures as production
Synthetic data flows through the same scoring engine, same types, same validation. No separate "demo mode."
Sector averages are the right default
EEA and Eurostat publish free, public sector data. Use median values (less skewed by outliers than averages).
Show the data source in the UI
Users must always know whether they are seeing benchmark data or their own data. A green dot for live, amber for benchmark.
Deterministic scoring — no randomness
Same input must always produce same output. No Math.random(), no timestamps in scoring logic, no LLM in the critical path.
Bottom Line
Fake compliance scores are worse than no scores. They create false confidence, fail under scrutiny, and require painful migration when real data arrives.
Validated synthetic data costs the same effort to build — you are just sourcing from EU regulatory databases instead of your imagination. And when a compliance officer asks "where does this 42% CSRD readiness come from?", you can answer: "EFRAG ESRS 2023 final set, 82 mandatory disclosures, cross-referenced with EEA sector averages for steel (NACE C24). Here is the breakdown."
That answer closes deals. The other one loses them.
Subscribe to Lifetime Scope Journal
Weekly insights on EU compliance, AI agents, and industrial transformation.
Subscribe