Data Strategy

Why 'Perfect Data' Is the Wrong Goal (And What to Aim for Instead)

Prexisio12 min read

"We can't start the data project yet; our data is too messy."

I hear this at least once a month. A company has budget approved, leadership buy-in, and a clear business need. But they're waiting.

Waiting to clean the data first. Waiting until the CRM is "fixed." Waiting until they migrate to the new accounting system. Waiting until everything is perfect.

Here's the hard truth: that day never comes.

Messy data is permanent. It's not a problem to solve before you start; it's a reality to design around.

The Perfect Data Myth

The logic sounds reasonable:

"If we build automated reporting on messy data, we'll just be automating mess. Let's clean it up first, then automate."


Why this fails:

  1. Cleaning data is a never-ending project - By the time you "finish," new mess has accumulated
  2. You don't know what needs cleaning until you try to use it - Abstract data cleaning yields abstract results
  3. Business needs don't wait - You're making decisions with no data while you wait for perfect data
  4. Perfect data doesn't exist - Even the best organizations have data quirks

The result:

Companies spend 6-12 months "cleaning data" and never start the actual project. Or they start, realize the data isn't as clean as they thought, and restart the cleaning process.

This cycle can continue for years.

Why Data Will Always Be Messy

Reason 1: Your business is constantly changing

What this means:

  • New products launch (new data structures)
  • Processes evolve (old fields become obsolete)
  • Systems get replaced (data moves, fields map imperfectly)
  • Business rules change (definitions shift)
  • Teams reorganize (ownership changes)

Reason 2: Multiple systems equal to multiple versions of truth

The reality:

You have:

  • Accounting system (QuickBooks, NetSuite, Sage)
  • CRM (Salesforce, HubSpot, Pipedrive)
  • Operations tools (custom systems, spreadsheets)
  • Support platform (Zendesk, Intercom)
  • Project management (Asana, Jira, Monday)

Each system:

  • Has its own customer ID structure
  • Defines "customer" differently
  • Updates at different times
  • Has different data quality standards

Example of inevitable mess:

In your CRM:

  • Customer name: "ABC Corp"
  • Status: Active
  • Owner: Sarah

In your accounting system:

  • Customer name: "ABC Corporation"
  • Status: Current
  • Sales rep: S. Johnson

In your support system:

  • Customer name: "ABC"
  • Status: Premium
  • CSM: Sarah J.

Same customer. Three different names. Three different status fields.

This isn't bad data management; it's reality.

Each system serves a different purpose. Forcing perfect consistency across all of them is expensive, brittle, and often counterproductive.

Reason 3: Human beings enter data

The problem:

Humans are:

  • Inconsistent (ABC Corp vs ABC Corporation)
  • Creative (using fields for unintended purposes)
  • Busy (skipping optional fields)
  • Imperfect (typos happen)

Reason 4: Legacy decisions haunt you

The reality:

Five years ago, someone made a decision about how to structure a field. It made sense at the time.

Now that decision is embedded in:

  • Hundreds of reports
  • Dozens of integrations
  • Automated workflows
  • Historical data

Changing it would break everything.

The Medallion Architecture: Embracing Messy Data

Modern data organizations use a three-tier approach:

Bronze Layer: Raw, Messy Data

  • Data exactly as it comes from source systems
  • No transformations, no cleaning
  • Preserves everything, including the mess
  • "This is what we actually have"

Silver Layer: Lightly Cleaned Data

  • Basic standardization (dates, names, IDs)
  • Obvious errors fixed
  • Still pretty close to source
  • "This is what we can reasonably work with"

Gold Layer: Business-Ready Data

  • Cleaned for specific use cases
  • Definitions standardized
  • Validated and tested
  • "This is what our reports use"

The key insight:

You maintain all three layers. You don't wait until everything is gold-level before you start.

You build systems that work with messy data, not systems that require perfect data.

What "Good Enough" Data Actually Looks Like

Forget perfect. Here's what you actually need:

Standard 1: Consistent Enough for Your Core Metrics

Not: Every field is perfectly clean
But: The fields that drive key decisions are reliable


Example:

Don't worry about:

  • Whether customer names are perfectly formatted
  • If phone numbers have dashes or not
  • Optional fields that are spotty

Do care about:

  • Revenue numbers are accurate
  • Customer counts are consistent
  • Cost data reconciles

80% of your decisions come from 20% of your data.

Make that 20% clean. Live with mess in the rest.

Standard 2: Documented Quirks

Not: No data quirks exist
But: Everyone knows what the quirks are


Example of good documentation:

Customer Count Definition: We count "active customers" as anyone who placed an order in the last 90 days. Note: Due to a CRM limitation, customers who bought exclusively through Partner Channel prior to 2023 may not appear in this count. We estimate this affects ~40 customers. For board reporting, we manually add 40 to the automated count.

This isn't perfect data. But it's useful data with documented limitations.

Standard 3: Reliable Enough to Act On

The test:

Would you make a $50k decision based on this data?

If yes: It's clean enough
If no: It needs more work


Example:

Scenario 1: Hiring Decision

"Our data shows revenue per employee is 15% below industry average. Should we hire?"

If this is based on:

  • Accurate revenue numbers
  • Correct employee count
  • Valid industry benchmarks

Then act on it, even if:

  • Employee start dates are imprecise
  • Department assignments are inconsistent
  • Job titles aren't standardized

Scenario 2: Pricing Decision

"Should we increase prices on Product A?"

If this is based on:

  • Accurate product costs
  • Reliable margin calculations
  • Valid demand data

Then act on it, even if:

  • Product descriptions have typos
  • Product categories overlap
  • SKU naming is inconsistent

Reliable enough to act on is not equal to Perfect

The Right Approach: Build Infrastructure That Handles Mess

Instead of cleaning all your data before building infrastructure, build infrastructure that can handle messy data.

Strategy 1: Standardize at the Reporting Layer, Not the Source

Don't: Try to make every system perfectly consistent
Do: Create a reporting layer that standardizes on the fly


Example:

In your three systems:

  • CRM: "ABC Corp"
  • Accounting: "ABC Corporation"
  • Support: "ABC"

In your reporting layer:

  • Map all three to "ABC Corporation"
  • Keep source systems unchanged
  • Standardization happens during data extraction

Why this works:

  • Source systems continue working as they always have
  • No disruption to daily operations
  • Reporting gets consistent data
  • Changes are easy (update mapping, not source systems)

Strategy 2: Document Known Issues Instead of Fixing Everything

Don't: Spend 6 months fixing every data quirk
Do: Document the quirks that matter and work around them


Example documentation:

KNOWN DATA ISSUES - Last updated: June 2025

1. Customer Count Quirks:
   - Partner channel customers pre-2023 not in CRM (~40 customers)
   - Workaround: Manually add 40 to automated count for board reports
   
2. Revenue Timing:
   - CRM records deal close date
   - Accounting records invoice date
   - These can differ by 15-30 days
   - Workaround: Use accounting date for financial reports, CRM date for sales metrics

3. Product Categories:
   - Some products in multiple categories
   - Causes ~2% double-counting in category reports
   - Workaround: Noted in all category reports, acceptable margin of error

This is useful. This is actionable. This is realistic.

Strategy 3: Prioritize Data Quality for High-Impact Decisions

The 80/20 rule:

20% of your data drives 80% of your decisions.

Focus cleaning efforts there.


Example priority list:

Priority 1 (Clean aggressively):

  • Revenue data
  • Cost data
  • Customer counts
  • Key operational metrics

Priority 2 (Clean opportunistically):

  • Product data
  • Employee data
  • Lead source tracking

Priority 3 (Live with the mess):

  • Descriptive fields
  • Optional fields
  • Historical data that doesn't drive decisions

Time allocation:

  • 70% on Priority 1
  • 25% on Priority 2
  • 5% on Priority 3

Strategy 4: Build Monitoring, Not Perfection

Don't: Try to prevent all bad data from entering
Do: Detect and flag bad data quickly


Example monitoring:

Red flags to monitor:

  • Revenue suddenly drops 50% (likely data issue)
  • Customer count changes 100+ overnight (likely import error)
  • Cost margins outside normal range (likely data entry error)

Automated alerts: "Revenue for Product A is $0 this week. Last week it was $45k. Likely data issue - please investigate."


Why this works:

You catch issues fast, before they affect major decisions. You don't wait for perfect data; you monitor for broken data.

The Cost of Waiting for Perfect Data

While you wait for clean data, you're paying for:

Cost 1: Decision Delay

Real cost example:

A 95-person company spent 8 months "cleaning data" before building reporting infrastructure.

During those 8 months:

  • They missed a declining trend in customer renewals (down 12%)
  • They over-hired in a department that was actually performing well
  • They continued manual reporting that cost 120 hours/month

Total cost of waiting: ~$85,000 in opportunity cost and wasted effort


If they had started with "good enough" data:

  • Would have spotted renewal trend in Month 2
  • Could have course-corrected hiring in Month 3
  • Would have saved 960 hours of manual work

Cost 2: Perpetual Preparation

The trap:

Month 1: "We need to clean the data first"
Month 3: "We're 60% done cleaning, need another 2 months"
Month 5: "We found more issues, need to restart"
Month 8: "The business changed, data is messy again"
Month 12: "We should really clean this data before starting..."

Reality:

Companies can spend years preparing to start and never actually start.

Cost 3: Perfect Becomes the Enemy of Good

The opportunity cost:

You could have had:

  • 80% automated reporting 6 months ago
  • Quick answers to most questions
  • Reliable data for most decisions
  • Momentum to tackle the remaining 20%

Instead you have:

  • 0% automated reporting
  • Still manually pulling data
  • Still making decisions with delayed information
  • No momentum, just exhaustion

What to Do Instead

Step 1: Start with Your Core Metrics (Week 1-2)

Identify the 5-10 metrics that drive major decisions:

  • Monthly revenue
  • Customer count
  • Gross margin
  • Cash position
  • Key operational metrics

Just these. Not everything.

Step 2: Assess Data Quality for Those Specific Metrics (Week 2-3)

For each core metric, ask:

  1. Where does this data come from?
  2. How accurate is it?
  3. What are the known issues?
  4. Is it reliable enough to act on?

Document the answers.

Step 3: Clean Only What's Necessary (Week 3-6)

For Priority 1 metrics:

If data quality is below 90% accuracy → Clean it
If data quality is 90-95% → Document quirks and proceed
If data quality is 95%+ → Proceed as-is

For everything else:

Document known issues and move forward.

Step 4: Build Infrastructure That Handles Imperfection (Week 6-12)

Design your system to:

  • Standardize at the reporting layer
  • Flag anomalies automatically
  • Document known issues
  • Be transparent about limitations

Don't wait for perfect source data.

Step 5: Improve Iteratively (Ongoing)

After the system is running:

  • Monitor for issues
  • Fix the biggest problems first
  • Improve data quality over time
  • But never stop delivering value while you improve

Progress, not perfection.

The Mindset Shift

Old mindset: "We can't build anything until the data is perfect"

New mindset: "We'll build with the data we have, document its limitations, and improve it iteratively"

Old mindset: "Messy data is a problem to solve"

New mindset: "Messy data is a reality to design around"

Old mindset: "We need 6 months to clean data before starting"

New mindset: "We need 6 weeks to assess data and start building"

The Bottom Line

Messy data will never be fully solved.

Your business is too dynamic. Your systems are too numerous. Your humans are too human.

Waiting for perfect data means waiting forever.

What actually works:

  1. Identify your core metrics - The 20% that drive 80% of decisions
  2. Assess data quality there - Is it reliable enough to act on?
  3. Clean what matters most - Priority 1 metrics only
  4. Document the rest - Known issues, limitations, workarounds
  5. Build infrastructure that handles imperfection - Design for messy data, not perfect data
  6. Improve iteratively - Get better over time, but deliver value now

The goal isn't perfect data. The goal is reliable enough data, documented limitations, and systems that work in the real world.

Stop waiting. Start building.


Stuck waiting for "clean data" before building infrastructure? We help mid-sized companies build reporting systems that work with real-world data; imperfect, messy, but good enough to drive decisions.

Let's talk →