Transparency
Our Methodology
How Water Utility Report sources, normalizes, interprets, and publishes U.S. drinking water data — and where we draw hard lines on what we will and won't do.
Core Philosophy
Water Utility Report is built on a simple premise: the most useful thing we can do is take hard-to-navigate official public data and make it genuinely understandable.
We do not manufacture water quality claims. We do not republish third-party commercial databases. We do not publish pages that exist only to capture search traffic. Every page that goes live must answer a real user question with real, source-backed information.
What We Use
Stage 1 uses only official U.S. government datasets and public records where terms clearly allow normalization, summarization, and republication of derived facts.
EPA SDWIS (Safe Drinking Water Information System)
Core utility identity, system IDs, violation records, population served
EPA ECHO (Enforcement and Compliance History Online)
Compliance history, enforcement actions, detailed violation data
Consumer Confidence Reports (CCRs)
Annual utility-published water quality reports; source of contaminant level data
EPA Water Quality Portal
Supporting sampling and monitoring data from federal and state agencies
State drinking water program datasets
Where terms permit public use; used for service area and utility detail
EPA and CDC public guidance documents
Health-effect interpretations and treatment guidance references
U.S. Census Bureau
Population and geography data for service area normalization
What We Won't Use
Stage 1 explicitly prohibits the following sources unless written permission or a license is obtained.
WQA Member Directory
Licensed commercial data — requires explicit authorization
WQA / NSF Certified Product Datasets
Commercial certification databases — bulk reproduction prohibited
EWG Tap Water Database content
Nonprofit competitive database — bulk extraction not permitted
Competitor or third-party directories
No bulk scraping, copying, or republication of third-party databases
Logos, seals, and certification marks
Third-party trust marks — not reproduced without explicit license
How Pages Are Built
No page goes live from an automated pipeline directly to public. Every page goes through a review and publish workflow with human checkpoints.
Data ingested
Official datasets downloaded from EPA or state sources. Source URL, ingestion date, and dataset version recorded at row level.
Records normalized
Utility names, system IDs, geographic references, and contaminant names standardized. Duplicate and incomplete records filtered.
Draft page generated
Page templates populated from normalized data. AI-assisted plain-English summaries drafted for human review.
Human review
Reviewer checks factual accuracy, legal compliance flags, internal link logic, and content quality. No page is published directly from automated generation.
Approved and published
Page assigned publish status. Cohort controls allow staged rollout by state, city, or entity group.
Refresh cycle
Annual ingestion refresh. Utilities that publish new CCRs trigger re-review of affected pages.
Confidence Levels
Every data record in our system carries a confidence score. This is shown on pages where confidence is less than high, so readers understand the certainty level of the underlying data.
Data sourced directly from EPA SDWIS, ECHO, or utility CCRs with verified ingestion date. Utility identity confirmed against official system ID.
Data sourced from state datasets or derived from official data through normalization steps. Core facts verified; some derived fields may carry uncertainty.
Data modeled, inferred, or sourced from a third party that has not been fully verified. Flagged for review before publication. Pages with low-confidence data carry explicit warnings.
Legal Safeguards
Likely match disclosure
Utility-to-address matching is disclosed as 'likely' where service area mapping relies on modeled boundaries. We never claim certainty we don't have.
Regulatory vs. health interpretation
We clearly separate what regulatory data shows from what health guidance recommends. These are often different — we do not conflate them.
No medical claims
Water quality information is informational only. We do not make medical, diagnostic, or treatment recommendations beyond linking to official health guidance.
Source-first
Every factual claim links to or identifies its source. Data without a source attribution is not published on entity pages.
Data Provenance Standards
Every ingested record in our system stores the following provenance fields at the row level:
| Field | Purpose |
|---|---|
| source_type | Identifies whether data is official, state, derived, or modeled |
| source_url | Direct URL to the source document or dataset |
| ingestion_date | Date this record was pulled from the source |
| last_verification_date | Last date a human or automated check confirmed the record |
| transform_version | Which version of our normalization pipeline processed this record |
| confidence_score | Numerical confidence score for derived or inferred values |
Site-Wide Disclaimer
Water Utility Report provides informational content derived from official U.S. government and public datasets. All content is published for informational purposes only.
- This site is not a substitute for professional water testing by a certified laboratory.
- Utility service area matching is likely but not guaranteed for all addresses — confirm with your utility or water bill.
- Where data is modeled or derived, this is disclosed on the relevant page.
- This site is not a substitute for medical advice. Consult a healthcare provider for health concerns related to water quality.
- Contaminant data reflects the most recent Consumer Confidence Report or regulatory data available to us, which may not reflect real-time conditions.