Getting Started
You have read the book. Now try the technique. Here is the shortest path from zero to a working profile.
Option 1: DataRadar in Your Browser (30 seconds)
- Open dataradar.co.uk.
- Drop a CSV, Excel, or JSON file onto the page — or paste a URL to an open data endpoint.
- Click Profile. The masks appear immediately.
- Switch between LU and HU grain to explore structural patterns at different resolutions.
- Click Export Flat Enhanced to download the NDJSON output with
.raw,.HU,.LU, and.Rulescolumns.
No installation. No sign-up. No data leaves your machine.
Option 2: bytefreq on the Command Line (5 minutes)
Install Rust and bytefreq:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env
cargo install --git https://github.com/minkymorgan/bytefreq
Profile a file:
cat your_data.csv | bytefreq -d ','
Generate flat enhanced output:
cat your_data.csv | bytefreq -d ',' -E > enhanced.ndjson
Query the output with DuckDB:
duckdb -c "SELECT * FROM read_ndjson_auto('enhanced.ndjson') LIMIT 10;"
Sample Data
If you do not have a dataset to hand, try the UK government's Electric Vehicle Chargepoint Registry — a real open dataset with messy postcodes, mixed formats, and international address data:
curl -sL 'https://www.gov.uk/guidance/find-and-use-data-on-public-electric-vehicle-chargepoints' -o chargepoints.csv
cat chargepoints.csv | bytefreq -d ','
Or paste the URL directly into DataRadar.
Links
- DataRadar: dataradar.co.uk
- bytefreq source: github.com/minkymorgan/bytefreq
- This book: github.com/minkymorgan/DataQualityOnRead
- Enterprise support: andrew@gamakon.ai
What to Do Next
- Profile a dataset you are working with right now. Look at the masks. What surprises you?
- Export the flat enhanced format and load it into your tool of choice (Pandas, Polars, DuckDB, Excel).
- Identify the top three mask patterns per column — those are your "expected" formats.
- Look at the long tail — the rare masks. Those are your quality issues.
- Write treatment functions for the issues you find, keyed by mask (as described in Chapter 8).
- Repeat. Profile early. Profile often.