About the Author

Andrew Morgan is a data engineer, architect, and consultant with nearly two decades of experience building data platforms and data quality tools across financial services, government, and telecoms.

He is the author of Mastering Spark for Data Science (Packt Publishing, 2017), a 542-page reference covering exploratory data analysis, data quality profiling, graph analytics, natural language processing, and large-scale machine learning with Apache Spark. Chapter 4 of that book introduced the mask-based profiling techniques that form the foundation of the DQOR framework described here.

Andrew is the creator of bytefreq, originally written in awk in 2007 and now reimplemented in Rust, and DataRadar, a browser-based WASM profiler for locked-down environments where installing software is not possible. His current work at Gamakon focuses on data quality tooling, data platform architecture, and consulting.

He believes that data quality should not require a PhD or an enterprise licence — just a clear idea and a good tool.