Anurag Agrawal

97 Things Every Data Engineer Should Know by Tobias Macey

March 2024

A collection of short essays from industry experts covering best practices, architectural patterns, and essential knowledge for modern data engineering.

Key Concepts

Personal Takeaways

The book's format—97 discrete pieces of advice—makes it particularly valuable as a reference. When designing our data platform at 5X, I frequently returned to specific essays for guidance on challenges like managing schema evolution or implementing effective data catalogs. The diversity of perspectives has helped me build a more well-rounded approach to data engineering.

Most Valuable Insights

The essays on data lineage tracking have been especially influential in my work. Implementing comprehensive lineage tracking in our system has dramatically improved our ability to troubleshoot issues and understand data dependencies. I was also particularly impressed by the sections on treating data pipelines as software, applying traditional software engineering practices like testing, version control, and CI/CD to data workflows.

Recommendation

Essential reading for data engineers at all levels, from beginners to experts. The book's structure makes it easy to consume in small chunks, and the range of topics ensures there's valuable information regardless of your specific focus area. It's one of the few technical books I keep within arm's reach at my desk.