97 Things Every Data Engineer Should Know by Tobias Macey
March 2024
A collection of short essays from industry experts covering best practices, architectural patterns, and essential knowledge for modern data engineering.
Key Concepts
- Data Quality Management: Approaches to ensuring and maintaining high-quality data
- Scalable Architecture: Designing data systems that can grow with increasing volumes and complexity
- Observability: Building monitoring and debugging capabilities into data pipelines
- Data Governance: Establishing practices for data security, privacy, and compliance
Personal Takeaways
The book's format—97 discrete pieces of advice—makes it particularly valuable as a reference. When designing our data platform at 5X, I frequently returned to specific essays for guidance on challenges like managing schema evolution or implementing effective data catalogs. The diversity of perspectives has helped me build a more well-rounded approach to data engineering.
Most Valuable Insights
The essays on data lineage tracking have been especially influential in my work. Implementing comprehensive lineage tracking in our system has dramatically improved our ability to troubleshoot issues and understand data dependencies. I was also particularly impressed by the sections on treating data pipelines as software, applying traditional software engineering practices like testing, version control, and CI/CD to data workflows.
Recommendation
Essential reading for data engineers at all levels, from beginners to experts. The book's structure makes it easy to consume in small chunks, and the range of topics ensures there's valuable information regardless of your specific focus area. It's one of the few technical books I keep within arm's reach at my desk.