Captain’s Log Star Date 2024.03.22

March 22, 2024

Study Progress This Week

The Enterprise Big Data Lake – Delivering the Promise of Big Data and Data Science
- Chapter 1: Introduction To Data Lakes

Finding/Notes

I’m really liking this new book I am reading about Enterprise Data Lakes. There’s a lot of great information in it. The first chapter’s been super informative and I am trying to be thorough about working through it. I’ve also added Obsidian as a note taking device too and moving away from paper.

Data puddles, data ponds, data lakes, data oceans, data swamps, and data silos
- I’ve learned about the difference between these, with the ultimate goal of the data lake is to support self service and is not limited by a particular project’s scope
Schema on write/read
- I know that schema on write is the standard relational database method, whereas schema on read provides frictionless ingestion is common in big data technologies.
Data preferences
- I realize that data analysts prefer their data harmonized, whereas data scientists want data much more granular so they can find relationships in the data
Data Lake Roadmaps
- I know that a good data lake roadmap should include standing up infrastructure, organizing data lakes by creating zones and ingesting data, setting the data lake up for self service, and managing user access