Captain’s Log Star Date 2024.03.22

Captain’s Log Star Date 2024.03.22

Study Progress This Week

  • The Enterprise Big Data Lake – Delivering the Promise of Big Data and Data Science
    • Chapter 1: Introduction To Data Lakes

Finding/Notes

I’m really liking this new book I am reading about Enterprise Data Lakes. There’s a lot of great information in it. The first chapter’s been super informative and I am trying to be thorough about working through it. I’ve also added Obsidian as a note taking device too and moving away from paper.

  • Data puddles, data ponds, data lakes, data oceans, data swamps, and data silos
    • I’ve learned about the difference between these, with the ultimate goal of the data lake is to support self service and is not limited by a particular project’s scope
  • Schema on write/read
    • I know that schema on write is the standard relational database method, whereas schema on read provides frictionless ingestion is common in big data technologies.
  • Data preferences
    • I realize that data analysts prefer their data harmonized, whereas data scientists want data much more granular so they can find relationships in the data
  • Data Lake Roadmaps
    • I know that a good data lake roadmap should include standing up infrastructure, organizing data lakes by creating zones and ingesting data,  setting the data lake up for self service, and managing user access

 

DALL·E 2024-03-24 21.38.11 - Envision a data lake depicted as a serene mountain lake, but with a distinct computer theme integrated into its landscape.
DALL·E 2024-03-24 21.38.11 – Envision a data lake depicted as a serene mountain lake, but with a distinct computer theme integrated into its landscape.