Captain’s Log Star Date 2024.03.08

Captain’s Log Star Date 2024.03.08

Study Progress This Week

  • A Cloud Guru – AWS Certified Cloud Practitioner (CLF-CO2)
    • Chapter 4 – Storage Technology and Services
    • Chapter 5 – Content Delivery and Networking Technology and Services
  • Data Pipelines Pocket Reference – Moving and Processing Data for Analytics
    • Chapter 4 – Data Ingestion: Extracting Data

Findings/Notes

  • Data Types
    • I came across a few data types reading about data ingestion and wanted to know exactly what they meant in more depth and when to use them, which would be in the context of memory management and precision.
      • FLOAT – 6 points of decimal precision
      • SINGLE – 7 points of decimal precision
      • DOUBLE – 15 points of decimal precision
  • TCP versus UDP
    • I took a networking class once, and these concepts were brought up but not in enough depth that I could tell the difference between them in any valuable way. I now know TCP is connection oriented, whereas UDP is not. TCP ensures that data packets are received in the same order they were sent and retransmits any lost packets. It is used for high reliability and accuracy in data delivery.
    • UDP on the other hand is connectionless, these types of packets are independent of the others and therefore does not guarantee the proper transmission of data packets. This means less overhead and efficiency, and is often used in where speed and low latency are more critical like gaming, video streaming.
  • Security Groups versus NACLs
    • Security Groups is a stateful firewall. This means that it keeps track of connections using a table, monitoring and inspecting packets. A NACL is stateless, just filtering invididual packets based on their header information. Stateful firewalls are more safer, stateful firewalls are more lightweight
  • Accessing data from an API (data pipelines)
    • Although the example API given in the Data Pipelines book is turned off, I was still able to access the JSON from api.open-notify.org/iss-now.json. This displays the current position of the International Space Station (ISS).
  • CDC – Change Data Capture (data pipelines)
    • I learned that this means “Change Data Capture” and is the process to which databases track changes made to data in its tables, this could mean what records are updated, deleted, inserted etc. Utilizing CDC data is key to making data ingestion more efficient though incremental means rather than a full data extraction. This is especially important on large databases. In some databases this is called binlog replication. Kafka and Debezium are recommended tools for this process.
  • PWA – Progressive Web App
    • I found out that PWAs is a type of web app that uses modern web tech to provide a UX similar to native mobile apps. It sounds like this is a popular framework but Apple has discounted its use, which means its future is cloudy.
  • Miscellaneous
    • SELECT * INTO new_table FROM original_table will create a new table and insert data into it. Good for creating a backup copy table (but does not include key information)
    • The || (double pipes) in Java is a logical OR operation