As a new data engineering student, there are a number of concepts that you need to grasp. The concepts will guide you in knowing exactly what to learn in respect to data engineering. So create a notion page and gather all resources available to be able to track your progress while learning.
i) Batch Verses Streaming Ingestion.
A Data Engineer implements the ETL (Extract Transform and Load) process for their organizations. In extracting the data, they should have identified the sources of these data and have a procedure for collecting the data. Data ingestion is the procedure that the data engineer takes to collect the data from all the different sources and organize it in a way that it can be processed for their specific organization.
Batch Ingestion is when the data is collected over a period of time for example, a minute, a week or a month, once it has all being gathered, it is then processed all together at the same time. It is suitable for when dealing with very large datasets. For example collecting all the sales data of an Ecommerce store after a day.
Stream ingestion is when data is processed as soon as it is collected. In stream ingestion, the data is processed instantly. It is highly recommended for critical data that requires immediate decision making. For example, you can use stream ingestion when dealing with fraud detection systems to identify the fraud as soon as it happens.
ii) (CDC) Change Data Capture
This is a technique used to ensure that all the records in a database are synchronized across the entire database in real-time. If and when a change is made to a record in a database, then these changes are integrated across the entire database resulting in data with low latency.