Cloud Composer
What is Cloud Composer?
Cloud Composer is a fully managed workflow orchestration service that helps you create, schedule, monitor, and manage workflows across services and environments.
Cloud Composer is based on Apache Airflow, an open-source workflow orchestration tool.
Advantages
- Fully managed: Google Cloud manages the infrastructure, so you can focus on your workflows.
- Scalable: You can easily scale your workflows up and down.
- Portable: You can write your workflows in Python, and you can run it on multiple execution engines (Apache Flink, Apache Spark, etc.)
- Observability: You can monitor your workflows with Cloud Composer monitoring interface
Airflow
Apache Airflow is an open-source workflow orchestration tool that allows you to create, schedule, and monitor workflows.
Airflow work with DAGs
A DAG in Airflow is a representation of your workflow as an Directed Acyclic Graph
:
- Directed: Tasks run in a specific order (one direction)
- Cyclic: Tasks cannot create loops (no cycles).
- Graph: A collection of tasks and their dependencies
Cloud Composer Environments
Cloud Composer environments are composed of the following resources:
- GKE cluster: The GKE cluster runs the Airflow scheduler, web server, and workers.
- Server Airflow: The Airflow web server provides the user interface for Airflow.
- Database Airflow: The Airflow database stores metadata for Airflow.
- Cloud Storage bucket: The Cloud Storage bucket stores DAGs, logs, and plugins.
U can create multiple Cloud Composer environments in the same project who contain one or more DAG
Workflow Scheduling
U have 2 types of scheduling in Cloud Composer:
- Periodic: U can schedule your workflow to run at specific intervals (every day, every hour, etc.)
- Triggered: U can trigger your workflow to run when a specific event occurs (file uploaded, message received, etc.) with Cloud Functions.