Files
sriphat-dataplatform/document-obsidiant/2026-SRI-PJ-001 Sriphat AI Transformation/04-Airflow.md
jigoong a587be08bd feat: MinIO integration — bucket finance, API service upload, Nginx routing
- 01-infra/nginx-configs: add MinIO /minio/ and /minio-console/ location blocks
  (port 9000 S3 API, port 9001 Console UI, path stripping via rewrite)
- 03-apiservice: integrate MinIO minio-python SDK for file upload
  - requirements.txt: add minio==7.2.11
  - app/core/config.py: add MINIO_ENDPOINT, ACCESS_KEY, SECRET_KEY, BUCKET_FINANCE, USE_SSL
  - app/services/minio_client.py: new — upload_file(), get_presigned_url(), delete_file()
  - app/routes/pages.py: replace local /data/uploads/ write with MinIO upload to finance bucket
  - docker-compose.yml: pass MinIO env vars to container
  - .env.example: document MinIO vars
- 07-minio/.env.example: add MINIO_SVC_ACCESS_KEY/SECRET_KEY section
- 07-minio/README.md: add Python minio SDK and Airflow DAG usage guide
- CLAUDE.md: project context (servers, SSH, paths, service distribution)
- document-obsidiant/: initial Obsidian docs for all services
2026-05-20 17:42:39 +07:00

6.1 KiB

tags, created, status, folder
tags created status folder
project/sriphat
airflow
workflow
etl
2026-05-07 active 05-airflow

Apache Airflow (05-airflow)

Docker Compose: 05-airflow/docker-compose.yaml Env File: 05-airflow/.env Version: Apache Airflow 3.1.5

Overview

Apache Airflow ใช้สำหรับ Workflow Orchestration:

  • รัน DAGs (Directed Acyclic Graphs) แบบตั้งเวลา
  • ประมวลผล Excel/CSV files จาก Finance
  • ETL pipeline orchestration
  • Integration กับ API Service

Executor: CeleryExecutor (ใช้ Redis เป็น broker)


Services

Container หน้าที่ Port
airflow-apiserver REST API + Web UI 8200:8080
airflow-scheduler DAG scheduling internal
airflow-dag-processor DAG file parsing internal
airflow-worker Task execution (Celery) internal
airflow-triggerer Deferred task triggering internal
airflow-init Database migration (one-time)
airflow-cli CLI tool (debug profile)
flower Celery monitoring (optional) 5555:5555

Architecture

                 ┌─────────────────┐
                 │  airflow-        │
                 │  apiserver       │  ← Web UI + REST API (port 8200)
                 │  (port 8080)     │
                 └────────┬────────┘
                          │
          ┌───────────────┼───────────────┐
          │               │               │
   ┌──────▼──────┐ ┌──────▼──────┐ ┌─────▼──────┐
   │  airflow-   │ │  airflow-   │ │  airflow-  │
   │  scheduler  │ │  dag-       │ │  triggerer │
   │             │ │  processor  │ │            │
   └──────┬──────┘ └─────────────┘ └────────────┘
          │
          ▼  (Celery tasks via Redis)
   ┌──────────────┐
   │  airflow-    │
   │  worker      │  ← รัน tasks จริง
   └──────────────┘
          │
          ▼
   ┌──────────────┐
   │  PostgreSQL  │  (Airflow metadata DB)
   │  Redis       │  (Celery broker)
   └──────────────┘

Database Configuration

Airflow ใช้ PostgreSQL บน Infra server:

# Connection string
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=
  postgresql+psycopg2://${AIRFLOW_DB_USER}:${AIRFLOW_DB_PASSWD}@${AIRFLOW_DB_HOST}:${AIRFLOW_DB_PORT}/${AIRFLOW_DB_NAME}

AIRFLOW__CELERY__RESULT_BACKEND=
  db+postgresql://${AIRFLOW_DB_USER}:${AIRFLOW_DB_PASSWD}@${AIRFLOW_DB_HOST}:${AIRFLOW_DB_PORT}/${AIRFLOW_DB_NAME}

# Redis broker
AIRFLOW__CELERY__BROKER_URL=redis://:@redis:6379/0

Volume Mounts

05-airflow/
├── dags/        → /opt/airflow/dags      (DAG files)
├── logs/        → /opt/airflow/logs      (Task logs)
├── config/      → /opt/airflow/config    (airflow.cfg)
│   └── airflow.cfg
└── plugins/     → /opt/airflow/plugins   (Custom plugins)

Web UI

URL: http://localhost:8200 หรือ https://ai.sriphat.com/airflow

# Config
AIRFLOW__WEBSERVER__BASE_URL=https://ai.sriphat.com/airflow
AIRFLOW__WEBSERVER__WEB_SERVER_PORT=8080

Default credentials (ถ้าไม่เปลี่ยน):

  • Username: airflow
  • Password: airflow

DAGs ที่มีอยู่

DAG ID หน้าที่ ถูก Trigger จาก
process_finance_excel ประมวลผล Excel ของ Finance API Service

Airflow Configuration (airflow.cfg)

Path: 05-airflow/config/airflow.cfg

Key settings:

[core]
executor = CeleryExecutor
load_examples = False
dags_are_paused_at_creation = True

[webserver]
base_url = https://ai.sriphat.com/airflow

[execution_api]
execution_api_server_url = http://airflow-apiserver:8080/execution/

Environment Variables

# Airflow image
AIRFLOW_IMAGE_NAME=apache/airflow:3.1.5

# Database
AIRFLOW_DB_USER=<user>
AIRFLOW_DB_PASSWD=<password>
AIRFLOW_DB_HOST=<postgres-host>
AIRFLOW_DB_PORT=5432
AIRFLOW_DB_NAME=airflow

# Security
AIRFLOW__CORE__FERNET_KEY=<fernet-key>

# Admin user
_AIRFLOW_WWW_USER_USERNAME=airflow
_AIRFLOW_WWW_USER_PASSWORD=<password>

# Optional pip packages
_PIP_ADDITIONAL_REQUIREMENTS=

Deploy Commands

cd 05-airflow

# Initialize (first time only)
docker compose up airflow-init

# Start all services
docker compose up -d

# View logs
docker logs airflow-apiserver -f
docker logs airflow-scheduler -f
docker logs airflow-worker -f

# Run Celery Flower monitoring
docker compose --profile flower up -d

# Scale workers (เพิ่ม worker)
docker compose up -d --scale airflow-worker=3

System Requirements

Airflow ต้องการ resources ขั้นต่ำ:

  • RAM: ≥ 4 GB
  • CPU: ≥ 2 cores
  • Disk: ≥ 10 GB

Ingestion Layer (04-ingestion / Airbyte)

หมายเหตุ: 04-ingestion/docker-compose.yml ปัจจุบัน commented out ทั้งหมด Airbyte ถูก deploy แยกต่างหาก (ผ่าน abctl หรือ standalone)

Airbyte ที่ระบุในแผน

Source ชนิดข้อมูล
SQL Server (HIS) ข้อมูลผู้ป่วย, OPD
Oracle (Lab) ผลตรวจทางห้องปฏิบัติการ
REST API External data
Excel/CSV Finance, รายงาน

Destination: PostgreSQL raw_data schema

Port: 8030 (เมื่อ deploy แล้ว)