Files
sriphat-dataplatform/document-obsidiant/2026-SRI-PJ-001 Sriphat AI Transformation/04-Airflow.md
jigoong a587be08bd feat: MinIO integration — bucket finance, API service upload, Nginx routing
- 01-infra/nginx-configs: add MinIO /minio/ and /minio-console/ location blocks
  (port 9000 S3 API, port 9001 Console UI, path stripping via rewrite)
- 03-apiservice: integrate MinIO minio-python SDK for file upload
  - requirements.txt: add minio==7.2.11
  - app/core/config.py: add MINIO_ENDPOINT, ACCESS_KEY, SECRET_KEY, BUCKET_FINANCE, USE_SSL
  - app/services/minio_client.py: new — upload_file(), get_presigned_url(), delete_file()
  - app/routes/pages.py: replace local /data/uploads/ write with MinIO upload to finance bucket
  - docker-compose.yml: pass MinIO env vars to container
  - .env.example: document MinIO vars
- 07-minio/.env.example: add MINIO_SVC_ACCESS_KEY/SECRET_KEY section
- 07-minio/README.md: add Python minio SDK and Airflow DAG usage guide
- CLAUDE.md: project context (servers, SSH, paths, service distribution)
- document-obsidiant/: initial Obsidian docs for all services
2026-05-20 17:42:39 +07:00

238 lines
6.1 KiB
Markdown

---
tags:
- project/sriphat
- airflow
- workflow
- etl
created: 2026-05-07
status: active
folder: 05-airflow
---
# Apache Airflow (05-airflow)
> **Docker Compose:** `05-airflow/docker-compose.yaml`
> **Env File:** `05-airflow/.env`
> **Version:** Apache Airflow 3.1.5
## Overview
Apache Airflow ใช้สำหรับ Workflow Orchestration:
- รัน DAGs (Directed Acyclic Graphs) แบบตั้งเวลา
- ประมวลผล Excel/CSV files จาก Finance
- ETL pipeline orchestration
- Integration กับ API Service
**Executor:** CeleryExecutor (ใช้ Redis เป็น broker)
---
## Services
| Container | หน้าที่ | Port |
|-----------|--------|------|
| `airflow-apiserver` | REST API + Web UI | `8200:8080` |
| `airflow-scheduler` | DAG scheduling | internal |
| `airflow-dag-processor` | DAG file parsing | internal |
| `airflow-worker` | Task execution (Celery) | internal |
| `airflow-triggerer` | Deferred task triggering | internal |
| `airflow-init` | Database migration (one-time) | — |
| `airflow-cli` | CLI tool (debug profile) | — |
| `flower` | Celery monitoring (optional) | `5555:5555` |
---
## Architecture
```
┌─────────────────┐
│ airflow- │
│ apiserver │ ← Web UI + REST API (port 8200)
│ (port 8080) │
└────────┬────────┘
┌───────────────┼───────────────┐
│ │ │
┌──────▼──────┐ ┌──────▼──────┐ ┌─────▼──────┐
│ airflow- │ │ airflow- │ │ airflow- │
│ scheduler │ │ dag- │ │ triggerer │
│ │ │ processor │ │ │
└──────┬──────┘ └─────────────┘ └────────────┘
▼ (Celery tasks via Redis)
┌──────────────┐
│ airflow- │
│ worker │ ← รัน tasks จริง
└──────────────┘
┌──────────────┐
│ PostgreSQL │ (Airflow metadata DB)
│ Redis │ (Celery broker)
└──────────────┘
```
---
## Database Configuration
Airflow ใช้ PostgreSQL บน Infra server:
```bash
# Connection string
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=
postgresql+psycopg2://${AIRFLOW_DB_USER}:${AIRFLOW_DB_PASSWD}@${AIRFLOW_DB_HOST}:${AIRFLOW_DB_PORT}/${AIRFLOW_DB_NAME}
AIRFLOW__CELERY__RESULT_BACKEND=
db+postgresql://${AIRFLOW_DB_USER}:${AIRFLOW_DB_PASSWD}@${AIRFLOW_DB_HOST}:${AIRFLOW_DB_PORT}/${AIRFLOW_DB_NAME}
# Redis broker
AIRFLOW__CELERY__BROKER_URL=redis://:@redis:6379/0
```
---
## Volume Mounts
```
05-airflow/
├── dags/ → /opt/airflow/dags (DAG files)
├── logs/ → /opt/airflow/logs (Task logs)
├── config/ → /opt/airflow/config (airflow.cfg)
│ └── airflow.cfg
└── plugins/ → /opt/airflow/plugins (Custom plugins)
```
---
## Web UI
**URL:** `http://localhost:8200` หรือ `https://ai.sriphat.com/airflow`
```bash
# Config
AIRFLOW__WEBSERVER__BASE_URL=https://ai.sriphat.com/airflow
AIRFLOW__WEBSERVER__WEB_SERVER_PORT=8080
```
Default credentials (ถ้าไม่เปลี่ยน):
- Username: `airflow`
- Password: `airflow`
---
## DAGs ที่มีอยู่
| DAG ID | หน้าที่ | ถูก Trigger จาก |
|--------|--------|----------------|
| `process_finance_excel` | ประมวลผล Excel ของ Finance | API Service |
---
## Airflow Configuration (airflow.cfg)
**Path:** `05-airflow/config/airflow.cfg`
Key settings:
```ini
[core]
executor = CeleryExecutor
load_examples = False
dags_are_paused_at_creation = True
[webserver]
base_url = https://ai.sriphat.com/airflow
[execution_api]
execution_api_server_url = http://airflow-apiserver:8080/execution/
```
---
## Environment Variables
```bash
# Airflow image
AIRFLOW_IMAGE_NAME=apache/airflow:3.1.5
# Database
AIRFLOW_DB_USER=<user>
AIRFLOW_DB_PASSWD=<password>
AIRFLOW_DB_HOST=<postgres-host>
AIRFLOW_DB_PORT=5432
AIRFLOW_DB_NAME=airflow
# Security
AIRFLOW__CORE__FERNET_KEY=<fernet-key>
# Admin user
_AIRFLOW_WWW_USER_USERNAME=airflow
_AIRFLOW_WWW_USER_PASSWORD=<password>
# Optional pip packages
_PIP_ADDITIONAL_REQUIREMENTS=
```
---
## Deploy Commands
```bash
cd 05-airflow
# Initialize (first time only)
docker compose up airflow-init
# Start all services
docker compose up -d
# View logs
docker logs airflow-apiserver -f
docker logs airflow-scheduler -f
docker logs airflow-worker -f
# Run Celery Flower monitoring
docker compose --profile flower up -d
# Scale workers (เพิ่ม worker)
docker compose up -d --scale airflow-worker=3
```
---
## System Requirements
Airflow ต้องการ resources ขั้นต่ำ:
- **RAM:** ≥ 4 GB
- **CPU:** ≥ 2 cores
- **Disk:** ≥ 10 GB
---
## Ingestion Layer (04-ingestion / Airbyte)
> **หมายเหตุ:** `04-ingestion/docker-compose.yml` ปัจจุบัน **commented out ทั้งหมด**
> Airbyte ถูก deploy แยกต่างหาก (ผ่าน `abctl` หรือ standalone)
### Airbyte ที่ระบุในแผน
| Source | ชนิดข้อมูล |
|--------|----------|
| SQL Server (HIS) | ข้อมูลผู้ป่วย, OPD |
| Oracle (Lab) | ผลตรวจทางห้องปฏิบัติการ |
| REST API | External data |
| Excel/CSV | Finance, รายงาน |
**Destination:** PostgreSQL `raw_data` schema
**Port:** `8030` (เมื่อ deploy แล้ว)
---
## Related
- [[00-Project-Overview]]
- [[01-Infrastructure]]
- [[03-API-Service]]
- [[08-Operations-Runbook]]