- 01-infra/nginx-configs: add MinIO /minio/ and /minio-console/ location blocks (port 9000 S3 API, port 9001 Console UI, path stripping via rewrite) - 03-apiservice: integrate MinIO minio-python SDK for file upload - requirements.txt: add minio==7.2.11 - app/core/config.py: add MINIO_ENDPOINT, ACCESS_KEY, SECRET_KEY, BUCKET_FINANCE, USE_SSL - app/services/minio_client.py: new — upload_file(), get_presigned_url(), delete_file() - app/routes/pages.py: replace local /data/uploads/ write with MinIO upload to finance bucket - docker-compose.yml: pass MinIO env vars to container - .env.example: document MinIO vars - 07-minio/.env.example: add MINIO_SVC_ACCESS_KEY/SECRET_KEY section - 07-minio/README.md: add Python minio SDK and Airflow DAG usage guide - CLAUDE.md: project context (servers, SSH, paths, service distribution) - document-obsidiant/: initial Obsidian docs for all services
238 lines
6.1 KiB
Markdown
238 lines
6.1 KiB
Markdown
---
|
|
tags:
|
|
- project/sriphat
|
|
- airflow
|
|
- workflow
|
|
- etl
|
|
created: 2026-05-07
|
|
status: active
|
|
folder: 05-airflow
|
|
---
|
|
|
|
# Apache Airflow (05-airflow)
|
|
|
|
> **Docker Compose:** `05-airflow/docker-compose.yaml`
|
|
> **Env File:** `05-airflow/.env`
|
|
> **Version:** Apache Airflow 3.1.5
|
|
|
|
## Overview
|
|
|
|
Apache Airflow ใช้สำหรับ Workflow Orchestration:
|
|
- รัน DAGs (Directed Acyclic Graphs) แบบตั้งเวลา
|
|
- ประมวลผล Excel/CSV files จาก Finance
|
|
- ETL pipeline orchestration
|
|
- Integration กับ API Service
|
|
|
|
**Executor:** CeleryExecutor (ใช้ Redis เป็น broker)
|
|
|
|
---
|
|
|
|
## Services
|
|
|
|
| Container | หน้าที่ | Port |
|
|
|-----------|--------|------|
|
|
| `airflow-apiserver` | REST API + Web UI | `8200:8080` |
|
|
| `airflow-scheduler` | DAG scheduling | internal |
|
|
| `airflow-dag-processor` | DAG file parsing | internal |
|
|
| `airflow-worker` | Task execution (Celery) | internal |
|
|
| `airflow-triggerer` | Deferred task triggering | internal |
|
|
| `airflow-init` | Database migration (one-time) | — |
|
|
| `airflow-cli` | CLI tool (debug profile) | — |
|
|
| `flower` | Celery monitoring (optional) | `5555:5555` |
|
|
|
|
---
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────────┐
|
|
│ airflow- │
|
|
│ apiserver │ ← Web UI + REST API (port 8200)
|
|
│ (port 8080) │
|
|
└────────┬────────┘
|
|
│
|
|
┌───────────────┼───────────────┐
|
|
│ │ │
|
|
┌──────▼──────┐ ┌──────▼──────┐ ┌─────▼──────┐
|
|
│ airflow- │ │ airflow- │ │ airflow- │
|
|
│ scheduler │ │ dag- │ │ triggerer │
|
|
│ │ │ processor │ │ │
|
|
└──────┬──────┘ └─────────────┘ └────────────┘
|
|
│
|
|
▼ (Celery tasks via Redis)
|
|
┌──────────────┐
|
|
│ airflow- │
|
|
│ worker │ ← รัน tasks จริง
|
|
└──────────────┘
|
|
│
|
|
▼
|
|
┌──────────────┐
|
|
│ PostgreSQL │ (Airflow metadata DB)
|
|
│ Redis │ (Celery broker)
|
|
└──────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Database Configuration
|
|
|
|
Airflow ใช้ PostgreSQL บน Infra server:
|
|
|
|
```bash
|
|
# Connection string
|
|
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=
|
|
postgresql+psycopg2://${AIRFLOW_DB_USER}:${AIRFLOW_DB_PASSWD}@${AIRFLOW_DB_HOST}:${AIRFLOW_DB_PORT}/${AIRFLOW_DB_NAME}
|
|
|
|
AIRFLOW__CELERY__RESULT_BACKEND=
|
|
db+postgresql://${AIRFLOW_DB_USER}:${AIRFLOW_DB_PASSWD}@${AIRFLOW_DB_HOST}:${AIRFLOW_DB_PORT}/${AIRFLOW_DB_NAME}
|
|
|
|
# Redis broker
|
|
AIRFLOW__CELERY__BROKER_URL=redis://:@redis:6379/0
|
|
```
|
|
|
|
---
|
|
|
|
## Volume Mounts
|
|
|
|
```
|
|
05-airflow/
|
|
├── dags/ → /opt/airflow/dags (DAG files)
|
|
├── logs/ → /opt/airflow/logs (Task logs)
|
|
├── config/ → /opt/airflow/config (airflow.cfg)
|
|
│ └── airflow.cfg
|
|
└── plugins/ → /opt/airflow/plugins (Custom plugins)
|
|
```
|
|
|
|
---
|
|
|
|
## Web UI
|
|
|
|
**URL:** `http://localhost:8200` หรือ `https://ai.sriphat.com/airflow`
|
|
|
|
```bash
|
|
# Config
|
|
AIRFLOW__WEBSERVER__BASE_URL=https://ai.sriphat.com/airflow
|
|
AIRFLOW__WEBSERVER__WEB_SERVER_PORT=8080
|
|
```
|
|
|
|
Default credentials (ถ้าไม่เปลี่ยน):
|
|
- Username: `airflow`
|
|
- Password: `airflow`
|
|
|
|
---
|
|
|
|
## DAGs ที่มีอยู่
|
|
|
|
| DAG ID | หน้าที่ | ถูก Trigger จาก |
|
|
|--------|--------|----------------|
|
|
| `process_finance_excel` | ประมวลผล Excel ของ Finance | API Service |
|
|
|
|
---
|
|
|
|
## Airflow Configuration (airflow.cfg)
|
|
|
|
**Path:** `05-airflow/config/airflow.cfg`
|
|
|
|
Key settings:
|
|
```ini
|
|
[core]
|
|
executor = CeleryExecutor
|
|
load_examples = False
|
|
dags_are_paused_at_creation = True
|
|
|
|
[webserver]
|
|
base_url = https://ai.sriphat.com/airflow
|
|
|
|
[execution_api]
|
|
execution_api_server_url = http://airflow-apiserver:8080/execution/
|
|
```
|
|
|
|
---
|
|
|
|
## Environment Variables
|
|
|
|
```bash
|
|
# Airflow image
|
|
AIRFLOW_IMAGE_NAME=apache/airflow:3.1.5
|
|
|
|
# Database
|
|
AIRFLOW_DB_USER=<user>
|
|
AIRFLOW_DB_PASSWD=<password>
|
|
AIRFLOW_DB_HOST=<postgres-host>
|
|
AIRFLOW_DB_PORT=5432
|
|
AIRFLOW_DB_NAME=airflow
|
|
|
|
# Security
|
|
AIRFLOW__CORE__FERNET_KEY=<fernet-key>
|
|
|
|
# Admin user
|
|
_AIRFLOW_WWW_USER_USERNAME=airflow
|
|
_AIRFLOW_WWW_USER_PASSWORD=<password>
|
|
|
|
# Optional pip packages
|
|
_PIP_ADDITIONAL_REQUIREMENTS=
|
|
```
|
|
|
|
---
|
|
|
|
## Deploy Commands
|
|
|
|
```bash
|
|
cd 05-airflow
|
|
|
|
# Initialize (first time only)
|
|
docker compose up airflow-init
|
|
|
|
# Start all services
|
|
docker compose up -d
|
|
|
|
# View logs
|
|
docker logs airflow-apiserver -f
|
|
docker logs airflow-scheduler -f
|
|
docker logs airflow-worker -f
|
|
|
|
# Run Celery Flower monitoring
|
|
docker compose --profile flower up -d
|
|
|
|
# Scale workers (เพิ่ม worker)
|
|
docker compose up -d --scale airflow-worker=3
|
|
```
|
|
|
|
---
|
|
|
|
## System Requirements
|
|
|
|
Airflow ต้องการ resources ขั้นต่ำ:
|
|
- **RAM:** ≥ 4 GB
|
|
- **CPU:** ≥ 2 cores
|
|
- **Disk:** ≥ 10 GB
|
|
|
|
---
|
|
|
|
## Ingestion Layer (04-ingestion / Airbyte)
|
|
|
|
> **หมายเหตุ:** `04-ingestion/docker-compose.yml` ปัจจุบัน **commented out ทั้งหมด**
|
|
> Airbyte ถูก deploy แยกต่างหาก (ผ่าน `abctl` หรือ standalone)
|
|
|
|
### Airbyte ที่ระบุในแผน
|
|
|
|
| Source | ชนิดข้อมูล |
|
|
|--------|----------|
|
|
| SQL Server (HIS) | ข้อมูลผู้ป่วย, OPD |
|
|
| Oracle (Lab) | ผลตรวจทางห้องปฏิบัติการ |
|
|
| REST API | External data |
|
|
| Excel/CSV | Finance, รายงาน |
|
|
|
|
**Destination:** PostgreSQL `raw_data` schema
|
|
|
|
**Port:** `8030` (เมื่อ deploy แล้ว)
|
|
|
|
---
|
|
|
|
## Related
|
|
|
|
- [[00-Project-Overview]]
|
|
- [[01-Infrastructure]]
|
|
- [[03-API-Service]]
|
|
- [[08-Operations-Runbook]]
|