add superset airbyte setup and merge md file

This commit is contained in:
jigoong
2026-03-02 21:58:51 +07:00
parent 550d926139
commit 6f6009d63e
15 changed files with 1220 additions and 19 deletions

View File

@@ -1,30 +1,235 @@
# 04-ingestion: Airbyte Data Ingestion
Airbyte OSS for data ingestion and ETL (multi-container deployment).
Airbyte OSS for data ingestion and ETL using `abctl` CLI tool.
## Services
## Overview
- **airbyte-proxy**: Public entrypoint (UI/API gateway)
- **server**: Airbyte backend
- **worker**: Runs sync jobs and launches connector containers
- **webapp**: Airbyte UI
- **airbyte-temporal**: Workflow engine
This deployment uses Airbyte's official `abctl` command-line tool for easy installation and management. It's configured to use shared infrastructure from `01-infra`:
## Run
- **PostgreSQL**: Shared database for Airbyte metadata
- **Nginx Proxy Manager**: Shared reverse proxy for external access
- **Network**: `shared_data_network` for inter-service communication
**Note**: `abctl` deploys an internal `airbyte-proxy` container for routing between Airbyte microservices. External access is handled by the existing Nginx Proxy Manager in `01-infra` - no additional nginx needed in this folder.
## Prerequisites
1. Docker Desktop installed and running
2. Infrastructure services running (PostgreSQL from `01-infra`)
3. Linux or macOS (for Windows, install abctl manually)
## Installation
### First Time Setup
Run the automated setup script:
```bash
docker compose --env-file ../.env.global up -d
cd 04-ingestion
chmod +x *.sh
./setup-airbyte.sh
```
This script will:
- Check prerequisites (Docker, PostgreSQL)
- Install `abctl` if not present
- Create required databases (airbyte, temporal, temporal_visibility)
- Install Airbyte with custom configuration
- Configure port mapping (8030 instead of default 8000)
Installation takes approximately 10-30 minutes depending on internet speed.
### Manual Installation
If you prefer manual installation:
1. Install abctl:
```bash
curl -LsfS https://get.airbyte.com | bash -
```
2. Create databases:
```bash
docker exec postgres psql -U postgres -c "CREATE DATABASE airbyte;"
docker exec postgres psql -U postgres -c "CREATE DATABASE temporal;"
docker exec postgres psql -U postgres -c "CREATE DATABASE temporal_visibility;"
```
3. Install Airbyte:
```bash
abctl local install --port 8030 --insecure-cookies
```
## Usage
### Start Airbyte
```bash
./start-airbyte.sh
```
### Stop Airbyte
```bash
./stop-airbyte.sh
```
### Uninstall Airbyte
```bash
./uninstall-airbyte.sh
```
## Access
- Web UI: http://localhost:8000
- Configure in Nginx to route domain to `airbyte-proxy:8000`
### Production (via Nginx Proxy Manager)
- **Domain**: https://ai.sriphat.com/airbyte
- **Authentication**: Configured via Nginx (see NGINX-SETUP.md)
- **SSL**: Enabled with Let's Encrypt
## Note
### Development/Local
- **Localhost**: http://localhost:8030
- **Direct IP**: http://[SERVER_IP]:8030
- **No authentication** required for local access
This deployment pins Airbyte images to avoid `:latest` tag issues.
## Configuration
## First Time Setup
1. Create database: `docker exec postgres psql -U postgres -c "CREATE DATABASE airbyte;"`
2. Access webapp and configure sources/destinations
Edit `.airbyte.env` to customize:
- `AIRBYTE_PORT`: External port (default: 8030)
- `AIRBYTE_HOST`: Domain name for external access (ai.sriphat.com)
- `LOW_RESOURCE_MODE`: **Enabled by default** for systems with <4 CPU cores
- `AIRBYTE_VERSION`: Uses latest stable version
- `ENABLE_BACKUP`: Automated backup configuration (enabled)
- Database connection settings (uses shared PostgreSQL)
### Authentication
Airbyte does not natively support Keycloak. Authentication is handled via **Nginx Proxy Manager**:
1. **Recommended**: OAuth2 Proxy with Keycloak integration
2. **Alternative**: Basic Authentication via nginx
3. **Simple**: IP whitelist for internal access
See `NGINX-SETUP.md` for detailed configuration instructions.
## Database
Airbyte uses three databases in the shared PostgreSQL instance:
- `airbyte`: Main application database
- `temporal`: Workflow engine database
- `temporal_visibility`: Temporal visibility database
All databases are automatically created during setup and backed up with the main PostgreSQL instance.
## Architecture
### Services Deployed by abctl
- **airbyte-server**: Backend API and business logic
- **airbyte-worker**: Executes sync jobs and manages connectors
- **airbyte-webapp**: Web UI
- **airbyte-temporal**: Workflow orchestration engine
- **airbyte-proxy**: Nginx reverse proxy (public entrypoint)
- **airbyte-cron**: Scheduled job runner
- **airbyte-connector-builder-server**: Custom connector development
- **airbyte-api-server**: REST API server
### Network Architecture
**All services connect to `shared_data_network`:**
```
Internet → Nginx Proxy Manager (01-infra) → airbyte-proxy (internal) → Airbyte Services
ai.sriphat.com/airbyte port 8000
```
**Shared Resources from 01-infra:**
- **Nginx Proxy Manager**: External reverse proxy (handles SSL, auth, routing)
- **PostgreSQL**: Database server (airbyte, temporal, temporal_visibility)
- **Keycloak**: Identity provider (optional, via OAuth2 Proxy)
**Airbyte Components (deployed by abctl):**
- **airbyte-proxy**: Internal nginx for microservice routing (NOT for external access)
- **Airbyte services**: server, worker, webapp, temporal, etc.
See `ARCHITECTURE.md` for detailed network flow diagram.
## Troubleshooting
### Installation Issues
**Error: "Readiness probe failed: HTTP probe failed with statuscode: 503"**
- This is normal during installation. Allow installation to continue.
- May need to allocate more resources to Docker Desktop.
**Error: "PostgreSQL container is not running"**
- Start infrastructure first: `cd ../01-infra && docker compose --env-file ../.env.global up -d`
**Error: "abctl: command not found"**
- The setup script will install it automatically on Linux/macOS
- For Windows, download from: https://github.com/airbytehq/abctl/releases
### Runtime Issues
**Cannot access UI at localhost:8030**
- Check if Airbyte is running: `abctl local status`
- Check Docker containers: `docker ps | grep airbyte`
- View logs: `abctl local logs`
**Sync jobs failing**
- Check worker logs: `docker logs airbyte-worker`
- Verify database connectivity
- Ensure sufficient disk space and memory
**Low resource environments**
- Enable low-resource mode in `.airbyte.env`: `LOW_RESOURCE_MODE=true`
- Note: Connector Builder will be disabled in low-resource mode
## Upgrading
To upgrade Airbyte to a newer version:
```bash
abctl local upgrade
```
## Backup
### Automated Backups
The setup script creates `backup-airbyte.sh` which backs up all Airbyte databases:
- `airbyte` - Main application database
- `temporal` - Workflow engine database
- `temporal_visibility` - Temporal visibility database
**Manual Backup:**
```bash
./backup-airbyte.sh
```
**Automated Schedule:**
Add to crontab for daily backups at 2 AM:
```bash
crontab -e
# Add this line:
0 2 * * * cd /path/to/04-ingestion && ./backup-airbyte.sh
```
**Backup Location:**
- Directory: `./backups/`
- Format: `airbyte_backup_YYYYMMDD_HHMMSS.tar.gz`
- Retention: Last 7 days (older backups auto-deleted)
**Restore from Backup:**
```bash
# Extract backup
tar -xzf backups/airbyte_backup_20260227_020000.tar.gz
# Restore databases
docker exec -i postgres psql -U postgres airbyte < airbyte_20260227_020000.sql
docker exec -i postgres psql -U postgres temporal < temporal_20260227_020000.sql
docker exec -i postgres psql -U postgres temporal_visibility < temporal_visibility_20260227_020000.sql
```
## Additional Resources
- [Airbyte Documentation](https://docs.airbyte.com/)
- [Airbyte OSS Quickstart](https://docs.airbyte.com/platform/using-airbyte/getting-started/oss-quickstart)
- [abctl CLI Reference](https://docs.airbyte.com/platform/deploying-airbyte/abctl)