add superset airbyte setup and merge md file
This commit is contained in:
@@ -1,30 +1,235 @@
|
||||
# 04-ingestion: Airbyte Data Ingestion
|
||||
|
||||
Airbyte OSS for data ingestion and ETL (multi-container deployment).
|
||||
Airbyte OSS for data ingestion and ETL using `abctl` CLI tool.
|
||||
|
||||
## Services
|
||||
## Overview
|
||||
|
||||
- **airbyte-proxy**: Public entrypoint (UI/API gateway)
|
||||
- **server**: Airbyte backend
|
||||
- **worker**: Runs sync jobs and launches connector containers
|
||||
- **webapp**: Airbyte UI
|
||||
- **airbyte-temporal**: Workflow engine
|
||||
This deployment uses Airbyte's official `abctl` command-line tool for easy installation and management. It's configured to use shared infrastructure from `01-infra`:
|
||||
|
||||
## Run
|
||||
- **PostgreSQL**: Shared database for Airbyte metadata
|
||||
- **Nginx Proxy Manager**: Shared reverse proxy for external access
|
||||
- **Network**: `shared_data_network` for inter-service communication
|
||||
|
||||
**Note**: `abctl` deploys an internal `airbyte-proxy` container for routing between Airbyte microservices. External access is handled by the existing Nginx Proxy Manager in `01-infra` - no additional nginx needed in this folder.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
1. Docker Desktop installed and running
|
||||
2. Infrastructure services running (PostgreSQL from `01-infra`)
|
||||
3. Linux or macOS (for Windows, install abctl manually)
|
||||
|
||||
## Installation
|
||||
|
||||
### First Time Setup
|
||||
|
||||
Run the automated setup script:
|
||||
|
||||
```bash
|
||||
docker compose --env-file ../.env.global up -d
|
||||
cd 04-ingestion
|
||||
chmod +x *.sh
|
||||
./setup-airbyte.sh
|
||||
```
|
||||
|
||||
This script will:
|
||||
- Check prerequisites (Docker, PostgreSQL)
|
||||
- Install `abctl` if not present
|
||||
- Create required databases (airbyte, temporal, temporal_visibility)
|
||||
- Install Airbyte with custom configuration
|
||||
- Configure port mapping (8030 instead of default 8000)
|
||||
|
||||
Installation takes approximately 10-30 minutes depending on internet speed.
|
||||
|
||||
### Manual Installation
|
||||
|
||||
If you prefer manual installation:
|
||||
|
||||
1. Install abctl:
|
||||
```bash
|
||||
curl -LsfS https://get.airbyte.com | bash -
|
||||
```
|
||||
|
||||
2. Create databases:
|
||||
```bash
|
||||
docker exec postgres psql -U postgres -c "CREATE DATABASE airbyte;"
|
||||
docker exec postgres psql -U postgres -c "CREATE DATABASE temporal;"
|
||||
docker exec postgres psql -U postgres -c "CREATE DATABASE temporal_visibility;"
|
||||
```
|
||||
|
||||
3. Install Airbyte:
|
||||
```bash
|
||||
abctl local install --port 8030 --insecure-cookies
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Start Airbyte
|
||||
```bash
|
||||
./start-airbyte.sh
|
||||
```
|
||||
|
||||
### Stop Airbyte
|
||||
```bash
|
||||
./stop-airbyte.sh
|
||||
```
|
||||
|
||||
### Uninstall Airbyte
|
||||
```bash
|
||||
./uninstall-airbyte.sh
|
||||
```
|
||||
|
||||
## Access
|
||||
|
||||
- Web UI: http://localhost:8000
|
||||
- Configure in Nginx to route domain to `airbyte-proxy:8000`
|
||||
### Production (via Nginx Proxy Manager)
|
||||
- **Domain**: https://ai.sriphat.com/airbyte
|
||||
- **Authentication**: Configured via Nginx (see NGINX-SETUP.md)
|
||||
- **SSL**: Enabled with Let's Encrypt
|
||||
|
||||
## Note
|
||||
### Development/Local
|
||||
- **Localhost**: http://localhost:8030
|
||||
- **Direct IP**: http://[SERVER_IP]:8030
|
||||
- **No authentication** required for local access
|
||||
|
||||
This deployment pins Airbyte images to avoid `:latest` tag issues.
|
||||
## Configuration
|
||||
|
||||
## First Time Setup
|
||||
1. Create database: `docker exec postgres psql -U postgres -c "CREATE DATABASE airbyte;"`
|
||||
2. Access webapp and configure sources/destinations
|
||||
Edit `.airbyte.env` to customize:
|
||||
- `AIRBYTE_PORT`: External port (default: 8030)
|
||||
- `AIRBYTE_HOST`: Domain name for external access (ai.sriphat.com)
|
||||
- `LOW_RESOURCE_MODE`: **Enabled by default** for systems with <4 CPU cores
|
||||
- `AIRBYTE_VERSION`: Uses latest stable version
|
||||
- `ENABLE_BACKUP`: Automated backup configuration (enabled)
|
||||
- Database connection settings (uses shared PostgreSQL)
|
||||
|
||||
### Authentication
|
||||
|
||||
Airbyte does not natively support Keycloak. Authentication is handled via **Nginx Proxy Manager**:
|
||||
|
||||
1. **Recommended**: OAuth2 Proxy with Keycloak integration
|
||||
2. **Alternative**: Basic Authentication via nginx
|
||||
3. **Simple**: IP whitelist for internal access
|
||||
|
||||
See `NGINX-SETUP.md` for detailed configuration instructions.
|
||||
|
||||
## Database
|
||||
|
||||
Airbyte uses three databases in the shared PostgreSQL instance:
|
||||
- `airbyte`: Main application database
|
||||
- `temporal`: Workflow engine database
|
||||
- `temporal_visibility`: Temporal visibility database
|
||||
|
||||
All databases are automatically created during setup and backed up with the main PostgreSQL instance.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Services Deployed by abctl
|
||||
|
||||
- **airbyte-server**: Backend API and business logic
|
||||
- **airbyte-worker**: Executes sync jobs and manages connectors
|
||||
- **airbyte-webapp**: Web UI
|
||||
- **airbyte-temporal**: Workflow orchestration engine
|
||||
- **airbyte-proxy**: Nginx reverse proxy (public entrypoint)
|
||||
- **airbyte-cron**: Scheduled job runner
|
||||
- **airbyte-connector-builder-server**: Custom connector development
|
||||
- **airbyte-api-server**: REST API server
|
||||
|
||||
### Network Architecture
|
||||
|
||||
**All services connect to `shared_data_network`:**
|
||||
|
||||
```
|
||||
Internet → Nginx Proxy Manager (01-infra) → airbyte-proxy (internal) → Airbyte Services
|
||||
ai.sriphat.com/airbyte port 8000
|
||||
```
|
||||
|
||||
**Shared Resources from 01-infra:**
|
||||
- **Nginx Proxy Manager**: External reverse proxy (handles SSL, auth, routing)
|
||||
- **PostgreSQL**: Database server (airbyte, temporal, temporal_visibility)
|
||||
- **Keycloak**: Identity provider (optional, via OAuth2 Proxy)
|
||||
|
||||
**Airbyte Components (deployed by abctl):**
|
||||
- **airbyte-proxy**: Internal nginx for microservice routing (NOT for external access)
|
||||
- **Airbyte services**: server, worker, webapp, temporal, etc.
|
||||
|
||||
See `ARCHITECTURE.md` for detailed network flow diagram.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Installation Issues
|
||||
|
||||
**Error: "Readiness probe failed: HTTP probe failed with statuscode: 503"**
|
||||
- This is normal during installation. Allow installation to continue.
|
||||
- May need to allocate more resources to Docker Desktop.
|
||||
|
||||
**Error: "PostgreSQL container is not running"**
|
||||
- Start infrastructure first: `cd ../01-infra && docker compose --env-file ../.env.global up -d`
|
||||
|
||||
**Error: "abctl: command not found"**
|
||||
- The setup script will install it automatically on Linux/macOS
|
||||
- For Windows, download from: https://github.com/airbytehq/abctl/releases
|
||||
|
||||
### Runtime Issues
|
||||
|
||||
**Cannot access UI at localhost:8030**
|
||||
- Check if Airbyte is running: `abctl local status`
|
||||
- Check Docker containers: `docker ps | grep airbyte`
|
||||
- View logs: `abctl local logs`
|
||||
|
||||
**Sync jobs failing**
|
||||
- Check worker logs: `docker logs airbyte-worker`
|
||||
- Verify database connectivity
|
||||
- Ensure sufficient disk space and memory
|
||||
|
||||
**Low resource environments**
|
||||
- Enable low-resource mode in `.airbyte.env`: `LOW_RESOURCE_MODE=true`
|
||||
- Note: Connector Builder will be disabled in low-resource mode
|
||||
|
||||
## Upgrading
|
||||
|
||||
To upgrade Airbyte to a newer version:
|
||||
|
||||
```bash
|
||||
abctl local upgrade
|
||||
```
|
||||
|
||||
## Backup
|
||||
|
||||
### Automated Backups
|
||||
|
||||
The setup script creates `backup-airbyte.sh` which backs up all Airbyte databases:
|
||||
- `airbyte` - Main application database
|
||||
- `temporal` - Workflow engine database
|
||||
- `temporal_visibility` - Temporal visibility database
|
||||
|
||||
**Manual Backup:**
|
||||
```bash
|
||||
./backup-airbyte.sh
|
||||
```
|
||||
|
||||
**Automated Schedule:**
|
||||
Add to crontab for daily backups at 2 AM:
|
||||
```bash
|
||||
crontab -e
|
||||
# Add this line:
|
||||
0 2 * * * cd /path/to/04-ingestion && ./backup-airbyte.sh
|
||||
```
|
||||
|
||||
**Backup Location:**
|
||||
- Directory: `./backups/`
|
||||
- Format: `airbyte_backup_YYYYMMDD_HHMMSS.tar.gz`
|
||||
- Retention: Last 7 days (older backups auto-deleted)
|
||||
|
||||
**Restore from Backup:**
|
||||
```bash
|
||||
# Extract backup
|
||||
tar -xzf backups/airbyte_backup_20260227_020000.tar.gz
|
||||
|
||||
# Restore databases
|
||||
docker exec -i postgres psql -U postgres airbyte < airbyte_20260227_020000.sql
|
||||
docker exec -i postgres psql -U postgres temporal < temporal_20260227_020000.sql
|
||||
docker exec -i postgres psql -U postgres temporal_visibility < temporal_visibility_20260227_020000.sql
|
||||
```
|
||||
|
||||
## Additional Resources
|
||||
|
||||
- [Airbyte Documentation](https://docs.airbyte.com/)
|
||||
- [Airbyte OSS Quickstart](https://docs.airbyte.com/platform/using-airbyte/getting-started/oss-quickstart)
|
||||
- [abctl CLI Reference](https://docs.airbyte.com/platform/deploying-airbyte/abctl)
|
||||
|
||||
Reference in New Issue
Block a user