Files
sriphat-dataplatform/03-apiservice/README-PAGES.md

6.2 KiB

Web Pages Documentation

Overview

The API service now includes web pages for easy access to all platform services and data management functionality.

Pages

1. Landing Page (/)

Main dashboard with navigation to all services:

  • Supabase: PostgreSQL database with REST API
  • API Docs: Interactive API documentation
  • Airflow: Workflow orchestration
  • Airbyte: Data integration and ETL
  • Data Management: File upload and processing (with submenu)
  • DBT: Data transformation
  • Superset: Business intelligence and visualization

2. Finance Excel Upload (/data-management/finance)

Upload and process financial Excel files with the following features:

Upload Form:

  • Drag-and-drop or click to select Excel files (.xlsx, .xls)
  • Optional description field
  • File validation and size display

Upload Processing:

  • Files saved to /data/uploads/ with timestamp prefix
  • Unique filename generation to prevent conflicts
  • Automatic Airflow job triggering (when configured)

Upload History:

  • Real-time list of all uploaded files
  • Status tracking: pending, processing, success, error
  • Expandable accordion for logs and error details
  • Auto-refresh every 10 seconds

API Endpoints

Page Routes

GET  /                              - Landing page
GET  /data-management/finance       - Finance upload page
POST /data-management/finance/upload - Upload file endpoint
GET  /data-management/finance/uploads - Get all uploads
GET  /data-management/finance/uploads/{upload_id} - Get specific upload

Upload Endpoint

POST /data-management/finance/upload

Form Data:

  • file: Excel file (.xlsx or .xls)
  • description: Optional description text

Response:

{
  "success": true,
  "message": "File 'example.xlsx' uploaded successfully",
  "upload_id": "upload_20260306_174500",
  "filename": "20260306_174500_example.xlsx"
}

Error Response:

{
  "detail": "Invalid file type. Only .xlsx and .xls files are allowed."
}

File Storage

Uploaded files are stored in:

/data/uploads/YYYYMMDD_HHMMSS_original_filename.xlsx

Example:

/data/uploads/20260306_174530_finance_report.xlsx

Airflow Integration

The upload endpoint includes a placeholder for Airflow integration. To enable:

  1. Configure Airflow endpoint in app/routes/pages.py:

    airflow_url = "http://airflow-webserver:8080/api/v1/dags/{dag_id}/dagRuns"
    
  2. Set DAG ID for the finance processing job

  3. Uncomment the Airflow trigger code in trigger_airflow_job() function

  4. Configure authentication (username/password or API token)

Example Airflow Integration

async def trigger_airflow_job(filepath: str, upload_id: str) -> str:
    import httpx
    
    airflow_url = "http://airflow-webserver:8080/api/v1/dags/finance_processing/dagRuns"
    headers = {"Content-Type": "application/json"}
    auth = ("airflow", "airflow")
    
    payload = {
        "conf": {
            "filepath": filepath,
            "upload_id": upload_id
        }
    }
    
    async with httpx.AsyncClient() as client:
        response = await client.post(
            airflow_url,
            json=payload,
            headers=headers,
            auth=auth
        )
        response.raise_for_status()
        result = response.json()
        return result["dag_run_id"]

Upload Status Tracking

Upload records include:

  • id: Unique identifier
  • filename: Original filename
  • filepath: Full path to saved file
  • uploaded_at: ISO timestamp
  • description: User-provided description
  • status: pending | processing | success | error
  • job_id: Airflow DAG run ID (when triggered)
  • logs: Error logs or processing information

Database Storage

Currently uses in-memory storage (uploads_db list). For production:

  1. Create database table:

    CREATE TABLE uploads (
        id VARCHAR(100) PRIMARY KEY,
        filename VARCHAR(255) NOT NULL,
        filepath VARCHAR(500) NOT NULL,
        uploaded_at TIMESTAMP NOT NULL,
        description TEXT,
        status VARCHAR(20) NOT NULL,
        job_id VARCHAR(100),
        logs TEXT,
        created_at TIMESTAMP DEFAULT NOW()
    );
    
  2. Replace in-memory storage with SQLAlchemy model

  3. Add database queries in route handlers

Access URLs

Security Considerations

  1. File validation: Only .xlsx and .xls files allowed
  2. Filename sanitization: Spaces replaced with underscores
  3. Unique filenames: Timestamp prefix prevents conflicts
  4. File size limits: Configure in nginx/FastAPI if needed
  5. Authentication: Add authentication middleware for production

Future Enhancements

  1. Database persistence: Replace in-memory storage
  2. File preview: Show Excel data preview before processing
  3. Batch upload: Support multiple file uploads
  4. Download processed files: Allow downloading results
  5. User management: Track uploads by user
  6. Email notifications: Notify on processing completion
  7. Scheduled jobs: Automatic periodic processing
  8. Data validation: Validate Excel structure before processing

Troubleshooting

Upload fails with 500 error

  • Check /data/uploads/ directory exists and is writable
  • Verify volume mount in docker-compose.yml
  • Check container logs: docker logs apiservice

Files not persisting after container restart

  • Ensure volume mount is configured: ./data/uploads:/data/uploads
  • Check file permissions on host directory

Airflow job not triggering

  • Verify Airflow is accessible from container
  • Check network connectivity: docker network inspect shared_data_network
  • Verify Airflow credentials and DAG ID
  • Check logs in upload history

Development

To add new data management pages:

  1. Create HTML template in app/templates/
  2. Add route in app/routes/pages.py
  3. Update landing page menu in index.html
  4. Add submenu item if needed

Example:

@router.get("/data-management/hr", response_class=HTMLResponse)
async def hr_page(request: Request):
    return templates.TemplateResponse(
        "data_management_hr.html",
        {"request": request, "root_path": settings.ROOT_PATH}
    )