# Web Pages Documentation ## Overview The API service now includes web pages for easy access to all platform services and data management functionality. ## Pages ### 1. Landing Page (`/`) Main dashboard with navigation to all services: - **Supabase**: PostgreSQL database with REST API - **API Docs**: Interactive API documentation - **Airflow**: Workflow orchestration - **Airbyte**: Data integration and ETL - **Data Management**: File upload and processing (with submenu) - **DBT**: Data transformation - **Superset**: Business intelligence and visualization ### 2. Finance Excel Upload (`/data-management/finance`) Upload and process financial Excel files with the following features: **Upload Form:** - Drag-and-drop or click to select Excel files (.xlsx, .xls) - Optional description field - File validation and size display **Upload Processing:** - Files saved to `/data/uploads/` with timestamp prefix - Unique filename generation to prevent conflicts - Automatic Airflow job triggering (when configured) **Upload History:** - Real-time list of all uploaded files - Status tracking: pending, processing, success, error - Expandable accordion for logs and error details - Auto-refresh every 10 seconds ## API Endpoints ### Page Routes ``` GET / - Landing page GET /data-management/finance - Finance upload page POST /data-management/finance/upload - Upload file endpoint GET /data-management/finance/uploads - Get all uploads GET /data-management/finance/uploads/{upload_id} - Get specific upload ``` ### Upload Endpoint **POST** `/data-management/finance/upload` **Form Data:** - `file`: Excel file (.xlsx or .xls) - `description`: Optional description text **Response:** ```json { "success": true, "message": "File 'example.xlsx' uploaded successfully", "upload_id": "upload_20260306_174500", "filename": "20260306_174500_example.xlsx" } ``` **Error Response:** ```json { "detail": "Invalid file type. Only .xlsx and .xls files are allowed." } ``` ## File Storage Uploaded files are stored in: ``` /data/uploads/YYYYMMDD_HHMMSS_original_filename.xlsx ``` Example: ``` /data/uploads/20260306_174530_finance_report.xlsx ``` ## Airflow Integration The upload endpoint includes a placeholder for Airflow integration. To enable: 1. **Configure Airflow endpoint** in `app/routes/pages.py`: ```python airflow_url = "http://airflow-webserver:8080/api/v1/dags/{dag_id}/dagRuns" ``` 2. **Set DAG ID** for the finance processing job 3. **Uncomment the Airflow trigger code** in `trigger_airflow_job()` function 4. **Configure authentication** (username/password or API token) ### Example Airflow Integration ```python async def trigger_airflow_job(filepath: str, upload_id: str) -> str: import httpx airflow_url = "http://airflow-webserver:8080/api/v1/dags/finance_processing/dagRuns" headers = {"Content-Type": "application/json"} auth = ("airflow", "airflow") payload = { "conf": { "filepath": filepath, "upload_id": upload_id } } async with httpx.AsyncClient() as client: response = await client.post( airflow_url, json=payload, headers=headers, auth=auth ) response.raise_for_status() result = response.json() return result["dag_run_id"] ``` ## Upload Status Tracking Upload records include: - `id`: Unique identifier - `filename`: Original filename - `filepath`: Full path to saved file - `uploaded_at`: ISO timestamp - `description`: User-provided description - `status`: pending | processing | success | error - `job_id`: Airflow DAG run ID (when triggered) - `logs`: Error logs or processing information ## Database Storage Currently uses in-memory storage (`uploads_db` list). For production: 1. **Create database table:** ```sql CREATE TABLE uploads ( id VARCHAR(100) PRIMARY KEY, filename VARCHAR(255) NOT NULL, filepath VARCHAR(500) NOT NULL, uploaded_at TIMESTAMP NOT NULL, description TEXT, status VARCHAR(20) NOT NULL, job_id VARCHAR(100), logs TEXT, created_at TIMESTAMP DEFAULT NOW() ); ``` 2. **Replace in-memory storage** with SQLAlchemy model 3. **Add database queries** in route handlers ## Access URLs - **Local**: http://localhost:8040/apiservice/ - **Production**: https://ai.sriphat.com/apiservice/ ## Security Considerations 1. **File validation**: Only .xlsx and .xls files allowed 2. **Filename sanitization**: Spaces replaced with underscores 3. **Unique filenames**: Timestamp prefix prevents conflicts 4. **File size limits**: Configure in nginx/FastAPI if needed 5. **Authentication**: Add authentication middleware for production ## Future Enhancements 1. **Database persistence**: Replace in-memory storage 2. **File preview**: Show Excel data preview before processing 3. **Batch upload**: Support multiple file uploads 4. **Download processed files**: Allow downloading results 5. **User management**: Track uploads by user 6. **Email notifications**: Notify on processing completion 7. **Scheduled jobs**: Automatic periodic processing 8. **Data validation**: Validate Excel structure before processing ## Troubleshooting ### Upload fails with 500 error - Check `/data/uploads/` directory exists and is writable - Verify volume mount in docker-compose.yml - Check container logs: `docker logs apiservice` ### Files not persisting after container restart - Ensure volume mount is configured: `./data/uploads:/data/uploads` - Check file permissions on host directory ### Airflow job not triggering - Verify Airflow is accessible from container - Check network connectivity: `docker network inspect shared_data_network` - Verify Airflow credentials and DAG ID - Check logs in upload history ## Development To add new data management pages: 1. Create HTML template in `app/templates/` 2. Add route in `app/routes/pages.py` 3. Update landing page menu in `index.html` 4. Add submenu item if needed Example: ```python @router.get("/data-management/hr", response_class=HTMLResponse) async def hr_page(request: Request): return templates.TemplateResponse( "data_management_hr.html", {"request": request, "root_path": settings.ROOT_PATH} ) ```