edge-gitops/README.md

# Edge GitOps - KServe on k3s with GPU

GitOps setup for deploying ML models using KServe on a k3s cluster with GPU support (DGX Spark).

## Prerequisites

- k3s cluster with GPU support
- kubectl configured to access the cluster
- Gitea instance for GitOps repository
- FluxCD CLI installed

## Architecture

```
edge-gitops/
├── clusters/
│   └── k3s-dgx/
│       ├── flux-system/          # FluxCD installation
│       ├── gpu-support/          # NVIDIA GPU Operator
│       ├── kserve/               # KServe installation
│       └── apps/                 # ML model deployments
├── apps/                        # Reusable app manifests
└── infrastructure/              # Base infrastructure
```

## Setup Instructions

### 1. Bootstrap FluxCD

```bash
flux bootstrap git \
  --url=ssh://git@gitea.example.com/edge-gitops/edge-gitops.git \
  --branch=main \
  --path=clusters/k3s-dgx \
  --components=source-controller,kustomize-controller,helm-controller,notification-controller
```

### 2. Configure Gitea SSH Key

Generate SSH key for FluxCD:
```bash
ssh-keygen -t ed25519 -N "" -f flux-gitea-key
```

Add the public key to your Gitea repository as a deploy key.

### 3. Update Repository Configuration

Edit `clusters/k3s-dgx/flux-system/gotk-sync.yaml` to match your Gitea URL:
```yaml
url: ssh://git@your-gitea-instance.com/edge-gitops/edge-gitops.git
```

### 4. Deploy the Stack

Commit and push the changes:
```bash
git add .
git commit -m "Initial GitOps setup for KServe on k3s"
git push origin main
```

FluxCD will automatically sync the changes to your cluster.

## Components

### GPU Support
- NVIDIA GPU Operator (v23.9.1)
- NVIDIA Device Plugin
- DCGM Exporter for monitoring
- GPU Node Feature Discovery

### KServe
- KServe Core (v0.12.0)
- GPU-enabled Serving Runtime
- Istio Gateway for networking
- Model Storage (PVC)

### Example Model
- Huihui-granite-4.1-30b-abliterated (Hugging Face)
- GPU-accelerated inference
- REST API endpoint

## Usage

### Deploy a New Model

1. Create a new InferenceService in `clusters/k3s-dgx/apps/`:
```yaml
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: your-model
  namespace: kserve
spec:
  predictor:
    model:
      modelFormat:
        name: huggingface
      storageUri: "hf://your-org/your-model"
      resources:
        limits:
          nvidia.com/gpu: "1"
```

2. Commit and push changes

### Test the Model

```bash
# Get the service URL
kubectl get inferenceservice huihui-granite -n kserve

# Test inference
curl -X POST http://your-service-url/v1/models/huihui-granite:predict \
  -H "Content-Type: application/json" \
  -d '{"inputs": [{"name": "text", "shape": [1], "datatype": "BYTES", "data": ["Hello world"]}]}'
```

## Monitoring

Check FluxCD status:
```bash
flux get all --all-namespaces
```

Check GPU status:
```bash
kubectl get nodes -o jsonpath='{.items[*].status.allocatable.nvidia\.com/gpu}'
```

Check KServe services:
```bash
kubectl get inferenceservices -n kserve
```

## Troubleshooting

### GPU Not Available
```bash
kubectl describe node | grep -A 5 nvidia.com/gpu
```

### KServe Pods Not Starting
```bash
kubectl logs -n kserve deployment/kserve-controller-manager
kubectl get pods -n kserve
```

### FluxCD Sync Issues
```bash
flux reconcile kustomization flux-system --with-source
flux logs
```

## Customization

### GPU Resources
Edit `clusters/k3s-dgx/apps/huihui-granite-inference.yaml` to adjust GPU allocation.

### Storage
Modify `clusters/k3s-dgx/kserve/model-storage-pvc.yaml` for different storage requirements.

### Networking
Update `clusters/k3s-dgx/kserve/istio-gateway.yaml` for custom ingress configuration.