# Edge GitOps - KServe on k3s with GPU GitOps setup for deploying ML models using KServe on a k3s cluster with GPU support (DGX Spark). ## Prerequisites - k3s cluster with GPU support - kubectl configured to access the cluster - Gitea instance for GitOps repository - FluxCD CLI installed ## Architecture ``` edge-gitops/ ├── clusters/ │ └── k3s-dgx/ │ ├── flux-system/ # FluxCD installation │ ├── gpu-support/ # NVIDIA GPU Operator │ ├── kserve/ # KServe installation │ └── apps/ # ML model deployments ├── apps/ # Reusable app manifests └── infrastructure/ # Base infrastructure ``` ## Setup Instructions ### 1. Bootstrap FluxCD ```bash flux bootstrap git \ --url=ssh://git@gitea.example.com/edge-gitops/edge-gitops.git \ --branch=main \ --path=clusters/k3s-dgx \ --components=source-controller,kustomize-controller,helm-controller,notification-controller ``` ### 2. Configure Gitea SSH Key Generate SSH key for FluxCD: ```bash ssh-keygen -t ed25519 -N "" -f flux-gitea-key ``` Add the public key to your Gitea repository as a deploy key. ### 3. Update Repository Configuration Edit `clusters/k3s-dgx/flux-system/gotk-sync.yaml` to match your Gitea URL: ```yaml url: ssh://git@your-gitea-instance.com/edge-gitops/edge-gitops.git ``` ### 4. Deploy the Stack Commit and push the changes: ```bash git add . git commit -m "Initial GitOps setup for KServe on k3s" git push origin main ``` FluxCD will automatically sync the changes to your cluster. ## Components ### GPU Support - NVIDIA GPU Operator (v23.9.1) - NVIDIA Device Plugin - DCGM Exporter for monitoring - GPU Node Feature Discovery ### KServe - KServe Core (v0.12.0) - GPU-enabled Serving Runtime - Istio Gateway for networking - Model Storage (PVC) ### Example Model - Huihui-granite-4.1-30b-abliterated (Hugging Face) - GPU-accelerated inference - REST API endpoint ## Usage ### Deploy a New Model 1. Create a new InferenceService in `clusters/k3s-dgx/apps/`: ```yaml apiVersion: serving.kserve.io/v1beta1 kind: InferenceService metadata: name: your-model namespace: kserve spec: predictor: model: modelFormat: name: huggingface storageUri: "hf://your-org/your-model" resources: limits: nvidia.com/gpu: "1" ``` 2. Commit and push changes ### Test the Model ```bash # Get the service URL kubectl get inferenceservice huihui-granite -n kserve # Test inference curl -X POST http://your-service-url/v1/models/huihui-granite:predict \ -H "Content-Type: application/json" \ -d '{"inputs": [{"name": "text", "shape": [1], "datatype": "BYTES", "data": ["Hello world"]}]}' ``` ## Monitoring Check FluxCD status: ```bash flux get all --all-namespaces ``` Check GPU status: ```bash kubectl get nodes -o jsonpath='{.items[*].status.allocatable.nvidia\.com/gpu}' ``` Check KServe services: ```bash kubectl get inferenceservices -n kserve ``` ## Troubleshooting ### GPU Not Available ```bash kubectl describe node | grep -A 5 nvidia.com/gpu ``` ### KServe Pods Not Starting ```bash kubectl logs -n kserve deployment/kserve-controller-manager kubectl get pods -n kserve ``` ### FluxCD Sync Issues ```bash flux reconcile kustomization flux-system --with-source flux logs ``` ## Customization ### GPU Resources Edit `clusters/k3s-dgx/apps/huihui-granite-inference.yaml` to adjust GPU allocation. ### Storage Modify `clusters/k3s-dgx/kserve/model-storage-pvc.yaml` for different storage requirements. ### Networking Update `clusters/k3s-dgx/kserve/istio-gateway.yaml` for custom ingress configuration.