Edge GitOps - KServe on k3s with GPU
GitOps setup for deploying ML models using KServe on a k3s cluster with GPU support (DGX Spark).
Prerequisites
- k3s cluster with GPU support
- kubectl configured to access the cluster
- Gitea instance for GitOps repository
- FluxCD CLI installed
Architecture
edge-gitops/
├── clusters/
│ └── k3s-dgx/
│ ├── flux-system/ # FluxCD installation
│ ├── gpu-support/ # NVIDIA GPU Operator
│ ├── kserve/ # KServe installation
│ └── apps/ # ML model deployments
├── apps/ # Reusable app manifests
└── infrastructure/ # Base infrastructure
Setup Instructions
1. Bootstrap FluxCD
flux bootstrap git \
--url=ssh://git@gitea.example.com/edge-gitops/edge-gitops.git \
--branch=main \
--path=clusters/k3s-dgx \
--components=source-controller,kustomize-controller,helm-controller,notification-controller
2. Configure Gitea SSH Key
Generate SSH key for FluxCD:
ssh-keygen -t ed25519 -N "" -f flux-gitea-key
Add the public key to your Gitea repository as a deploy key.
3. Update Repository Configuration
Edit clusters/k3s-dgx/flux-system/gotk-sync.yaml to match your Gitea URL:
url: ssh://git@your-gitea-instance.com/edge-gitops/edge-gitops.git
4. Deploy the Stack
Commit and push the changes:
git add .
git commit -m "Initial GitOps setup for KServe on k3s"
git push origin main
FluxCD will automatically sync the changes to your cluster.
Components
GPU Support
- NVIDIA GPU Operator (v23.9.1)
- NVIDIA Device Plugin
- DCGM Exporter for monitoring
- GPU Node Feature Discovery
KServe
- KServe Core (v0.12.0)
- GPU-enabled Serving Runtime
- Istio Gateway for networking
- Model Storage (PVC)
Example Model
- Huihui-granite-4.1-30b-abliterated (Hugging Face)
- GPU-accelerated inference
- REST API endpoint
Usage
Deploy a New Model
- Create a new InferenceService in
clusters/k3s-dgx/apps/:
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: your-model
namespace: kserve
spec:
predictor:
model:
modelFormat:
name: huggingface
storageUri: "hf://your-org/your-model"
resources:
limits:
nvidia.com/gpu: "1"
- Commit and push changes
Test the Model
# Get the service URL
kubectl get inferenceservice huihui-granite -n kserve
# Test inference
curl -X POST http://your-service-url/v1/models/huihui-granite:predict \
-H "Content-Type: application/json" \
-d '{"inputs": [{"name": "text", "shape": [1], "datatype": "BYTES", "data": ["Hello world"]}]}'
Monitoring
Check FluxCD status:
flux get all --all-namespaces
Check GPU status:
kubectl get nodes -o jsonpath='{.items[*].status.allocatable.nvidia\.com/gpu}'
Check KServe services:
kubectl get inferenceservices -n kserve
Troubleshooting
GPU Not Available
kubectl describe node | grep -A 5 nvidia.com/gpu
KServe Pods Not Starting
kubectl logs -n kserve deployment/kserve-controller-manager
kubectl get pods -n kserve
FluxCD Sync Issues
flux reconcile kustomization flux-system --with-source
flux logs
Customization
GPU Resources
Edit clusters/k3s-dgx/apps/huihui-granite-inference.yaml to adjust GPU allocation.
Storage
Modify clusters/k3s-dgx/kserve/model-storage-pvc.yaml for different storage requirements.
Networking
Update clusters/k3s-dgx/kserve/istio-gateway.yaml for custom ingress configuration.
Description
Languages
Shell
69%
Makefile
28.7%
Nix
2.3%