2026-05-05 11:19:54 -05:00
2026-05-06 11:59:15 -05:00
2026-05-05 11:31:59 -05:00
2026-05-05 11:15:49 -05:00
2026-05-05 11:15:49 -05:00
2026-05-05 11:15:49 -05:00
2026-05-05 11:15:49 -05:00
2026-05-05 11:15:49 -05:00
2026-05-05 11:38:28 -05:00

Edge GitOps - KServe on k3s with GPU

GitOps setup for deploying ML models using KServe on a k3s cluster with GPU support (DGX Spark).

Prerequisites

  • k3s cluster with GPU support
  • kubectl configured to access the cluster
  • Gitea instance for GitOps repository
  • FluxCD CLI installed

Architecture

edge-gitops/
├── clusters/
│   └── k3s-dgx/
│       ├── flux-system/          # FluxCD installation
│       ├── gpu-support/          # NVIDIA GPU Operator
│       ├── kserve/               # KServe installation
│       └── apps/                 # ML model deployments
├── apps/                        # Reusable app manifests
└── infrastructure/              # Base infrastructure

Setup Instructions

1. Bootstrap FluxCD

flux bootstrap git \
  --url=ssh://git@gitea.example.com/edge-gitops/edge-gitops.git \
  --branch=main \
  --path=clusters/k3s-dgx \
  --components=source-controller,kustomize-controller,helm-controller,notification-controller

2. Configure Gitea SSH Key

Generate SSH key for FluxCD:

ssh-keygen -t ed25519 -N "" -f flux-gitea-key

Add the public key to your Gitea repository as a deploy key.

3. Update Repository Configuration

Edit clusters/k3s-dgx/flux-system/gotk-sync.yaml to match your Gitea URL:

url: ssh://git@your-gitea-instance.com/edge-gitops/edge-gitops.git

4. Deploy the Stack

Commit and push the changes:

git add .
git commit -m "Initial GitOps setup for KServe on k3s"
git push origin main

FluxCD will automatically sync the changes to your cluster.

Components

GPU Support

  • NVIDIA GPU Operator (v23.9.1)
  • NVIDIA Device Plugin
  • DCGM Exporter for monitoring
  • GPU Node Feature Discovery

KServe

  • KServe Core (v0.12.0)
  • GPU-enabled Serving Runtime
  • Istio Gateway for networking
  • Model Storage (PVC)

Example Model

  • Huihui-granite-4.1-30b-abliterated (Hugging Face)
  • GPU-accelerated inference
  • REST API endpoint

Usage

Deploy a New Model

  1. Create a new InferenceService in clusters/k3s-dgx/apps/:
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: your-model
  namespace: kserve
spec:
  predictor:
    model:
      modelFormat:
        name: huggingface
      storageUri: "hf://your-org/your-model"
      resources:
        limits:
          nvidia.com/gpu: "1"
  1. Commit and push changes

Test the Model

# Get the service URL
kubectl get inferenceservice huihui-granite -n kserve

# Test inference
curl -X POST http://your-service-url/v1/models/huihui-granite:predict \
  -H "Content-Type: application/json" \
  -d '{"inputs": [{"name": "text", "shape": [1], "datatype": "BYTES", "data": ["Hello world"]}]}'

Monitoring

Check FluxCD status:

flux get all --all-namespaces

Check GPU status:

kubectl get nodes -o jsonpath='{.items[*].status.allocatable.nvidia\.com/gpu}'

Check KServe services:

kubectl get inferenceservices -n kserve

Troubleshooting

GPU Not Available

kubectl describe node | grep -A 5 nvidia.com/gpu

KServe Pods Not Starting

kubectl logs -n kserve deployment/kserve-controller-manager
kubectl get pods -n kserve

FluxCD Sync Issues

flux reconcile kustomization flux-system --with-source
flux logs

Customization

GPU Resources

Edit clusters/k3s-dgx/apps/huihui-granite-inference.yaml to adjust GPU allocation.

Storage

Modify clusters/k3s-dgx/kserve/model-storage-pvc.yaml for different storage requirements.

Networking

Update clusters/k3s-dgx/kserve/istio-gateway.yaml for custom ingress configuration.

Description
No description provided
Readme 300 KiB
Languages
Shell 69%
Makefile 28.7%
Nix 2.3%