Docker Kanvas and Kubernetes 1.33: Bridging Local Dev to AI/ML Production

Two infrastructure developments are worth paying attention to this month: Docker launched Kanvas, a platform designed to automate the transition from Docker Compose to Kubernetes, and Kubernetes 1.33 shipped with meaningful improvements to GPU resource management for AI/ML workloads.

Neither is a silver bullet, but both address real friction points that infrastructure teams deal with regularly.

Docker Kanvas: Compose → Kubernetes Without the Manual Work

The problem Kanvas targets is familiar: you run docker compose up locally and everything works. Then you deploy to Kubernetes and spend hours reconciling configuration differences. Compose and Kubernetes have fundamentally different design philosophies — single-host orchestration versus multi-node distributed systems — and the translation between them has always been manual, error-prone work.

Tools like kompose have existed for years, but they produce rough output that requires significant cleanup. Kanvas takes a more opinionated approach, generating production-ready Kubernetes artifacts and providing a visual interface for managing the configuration.

# Starting point: docker-compose.yml
services:
  api:
    build: ./api
    ports:
      - "3000:3000"
    environment:
      - DATABASE_URL=postgres://postgres:password@db:5432/myapp
      - NODE_ENV=production
    depends_on:
      db:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
 
  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_PASSWORD: password
      POSTGRES_DB: myapp
    volumes:
      - pgdata:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      retries: 5
 
volumes:
  pgdata:

Kanvas parses this and generates a Deployment, Service, ConfigMap, Secret, and PersistentVolumeClaim. The generated output is a starting point, not a final answer:

# Kanvas output: Deployment (abbreviated)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
  labels:
    app: api
spec:
  replicas: 2
  selector:
    matchLabels:
      app: api
  template:
    spec:
      containers:
        - name: api
          image: myregistry/api:latest
          ports:
            - containerPort: 3000
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: app-secrets
                  key: database-url
          livenessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 30
            periodSeconds: 30
          resources:
            requests:
              memory: "256Mi"
              cpu: "250m"
            limits:
              memory: "512Mi"
              cpu: "500m"

What you'll still need to configure manually: security context (non-root user), network policies, pod disruption budgets, and proper secret management (Sealed Secrets or External Secrets Operator rather than plain Kubernetes Secrets). Kanvas reduces the boilerplate substantially but doesn't replace Kubernetes expertise.

The more immediately impactful change for AI/ML teams is Kubernetes 1.33's GPU resource management improvements. The core addition is proper support for GPU sharing — multiple workloads splitting a single GPU rather than requiring exclusive access.

Under the traditional model, a small inference service that uses 10% of a GPU's capacity still reserved the entire device. For teams running dozens of inference endpoints, that's significant waste. Kubernetes 1.33 with NVIDIA's updated device plugin addresses this:

# Kubernetes 1.33: GPU time-slicing configuration
# Applied via ConfigMap to the NVIDIA device plugin
apiVersion: v1
kind: ConfigMap
metadata:
  name: nvidia-device-plugin-config
  namespace: kube-system
data:
  config.yaml: |
    version: v1
    sharing:
      timeSlicing:
        resources:
          - name: nvidia.com/gpu
            replicas: 4  # Expose each GPU as 4 virtual GPUs

# Inference pod requesting 1/4 of a GPU
apiVersion: v1
kind: Pod
metadata:
  name: text-embedder
spec:
  containers:
    - name: embedder
      image: myapp/embeddings:latest
      resources:
        limits:
          nvidia.com/gpu: 1  # Gets 1 virtual GPU = 1/4 physical GPU
        requests:
          memory: "4Gi"
          cpu: "2"

The 60% cost reduction figure cited in benchmark reports assumes you're currently running many small inference workloads on dedicated GPUs. Your actual savings depend on workload profiles — bursty batch jobs benefit less from time-slicing than steady low-utilization services.

GPU Observability with Standard Metrics

Kubernetes 1.33 also improves GPU metrics exposure via the DCGM Exporter, making GPU utilization data available in standard Prometheus format:

# ServiceMonitor for GPU metrics collection
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: dcgm-exporter
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: dcgm-exporter
  endpoints:
    - port: metrics
      interval: 15s

With this in place, you can identify underutilized GPUs and make data-driven scaling decisions:

# Find inference pods with GPU utilization below 20%
# (candidates for consolidation or spot instance migration)
avg by (pod, namespace) (
  DCGM_FI_DEV_GPU_UTIL{namespace=~"ml-.*"}
) < 20
 
# Alert when GPU memory pressure is high
avg by (node) (
  DCGM_FI_DEV_FB_USED / DCGM_FI_DEV_FB_TOTAL * 100
) > 85

Kanvas and the Helm vs. Kustomize Question

Docker Kanvas generates raw Kubernetes manifests. How you manage those manifests post-generation is a separate decision:

Kanvas + Kustomize works well for teams that want environment-specific overlays without a templating language. The generated manifests become the base layer; overlays handle staging vs. production differences:

k8s/
├── base/           # Kanvas output
│   ├── deployment.yaml
│   ├── service.yaml
│   └── kustomization.yaml
└── overlays/
    ├── staging/
    │   └── kustomization.yaml  # patch replicas, image tag
    └── production/
        └── kustomization.yaml  # patch resources, add HPA

Kanvas + Helm makes sense if you're packaging infrastructure for reuse across multiple projects or distributing it as a chart. The conversion from generated manifests to Helm templates adds overhead but pays off at scale.

For most internal applications, start with Kustomize — it's simpler and Kanvas's output fits naturally into the base/overlay model.

Practical Adoption Guidance

Consider Docker Kanvas if:

Your team uses Docker Compose for local development and Kubernetes for production, and the gap causes regular deployment friction
Kubernetes YAML is currently maintained by one or two people and hasn't scaled to the broader team

Leverage Kubernetes 1.33 GPU sharing if:

You're running multiple small-to-medium inference services and GPU costs are a visible budget line
Your workloads have predictable, steady utilization profiles (time-slicing works poorly for highly variable GPU demand)

Skip for now if:

Kanvas: your Kubernetes setup is already well-managed with Helm/Kustomize — adding another layer won't help
GPU sharing: your inference workloads are large and GPU-saturating — sharing adds overhead without benefit

Both tools address real infrastructure problems. The value depends almost entirely on whether the specific problem they solve matches your current pain points.

Docker Kanvas: Compose → Kubernetes Without the Manual Work

Kubernetes 1.33: GPU Sharing for AI Workloads

GPU Observability with Standard Metrics

Kanvas and the Helm vs. Kustomize Question

Practical Adoption Guidance