Skip to content

CI/CD & DevOps

This document outlines the current state and roadmap for CI/CD pipelines, deployment automation, and developer experience tools.


Current CI/CD Status

Pipeline Infrastructure:

  • GitHub Actions - 12 workflow files orchestrating build, test, deploy
  • Automated Testing - C# unit/integration tests, frontend tests with coverage reporting
  • Automated Deployment - .NET Aspire deployment to Azure Container Apps
  • Release Automation - Automatic GitHub Releases with artifacts and API changelogs
  • Security Scanning - Weekly dependency vulnerability scans (.NET + npm)
  • Dependency Management - Dependabot for NuGet packages (npm not configured)

Workflows Overview:

WorkflowTriggerPurposeStatus
ci.ymlPush to mainFull CI pipeline: build, test, package, release✅ Operational
pr-validation.ymlPull requestsPR validation with OpenAPI diff, tests, linting✅ Operational
deploy-release.ymlRelease/manualDeploy to Azure (staging/production)✅ Operational
utl-security-scan.ymlWeekly/push/PRDependency vulnerability scanning✅ Operational
_dotnet-build.ymlReusableShared .NET build workflow✅ Operational
_dotnet-test.ymlReusableShared .NET test workflow✅ Operational
pr-labeler.ymlPR eventsAuto-label PRs by file paths✅ Operational
todo-to-issue.ymlPushConvert TODO comments to GitHub issues✅ Operational
utl-delete-old-workflow-runs.ymlScheduleCleanup old workflow runs✅ Operational
utl-sync-develop.ymlUtilitySync develop branch with main✅ Operational
publish-release.ymlUtilityPublish release to GitHub✅ Operational

CI Pipeline (ci.yml) - Main Branch

Build Jobs (Parallel Execution)

  1. Platform API Build (~10 min):
    • .NET 10 build with Nerdbank.GitVersioning
    • OpenAPI spec generation validation
    • Artifacts: platform build output, OpenAPI spec
  2. Edge Gateway Build & Test (~10 min):
    • Gateway service build
    • Gateway unit + behavior tests
  3. Frontend Build & Test (~10 min):
    • platform-client-js build (TypeScript SDK)
    • iot-frontend type check, unit tests (Vitest), build
    • Test coverage upload
  4. Documentation Build (~10 min):
    • VitePress documentation site build
    • Linting with continue-on-error

Test Jobs (Sequential after Platform Build)

  • Platform Unit Tests (~15 min): Full platform test suite
  • Platform Integration Tests (Parallel, ~20 min each):
    • Data layer (Testcontainers.PostgreSql)
    • Functions (Azure Functions)
    • Migration service

Code Generation & Validation

  • TypeScript Client Generation:
    • Auto-generates TypeScript SDK from OpenAPI spec
    • Checks if generated client is in sync with repository
    • Creates PR if out of sync (automated/update-api-clients branch)
  • npm Package Build:
    • Versions package with Nerdbank.GitVersioning
    • Builds and packs @voltimax/platform-client-js
    • Uploads .tgz to GitHub Releases

API Changelog Generation

  • OpenAPI Spec Comparison:
    • Compares current spec with previous release tag
    • Uses oasdiff to detect breaking changes
    • Generates markdown changelog and breaking changes report
    • Attaches to GitHub Release

Release Creation

  • GitHub Release (triggered on success):
    • Tags version (e.g., v1.0.0-beta.1) using GitVersioning
    • Creates draft release with:
      • API changelog (breaking changes + full diff)
      • OpenAPI spec JSON files
      • npm package (.tgz)
      • Frontend dist (. tar.gz)
      • Docs dist (.tar.gz)
    • Publishes release automatically

Deployment

  • Staging Deployment (after release):
    • Calls deploy-release.yml workflow
    • Deploys to Azure staging environment

Total CI Time: ~25-30 minutes (with parallel execution)


PR Validation (pr-validation.yml)

Change Detection

  • Uses dorny/paths-filter to detect changed files
  • Selective job execution based on changed paths:
    • Platform changes > run platform jobs
    • Gateway changes > run gateway jobs
    • Frontend changes > run frontend jobs
    • OpenAPI changes > run client validation

Linting & Code Quality

  • dotnet format check on changed .NET files (fail on violations)
  • ESLint for frontend TypeScript/Vue files
  • Enforces formatting standards before merge

Platform Validation

  • Build Platform API with OpenAPI spec generation
  • Verify OpenAPI spec is up to date (fail if uncommitted changes)
  • Compare OpenAPI spec with main branch using oasdiff
  • Comment PR with breaking changes and API diff

Client Validation

  • TypeScript Client:
    • Regenerate from OpenAPI spec
    • Verify generated client is in sync (fail if not)
    • Instructions in PR summary to fix sync issues
  • .NET Client:
    • Regenerate using Refitter
    • Verify generated client is in sync
    • Run client tests

Testing

  • Platform unit tests
  • Platform integration tests (Data, Functions, Migration) - sequential after unit tests
  • Gateway unit + behavior tests
  • Frontend unit tests with coverage

Test Results & Coverage

  • Aggregates test results from all jobs
  • Combines coverage reports using ReportGenerator
  • Publishes unified test results to PR (EnricoMi/publish-unit-test-result-action)
  • Posts code coverage summary as PR comment (irongut/CodeCoverageSummary)
  • Uploads combined coverage report artifacts

Total PR Validation Time: ~20-25 minutes (with selective execution)


Deployment (deploy-release.yml)

Triggers

  • Workflow call from CI pipeline (staging)
  • GitHub Release published (production)
  • Manual workflow dispatch

Azure Deployment (Platform + Backend)

  • Uses .NET Aspire CLI for deployment
  • Azure Container Apps with federated credentials (no secrets)
  • Deploys:
    • Migration service
    • Platform API
    • Azure Functions (pricing, telemetry)
  • Configuration:
    • Elastic APM enabled
    • Elasticsearch namespacing by environment
    • CORS origins configured
    • Keycloak auth server URL
    • ENTSO-E pricing for Netherlands (10YNL----------L)

Frontend Deployment

  • Downloads frontend tarball from GitHub Release
  • Deploys to Azure Static Web Apps
  • Retrieves deployment token from Azure Key Vault (no secrets in repo)

Documentation Deployment

  • Downloads docs tarball from GitHub Release
  • Deploys to separate Azure Static Web Apps instance

Environments

  • Staging: Auto-deploy on main branch push
  • Production: Manual approval required (GitHub environment protection)

Total Deployment Time: ~25 minutes


Security & Compliance

Dependency Scanning (utl-security-scan.yml)

  • Schedule: Weekly on Mondays at 2 AM UTC
  • Triggers: Push to main/develop, PRs, manual dispatch
  • .NET Vulnerability Scan:
    • dotnet list package --vulnerable --include-transitive
    • Checks transitive dependencies
    • Fails CI if vulnerabilities found
    • Also checks for deprecated packages
  • npm Vulnerability Scan:
    • Runs npm audit --audit-level=moderate on:
      • iot-frontend
      • platform-client-js
    • Uploads JSON audit results as artifacts
    • Fails CI if moderate+ vulnerabilities found

Dependabot

  • ✅ Configured for NuGet packages (weekly)
  • NOT configured for npm packages (missing npm ecosystem)

API Security

  • OpenAPI spec diffing catches accidental breaking changes
  • Breaking changes highlighted in PR comments
  • Semantic versioning enforced via Nerdbank.GitVersioning

Docker & Containerization

Dockerfiles

  • Edge Gateway (src/Voltimax.Edge.Gateway.Service/Dockerfile):
    • Multi-stage: base > build > publish > final
    • ASP.NET Core 9.0 runtime
    • Non-root user execution
    • Port 8080 exposed
  • Frontend (src/@voltimax/iot-frontend/Dockerfile):
    • Multi-stage: Node.js 24 builder > YARP 2.3-preview
    • Serves static build via YARP reverse proxy
  • AOT Gateway (src/Voltimax.Edge.Gateway.Service/Dockerfile.aot):
    • Native AOT compilation for smaller binaries and faster startup

Container Registry

  • No explicit container registry push in current workflows
  • .NET Aspire handles container builds during deployment
  • Gateway containers not automatically published to registry

Infrastructure as Code

Bicep Templates

  • ⚠️ Limited coverage: Only 2 Bicep files found
    • infra/shared/main.bicep
    • infra/shared/modules/postgresql.bicep
  • Missing:
    • Azure Container Apps definitions
    • Azure Functions definitions
    • Azure Service Bus definitions
    • Elasticsearch/Keycloak infrastructure
    • Networking/VNet configurations
    • Azure Static Web Apps definitions

Deployment Approach

  • ✅ .NET Aspire CLI handles infrastructure provisioning
  • ⚠️ Limited visibility into infrastructure state (no separate IaC repo)
  • ⚠️ Manual configuration for some resources (CORS, Keycloak URLs hardcoded)

Performance & Optimization

Build Optimization

  • ✅ NuGet package caching (GitHub Actions cache)
  • ✅ npm cache (custom action: .github/actions/setup-node-cached)
  • ✅ .NET SDK caching (custom action: .github/actions/setup-dotnet-cached)
  • ✅ Parallel job execution where possible
  • ✅ Build artifact reuse (platform build output shared with test jobs)
  • ✅ Selective job execution in PRs (paths-filter)

Workflow Concurrency

  • ✅ Cancel-in-progress for PR validation (no wasted CI time)
  • ✅ Cancel-in-progress for CI pipeline (main branch)

Artifact Retention

  • Build artifacts: 1 day (short-lived, for test reuse)
  • Test results: 7 days
  • Coverage reports: 7-30 days
  • Release assets: 30 days
  • npm packages: 30 days

Gaps & Missing Features

FeatureStatusPriorityDescription
E2E tests in CI❌ Not ImplementedHighPlaywright tests not executed in CI (only unit/integration)
Docker image publishing❌ Not ImplementedHighGateway/Frontend containers not pushed to registry
Production deployment⚠️ ManualMediumNo automated production deployment (requires manual approval)
Performance benchmarks❌ Not ImplementedMediumNo automated performance testing or regression detection
Load testing❌ Not ImplementedMediumNo load/stress testing in CI
Deployment health checks❌ Not ImplementedMediumNo automated smoke tests post-deployment
Rollback automation❌ Not ImplementedMediumManual rollback required on deployment failures
Blue-green deployments❌ Not ImplementedLowZero-downtime deployments not automated
Canary releases❌ Not ImplementedLowGradual rollouts not supported
Dependabot for npm❌ Not ConfiguredMediumnpm packages not auto-updated (only NuGet)
IaC completeness⚠️ PartialMediumBicep templates incomplete, relying on Aspire CLI
Multi-region deployment❌ Not ImplementedLowSingle-region deployment only (North Europe)
Secrets rotation❌ Not ImplementedLowManual secret rotation (Key Vault secrets)
Cost monitoring❌ Not ImplementedLowNo automated cost alerts or budget tracking

Recommendations

High Priority

  1. Add E2E tests to CI pipeline:
    • Run Playwright tests in ci.yml after frontend build
    • Use Testcontainers for Keycloak + PostgreSQL + Elasticsearch
    • Generate E2E test report in PR comments
  2. Publish Docker images:
    • Push Gateway images to GitHub Container Registry
    • Version images with GitVersioning tags
    • Enable multi-arch builds (linux/amd64, linux/arm64)
  3. Configure Renovate for npm:
    • Replace/augment Dependabot with Renovate
    • Auto-merge patch updates
    • Grouped dependency updates
    • Better monorepo support for npm workspaces (platform-client-js, iot-frontend)
  4. Performance regression testing with k6:
    • .NET Aspire k6 integration for load testing
    • Baseline API response times
    • Detect performance regressions in PRs
    • Query Elasticsearch metrics for validation
    • Target: <200ms P95 for simple queries, <500ms for complex aggregations

Medium Priority

  1. Add deployment smoke tests:

    • Health check endpoints after Azure deployment
    • Basic API tests against staging environment
    • Fail deployment if smoke tests fail
  2. Static code analysis (SonarCloud):

    • Integrate with GitHub Actions PR validation
    • Quality gates in PR validation
    • Track technical debt metrics over time
    • Security vulnerability detection beyond dependency scanning

Low Priority

  1. Blue-green deployments:

    • Use Azure Container Apps revisions
    • Traffic splitting for gradual rollouts
  2. Multi-environment management:

    • Add dev/qa environments
    • Environment-specific configuration management

Developer Experience Tools

Remote Development & Debugging

  • Telepresence / Mirrord:
    • Local development against remote Azure environment
    • Debug gateway locally while connected to staging Keycloak/PostgreSQL/Elasticsearch
    • Faster iteration than deploy-test-debug cycle
    • Bridge local code with remote services
    • Test locally with production-like data
  • Teleport:
    • Secure access to edge gateways (SSH, HTTPS, database tunnels)
    • Certificate-based authentication (no password management)
    • Audit logging for all gateway access
    • Alternative to VPN for field gateway debugging
    • Session recording for troubleshooting and training
    • RBAC for team access to production gateways
    • Integrates with SSO (Keycloak)
    • Use case: Debug gateway logs on customer site without VPN

Code Quality & Testing

  • NCrunch (Visual Studio/Rider):
    • Continuous test runner with real-time feedback
    • Coverage visualization inline in editor
    • Productivity boost for TDD with 65+ test files
    • Identifies slow tests automatically
  • MQTT Explorer:
    • Essential for debugging gateway telemetry publishing
    • Visualize topic structure (gateway-{id}/telemetry)
    • Test Service Bus > MQTT migration path (when gateway count > 50)
    • Monitor real-time telemetry from development gateways

Protocol Development

  • Modbus Slave Simulator:
    • Test Modbus TCP/RTU without physical hardware
    • Simulate CNTE Battery, Growatt Inverter, Peblar chargers
    • Faster development iteration than real devices
    • Create test scenarios (fault conditions, edge cases)
  • Wireshark:
    • OCPP 2.1 WebSocket debugging (charge station messages)
    • Modbus protocol validation
    • Essential for protocol compliance testing (OCPP conformance)
    • Capture and replay protocol sequences

API Development

  • Stoplight Studio:
    • OpenAPI spec editor with visual designer
    • Generate mock servers from OpenAPI specs
    • API documentation with try-it-out (complement to Scalar)
    • Validate spec changes before committing
  • Hoppscotch / Insomnia:
    • API testing with OpenAPI import (auto-generate from your specs)
    • Request collections for common workflows
    • Export to Playwright/k6 for automation
    • Team collaboration with shared collections

Project Management

  • Linear:
    • Modern issue tracking (better UX than GitHub Issues)
    • Integrates with GitHub PRs and commits
    • Roadmap visualization
    • Cycle planning (matches phase-based roadmap structure)
    • Automatic status updates from PR merges

Collaboration

  • Miro / Excalidraw:
    • System architecture diagrams (C4 model for docs)
    • Collaborative whiteboarding for design sessions
    • Real-time collaboration during architecture reviews
    • Export to documentation

Success Metrics

Current CI/CD Health

  • ✅ Build success rate: High (no data available)
  • ✅ PR validation time: ~20-25 minutes (acceptable)
  • ✅ Main branch CI time: ~25-30 minutes (acceptable)
  • ✅ Deployment time: ~25 minutes (acceptable)
  • ⚠️ Test coverage: Moderate (see Technical Debt & Testing)
  • ❌ E2E test coverage: 0% (not executed)

Target Metrics

  • Build success rate: >95%
  • PR validation time: <15 minutes
  • Main branch CI time: <20 minutes
  • Deployment time: <15 minutes
  • Test coverage: >80% for critical paths
  • E2E test coverage: 100% of critical user journeys
  • Mean time to deployment (MTTD): <30 minutes from commit
  • Mean time to recovery (MTTR): <15 minutes from incident detection