CI/CD & DevOps
This document outlines the current state and roadmap for CI/CD pipelines, deployment automation, and developer experience tools.
Current CI/CD Status
Pipeline Infrastructure:
- ✅ GitHub Actions - 12 workflow files orchestrating build, test, deploy
- ✅ Automated Testing - C# unit/integration tests, frontend tests with coverage reporting
- ✅ Automated Deployment - .NET Aspire deployment to Azure Container Apps
- ✅ Release Automation - Automatic GitHub Releases with artifacts and API changelogs
- ✅ Security Scanning - Weekly dependency vulnerability scans (.NET + npm)
- ✅ Dependency Management - Dependabot for NuGet packages (npm not configured)
Workflows Overview:
| Workflow | Trigger | Purpose | Status |
|---|---|---|---|
ci.yml | Push to main | Full CI pipeline: build, test, package, release | ✅ Operational |
pr-validation.yml | Pull requests | PR validation with OpenAPI diff, tests, linting | ✅ Operational |
deploy-release.yml | Release/manual | Deploy to Azure (staging/production) | ✅ Operational |
utl-security-scan.yml | Weekly/push/PR | Dependency vulnerability scanning | ✅ Operational |
_dotnet-build.yml | Reusable | Shared .NET build workflow | ✅ Operational |
_dotnet-test.yml | Reusable | Shared .NET test workflow | ✅ Operational |
pr-labeler.yml | PR events | Auto-label PRs by file paths | ✅ Operational |
todo-to-issue.yml | Push | Convert TODO comments to GitHub issues | ✅ Operational |
utl-delete-old-workflow-runs.yml | Schedule | Cleanup old workflow runs | ✅ Operational |
utl-sync-develop.yml | Utility | Sync develop branch with main | ✅ Operational |
publish-release.yml | Utility | Publish release to GitHub | ✅ Operational |
CI Pipeline (ci.yml) - Main Branch
Build Jobs (Parallel Execution)
- Platform API Build (~10 min):
- .NET 10 build with Nerdbank.GitVersioning
- OpenAPI spec generation validation
- Artifacts: platform build output, OpenAPI spec
- Edge Gateway Build & Test (~10 min):
- Gateway service build
- Gateway unit + behavior tests
- Frontend Build & Test (~10 min):
- platform-client-js build (TypeScript SDK)
- iot-frontend type check, unit tests (Vitest), build
- Test coverage upload
- Documentation Build (~10 min):
- VitePress documentation site build
- Linting with continue-on-error
Test Jobs (Sequential after Platform Build)
- Platform Unit Tests (~15 min): Full platform test suite
- Platform Integration Tests (Parallel, ~20 min each):
- Data layer (Testcontainers.PostgreSql)
- Functions (Azure Functions)
- Migration service
Code Generation & Validation
- TypeScript Client Generation:
- Auto-generates TypeScript SDK from OpenAPI spec
- Checks if generated client is in sync with repository
- Creates PR if out of sync (automated/update-api-clients branch)
- npm Package Build:
- Versions package with Nerdbank.GitVersioning
- Builds and packs @voltimax/platform-client-js
- Uploads .tgz to GitHub Releases
API Changelog Generation
- OpenAPI Spec Comparison:
- Compares current spec with previous release tag
- Uses
oasdiffto detect breaking changes - Generates markdown changelog and breaking changes report
- Attaches to GitHub Release
Release Creation
- GitHub Release (triggered on success):
- Tags version (e.g., v1.0.0-beta.1) using GitVersioning
- Creates draft release with:
- API changelog (breaking changes + full diff)
- OpenAPI spec JSON files
- npm package (.tgz)
- Frontend dist (. tar.gz)
- Docs dist (.tar.gz)
- Publishes release automatically
Deployment
- Staging Deployment (after release):
- Calls
deploy-release.ymlworkflow - Deploys to Azure staging environment
- Calls
Total CI Time: ~25-30 minutes (with parallel execution)
PR Validation (pr-validation.yml)
Change Detection
- Uses
dorny/paths-filterto detect changed files - Selective job execution based on changed paths:
- Platform changes > run platform jobs
- Gateway changes > run gateway jobs
- Frontend changes > run frontend jobs
- OpenAPI changes > run client validation
Linting & Code Quality
- dotnet format check on changed .NET files (fail on violations)
- ESLint for frontend TypeScript/Vue files
- Enforces formatting standards before merge
Platform Validation
- Build Platform API with OpenAPI spec generation
- Verify OpenAPI spec is up to date (fail if uncommitted changes)
- Compare OpenAPI spec with main branch using
oasdiff - Comment PR with breaking changes and API diff
Client Validation
- TypeScript Client:
- Regenerate from OpenAPI spec
- Verify generated client is in sync (fail if not)
- Instructions in PR summary to fix sync issues
- .NET Client:
- Regenerate using Refitter
- Verify generated client is in sync
- Run client tests
Testing
- Platform unit tests
- Platform integration tests (Data, Functions, Migration) - sequential after unit tests
- Gateway unit + behavior tests
- Frontend unit tests with coverage
Test Results & Coverage
- Aggregates test results from all jobs
- Combines coverage reports using ReportGenerator
- Publishes unified test results to PR (EnricoMi/publish-unit-test-result-action)
- Posts code coverage summary as PR comment (irongut/CodeCoverageSummary)
- Uploads combined coverage report artifacts
Total PR Validation Time: ~20-25 minutes (with selective execution)
Deployment (deploy-release.yml)
Triggers
- Workflow call from CI pipeline (staging)
- GitHub Release published (production)
- Manual workflow dispatch
Azure Deployment (Platform + Backend)
- Uses .NET Aspire CLI for deployment
- Azure Container Apps with federated credentials (no secrets)
- Deploys:
- Migration service
- Platform API
- Azure Functions (pricing, telemetry)
- Configuration:
- Elastic APM enabled
- Elasticsearch namespacing by environment
- CORS origins configured
- Keycloak auth server URL
- ENTSO-E pricing for Netherlands (10YNL----------L)
Frontend Deployment
- Downloads frontend tarball from GitHub Release
- Deploys to Azure Static Web Apps
- Retrieves deployment token from Azure Key Vault (no secrets in repo)
Documentation Deployment
- Downloads docs tarball from GitHub Release
- Deploys to separate Azure Static Web Apps instance
Environments
- Staging: Auto-deploy on main branch push
- Production: Manual approval required (GitHub environment protection)
Total Deployment Time: ~25 minutes
Security & Compliance
Dependency Scanning (utl-security-scan.yml)
- Schedule: Weekly on Mondays at 2 AM UTC
- Triggers: Push to main/develop, PRs, manual dispatch
- .NET Vulnerability Scan:
dotnet list package --vulnerable --include-transitive- Checks transitive dependencies
- Fails CI if vulnerabilities found
- Also checks for deprecated packages
- npm Vulnerability Scan:
- Runs
npm audit --audit-level=moderateon:- iot-frontend
- platform-client-js
- Uploads JSON audit results as artifacts
- Fails CI if moderate+ vulnerabilities found
- Runs
Dependabot
- ✅ Configured for NuGet packages (weekly)
- ❌ NOT configured for npm packages (missing npm ecosystem)
API Security
- OpenAPI spec diffing catches accidental breaking changes
- Breaking changes highlighted in PR comments
- Semantic versioning enforced via Nerdbank.GitVersioning
Docker & Containerization
Dockerfiles
- ✅ Edge Gateway (
src/Voltimax.Edge.Gateway.Service/Dockerfile):- Multi-stage: base > build > publish > final
- ASP.NET Core 9.0 runtime
- Non-root user execution
- Port 8080 exposed
- ✅ Frontend (
src/@voltimax/iot-frontend/Dockerfile):- Multi-stage: Node.js 24 builder > YARP 2.3-preview
- Serves static build via YARP reverse proxy
- ✅ AOT Gateway (
src/Voltimax.Edge.Gateway.Service/Dockerfile.aot):- Native AOT compilation for smaller binaries and faster startup
Container Registry
- No explicit container registry push in current workflows
- .NET Aspire handles container builds during deployment
- Gateway containers not automatically published to registry
Infrastructure as Code
Bicep Templates
- ⚠️ Limited coverage: Only 2 Bicep files found
infra/shared/main.bicepinfra/shared/modules/postgresql.bicep
- ❌ Missing:
- Azure Container Apps definitions
- Azure Functions definitions
- Azure Service Bus definitions
- Elasticsearch/Keycloak infrastructure
- Networking/VNet configurations
- Azure Static Web Apps definitions
Deployment Approach
- ✅ .NET Aspire CLI handles infrastructure provisioning
- ⚠️ Limited visibility into infrastructure state (no separate IaC repo)
- ⚠️ Manual configuration for some resources (CORS, Keycloak URLs hardcoded)
Performance & Optimization
Build Optimization
- ✅ NuGet package caching (GitHub Actions cache)
- ✅ npm cache (custom action:
.github/actions/setup-node-cached) - ✅ .NET SDK caching (custom action:
.github/actions/setup-dotnet-cached) - ✅ Parallel job execution where possible
- ✅ Build artifact reuse (platform build output shared with test jobs)
- ✅ Selective job execution in PRs (paths-filter)
Workflow Concurrency
- ✅ Cancel-in-progress for PR validation (no wasted CI time)
- ✅ Cancel-in-progress for CI pipeline (main branch)
Artifact Retention
- Build artifacts: 1 day (short-lived, for test reuse)
- Test results: 7 days
- Coverage reports: 7-30 days
- Release assets: 30 days
- npm packages: 30 days
Gaps & Missing Features
| Feature | Status | Priority | Description |
|---|---|---|---|
| E2E tests in CI | ❌ Not Implemented | High | Playwright tests not executed in CI (only unit/integration) |
| Docker image publishing | ❌ Not Implemented | High | Gateway/Frontend containers not pushed to registry |
| Production deployment | ⚠️ Manual | Medium | No automated production deployment (requires manual approval) |
| Performance benchmarks | ❌ Not Implemented | Medium | No automated performance testing or regression detection |
| Load testing | ❌ Not Implemented | Medium | No load/stress testing in CI |
| Deployment health checks | ❌ Not Implemented | Medium | No automated smoke tests post-deployment |
| Rollback automation | ❌ Not Implemented | Medium | Manual rollback required on deployment failures |
| Blue-green deployments | ❌ Not Implemented | Low | Zero-downtime deployments not automated |
| Canary releases | ❌ Not Implemented | Low | Gradual rollouts not supported |
| Dependabot for npm | ❌ Not Configured | Medium | npm packages not auto-updated (only NuGet) |
| IaC completeness | ⚠️ Partial | Medium | Bicep templates incomplete, relying on Aspire CLI |
| Multi-region deployment | ❌ Not Implemented | Low | Single-region deployment only (North Europe) |
| Secrets rotation | ❌ Not Implemented | Low | Manual secret rotation (Key Vault secrets) |
| Cost monitoring | ❌ Not Implemented | Low | No automated cost alerts or budget tracking |
Recommendations
High Priority
- Add E2E tests to CI pipeline:
- Run Playwright tests in
ci.ymlafter frontend build - Use Testcontainers for Keycloak + PostgreSQL + Elasticsearch
- Generate E2E test report in PR comments
- Run Playwright tests in
- Publish Docker images:
- Push Gateway images to GitHub Container Registry
- Version images with GitVersioning tags
- Enable multi-arch builds (linux/amd64, linux/arm64)
- Configure Renovate for npm:
- Replace/augment Dependabot with Renovate
- Auto-merge patch updates
- Grouped dependency updates
- Better monorepo support for npm workspaces (platform-client-js, iot-frontend)
- Performance regression testing with k6:
- .NET Aspire k6 integration for load testing
- Baseline API response times
- Detect performance regressions in PRs
- Query Elasticsearch metrics for validation
- Target: <200ms P95 for simple queries, <500ms for complex aggregations
Medium Priority
Add deployment smoke tests:
- Health check endpoints after Azure deployment
- Basic API tests against staging environment
- Fail deployment if smoke tests fail
Static code analysis (SonarCloud):
- Integrate with GitHub Actions PR validation
- Quality gates in PR validation
- Track technical debt metrics over time
- Security vulnerability detection beyond dependency scanning
Low Priority
Blue-green deployments:
- Use Azure Container Apps revisions
- Traffic splitting for gradual rollouts
Multi-environment management:
- Add dev/qa environments
- Environment-specific configuration management
Developer Experience Tools
Remote Development & Debugging
- Telepresence / Mirrord:
- Local development against remote Azure environment
- Debug gateway locally while connected to staging Keycloak/PostgreSQL/Elasticsearch
- Faster iteration than deploy-test-debug cycle
- Bridge local code with remote services
- Test locally with production-like data
- Teleport:
- Secure access to edge gateways (SSH, HTTPS, database tunnels)
- Certificate-based authentication (no password management)
- Audit logging for all gateway access
- Alternative to VPN for field gateway debugging
- Session recording for troubleshooting and training
- RBAC for team access to production gateways
- Integrates with SSO (Keycloak)
- Use case: Debug gateway logs on customer site without VPN
Code Quality & Testing
- NCrunch (Visual Studio/Rider):
- Continuous test runner with real-time feedback
- Coverage visualization inline in editor
- Productivity boost for TDD with 65+ test files
- Identifies slow tests automatically
- MQTT Explorer:
- Essential for debugging gateway telemetry publishing
- Visualize topic structure (gateway-{id}/telemetry)
- Test Service Bus > MQTT migration path (when gateway count > 50)
- Monitor real-time telemetry from development gateways
Protocol Development
- Modbus Slave Simulator:
- Test Modbus TCP/RTU without physical hardware
- Simulate CNTE Battery, Growatt Inverter, Peblar chargers
- Faster development iteration than real devices
- Create test scenarios (fault conditions, edge cases)
- Wireshark:
- OCPP 2.1 WebSocket debugging (charge station messages)
- Modbus protocol validation
- Essential for protocol compliance testing (OCPP conformance)
- Capture and replay protocol sequences
API Development
- Stoplight Studio:
- OpenAPI spec editor with visual designer
- Generate mock servers from OpenAPI specs
- API documentation with try-it-out (complement to Scalar)
- Validate spec changes before committing
- Hoppscotch / Insomnia:
- API testing with OpenAPI import (auto-generate from your specs)
- Request collections for common workflows
- Export to Playwright/k6 for automation
- Team collaboration with shared collections
Project Management
- Linear:
- Modern issue tracking (better UX than GitHub Issues)
- Integrates with GitHub PRs and commits
- Roadmap visualization
- Cycle planning (matches phase-based roadmap structure)
- Automatic status updates from PR merges
Collaboration
- Miro / Excalidraw:
- System architecture diagrams (C4 model for docs)
- Collaborative whiteboarding for design sessions
- Real-time collaboration during architecture reviews
- Export to documentation
Success Metrics
Current CI/CD Health
- ✅ Build success rate: High (no data available)
- ✅ PR validation time: ~20-25 minutes (acceptable)
- ✅ Main branch CI time: ~25-30 minutes (acceptable)
- ✅ Deployment time: ~25 minutes (acceptable)
- ⚠️ Test coverage: Moderate (see Technical Debt & Testing)
- ❌ E2E test coverage: 0% (not executed)
Target Metrics
- Build success rate: >95%
- PR validation time: <15 minutes
- Main branch CI time: <20 minutes
- Deployment time: <15 minutes
- Test coverage: >80% for critical paths
- E2E test coverage: 100% of critical user journeys
- Mean time to deployment (MTTD): <30 minutes from commit
- Mean time to recovery (MTTR): <15 minutes from incident detection