Conversation
- Updated .gitignore to include environment-specific secrets and Helm deployment value files. - Added MySQL PersistentVolumeClaim and ConfigMap for configuration management. - Introduced MySQL Deployment and Service templates for better database management. - Enhanced jobs-manager-deployment with security context, resource requests, and environment variable management. - Updated RBAC roles to include service account creation and improved access control. - Added resource monitor DaemonSet configuration and associated settings. - Improved documentation in NOTES.txt for deployment verification and component overview.
- Introduced a new Helm CI workflow for automated linting, templating, and validation across multiple platforms. - Added deprecation notices for legacy charts (aks, bm, eks, oc) in favor of the unified tracebloc chart. - Updated templates to utilize a centralized registry secret name for improved consistency. - Enhanced values schema and default values for better configuration management. - Added tests for jobs manager and MySQL deployment to ensure template integrity and functionality. - Improved documentation in values.yaml and NOTES.txt for clarity on deployment configurations.
…m Helm chart - Eliminated `imageRegistry` and `nodeSelector` settings from various configuration files to streamline deployment. - Updated templates to default to `docker.io` for image references, ensuring consistency across environments. - Added PodDisruptionBudgets for jobs manager and MySQL deployments to enhance availability during maintenance. - Improved deployment configurations with termination grace periods for better resource management. - Enhanced documentation in NOTES.txt for clearer deployment instructions and component overview.
- Updated `MIGRATION.md` to reflect changes in cluster role configuration, replacing `useClusterScope` with `clusterScope`. - Revised `values.schema.json` to streamline required properties and enhance descriptions for environment variables, PVCs, and RBAC settings. - Modified `values.yaml` to consolidate environment variable settings and clarify PVC configurations. - Enhanced template files to utilize new helper functions for PVC names and storage sizes, ensuring consistency across deployments. - Improved test configurations to align with updated schema and values, ensuring robust validation of deployments.
|
|
||
| either: pvc-585af5b7-6652-42d4-98d2-90a8fecb987e | ||
|
|
||
| this: pvc-e4e62729-740e-46e4-817f-6d36387ddabf No newline at end of file |
There was a problem hiding this comment.
Accidentally committed PVC identifiers in README
High Severity
The README.md has what appear to be scratch notes with specific PVC volume identifiers (pvc-9b5ea50f-..., pvc-585af5b7-..., pvc-e4e62729-...) labeled curent, either, and this. These look like personal debugging notes or environment-specific PVC names that were accidentally included in the public-facing readme.
| labels: | ||
| {{- include "tracebloc.labels" . | nindent 4 }} | ||
| spec: | ||
| minAvailable: 1 |
There was a problem hiding this comment.
PDB blocks node drains on single-replica deployments
Medium Severity
Both new PodDisruptionBudgets set minAvailable: 1 for deployments that have exactly replicas: 1. This prevents the Kubernetes eviction API from ever evicting these pods, effectively blocking kubectl drain, node maintenance, and cluster upgrades indefinitely. Using maxUnavailable: 1 instead would allow voluntary disruptions while still preventing multiple simultaneous evictions.
Additional Locations (1)
- Introduced `install-k8s.sh` for a one-command Kubernetes installation on macOS and Linux, utilizing k3d for lightweight clusters. - Added various helper scripts in the `lib` directory for GPU detection, driver installation, and cluster management. - Enhanced `.gitignore` to exclude unnecessary files while including the `scripts/lib` directory. - Implemented logging and utility functions for better user feedback during installation and setup processes. - Provided a summary script to display cluster status and common commands post-installation.
- Added creation of host data directory in `_create_new_cluster` if it doesn't exist. - Updated `HOST_DATA_DIR` default value to `$HOME/.tracebloc` for better user experience. - Modified summary output to include volume mount information for clarity. - Revised `MIGRATION.md` to reflect new host path structure for data, logs, and MySQL. - Updated `values.schema.json` to change property names from `dataPath`, `logsPath`, and `mysqlPath` to `dataDir`, `logsDir`, and `mysqlDir` for consistency. - Enhanced template files to utilize new helper functions for host path management in PVC configurations.
…treamlined deployment
…ved clarity in configuration
- Added functions to detect Ubuntu codename and RHEL version for dynamic package retrieval. - Implemented a method to scrape the ROCm repository for the latest .deb and .rpm files. - Improved error handling for unsupported distributions and missing packages. - Updated installation logic to support both Ubuntu and RHEL/CentOS systems more effectively.
| - name: GPU_REQUESTS | ||
| value: {{ if hasKey .Values.env "GPU_REQUESTS" }}{{ .Values.env.GPU_REQUESTS | quote }}{{ else }}"nvidia.com/gpu=1"{{ end }} | ||
| - name: GPU_LIMITS | ||
| value: {{ if hasKey .Values.env "GPU_LIMITS" }}{{ .Values.env.GPU_LIMITS | quote }}{{ else }}"nvidia.com/gpu=1"{{ end }} |
There was a problem hiding this comment.
GPU resources default changed, breaks non-GPU clusters
High Severity
When env.GPU_REQUESTS and env.GPU_LIMITS are not provided (the default, since env is {}), the template falls back to "nvidia.com/gpu=1" for both. All four legacy charts defaulted to empty strings for these values, meaning no GPU was requested. The unified chart now causes every spawned job to request an NVIDIA GPU by default. On clusters without GPU nodes — which is the common case — spawned jobs will be stuck in Pending forever because the scheduler can't satisfy the nvidia.com/gpu resource request.
- Updated package manager installation commands to include options for handling configuration file changes during upgrades. - Set environment variables to ensure non-interactive installations and manage restart behavior during the setup process.
…onality - Introduced `_merge_kubeconfig` to handle kubeconfig updates and context switching. - Added `_wait_for_api` to ensure the Kubernetes API server is reachable before proceeding. - Streamlined the `create_cluster` function by delegating kubeconfig merging and API readiness checks to the new helper functions.
- Introduced claimRef with name and namespace fields in logs-pvc.yaml, mysql-storage-pvc.yaml, and shared-images-pvc.yaml to enhance the association of PersistentVolumeClaims with their respective resources.
- Revised the README to include a quick installation guide for setting up a local Kubernetes cluster with GPU support using a single command. - Introduced a new `install.sh` script that downloads necessary installation scripts and executes the Kubernetes installation process. - Updated the `install-k8s.sh` script reference in the comments to point to the new bootstrap installer for clarity.
- Modified the `install.sh` script to allow users to specify a branch for downloading the installer, defaulting to 'main'. - Enhanced the download message to indicate the selected branch for better user clarity.
…r dynamic downloads
…ce for dynamic downloads
- Updated the Docker installation process to force install the application if not already present. - Improved waiting mechanism for Docker engine startup with a configurable maximum wait time. - Added user guidance for accepting the Docker license agreement on first launch to prevent errors during setup.
- Introduced a new `install.ps1` script for Windows that downloads and executes the `install-k8s.ps1` script. - Updated `install.sh` to include instructions for Windows users, directing them to use the new PowerShell installer. - Enhanced error handling and retry logic in various scripts to improve robustness during downloads. - Improved help messages and usage instructions across installation scripts for better user guidance.
- Standardized comment formatting and improved clarity in usage instructions. - Enhanced error messages for better user feedback during installation. - Updated logging and banner display for a more consistent user experience. - Refined GPU detection logic and added error handling for missing NVIDIA drivers.
…er guidance - Added WSL update command with error handling to ensure the latest version is used. - Improved feedback messages for setting WSL2 as the default version, including instructions for manual updates if necessary.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
… experience - Implemented asynchronous WSL update with progress feedback to enhance user guidance during installation. - Improved logic for selecting a WSL2 distribution, prioritizing Ubuntu and providing fallback options. - Updated commands to ensure compatibility and clarity in executing WSL commands within the script.
…8s.ps1 - Adjusted console encoding to properly handle WSL output, ensuring accurate retrieval of available distributions. - Enhanced logic for selecting a WSL2 distribution, improving fallback behavior when no Ubuntu distro is found. - Updated command execution for kubectl and helm to ensure compatibility and clearer output handling.
- Added logic to determine the real hardware architecture on macOS, ensuring the correct version of Docker Desktop is installed. - Updated the installation script to download the appropriate Docker DMG based on the detected architecture. - Implemented verification to warn users if the installed Docker binary does not match the hardware architecture.
- Added a spinner utility to provide visual feedback during long-running commands in the installation scripts. - Updated Docker, kubectl, k3d, and helm installation processes to use the new spinner functionality, enhancing user experience by indicating progress. - Refactored existing command execution to improve clarity and maintainability.
- Introduced a check for fresh Docker installations, prompting users with setup instructions on first launch. - Improved architecture verification to warn users if the installed Docker binary does not match their hardware. - Adjusted waiting mechanism for Docker engine startup, reducing maximum wait time for improved responsiveness.
- Introduced a new `preflight_sudo` function to warm the credential cache for sudo, preventing interactive prompts during installations. - Integrated the `preflight_sudo` call at the start of both `install_macos` and `install_linux` functions to ensure necessary privileges are obtained before proceeding with installations. - Enhanced user experience by providing clear messaging regarding administrator privileges required for the installation process.
- Introduced a new `download_with_progress` function to provide a visual progress bar during the download of Docker Desktop, enhancing user experience. - Updated the `install_docker_desktop` function to utilize the new download function, replacing the previous download command with a retry mechanism for improved reliability. - The progress bar displays percentage and size information, working seamlessly on both macOS and Linux.
- Improved the waiting mechanism for Docker Desktop to include a visual spinner, providing real-time feedback while waiting for the Docker engine to start. - Added informative messages to guide users if Docker is not responding, ensuring clarity on what to check during the startup process. - Enhanced user experience by replacing static messages with dynamic updates during the waiting period.


Note
Medium Risk
Medium risk because this introduces a new unified Helm chart and significantly changes Kubernetes manifests/values, RBAC, secret naming, and storage behavior; rollout issues could impact existing deployments. The new bootstrap/installer scripts and CI checks are additive but touch developer setup paths.
Overview
Introduces a unified Helm chart (
tracebloc/) for AKS/EKS/bare-metal/OpenShift, including schema validation, platform-specific CI values,helm-unittesttests, and a migration guide.Hardens and standardizes deployment manifests across charts: consistent Kubernetes labels, dedicated ServiceAccounts, stricter container security contexts, required secret values, PVC
helm.sh/resource-policy: keep, split MySQL resources (Deployment/ConfigMap/Service) with probes and resource limits, and optional node selectors/resource monitor toggles (plus OpenShift SCC support in the unified chart).Adds automation and setup tooling: a new GitHub Actions workflow to lint/template/validate the Helm chart (kubeconform + unit tests), plus cross-platform one-command Kubernetes (k3d) + GPU installers (modular bash + Windows PowerShell). Also marks legacy per-platform charts as deprecated and updates
.gitignore/README accordingly.Written by Cursor Bugbot for commit 6a966b9. This will update automatically on new commits. Configure here.