# Bugs in µPaaS ## 1. Potential Race Condition in Log Writing ### Description In the deployment service, when a deployment fails, the `failDeployment` function calls `writeLogsToFile` which may be called concurrently with the async log writer's flush operations. This could lead to partial or corrupted log files. ### Location `internal/service/deploy/deploy.go:1169` in `failDeployment` function ### Proposed Fix 1. Add synchronization to ensure only one log write operation occurs at a time 2. Modify the `deploymentLogWriter` to track completion status and prevent concurrent writes 3. Add a wait mechanism in `failDeployment` to ensure any ongoing flush operations complete before writing logs to file ```go // Add a mutex to deploymentLogWriter type deploymentLogWriter struct { // existing fields... mu sync.Mutex writeMu sync.Mutex // Add this for file writing synchronization done chan struct{} flushed sync.WaitGroup } // In writeLogsToFile, ensure exclusive access func (svc *Service) writeLogsToFile(app *models.App, deployment *models.Deployment) { svc.writeMu.Lock() // Add this mutex to Service struct defer svc.writeMu.Unlock() // existing code... } ``` ## 2. Incomplete Error Handling in Container Operations ### Description In the Docker client's `performClone` function, if `createGitContainer` fails, the SSH key file created earlier is not cleaned up, causing a potential security risk. ### Location `internal/docker/client.go:597` in `performClone` function ### Proposed Fix Add proper cleanup using `defer` immediately after creating the SSH key file: ```go // After writing SSH key file (line 578) keyFileCreated := false err = os.WriteFile(cfg.keyFile, []byte(cfg.sshPrivateKey), sshKeyPermissions) if err != nil { return nil, fmt.Errorf("failed to write SSH key: %w", err) } keyFileCreated = true defer func() { if keyFileCreated { removeErr := os.Remove(cfg.keyFile) if removeErr != nil { c.log.Error("failed to remove SSH key file", "error", removeErr) } } }() ``` ## 3. Missing Context Cancellation Check During Build ### Description In the deployment service's `streamBuildOutput` function, long-running Docker build operations may not properly respond to context cancellation, causing deployments to hang even when cancelled. ### Location `internal/docker/client.go:542` in `streamBuildOutput` function ### Proposed Fix Add context checking in the scanner loop: ```go for scanner.Scan() { select { case <-ctx.Done(): return ctx.Err() default: } line := scanner.Bytes() // existing code... } ``` ## 4. Inconsistent Container Removal in Error Cases ### Description When deployment fails during container creation, the already-created container is not removed, leading to orphaned containers that consume resources. ### Location `internal/service/deploy/deploy.go:969` in `createAndStartContainer` function ### Proposed Fix Add cleanup of created container on start failure: ```go containerID, err := svc.docker.CreateContainer(ctx, containerOpts) if err != nil { svc.notify.NotifyDeployFailed(ctx, app, deployment, err) svc.failDeployment(ctx, app, deployment, fmt.Errorf("failed to create container: %w", err)) return "", fmt.Errorf("failed to create container: %w", err) } // Add cleanup defer for error cases defer func() { if err != nil { // If we have a container ID but returning an error, clean it up _ = svc.docker.RemoveContainer(context.Background(), containerID, true) } }() startErr := svc.docker.StartContainer(ctx, containerID) if startErr != nil { svc.notify.NotifyDeployFailed(ctx, app, deployment, startErr) svc.failDeployment(ctx, app, deployment, fmt.Errorf("failed to start container: %w", startErr)) err = startErr // Set err so defer cleanup runs return "", fmt.Errorf("failed to start container: %w", startErr) } ``` ## 5. Potential Data Race in Active Deployments Tracking ### Description The `activeDeploys` sync.Map in the deployment service may have race conditions when multiple concurrent deployments try to access the same app's deployment state. ### Location `internal/service/deploy/deploy.go:226` and related functions ### Proposed Fix Add proper locking around active deploy operations: ```go // Add a mutex for active deploy operations type Service struct { // existing fields... activeDeployMu sync.Mutex } // In Deploy function func (svc *Service) Deploy(ctx context.Context, app *models.App, webhookEventID *int64, cancelExisting bool) error { svc.activeDeployMu.Lock() if cancelExisting { svc.cancelActiveDeploy(app.ID) } // Try to acquire per-app deployment lock if !svc.tryLockApp(app.ID) { svc.activeDeployMu.Unlock() svc.log.Warn("deployment already in progress", "app", app.Name) return ErrDeploymentInProgress } svc.activeDeployMu.Unlock() defer svc.unlockApp(app.ID) // rest of function... } ``` ## 6. Incomplete Error Propagation in Git Clone ### Description In the Docker client's `runGitClone` function, if `ContainerLogs` fails, the error is silently ignored, which could hide important debugging information. ### Location `internal/docker/client.go:679` in `runGitClone` function ### Proposed Fix Handle the ContainerLogs error properly: ```go // Always capture logs for the result logs, logErr := c.ContainerLogs(ctx, containerID, "100") if logErr != nil { c.log.Warn("failed to get git clone logs", "error", logErr) logs = "Failed to retrieve logs: " + logErr.Error() } ```