upaas/BUGS.md
sneak 4e6542badf Consolidate database schema into two files: init migrations table and complete schema
Since there is no existing installed base, we can consolidate all migrations into a single complete schema file plus the migrations table initialization. This simplifies the database setup for new installations.
2026-02-16 14:51:33 +01:00

180 lines
5.5 KiB
Markdown

# Bugs in µPaaS
## 1. Potential Race Condition in Log Writing
### Description
In the deployment service, when a deployment fails, the `failDeployment` function calls `writeLogsToFile` which may be called concurrently with the async log writer's flush operations. This could lead to partial or corrupted log files.
### Location
`internal/service/deploy/deploy.go:1169` in `failDeployment` function
### Proposed Fix
1. Add synchronization to ensure only one log write operation occurs at a time
2. Modify the `deploymentLogWriter` to track completion status and prevent concurrent writes
3. Add a wait mechanism in `failDeployment` to ensure any ongoing flush operations complete before writing logs to file
```go
// Add a mutex to deploymentLogWriter
type deploymentLogWriter struct {
// existing fields...
mu sync.Mutex
writeMu sync.Mutex // Add this for file writing synchronization
done chan struct{}
flushed sync.WaitGroup
}
// In writeLogsToFile, ensure exclusive access
func (svc *Service) writeLogsToFile(app *models.App, deployment *models.Deployment) {
svc.writeMu.Lock() // Add this mutex to Service struct
defer svc.writeMu.Unlock()
// existing code...
}
```
## 2. Incomplete Error Handling in Container Operations
### Description
In the Docker client's `performClone` function, if `createGitContainer` fails, the SSH key file created earlier is not cleaned up, causing a potential security risk.
### Location
`internal/docker/client.go:597` in `performClone` function
### Proposed Fix
Add proper cleanup using `defer` immediately after creating the SSH key file:
```go
// After writing SSH key file (line 578)
keyFileCreated := false
err = os.WriteFile(cfg.keyFile, []byte(cfg.sshPrivateKey), sshKeyPermissions)
if err != nil {
return nil, fmt.Errorf("failed to write SSH key: %w", err)
}
keyFileCreated = true
defer func() {
if keyFileCreated {
removeErr := os.Remove(cfg.keyFile)
if removeErr != nil {
c.log.Error("failed to remove SSH key file", "error", removeErr)
}
}
}()
```
## 3. Missing Context Cancellation Check During Build
### Description
In the deployment service's `streamBuildOutput` function, long-running Docker build operations may not properly respond to context cancellation, causing deployments to hang even when cancelled.
### Location
`internal/docker/client.go:542` in `streamBuildOutput` function
### Proposed Fix
Add context checking in the scanner loop:
```go
for scanner.Scan() {
select {
case <-ctx.Done():
return ctx.Err()
default:
}
line := scanner.Bytes()
// existing code...
}
```
## 4. Inconsistent Container Removal in Error Cases
### Description
When deployment fails during container creation, the already-created container is not removed, leading to orphaned containers that consume resources.
### Location
`internal/service/deploy/deploy.go:969` in `createAndStartContainer` function
### Proposed Fix
Add cleanup of created container on start failure:
```go
containerID, err := svc.docker.CreateContainer(ctx, containerOpts)
if err != nil {
svc.notify.NotifyDeployFailed(ctx, app, deployment, err)
svc.failDeployment(ctx, app, deployment, fmt.Errorf("failed to create container: %w", err))
return "", fmt.Errorf("failed to create container: %w", err)
}
// Add cleanup defer for error cases
defer func() {
if err != nil {
// If we have a container ID but returning an error, clean it up
_ = svc.docker.RemoveContainer(context.Background(), containerID, true)
}
}()
startErr := svc.docker.StartContainer(ctx, containerID)
if startErr != nil {
svc.notify.NotifyDeployFailed(ctx, app, deployment, startErr)
svc.failDeployment(ctx, app, deployment, fmt.Errorf("failed to start container: %w", startErr))
err = startErr // Set err so defer cleanup runs
return "", fmt.Errorf("failed to start container: %w", startErr)
}
```
## 5. Potential Data Race in Active Deployments Tracking
### Description
The `activeDeploys` sync.Map in the deployment service may have race conditions when multiple concurrent deployments try to access the same app's deployment state.
### Location
`internal/service/deploy/deploy.go:226` and related functions
### Proposed Fix
Add proper locking around active deploy operations:
```go
// Add a mutex for active deploy operations
type Service struct {
// existing fields...
activeDeployMu sync.Mutex
}
// In Deploy function
func (svc *Service) Deploy(ctx context.Context, app *models.App, webhookEventID *int64, cancelExisting bool) error {
svc.activeDeployMu.Lock()
if cancelExisting {
svc.cancelActiveDeploy(app.ID)
}
// Try to acquire per-app deployment lock
if !svc.tryLockApp(app.ID) {
svc.activeDeployMu.Unlock()
svc.log.Warn("deployment already in progress", "app", app.Name)
return ErrDeploymentInProgress
}
svc.activeDeployMu.Unlock()
defer svc.unlockApp(app.ID)
// rest of function...
}
```
## 6. Incomplete Error Propagation in Git Clone
### Description
In the Docker client's `runGitClone` function, if `ContainerLogs` fails, the error is silently ignored, which could hide important debugging information.
### Location
`internal/docker/client.go:679` in `runGitClone` function
### Proposed Fix
Handle the ContainerLogs error properly:
```go
// Always capture logs for the result
logs, logErr := c.ContainerLogs(ctx, containerID, "100")
if logErr != nil {
c.log.Warn("failed to get git clone logs", "error", logErr)
logs = "Failed to retrieve logs: " + logErr.Error()
}
```