Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
156 changes: 156 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -239,6 +239,13 @@ client_id = "your-client-id"
[account]
admin_email = "admin@example.com"
admin_password = "your-password"

# mTLS for Manager → Decision Maker communication (optional, default: disabled)
[mtls]
enable = false
cert_pem = "..." # Manager's client certificate (signed by private CA)
key_pem = "..." # Manager's client private key
ca_pem = "..." # Private CA certificate (to verify Decision Maker's server cert)
```

#### Decision Maker Configuration (`config/dm_config.toml`)
Expand All @@ -252,6 +259,13 @@ level = "info"
[token]
rsa_private_key_pem = "..."
token_duration_hr = 24

# mTLS server for Manager → Decision Maker communication (optional, default: disabled)
[mtls]
enable = false
cert_pem = "..." # Decision Maker's server certificate (signed by private CA)
key_pem = "..." # Decision Maker's server private key
ca_pem = "..." # Private CA certificate (to verify Manager's client cert)
```

### 3. Start Services
Expand All @@ -276,6 +290,148 @@ go run main.go decisionmaker -c dm_config -d /path/to/config

Please refer to https://github.com/Gthulhu/chart?tab=readme-ov-file#testing for testing the API endpoints using curl.

## mTLS Setup: Manager ↔ Decision Maker

The Manager communicates with every Decision Maker node using **mutual TLS (mTLS)**. Both sides authenticate each other with certificates signed by a shared **private CA**, so neither plain-text traffic nor untrusted connections are accepted.

> **Note**: The Manager's external HTTP API (web GUI, `/api/v1/…`) intentionally remains plain HTTP. In a production cluster this endpoint is typically exposed through a Kubernetes Ingress with TLS termination.

### Why mTLS?

Scheduling decisions affect the Linux kernel scheduler on every node. A compromised connection between the Manager and a Decision Maker could allow an attacker to manipulate per-process CPU priorities. mTLS provides:

- **Server authentication** – the Manager verifies it is talking to a genuine Decision Maker.
- **Client authentication** – the Decision Maker verifies only the authorised Manager can push intents.
- **Encrypted channel** – all scheduling intents are protected in transit.

### Step-by-step: Generate certificates with a private CA

The commands below use only the OpenSSL CLI. Replace `<DM_IP>` with the actual IP or hostname of each Decision Maker node.

#### 1. Create the private CA

```bash
# Generate CA private key (EC P-256 recommended; RSA-4096 also works)
openssl ecparam -name prime256v1 -genkey -noout -out ca.key

# Self-signed CA certificate (10-year validity)
openssl req -new -x509 -days 3650 \
-key ca.key \
-out ca.crt \
-subj "/CN=Gthulhu-Private-CA"
```

#### 2. Generate the Manager client certificate

```bash
# Manager private key
openssl ecparam -name prime256v1 -genkey -noout -out manager.key

# Certificate signing request
openssl req -new \
-key manager.key \
-out manager.csr \
-subj "/CN=gthulhu-manager"

# Sign with the private CA (2-year validity, client-auth EKU)
openssl x509 -req -days 730 \
-in manager.csr \
-CA ca.crt -CAkey ca.key -CAcreateserial \
-extfile <(printf "extendedKeyUsage=clientAuth") \
-out manager.crt
```

#### 3. Generate a Decision Maker server certificate

Repeat for each DM node, setting the correct IP/DNS in `subjectAltName`.

```bash
# Decision Maker private key
openssl ecparam -name prime256v1 -genkey -noout -out dm.key

# CSR
openssl req -new \
-key dm.key \
-out dm.csr \
-subj "/CN=gthulhu-decisionmaker"

# Sign with the private CA (2-year validity, server-auth + client-auth EKUs + SAN)
openssl x509 -req -days 730 \
-in dm.csr \
-CA ca.crt -CAkey ca.key -CAcreateserial \
-extfile <(printf "subjectAltName=IP:<DM_IP>\nextendedKeyUsage=serverAuth,clientAuth") \
-out dm.crt
```

#### 4. Embed certificates in configuration

Paste the PEM file contents into the respective config files.

**`config/manager_config.toml`**

```toml
[mtls]
enable = true
cert_pem = """
-----BEGIN CERTIFICATE-----
<contents of manager.crt>
-----END CERTIFICATE-----
"""
key_pem = """
-----BEGIN EC PRIVATE KEY-----
<contents of manager.key>
-----END EC PRIVATE KEY-----
"""
ca_pem = """
-----BEGIN CERTIFICATE-----
<contents of ca.crt>
-----END CERTIFICATE-----
"""
```

**`config/dm_config.toml`**

```toml
[mtls]
enable = true
cert_pem = """
-----BEGIN CERTIFICATE-----
<contents of dm.crt>
-----END CERTIFICATE-----
"""
key_pem = """
-----BEGIN EC PRIVATE KEY-----
<contents of dm.key>
-----END EC PRIVATE KEY-----
"""
ca_pem = """
-----BEGIN CERTIFICATE-----
<contents of ca.crt>
-----END CERTIFICATE-----
"""
```

#### 5. Verify

Start both services and confirm the Decision Maker log contains:

```
starting dm server with mTLS on port :8080
```

And the Manager log shows successful intent reconciliation without TLS errors.

### Kubernetes: mounting certificates as Secrets

In a production deployment, store PEM content in Kubernetes Secrets and mount them as environment variables or files, then reference them in the TOML config.

```bash
kubectl create secret generic gthulhu-mtls-certs \
--from-file=ca.crt \
--from-file=manager.crt \
--from-file=manager.key
```

## Kubernetes Deployment

### Deployment Architecture
Expand Down
20 changes: 19 additions & 1 deletion config/dm_config.default.toml
Original file line number Diff line number Diff line change
Expand Up @@ -61,4 +61,22 @@ spxkAVwlY6g1ZER7IFXlzhz6wYuDBayRhA/2zBPgtGesfpTd7H24AlJ6qB+mThHs
X6m7Mp9nAMhRyXhULslO3trWFbFCa2dbQkDSyBRvsb2HZtztoLVyo1mtUg==
-----END RSA PRIVATE KEY-----
"""
token_duration_hr = 24
token_duration_hr = 24

[mtls]
enable = false
cert_pem = """
-----BEGIN CERTIFICATE-----
YOUR_DM_CERTIFICATE_HERE
-----END CERTIFICATE-----
"""
key_pem = """
-----BEGIN EC PRIVATE KEY-----
YOUR_DM_PRIVATE_KEY_HERE
-----END EC PRIVATE KEY-----
"""
ca_pem = """
-----BEGIN CERTIFICATE-----
YOUR_CA_CERTIFICATE_HERE
-----END CERTIFICATE-----
"""
1 change: 1 addition & 0 deletions config/dm_config.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ type DecisionMakerConfig struct {
Server ServerConfig `mapstructure:"server"`
Logging LoggingConfig `mapstructure:"logging"`
Token TokenConfig `mapstructure:"token"`
MTLS MTLSConfig `mapstructure:"mtls"`
}

var (
Expand Down
20 changes: 19 additions & 1 deletion config/manager_config.default.toml
Original file line number Diff line number Diff line change
Expand Up @@ -98,4 +98,22 @@ admin_password = "your-password-here"

[k8s]
kube_config_path = "/path/to/kubeconfig"
in_cluster = false
in_cluster = false

[mtls]
enable = false
cert_pem = """
-----BEGIN CERTIFICATE-----
YOUR_MANAGER_CERTIFICATE_HERE
-----END CERTIFICATE-----
"""
key_pem = """
-----BEGIN EC PRIVATE KEY-----
YOUR_MANAGER_PRIVATE_KEY_HERE
-----END EC PRIVATE KEY-----
"""
ca_pem = """
-----BEGIN CERTIFICATE-----
YOUR_CA_CERTIFICATE_HERE
-----END CERTIFICATE-----
"""
11 changes: 11 additions & 0 deletions config/manager_config.go
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,17 @@ type ManageConfig struct {
Key KeyConfig `mapstructure:"key"`
Account AccountConfig `mapstructure:"account"`
K8S K8SConfig `mapstructure:"k8s"`
MTLS MTLSConfig `mapstructure:"mtls"`
}

// MTLSConfig holds the mutual TLS configuration used for Manager ↔ Decision Maker communication.
// CertPem and KeyPem are the service's own certificate/key pair signed by the private CA.
// CAPem is the private CA certificate used to verify the peer's certificate.
type MTLSConfig struct {
Enable bool `mapstructure:"enable"`
CertPem SecretValue `mapstructure:"cert_pem"`
KeyPem SecretValue `mapstructure:"key_pem"`
CAPem SecretValue `mapstructure:"ca_pem"`
}

type MongoDBConfig struct {
Expand Down
3 changes: 3 additions & 0 deletions decisionmaker/app/module.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@ func ConfigModule(cfg config.DecisionMakerConfig) (fx.Option, error) {
fx.Provide(func(dmCfg config.DecisionMakerConfig) config.TokenConfig {
return dmCfg.Token
}),
fx.Provide(func(dmCfg config.DecisionMakerConfig) config.MTLSConfig {
return dmCfg.MTLS
}),
), nil
}

Expand Down
59 changes: 52 additions & 7 deletions decisionmaker/app/rest_app.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@ package app

import (
"context"
"crypto/tls"
"crypto/x509"
"fmt"
"net"

"github.com/Gthulhu/api/config"
"github.com/Gthulhu/api/decisionmaker/rest"
Expand Down Expand Up @@ -32,11 +36,11 @@ func NewRestApp(configName string, configDirPath string) (*fx.App, error) {
return app, nil
}

func StartRestApp(lc fx.Lifecycle, cfg config.ServerConfig, handler *rest.Handler) error {
func StartRestApp(lc fx.Lifecycle, cfg config.ServerConfig, mtlsCfg config.MTLSConfig, handler *rest.Handler) error {
engine := echo.New()
handler.SetupRoutes(engine)

// TODO: setup middleware, logging, etc.
if err := handler.SetupRoutes(engine); err != nil {
return err
}

lc.Append(fx.Hook{
OnStart: func(ctx context.Context) error {
Expand All @@ -45,9 +49,15 @@ func StartRestApp(lc fx.Lifecycle, cfg config.ServerConfig, handler *rest.Handle
serverHost = ":8082"
}
go func() {
logger.Logger(ctx).Info().Msgf("starting dm server on port %s", serverHost)
if err := engine.Start(serverHost); err != nil {
logger.Logger(ctx).Fatal().Err(err).Msgf("start rest server fail on port %s", serverHost)
if mtlsCfg.Enable {
if err := startTLSServer(ctx, engine, serverHost, mtlsCfg); err != nil {
logger.Logger(ctx).Fatal().Err(err).Msgf("start dm rest server with mTLS fail on port %s", serverHost)
}
} else {
logger.Logger(ctx).Info().Msgf("starting dm server on port %s", serverHost)
if err := engine.Start(serverHost); err != nil {
logger.Logger(ctx).Fatal().Err(err).Msgf("start rest server fail on port %s", serverHost)
}
}
}()
return nil
Expand All @@ -60,3 +70,38 @@ func StartRestApp(lc fx.Lifecycle, cfg config.ServerConfig, handler *rest.Handle

return nil
}

// startTLSServer starts the Echo server with mTLS: the server presents its own certificate and
// requires the connecting client (Manager) to present a certificate signed by the shared CA.
func startTLSServer(ctx context.Context, engine *echo.Echo, addr string, mtlsCfg config.MTLSConfig) error {
cert, err := tls.X509KeyPair([]byte(mtlsCfg.CertPem.Value()), []byte(mtlsCfg.KeyPem.Value()))
if err != nil {
return fmt.Errorf("load mTLS server certificate: %w", err)
}

caPool := x509.NewCertPool()
caPEM := mtlsCfg.CAPem.Value()
if caPEM == "" {
return fmt.Errorf("mTLS server CA PEM is empty; cannot configure client certificate validation")
}
if !caPool.AppendCertsFromPEM([]byte(caPEM)) {
return fmt.Errorf("no CA certificates found in mTLS server CA PEM; failed to parse CA bundle")
}

tlsCfg := &tls.Config{
Certificates: []tls.Certificate{cert},
ClientAuth: tls.RequireAndVerifyClientCert,
ClientCAs: caPool,
MinVersion: tls.VersionTLS12,
}

ln, err := net.Listen("tcp", addr)
if err != nil {
return fmt.Errorf("create listener: %w", err)
}
tlsListener := tls.NewListener(ln, tlsCfg)
engine.Listener = tlsListener

logger.Logger(ctx).Info().Msgf("starting dm server with mTLS on port %s", addr)
return engine.Start("")
}
3 changes: 3 additions & 0 deletions manager/app/module.go
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,9 @@ func ConfigModule(cfg config.ManageConfig) (fx.Option, error) {
fx.Provide(func(managerCfg config.ManageConfig) config.K8SConfig {
return managerCfg.K8S
}),
fx.Provide(func(managerCfg config.ManageConfig) config.MTLSConfig {
return managerCfg.MTLS
}),
), nil
}

Expand Down
Loading