Skip to content

Fix crash upon reading empty config for notifier#4979

Open
SoloJacobs wants to merge 1 commit intoprometheus:mainfrom
SoloJacobs:fix-fuzz-findings
Open

Fix crash upon reading empty config for notifier#4979
SoloJacobs wants to merge 1 commit intoprometheus:mainfrom
SoloJacobs:fix-fuzz-findings

Conversation

@SoloJacobs
Copy link
Contributor

Previously, alertmanager and amtool would dereference a null pointer upon encountering a configuration such as

receivers:
- name: "example"
  webhook_configs:
  -

Here is an example traceback.

$ ./amtool check-config config/testdata/conf.crash.yml
Checking 'config/testdata/conf.crash.yml'panic: runtime error: invalid memory address or nil pointer dereference [recovered, repanicked]
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0xba8d25]

goroutine 1 [running]:
gopkg.in/yaml%2ev2.handleErr(0xc000239a28)
	/home/solomonjacobs/go/pkg/mod/gopkg.in/yaml.v2@v2.4.0/yaml.go:249 +0x68
panic({0xea0000?, 0x19a8260?})
	/usr/local/go/src/runtime/panic.go:783 +0x132
github.com/prometheus/alertmanager/config.(*Config).UnmarshalYAML(0xc0004281e0, 0x773fed0393b0?)
	/home/solomonjacobs/get/f-alertmanager/config/config.go:523 +0xce5
gopkg.in/yaml%2ev2.(*decoder).callUnmarshaler(0xc0002c71a0, 0xc00042a070, {0x773fed0393b0, 0xc0004281e0})
	/home/solomonjacobs/go/pkg/mod/gopkg.in/yaml.v2@v2.4.0/decode.go:270 +0x94
gopkg.in/yaml%2ev2.(*decoder).prepare(0xc0002c71a0, 0xc00042a070, {0xfb4200?, 0xc0004281e0?, 0xc000072e08?})
	/home/solomonjacobs/go/pkg/mod/gopkg.in/yaml.v2@v2.4.0/decode.go:313 +0x256
gopkg.in/yaml%2ev2.(*decoder).unmarshal(0xc0002c71a0, 0xc00042a070, {0xfb4200?, 0xc0004281e0?, 0xaa506b?})
	/home/solomonjacobs/go/pkg/mod/gopkg.in/yaml.v2@v2.4.0/decode.go:364 +0x13b
gopkg.in/yaml%2ev2.(*decoder).document(...)
	/home/solomonjacobs/go/pkg/mod/gopkg.in/yaml.v2@v2.4.0/decode.go:384
gopkg.in/yaml%2ev2.(*decoder).unmarshal(0xea7a00?, 0xc0004281e0?, {0xfb4200?, 0xc0004281e0?, 0x46467e?})
	/home/solomonjacobs/go/pkg/mod/gopkg.in/yaml.v2@v2.4.0/decode.go:360 +0x110
gopkg.in/yaml%2ev2.unmarshal({0xc0003ea800, 0x34, 0x40}, {0xea7a00, 0xc0004281e0}, 0x1)
	/home/solomonjacobs/go/pkg/mod/gopkg.in/yaml.v2@v2.4.0/yaml.go:148 +0x371
gopkg.in/yaml%2ev2.UnmarshalStrict(...)
	/home/solomonjacobs/go/pkg/mod/gopkg.in/yaml.v2@v2.4.0/yaml.go:89
github.com/prometheus/alertmanager/config.Load({0xc0003ea7c0, 0x34})
	/home/solomonjacobs/get/f-alertmanager/config/config.go:287 +0x55
github.com/prometheus/alertmanager/config.LoadFile({0x7fff2ce3c111, 0x1e})
	/home/solomonjacobs/get/f-alertmanager/config/config.go:313 +0x37
github.com/prometheus/alertmanager/cli.CheckConfig({0xc0003c1e80?, 0xc000239ca8?, 0x419c1f?})
	/home/solomonjacobs/get/f-alertmanager/cli/check_config.go:67 +0x1d4
github.com/prometheus/alertmanager/cli.(*checkConfigCmd).checkConfig(...)
	/home/solomonjacobs/get/f-alertmanager/cli/check_config.go:48
github.com/alecthomas/kingpin/v2.(*actionMixin).applyActions(...)
	/home/solomonjacobs/go/pkg/mod/github.com/alecthomas/kingpin/v2@v2.4.0/actions.go:28
github.com/alecthomas/kingpin/v2.(*Application).applyActions(0xc0001a4e00?, 0xc0002bc7e0)
	/home/solomonjacobs/go/pkg/mod/github.com/alecthomas/kingpin/v2@v2.4.0/app.go:568 +0xd0
github.com/alecthomas/kingpin/v2.(*Application).execute(0xc0001a4e00, 0xc0002bc7e0, {0xc0003c1e50, 0x1, 0x1})
	/home/solomonjacobs/go/pkg/mod/github.com/alecthomas/kingpin/v2@v2.4.0/app.go:398 +0x65
github.com/alecthomas/kingpin/v2.(*Application).Parse(0xc0001a4e00, {0xc000034370?, 0xc000034370?, 0xc00040ea80?})
	/home/solomonjacobs/go/pkg/mod/github.com/alecthomas/kingpin/v2@v2.4.0/app.go:230 +0x13f
github.com/prometheus/alertmanager/cli.Execute()
	/home/solomonjacobs/get/f-alertmanager/cli/root.go:184 +0x12c9
main.main()
	/home/solomonjacobs/get/f-alertmanager/cmd/amtool/main.go:19 +0xf

Previously, `alertmanager` and `amtool` would dereference a null
pointer upon encountering a configuration such as:

```yaml
receivers:
- name: "example"
  webhook_configs:
  -
```

Here is an example traceback:

```sh
$ ./amtool check-config config/testdata/conf.crash.yml
Checking 'config/testdata/conf.crash.yml'panic: runtime error: invalid memory address or nil pointer dereference [recovered, repanicked]
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0xba8d25]

goroutine 1 [running]:
gopkg.in/yaml%2ev2.handleErr(0xc000239a28)
	/home/solomonjacobs/go/pkg/mod/gopkg.in/yaml.v2@v2.4.0/yaml.go:249 +0x68
panic({0xea0000?, 0x19a8260?})
	/usr/local/go/src/runtime/panic.go:783 +0x132
github.com/prometheus/alertmanager/config.(*Config).UnmarshalYAML(0xc0004281e0, 0x773fed0393b0?)
	/home/solomonjacobs/get/f-alertmanager/config/config.go:523 +0xce5
gopkg.in/yaml%2ev2.(*decoder).callUnmarshaler(0xc0002c71a0, 0xc00042a070, {0x773fed0393b0, 0xc0004281e0})
	/home/solomonjacobs/go/pkg/mod/gopkg.in/yaml.v2@v2.4.0/decode.go:270 +0x94
gopkg.in/yaml%2ev2.(*decoder).prepare(0xc0002c71a0, 0xc00042a070, {0xfb4200?, 0xc0004281e0?, 0xc000072e08?})
	/home/solomonjacobs/go/pkg/mod/gopkg.in/yaml.v2@v2.4.0/decode.go:313 +0x256
gopkg.in/yaml%2ev2.(*decoder).unmarshal(0xc0002c71a0, 0xc00042a070, {0xfb4200?, 0xc0004281e0?, 0xaa506b?})
	/home/solomonjacobs/go/pkg/mod/gopkg.in/yaml.v2@v2.4.0/decode.go:364 +0x13b
gopkg.in/yaml%2ev2.(*decoder).document(...)
	/home/solomonjacobs/go/pkg/mod/gopkg.in/yaml.v2@v2.4.0/decode.go:384
gopkg.in/yaml%2ev2.(*decoder).unmarshal(0xea7a00?, 0xc0004281e0?, {0xfb4200?, 0xc0004281e0?, 0x46467e?})
	/home/solomonjacobs/go/pkg/mod/gopkg.in/yaml.v2@v2.4.0/decode.go:360 +0x110
gopkg.in/yaml%2ev2.unmarshal({0xc0003ea800, 0x34, 0x40}, {0xea7a00, 0xc0004281e0}, 0x1)
	/home/solomonjacobs/go/pkg/mod/gopkg.in/yaml.v2@v2.4.0/yaml.go:148 +0x371
gopkg.in/yaml%2ev2.UnmarshalStrict(...)
	/home/solomonjacobs/go/pkg/mod/gopkg.in/yaml.v2@v2.4.0/yaml.go:89
github.com/prometheus/alertmanager/config.Load({0xc0003ea7c0, 0x34})
	/home/solomonjacobs/get/f-alertmanager/config/config.go:287 +0x55
github.com/prometheus/alertmanager/config.LoadFile({0x7fff2ce3c111, 0x1e})
	/home/solomonjacobs/get/f-alertmanager/config/config.go:313 +0x37
github.com/prometheus/alertmanager/cli.CheckConfig({0xc0003c1e80?, 0xc000239ca8?, 0x419c1f?})
	/home/solomonjacobs/get/f-alertmanager/cli/check_config.go:67 +0x1d4
github.com/prometheus/alertmanager/cli.(*checkConfigCmd).checkConfig(...)
	/home/solomonjacobs/get/f-alertmanager/cli/check_config.go:48
github.com/alecthomas/kingpin/v2.(*actionMixin).applyActions(...)
	/home/solomonjacobs/go/pkg/mod/github.com/alecthomas/kingpin/v2@v2.4.0/actions.go:28
github.com/alecthomas/kingpin/v2.(*Application).applyActions(0xc0001a4e00?, 0xc0002bc7e0)
	/home/solomonjacobs/go/pkg/mod/github.com/alecthomas/kingpin/v2@v2.4.0/app.go:568 +0xd0
github.com/alecthomas/kingpin/v2.(*Application).execute(0xc0001a4e00, 0xc0002bc7e0, {0xc0003c1e50, 0x1, 0x1})
	/home/solomonjacobs/go/pkg/mod/github.com/alecthomas/kingpin/v2@v2.4.0/app.go:398 +0x65
github.com/alecthomas/kingpin/v2.(*Application).Parse(0xc0001a4e00, {0xc000034370?, 0xc000034370?, 0xc00040ea80?})
	/home/solomonjacobs/go/pkg/mod/github.com/alecthomas/kingpin/v2@v2.4.0/app.go:230 +0x13f
github.com/prometheus/alertmanager/cli.Execute()
	/home/solomonjacobs/get/f-alertmanager/cli/root.go:184 +0x12c9
main.main()
	/home/solomonjacobs/get/f-alertmanager/cmd/amtool/main.go:19 +0xf
```

Related-to: prometheus#4912
Signed-off-by: Solomon Jacobs <solomonjacobs@protonmail.com>
for _, wh := range rcv.WebhookConfigs {
if wh == nil {
wh = &WebhookConfig{}
return wh.UnmarshalYAML(func(any) error { return nil })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to call wh.UnmarshalYAML() and return?
Doesn't this mean we simply stop processing the rest of the config?

Also since most of these fields are required, should we just log an error and exit the process?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to call wh.UnmarshalYAML() and return?

The primary motivation for me was to preserve the errors, that it would have encountered, if it did parse an empty config. A different way to accomplish this is by defining an error, and share it between the UnmarshalYAML and this call site.

Doesn't this mean we simply stop processing the rest of the config?

For this specific notifier, this call always terminates here. In the case of some of the error is only exposed later.

Also since most of these fields are required, should we just log an error and exit the process?

We should definitely just return the error: If we exit the process, this might break the reloading mechanism. On initial startup, the process will log a message and then exit. I will not touch the error handling in that function in this PR, although I do have some cleanups in mind. I imagine you meant something different, tough.

I hope that clears up your questions. I would then propose the following: Have a validation function per WebhookConfig, EmailConfig etc. This will be source of truth, rather then logic being scattered across (c *Config) UnmarshalYAML , (c *WebhookConfig) UnmarshalYAML. @siavashs do you like this proposal?
@TheMeier Would such a refactoring interfere with your current work?

In the future, I would like to

  • move the validation out of the UnmarshalYAML,
  • use a parse, don't validate approach,
  • have consistent error messages, if required fields are missing.

But that is for different PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I thought about validation functions per notifier-config too, would be a nice addtion. I don't think it would interfer too much wih my experiment. But maybe it might be worthwhile to wait a bit more and see if a path emerges for splitting out notifier-specific code before adding more per-notifier code.

Copy link
Contributor

@siavashs siavashs Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary motivation for me was to preserve the errors, that it would have encountered, if it did parse an empty config. A different way to accomplish this is by defining an error, and share it between the UnmarshalYAML and this call site.

We must use either an existing error type or create a custom one without needing to make unnecessary method calls to produce an error.
For example you can construct TypeErrors, or just a plain err.Error which says what failed.

For this specific notifier, this call always terminates here. In the case of some of the error is only exposed later.

There are following code blocks which need to process the rest of the config, an early return here means that we never process those.
I think we need to gather all errors and return them to the caller, then the caller decides what to do with them.
Otherwise the user will become aware of the errors one by one, forcing them to go through iterations rather than receiving a full validation result from the config.

I would then propose the following: Have a validation function per WebhookConfig, EmailConfig etc. This will be source of truth, rather then logic being scattered across (c *Config) UnmarshalYAML , (c *WebhookConfig) UnmarshalYAML.

During the maintainers meet up we discussed about breaking up the config file into multiple parts and having independent load and validations per part, for example:

  • if you have 10 receivers and you add a new broken one we just skip loading that and show an error in the logs, ui, api, etc.

This should be done both during startup and reload, which should help Alertmanager to start and let the user fix the broken part of the config rather than having complete downtime.

Reloads are similarly handled, in a scenario when multiple users inject their receiver configs, a broken new receiver should not block the receiver added by another user to load, etc.

So I like the idea of decoupling the config load mechanism and also braking up the config file into smaller pieces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are following code blocks which need to process the rest of the config, an early return here means that we never process those.

Yes, but this behavior was already present, it is not introduced in this change. We can collect errors, but that is a much larger change, which affects the control in many places. For now, all this change should do is change crashes into error messages, the same way it is done everywhere else. I will do follow-up with other suggestions of yours.

I created an issue to track the other findings, which we have discussed here. @TheMeier @siavashs Let me know what you think: #4989

Copy link
Contributor

@siavashs siavashs Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create a missing <section> config error and return that instead of calling unmarshal on empty config.
Otherwise LGTM.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to delay error collection to a later point in time. Creating a missing <section> config error also sounds good to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants