Fix crash upon reading empty config for notifier by SoloJacobs · Pull Request #4979 · prometheus/alertmanager

SoloJacobs · 2026-02-08T18:04:45Z

Previously, alertmanager and amtool would dereference a null pointer upon encountering a configuration such as

receivers:
- name: "example"
  webhook_configs:
  -

Here is an example traceback.

$ ./amtool check-config config/testdata/conf.crash.yml
Checking 'config/testdata/conf.crash.yml'panic: runtime error: invalid memory address or nil pointer dereference [recovered, repanicked]
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0xba8d25]

goroutine 1 [running]:
gopkg.in/yaml%2ev2.handleErr(0xc000239a28)
	/home/solomonjacobs/go/pkg/mod/gopkg.in/yaml.v2@v2.4.0/yaml.go:249 +0x68
panic({0xea0000?, 0x19a8260?})
	/usr/local/go/src/runtime/panic.go:783 +0x132
github.com/prometheus/alertmanager/config.(*Config).UnmarshalYAML(0xc0004281e0, 0x773fed0393b0?)
	/home/solomonjacobs/get/f-alertmanager/config/config.go:523 +0xce5
gopkg.in/yaml%2ev2.(*decoder).callUnmarshaler(0xc0002c71a0, 0xc00042a070, {0x773fed0393b0, 0xc0004281e0})
	/home/solomonjacobs/go/pkg/mod/gopkg.in/yaml.v2@v2.4.0/decode.go:270 +0x94
gopkg.in/yaml%2ev2.(*decoder).prepare(0xc0002c71a0, 0xc00042a070, {0xfb4200?, 0xc0004281e0?, 0xc000072e08?})
	/home/solomonjacobs/go/pkg/mod/gopkg.in/yaml.v2@v2.4.0/decode.go:313 +0x256
gopkg.in/yaml%2ev2.(*decoder).unmarshal(0xc0002c71a0, 0xc00042a070, {0xfb4200?, 0xc0004281e0?, 0xaa506b?})
	/home/solomonjacobs/go/pkg/mod/gopkg.in/yaml.v2@v2.4.0/decode.go:364 +0x13b
gopkg.in/yaml%2ev2.(*decoder).document(...)
	/home/solomonjacobs/go/pkg/mod/gopkg.in/yaml.v2@v2.4.0/decode.go:384
gopkg.in/yaml%2ev2.(*decoder).unmarshal(0xea7a00?, 0xc0004281e0?, {0xfb4200?, 0xc0004281e0?, 0x46467e?})
	/home/solomonjacobs/go/pkg/mod/gopkg.in/yaml.v2@v2.4.0/decode.go:360 +0x110
gopkg.in/yaml%2ev2.unmarshal({0xc0003ea800, 0x34, 0x40}, {0xea7a00, 0xc0004281e0}, 0x1)
	/home/solomonjacobs/go/pkg/mod/gopkg.in/yaml.v2@v2.4.0/yaml.go:148 +0x371
gopkg.in/yaml%2ev2.UnmarshalStrict(...)
	/home/solomonjacobs/go/pkg/mod/gopkg.in/yaml.v2@v2.4.0/yaml.go:89
github.com/prometheus/alertmanager/config.Load({0xc0003ea7c0, 0x34})
	/home/solomonjacobs/get/f-alertmanager/config/config.go:287 +0x55
github.com/prometheus/alertmanager/config.LoadFile({0x7fff2ce3c111, 0x1e})
	/home/solomonjacobs/get/f-alertmanager/config/config.go:313 +0x37
github.com/prometheus/alertmanager/cli.CheckConfig({0xc0003c1e80?, 0xc000239ca8?, 0x419c1f?})
	/home/solomonjacobs/get/f-alertmanager/cli/check_config.go:67 +0x1d4
github.com/prometheus/alertmanager/cli.(*checkConfigCmd).checkConfig(...)
	/home/solomonjacobs/get/f-alertmanager/cli/check_config.go:48
github.com/alecthomas/kingpin/v2.(*actionMixin).applyActions(...)
	/home/solomonjacobs/go/pkg/mod/github.com/alecthomas/kingpin/v2@v2.4.0/actions.go:28
github.com/alecthomas/kingpin/v2.(*Application).applyActions(0xc0001a4e00?, 0xc0002bc7e0)
	/home/solomonjacobs/go/pkg/mod/github.com/alecthomas/kingpin/v2@v2.4.0/app.go:568 +0xd0
github.com/alecthomas/kingpin/v2.(*Application).execute(0xc0001a4e00, 0xc0002bc7e0, {0xc0003c1e50, 0x1, 0x1})
	/home/solomonjacobs/go/pkg/mod/github.com/alecthomas/kingpin/v2@v2.4.0/app.go:398 +0x65
github.com/alecthomas/kingpin/v2.(*Application).Parse(0xc0001a4e00, {0xc000034370?, 0xc000034370?, 0xc00040ea80?})
	/home/solomonjacobs/go/pkg/mod/github.com/alecthomas/kingpin/v2@v2.4.0/app.go:230 +0x13f
github.com/prometheus/alertmanager/cli.Execute()
	/home/solomonjacobs/get/f-alertmanager/cli/root.go:184 +0x12c9
main.main()
	/home/solomonjacobs/get/f-alertmanager/cmd/amtool/main.go:19 +0xf

Previously, `alertmanager` and `amtool` would dereference a null pointer upon encountering a configuration such as: ```yaml receivers: - name: "example" webhook_configs: - ``` Here is an example traceback: ```sh $ ./amtool check-config config/testdata/conf.crash.yml Checking 'config/testdata/conf.crash.yml'panic: runtime error: invalid memory address or nil pointer dereference [recovered, repanicked] [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0xba8d25] goroutine 1 [running]: gopkg.in/yaml%2ev2.handleErr(0xc000239a28) /home/solomonjacobs/go/pkg/mod/gopkg.in/yaml.v2@v2.4.0/yaml.go:249 +0x68 panic({0xea0000?, 0x19a8260?}) /usr/local/go/src/runtime/panic.go:783 +0x132 github.com/prometheus/alertmanager/config.(*Config).UnmarshalYAML(0xc0004281e0, 0x773fed0393b0?) /home/solomonjacobs/get/f-alertmanager/config/config.go:523 +0xce5 gopkg.in/yaml%2ev2.(*decoder).callUnmarshaler(0xc0002c71a0, 0xc00042a070, {0x773fed0393b0, 0xc0004281e0}) /home/solomonjacobs/go/pkg/mod/gopkg.in/yaml.v2@v2.4.0/decode.go:270 +0x94 gopkg.in/yaml%2ev2.(*decoder).prepare(0xc0002c71a0, 0xc00042a070, {0xfb4200?, 0xc0004281e0?, 0xc000072e08?}) /home/solomonjacobs/go/pkg/mod/gopkg.in/yaml.v2@v2.4.0/decode.go:313 +0x256 gopkg.in/yaml%2ev2.(*decoder).unmarshal(0xc0002c71a0, 0xc00042a070, {0xfb4200?, 0xc0004281e0?, 0xaa506b?}) /home/solomonjacobs/go/pkg/mod/gopkg.in/yaml.v2@v2.4.0/decode.go:364 +0x13b gopkg.in/yaml%2ev2.(*decoder).document(...) /home/solomonjacobs/go/pkg/mod/gopkg.in/yaml.v2@v2.4.0/decode.go:384 gopkg.in/yaml%2ev2.(*decoder).unmarshal(0xea7a00?, 0xc0004281e0?, {0xfb4200?, 0xc0004281e0?, 0x46467e?}) /home/solomonjacobs/go/pkg/mod/gopkg.in/yaml.v2@v2.4.0/decode.go:360 +0x110 gopkg.in/yaml%2ev2.unmarshal({0xc0003ea800, 0x34, 0x40}, {0xea7a00, 0xc0004281e0}, 0x1) /home/solomonjacobs/go/pkg/mod/gopkg.in/yaml.v2@v2.4.0/yaml.go:148 +0x371 gopkg.in/yaml%2ev2.UnmarshalStrict(...) /home/solomonjacobs/go/pkg/mod/gopkg.in/yaml.v2@v2.4.0/yaml.go:89 github.com/prometheus/alertmanager/config.Load({0xc0003ea7c0, 0x34}) /home/solomonjacobs/get/f-alertmanager/config/config.go:287 +0x55 github.com/prometheus/alertmanager/config.LoadFile({0x7fff2ce3c111, 0x1e}) /home/solomonjacobs/get/f-alertmanager/config/config.go:313 +0x37 github.com/prometheus/alertmanager/cli.CheckConfig({0xc0003c1e80?, 0xc000239ca8?, 0x419c1f?}) /home/solomonjacobs/get/f-alertmanager/cli/check_config.go:67 +0x1d4 github.com/prometheus/alertmanager/cli.(*checkConfigCmd).checkConfig(...) /home/solomonjacobs/get/f-alertmanager/cli/check_config.go:48 github.com/alecthomas/kingpin/v2.(*actionMixin).applyActions(...) /home/solomonjacobs/go/pkg/mod/github.com/alecthomas/kingpin/v2@v2.4.0/actions.go:28 github.com/alecthomas/kingpin/v2.(*Application).applyActions(0xc0001a4e00?, 0xc0002bc7e0) /home/solomonjacobs/go/pkg/mod/github.com/alecthomas/kingpin/v2@v2.4.0/app.go:568 +0xd0 github.com/alecthomas/kingpin/v2.(*Application).execute(0xc0001a4e00, 0xc0002bc7e0, {0xc0003c1e50, 0x1, 0x1}) /home/solomonjacobs/go/pkg/mod/github.com/alecthomas/kingpin/v2@v2.4.0/app.go:398 +0x65 github.com/alecthomas/kingpin/v2.(*Application).Parse(0xc0001a4e00, {0xc000034370?, 0xc000034370?, 0xc00040ea80?}) /home/solomonjacobs/go/pkg/mod/github.com/alecthomas/kingpin/v2@v2.4.0/app.go:230 +0x13f github.com/prometheus/alertmanager/cli.Execute() /home/solomonjacobs/get/f-alertmanager/cli/root.go:184 +0x12c9 main.main() /home/solomonjacobs/get/f-alertmanager/cmd/amtool/main.go:19 +0xf ``` Related-to: prometheus#4912 Signed-off-by: Solomon Jacobs <solomonjacobs@protonmail.com>

siavashs · 2026-02-08T22:19:40Z

config/config.go

 		for _, wh := range rcv.WebhookConfigs {
+			if wh == nil {
+				wh = &WebhookConfig{}
+				return wh.UnmarshalYAML(func(any) error { return nil })


Why do we need to call wh.UnmarshalYAML() and return?
Doesn't this mean we simply stop processing the rest of the config?

Also since most of these fields are required, should we just log an error and exit the process?

Why do we need to call wh.UnmarshalYAML() and return?

The primary motivation for me was to preserve the errors, that it would have encountered, if it did parse an empty config. A different way to accomplish this is by defining an error, and share it between the UnmarshalYAML and this call site.

Doesn't this mean we simply stop processing the rest of the config?

For this specific notifier, this call always terminates here. In the case of some of the error is only exposed later.

Also since most of these fields are required, should we just log an error and exit the process?

We should definitely just return the error: If we exit the process, this might break the reloading mechanism. On initial startup, the process will log a message and then exit. I will not touch the error handling in that function in this PR, although I do have some cleanups in mind. I imagine you meant something different, tough.

I hope that clears up your questions. I would then propose the following: Have a validation function per WebhookConfig, EmailConfig etc. This will be source of truth, rather then logic being scattered across (c *Config) UnmarshalYAML , (c *WebhookConfig) UnmarshalYAML. @siavashs do you like this proposal?
@TheMeier Would such a refactoring interfere with your current work?

In the future, I would like to

move the validation out of the UnmarshalYAML,

use a parse, don't validate approach,

have consistent error messages, if required fields are missing.

But that is for different PR.

Actually I thought about validation functions per notifier-config too, would be a nice addtion. I don't think it would interfer too much wih my experiment. But maybe it might be worthwhile to wait a bit more and see if a path emerges for splitting out notifier-specific code before adding more per-notifier code.

The primary motivation for me was to preserve the errors, that it would have encountered, if it did parse an empty config. A different way to accomplish this is by defining an error, and share it between the UnmarshalYAML and this call site.

We must use either an existing error type or create a custom one without needing to make unnecessary method calls to produce an error.
For example you can construct TypeErrors, or just a plain err.Error which says what failed.

For this specific notifier, this call always terminates here. In the case of some of the error is only exposed later.

There are following code blocks which need to process the rest of the config, an early return here means that we never process those.
I think we need to gather all errors and return them to the caller, then the caller decides what to do with them.
Otherwise the user will become aware of the errors one by one, forcing them to go through iterations rather than receiving a full validation result from the config.

I would then propose the following: Have a validation function per WebhookConfig, EmailConfig etc. This will be source of truth, rather then logic being scattered across (c *Config) UnmarshalYAML , (c *WebhookConfig) UnmarshalYAML.

During the maintainers meet up we discussed about breaking up the config file into multiple parts and having independent load and validations per part, for example:

if you have 10 receivers and you add a new broken one we just skip loading that and show an error in the logs, ui, api, etc.

This should be done both during startup and reload, which should help Alertmanager to start and let the user fix the broken part of the config rather than having complete downtime.

Reloads are similarly handled, in a scenario when multiple users inject their receiver configs, a broken new receiver should not block the receiver added by another user to load, etc.

So I like the idea of decoupling the config load mechanism and also braking up the config file into smaller pieces.

There are following code blocks which need to process the rest of the config, an early return here means that we never process those.

Yes, but this behavior was already present, it is not introduced in this change. We can collect errors, but that is a much larger change, which affects the control in many places. For now, all this change should do is change crashes into error messages, the same way it is done everywhere else. I will do follow-up with other suggestions of yours.

I created an issue to track the other findings, which we have discussed here. @TheMeier @siavashs Let me know what you think: #4989

Let's create a missing <section> config error and return that instead of calling unmarshal on empty config.
Otherwise LGTM.

I think it is fine to delay error collection to a later point in time. Creating a missing <section> config error also sounds good to me

SoloJacobs force-pushed the fix-fuzz-findings branch from 0bb15fd to c92f9e3 Compare February 8, 2026 18:06

siavashs reviewed Feb 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix crash upon reading empty config for notifier#4979

Fix crash upon reading empty config for notifier#4979
SoloJacobs wants to merge 1 commit intoprometheus:mainfrom
SoloJacobs:fix-fuzz-findings

SoloJacobs commented Feb 8, 2026

Uh oh!

siavashs Feb 8, 2026

Uh oh!

SoloJacobs Feb 9, 2026

Uh oh!

TheMeier Feb 9, 2026

Uh oh!

siavashs Feb 9, 2026 •

edited

Loading

Uh oh!

SoloJacobs Feb 10, 2026

Uh oh!

siavashs Feb 10, 2026 •

edited

Loading

Uh oh!

TheMeier Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

SoloJacobs commented Feb 8, 2026

Uh oh!

siavashs Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

SoloJacobs Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

TheMeier Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

siavashs Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SoloJacobs Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

siavashs Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TheMeier Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

siavashs Feb 9, 2026 •

edited

Loading

siavashs Feb 10, 2026 •

edited

Loading