diff --git a/scripts/mitigation-scripts/stale-lb-dsr-rules/README.md b/scripts/mitigation-scripts/stale-lb-dsr-rules/README.md new file mode 100644 index 0000000..81e59f7 --- /dev/null +++ b/scripts/mitigation-scripts/stale-lb-dsr-rules/README.md @@ -0,0 +1,132 @@ +# Stale LB DSR Rules Cleanup + +## Overview + +This mitigation script automatically detects and removes stale Load Balancer Direct Server Return (LB DSR) rules from VFP (Virtual Filtering Platform) that reference non-existent backend endpoints. It runs continuously to maintain network health by cleaning up orphaned rules that can cause connectivity issues. + +## Problem Statement + +When backend endpoints are removed or become unavailable, the corresponding LB DSR rules in VFP may not be cleaned up properly. These stale rules can: +- Cause packet routing failures +- Lead to connection timeouts +- Create unnecessary overhead in the networking stack +- Result in traffic being sent to non-existent endpoints + +## Solution + +The `cleanup-stale-lb-rules.ps1` script: +1. Checks and sets the required registry configuration for LB DSR feature management +2. Continuously monitors VFP LB DSR rules (both IPv4 and IPv6) +3. Compares rule destination IPs (DIPs) against active HNS endpoints +4. Automatically removes rules that reference non-existent endpoints + +## Prerequisites + +- Windows Server with HNS (Host Network Service) enabled +- VFP control utilities (`vfpctrl.exe`) available +- PowerShell with administrator privileges +- HNS PowerShell module + +## Usage + +### Running the Script on a Single Node + +```powershell +.\cleanup-stale-lb-rules.ps1 +``` + +The script will: +1. Check registry key `HKLM:\SYSTEM\CurrentControlSet\Policies\Microsoft\FeatureManagement\Overrides\140377743` +2. If the key value is 1, set it to 0 and restart the node (this disables PR 13179278 which is causing delete LB RPC calls from KubeProxy to fail with Invalid IP Error - ICM: 719903780) +3. Start a continuous monitoring loop with 10-second intervals +4. Clean up any stale LB DSR rules found + +**Note:** This approach fixes issues on a single node. If the issue is widespread across the cluster, deploy the solution using a DaemonSet: + +```powershell +kubectl create -f cleanup-stale-lb-rules.yaml +``` + +This will run the mitigation script as HPC pods on all affected nodes. + +### Configuration + +You can modify these parameters at the top of the script: + +- **`$groups`**: VFP groups to monitor (default: `LB_DSR_IPv4_OUT`, `LB_DSR_IPv6_OUT`) +- **`$refreshIntervalSeconds`**: Time between cleanup iterations (default: 10 seconds) + +## How It Works + +### 1. Registry Check +The script first ensures the feature flag registry key (140377743) is set to 0. If not, it sets the value and restarts the node. + +### 2. Endpoint Collection +- Retrieves all HNS policies +- Extracts endpoint references +- Builds a dictionary of valid endpoint IP addresses + +### 3. Rule Validation +For each VFP port and LB DSR group: +- Lists all rules in the `LB_DSR` layer +- Extracts DIP (Destination IP) ranges from each rule +- Compares DIPs against the valid endpoint dictionary + +### 4. Cleanup +- Rules with DIPs not found in active endpoints are flagged as stale +- Stale rules are automatically deleted using `vfpctrl /remove-rule` + +## Output Examples + +### Healthy State +``` +All DIP ranges are present in the dictionary. +``` + +### Stale Rules Detected +``` +Missing DIP ranges: + - 10.244.0.25 + - fdf5:5d67:b9ce:b28f::13f +Deleting rule : ruleId: ABC123, port: Port1, group: LB_DSR_IPv4_OUT +``` + +## Monitoring + +The script provides color-coded output: +- **Green**: Healthy state, all rules valid +- **Yellow**: Configuration changes or rule deletion in progress +- **Red**: Stale rules detected +- **Cyan**: Status updates and iteration markers + +## Important Notes + +- The script runs indefinitely until manually stopped (Ctrl+C) +- Node restart may occur on first run if registry configuration is incorrect +- Ensure no legitimate endpoint updates are in progress during cleanup to avoid false positives +- The script requires elevated privileges to modify VFP rules and registry settings + +## Troubleshooting + +### Script doesn't detect stale rules +- Verify VFP and HNS are functioning correctly +- Check that `vfpctrl.exe` is accessible in the system PATH +- Ensure HNS endpoints are properly registered + +### Node restarts unexpectedly +- This is expected behavior if the registry key is not set to 0 +- After restart, the script will continue normal operation + +### Permission errors +- Run PowerShell as Administrator +- Verify account has rights to modify VFP rules and registry + +## Related Documentation + +- [VFP Documentation](../../helper/VFP.psm1) +- [HNS Module](../HNS/) +- [Network Health Monitoring](../../networkhealth/) + +## Support + +For issues or questions, please refer to the main repository documentation or open an issue. diff --git a/scripts/mitigation-scripts/stale-lb-dsr-rules/cleanup-orphan-rules.ps1 b/scripts/mitigation-scripts/stale-lb-dsr-rules/cleanup-orphan-rules.ps1 new file mode 100644 index 0000000..2a322e2 --- /dev/null +++ b/scripts/mitigation-scripts/stale-lb-dsr-rules/cleanup-orphan-rules.ps1 @@ -0,0 +1,195 @@ +function Write-KustoLog { + param( + [string]$Level, + [string]$Event, + [string]$Message, + [hashtable]$Properties = @{} + ) + $log = [ordered]@{ + Timestamp = (Get-Date -Format 'o') + Level = $Level + Event = $Event + Message = $Message + Computer = $env:COMPUTERNAME + } + foreach ($k in $Properties.Keys) { + $log[$k] = $Properties[$k] + } + $log | ConvertTo-Json -Compress | Write-Output +} + +$nw = Get-HnsNetwork | Where Name -Eq azure +$nodeIpv4 = $nw.ManagementIP +$nodeIpv6 = $nw.ManagementIPv6 + +Write-KustoLog -Level 'Info' -Event 'ScriptStarted' -Message 'Cleanup orphan rules script started.' -Properties @{ + NodeIPv4 = $nodeIpv4 + NodeIPv6 = $nodeIpv6 +} + + +function Get-ExpectedVfpRuleIds { + param( + [string]$nodeIPv4, + [string]$nodeIPv6, + [array]$hnsPolicies + ) + + $expVfpRuleIds = @{} + + foreach ($policy in $hnsPolicies) { + $pol = $policy.Policies[0] + $alloc = $policy.Resources.Allocators[0] + + # Skip non-DSR policies + if (-not $alloc.IsDSR) { continue } + + $extPort = $pol.ExternalPort + $intPort = $pol.InternalPort + $protocol = $pol.Protocol + $idPrefix = $alloc.ID.Substring(0,5).ToUpper() + + # Determine nodeIP based on IPv6 flag + if ($pol.IPv6 -or $alloc.IsIpv6) { + if (-not $nodeIPv6) { continue } + $nodeIP = $nodeIPv6 + } else { + $nodeIP = $nodeIPv4 + } + + # VIP: use VIPs[0] if present, otherwise nodeIP + if ($pol.VIPs -and $pol.VIPs.Count -gt 0) { + $vip = $pol.VIPs[0] + } else { + $vip = $nodeIP + } + + $expVfpRuleId = "LB_DSR_${nodeIP}_${vip}_${extPort}_${intPort}_${protocol}_${idPrefix}" + $expVfpRuleIds[$expVfpRuleId] = $true + } + + return $expVfpRuleIds +} + +function Get-StaleVfpRuleIds { + param( + [hashtable]$expVfpRuleIds + ) + + $staleVfpRuleIds = @{} + $groups = @("LB_DSR_IPv4_OUT", "LB_DSR_IPv6_OUT") + $ports = (vfpctrl.exe /list-vmswitch-port /format 1 | ConvertFrom-Json).Ports.Name + foreach ($port in $ports) { + foreach ($group in $groups) { + $rules = (vfpctrl /port $port /layer LB_DSR /group $group /list-rule /format 1 | ConvertFrom-Json).Rules + foreach ($rule in $rules) { + $ruleId = $rule.Id + Write-Host "Checking VFP rule: Port=$port, Group=$group, RuleId=$ruleId" -ForegroundColor Gray + if ($expVfpRuleIds[$ruleId] -eq $null) { + $key = "${port}::${ruleId}" + $staleVfpRuleIds[$key] = @{ + Port = $port + Layer = "LB_DSR" + Group = $group + RuleId = $ruleId + } + } + } + } + } + + return $staleVfpRuleIds +} + +function Test-VfpRuleHasMatchingPolicy { + param( + [string]$port, + [string]$layer, + [string]$group, + [string]$ruleId, + [array]$hnsPolicies + ) + + $obj = (vfpctrl /get-rule-info /port $port /layer $layer /group $group /rule $ruleId /format 1 | ConvertFrom-Json).Rules + + $protocol = $obj.Conditions | Where-Object { $_.ProtocolList } | ForEach-Object { $_.ProtocolList[0] } + + $destIP = $obj.Conditions | Where-Object { $_.DestinationIPv4RangeList -or $_.DestinationIPv6RangeList } | ForEach-Object { + if ($_.DestinationIPv4RangeList) { $_.DestinationIPv4RangeList[0].H } + elseif ($_.DestinationIPv6RangeList) { $_.DestinationIPv6RangeList[0].H } + } + + $destPort = $obj.Conditions | Where-Object { $_.DestinationPortList } | ForEach-Object { $_.DestinationPortList[0] } + + $found = $hnsPolicies | Where-Object { + $_.Policies[0].VIPs -contains $destIP -and + $_.Policies[0].ExternalPort -eq $destPort -and + $_.Policies[0].Protocol -eq $protocol + } + + return [bool]$found +} + +$iterationIntervalSeconds = if ($env:ITERATION_INTERVAL_SECONDS) { [int]$env:ITERATION_INTERVAL_SECONDS } else { 30 } +$pass2DelaySeconds = if ($env:PASS2_DELAY_SECONDS) { [int]$env:PASS2_DELAY_SECONDS } else { 10 } + +Write-KustoLog -Level 'Info' -Event 'ConfigLoaded' -Message 'Configuration loaded.' -Properties @{ + IterationIntervalSeconds = $iterationIntervalSeconds + Pass2DelaySeconds = $pass2DelaySeconds +} + +while ($true) { + $hnsPolicies = Get-HnsPolicyList + Write-KustoLog -Level 'Info' -Event 'Pass1Started' -Message 'First pass: collecting expected and stale VFP rule IDs.' -Properties @{ PolicyCount = $hnsPolicies.Count } + $expVfpRuleIdsBefore = Get-ExpectedVfpRuleIds -nodeIPv4 $nodeIpv4 -nodeIPv6 $nodeIpv6 -hnsPolicies $hnsPolicies + $staleVfpRuleIdsBefore = Get-StaleVfpRuleIds -expVfpRuleIds $expVfpRuleIdsBefore + Write-KustoLog -Level 'Info' -Event 'Pass1Completed' -Message 'First pass completed.' -Properties @{ + ExpectedRuleCount = $expVfpRuleIdsBefore.Count + StaleRuleCount = $staleVfpRuleIdsBefore.Count + } + + Start-Sleep -Seconds $pass2DelaySeconds + + $hnsPolicies = Get-HnsPolicyList + Write-KustoLog -Level 'Info' -Event 'Pass2Started' -Message 'Second pass: collecting expected and stale VFP rule IDs.' -Properties @{ PolicyCount = $hnsPolicies.Count } + $expVfpRuleIdsAfter = Get-ExpectedVfpRuleIds -nodeIPv4 $nodeIpv4 -nodeIPv6 $nodeIpv6 -hnsPolicies $hnsPolicies + $staleVfpRuleIdsAfter = Get-StaleVfpRuleIds -expVfpRuleIds $expVfpRuleIdsAfter + Write-KustoLog -Level 'Info' -Event 'Pass2Completed' -Message 'Second pass completed.' -Properties @{ + ExpectedRuleCount = $expVfpRuleIdsAfter.Count + StaleRuleCount = $staleVfpRuleIdsAfter.Count + } + + $orphanVfpRules = @{} + foreach ($key in $staleVfpRuleIdsBefore.Keys) { + if ($staleVfpRuleIdsAfter.ContainsKey($key)) { + $orphanVfpRules[$key] = $staleVfpRuleIdsAfter[$key] + } + } + + Write-KustoLog -Level 'Info' -Event 'OrphanDetection' -Message 'Orphan detection completed.' -Properties @{ OrphanRuleCount = $orphanVfpRules.Count } + + if ($orphanVfpRules.Count -eq 0) { + Write-KustoLog -Level 'Info' -Event 'NoOrphanRules' -Message 'No orphan VFP rules found.' + } else { + foreach ($entry in $orphanVfpRules.Values) { + $ruleProps = @{ + Port = $entry.Port + Layer = $entry.Layer + Group = $entry.Group + RuleId = $entry.RuleId + } + Write-KustoLog -Level 'Warning' -Event 'OrphanRuleDetected' -Message "Orphan VFP rule detected." -Properties $ruleProps + + if (-not (Test-VfpRuleHasMatchingPolicy -port $entry.Port -layer $entry.Layer -group $entry.Group -ruleId $entry.RuleId -hnsPolicies $hnsPolicies)) { + Write-KustoLog -Level 'Warning' -Event 'OrphanRuleNoPolicy' -Message "Rule has no matching HNS policy and is a candidate for removal." -Properties $ruleProps + # Remove the orphan rule + # vfpctrl.exe /port $entry.Port /layer $entry.Layer /group $entry.Group /delete-rule /rule $entry.RuleId + } else { + Write-KustoLog -Level 'Info' -Event 'OrphanRuleHasPolicy' -Message "Rule has a matching HNS policy and should not be removed." -Properties $ruleProps + } + } + } + + Write-KustoLog -Level 'Info' -Event 'IterationCompleted' -Message "Iteration completed. Sleeping for $iterationIntervalSeconds seconds." + Start-Sleep -Seconds $iterationIntervalSeconds +} \ No newline at end of file diff --git a/scripts/mitigation-scripts/stale-lb-dsr-rules/cleanup-orphan-rules.yaml b/scripts/mitigation-scripts/stale-lb-dsr-rules/cleanup-orphan-rules.yaml new file mode 100644 index 0000000..879745c --- /dev/null +++ b/scripts/mitigation-scripts/stale-lb-dsr-rules/cleanup-orphan-rules.yaml @@ -0,0 +1,264 @@ +apiVersion: v1 +kind: Namespace +metadata: + name: demo + namespace: demo +--- +apiVersion: v1 +kind: ConfigMap +metadata: + name: cleanup-orphan-rules + namespace: demo +data: + cleanup-orphan-rules.ps1: | + function Write-KustoLog { + param( + [string]$Level, + [string]$Event, + [string]$Message, + [hashtable]$Properties = @{} + ) + $log = [ordered]@{ + Timestamp = (Get-Date -Format 'o') + Level = $Level + Event = $Event + Message = $Message + Computer = $env:COMPUTERNAME + } + foreach ($k in $Properties.Keys) { + $log[$k] = $Properties[$k] + } + $log | ConvertTo-Json -Compress | Write-Output + } + + $nw = Get-HnsNetwork | Where Name -Eq azure + $nodeIpv4 = $nw.ManagementIP + $nodeIpv6 = $nw.ManagementIPv6 + + Write-KustoLog -Level 'Info' -Event 'ScriptStarted' -Message 'Cleanup orphan rules script started.' -Properties @{ + NodeIPv4 = $nodeIpv4 + NodeIPv6 = $nodeIpv6 + } + + + function Get-ExpectedVfpRuleIds { + param( + [string]$nodeIPv4, + [string]$nodeIPv6, + [array]$hnsPolicies + ) + + $expVfpRuleIds = @{} + + foreach ($policy in $hnsPolicies) { + $pol = $policy.Policies[0] + $alloc = $policy.Resources.Allocators[0] + + # Skip non-DSR policies + if (-not $alloc.IsDSR) { continue } + + $extPort = $pol.ExternalPort + $intPort = $pol.InternalPort + $protocol = $pol.Protocol + $idPrefix = $alloc.ID.Substring(0,5).ToUpper() + + # Determine nodeIP based on IPv6 flag + if ($pol.IPv6 -or $alloc.IsIpv6) { + if (-not $nodeIPv6) { continue } + $nodeIP = $nodeIPv6 + } else { + $nodeIP = $nodeIPv4 + } + + # VIP: use VIPs[0] if present, otherwise nodeIP + if ($pol.VIPs -and $pol.VIPs.Count -gt 0) { + $vip = $pol.VIPs[0] + } else { + $vip = $nodeIP + } + + $expVfpRuleId = "LB_DSR_${nodeIP}_${vip}_${extPort}_${intPort}_${protocol}_${idPrefix}" + $expVfpRuleIds[$expVfpRuleId] = $true + } + + return $expVfpRuleIds + } + + function Get-StaleVfpRuleIds { + param( + [hashtable]$expVfpRuleIds + ) + + $staleVfpRuleIds = @{} + $groups = @("LB_DSR_IPv4_OUT", "LB_DSR_IPv6_OUT") + $ports = (vfpctrl.exe /list-vmswitch-port /format 1 | ConvertFrom-Json).Ports.Name + foreach ($port in $ports) { + foreach ($group in $groups) { + $rules = (vfpctrl /port $port /layer LB_DSR /group $group /list-rule /format 1 | ConvertFrom-Json).Rules + foreach ($rule in $rules) { + $ruleId = $rule.Id + if ($expVfpRuleIds[$ruleId] -eq $null) { + $key = "${port}::${ruleId}" + $staleVfpRuleIds[$key] = @{ + Port = $port + Layer = "LB_DSR" + Group = $group + RuleId = $ruleId + } + } + } + } + } + + return $staleVfpRuleIds + } + + function Test-VfpRuleHasMatchingPolicy { + param( + [string]$port, + [string]$layer, + [string]$group, + [string]$ruleId, + [array]$hnsPolicies + ) + + $obj = (vfpctrl /get-rule-info /port $port /layer $layer /group $group /rule $ruleId /format 1 | ConvertFrom-Json).Rules + + $protocol = $obj.Conditions | Where-Object { $_.ProtocolList } | ForEach-Object { $_.ProtocolList[0] } + + $destIP = $obj.Conditions | Where-Object { $_.DestinationIPv4RangeList -or $_.DestinationIPv6RangeList } | ForEach-Object { + if ($_.DestinationIPv4RangeList) { $_.DestinationIPv4RangeList[0].H } + elseif ($_.DestinationIPv6RangeList) { $_.DestinationIPv6RangeList[0].H } + } + + $destPort = $obj.Conditions | Where-Object { $_.DestinationPortList } | ForEach-Object { $_.DestinationPortList[0] } + + $found = $hnsPolicies | Where-Object { + $_.Policies[0].VIPs -contains $destIP -and + $_.Policies[0].ExternalPort -eq $destPort -and + $_.Policies[0].Protocol -eq $protocol + } + + return [bool]$found + } + + $iterationIntervalSeconds = if ($env:ITERATION_INTERVAL_SECONDS) { [int]$env:ITERATION_INTERVAL_SECONDS } else { 60 } + $pass2DelaySeconds = if ($env:PASS2_DELAY_SECONDS) { [int]$env:PASS2_DELAY_SECONDS } else { 120 } + + Write-KustoLog -Level 'Info' -Event 'ConfigLoaded' -Message 'Configuration loaded.' -Properties @{ + IterationIntervalSeconds = $iterationIntervalSeconds + Pass2DelaySeconds = $pass2DelaySeconds + } + + while ($true) { + $hnsPolicies = Get-HnsPolicyList + Write-KustoLog -Level 'Info' -Event 'Pass1Started' -Message 'First pass: collecting expected and stale VFP rule IDs.' -Properties @{ PolicyCount = $hnsPolicies.Count } + $expVfpRuleIdsBefore = Get-ExpectedVfpRuleIds -nodeIPv4 $nodeIpv4 -nodeIPv6 $nodeIpv6 -hnsPolicies $hnsPolicies + $staleVfpRuleIdsBefore = Get-StaleVfpRuleIds -expVfpRuleIds $expVfpRuleIdsBefore + Write-KustoLog -Level 'Info' -Event 'Pass1Completed' -Message 'First pass completed.' -Properties @{ + ExpectedRuleCount = $expVfpRuleIdsBefore.Count + StaleRuleCount = $staleVfpRuleIdsBefore.Count + } + + Start-Sleep -Seconds $pass2DelaySeconds + + $hnsPolicies = Get-HnsPolicyList + Write-KustoLog -Level 'Info' -Event 'Pass2Started' -Message 'Second pass: collecting expected and stale VFP rule IDs.' -Properties @{ PolicyCount = $hnsPolicies.Count } + $expVfpRuleIdsAfter = Get-ExpectedVfpRuleIds -nodeIPv4 $nodeIpv4 -nodeIPv6 $nodeIpv6 -hnsPolicies $hnsPolicies + $staleVfpRuleIdsAfter = Get-StaleVfpRuleIds -expVfpRuleIds $expVfpRuleIdsAfter + Write-KustoLog -Level 'Info' -Event 'Pass2Completed' -Message 'Second pass completed.' -Properties @{ + ExpectedRuleCount = $expVfpRuleIdsAfter.Count + StaleRuleCount = $staleVfpRuleIdsAfter.Count + } + + $orphanVfpRules = @{} + foreach ($key in $staleVfpRuleIdsBefore.Keys) { + if ($staleVfpRuleIdsAfter.ContainsKey($key)) { + $orphanVfpRules[$key] = $staleVfpRuleIdsAfter[$key] + } + } + + Write-KustoLog -Level 'Info' -Event 'OrphanDetection' -Message 'Orphan detection completed.' -Properties @{ OrphanRuleCount = $orphanVfpRules.Count } + + if ($orphanVfpRules.Count -eq 0) { + Write-KustoLog -Level 'Info' -Event 'NoOrphanRules' -Message 'No orphan VFP rules found.' + } else { + foreach ($entry in $orphanVfpRules.Values) { + $ruleProps = @{ + Port = $entry.Port + Layer = $entry.Layer + Group = $entry.Group + RuleId = $entry.RuleId + } + Write-KustoLog -Level 'Warning' -Event 'OrphanRuleDetected' -Message "Orphan VFP rule detected." -Properties $ruleProps + + if (-not (Test-VfpRuleHasMatchingPolicy -port $entry.Port -layer $entry.Layer -group $entry.Group -ruleId $entry.RuleId -hnsPolicies $hnsPolicies)) { + Write-KustoLog -Level 'Warning' -Event 'OrphanRuleNoPolicy' -Message "Rule has no matching HNS policy and is a candidate for removal." -Properties $ruleProps + # Remove the orphan rule + vfpctrl.exe /port $entry.Port /layer $entry.Layer /group $entry.Group /delete-rule /rule $entry.RuleId + } else { + Write-KustoLog -Level 'Info' -Event 'OrphanRuleHasPolicy' -Message "Rule has a matching HNS policy and should not be removed." -Properties $ruleProps + } + } + } + + Write-KustoLog -Level 'Info' -Event 'IterationCompleted' -Message "Iteration completed. Sleeping for $iterationIntervalSeconds seconds." + Start-Sleep -Seconds $iterationIntervalSeconds + } +--- +apiVersion: apps/v1 +kind: DaemonSet +metadata: + name: cleanup-orphan-rules + namespace: demo + labels: + app: cleanup-orphan-rules +spec: + selector: + matchLabels: + app: cleanup-orphan-rules + template: + metadata: + labels: + app: cleanup-orphan-rules + spec: + securityContext: + windowsOptions: + hostProcess: true + runAsUserName: 'NT AUTHORITY\SYSTEM' + hostNetwork: true + affinity: + nodeAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + nodeSelectorTerms: + - matchExpressions: + - key: kubernetes.io/os + operator: In + values: + - windows + containers: + - name: cleanup-orphan-rules + image: mcr.microsoft.com/dotnet/framework/samples:aspnetapp + imagePullPolicy: IfNotPresent + env: + - name: ITERATION_INTERVAL_SECONDS + value: "30" + - name: PASS2_DELAY_SECONDS + value: "10" + command: + - powershell.exe + - -File + - C:\scripts\cleanup-orphan-rules.ps1 + volumeMounts: + - name: script + mountPath: C:\scripts + - name: kube-path + mountPath: C:\k + terminationGracePeriodSeconds: 60 + volumes: + - name: script + configMap: + name: cleanup-orphan-rules + - name: kube-path + hostPath: + path: C:\k diff --git a/scripts/mitigation-scripts/stale-lb-dsr-rules/cleanup-stale-lb-rules.ps1 b/scripts/mitigation-scripts/stale-lb-dsr-rules/cleanup-stale-lb-rules.ps1 new file mode 100644 index 0000000..6b888db --- /dev/null +++ b/scripts/mitigation-scripts/stale-lb-dsr-rules/cleanup-stale-lb-rules.ps1 @@ -0,0 +1,135 @@ +$groups = @("LB_DSR_IPv4_OUT", "LB_DSR_IPv6_OUT") +$refreshIntervalSeconds = 10 + +function Get-EndpointIpDictionary { + $dict = @{} + + $policies = Get-HnsPolicyList + + $endpointIds = $policies.References | + Where-Object { $_ -like "/endpoints/*" } | + ForEach-Object { ($_ -split "/")[-1] } | + Sort-Object -Unique + + $endpointIds | ForEach-Object { + try { + $endpoint = Get-HnsEndpoint -Id $_ + } catch { + Write-Host "Failed to get HNS endpoint $_`: $($_.Exception.Message)" -ForegroundColor Yellow + continue + } + + if ($null -eq $endpoint) { + Write-Host "HNS endpoint $_ not found, skipping." -ForegroundColor Yellow + continue + } + + if ($null -ne $endpoint.IPAddress) { + $dict[$endpoint.IPAddress] = $true + } + if ($null -ne $endpoint.IPv6Address) { + $dict[$endpoint.IPv6Address] = $true + } + } + + return $dict +} + +function Get-StaleRuleCommands { + param( + [string[]]$Groups + ) + + $dictDstIPs = Get-EndpointIpDictionary + $staleRuleCommands = [System.Collections.Generic.List[string]]::new() + + $ports = (vfpctrl.exe /list-vmswitch-port /format 1 | ConvertFrom-Json).Ports.Name + foreach ($port in $ports) { + foreach ($group in $Groups) { + $rules = (vfpctrl /port $port /layer LB_DSR /group $group /list-rule /format 1 | ConvertFrom-Json).Rules + foreach ($rule in $rules) { + $ruleId = $rule.Id + $ruleText = vfpctrl /get-rule-info /port $port /layer LB_DSR /group $group /rule $ruleId 2>&1 + if (-not $ruleText) { + Write-Host "No output from vfpctrl" + continue + } + + $dips = Get-DipRangesFromRuleText -RuleText $ruleText + # Check which DIPs are missing in the dictionary + $missingDIPs = $dips | Where-Object { -not $dictDstIPs.ContainsKey($_) } + + if ($missingDIPs.Count -eq 0) { + # Write-Host "All DIP ranges are present in the dictionary." -ForegroundColor Green + } else { + # Write-Host "Missing DIP ranges:" -ForegroundColor Red + # $missingDIPs | ForEach-Object { Write-Host " - $_" } + $staleRuleCommands.Add("vfpctrl /remove-rule /port $port /layer LB_DSR /group $group /rule $ruleId") + } + } + } + } + + return $staleRuleCommands +} + +function Get-DipRangesFromRuleText { + param([string[]]$RuleText) + + $collect = $false + $dips = @() + + foreach ($line in $RuleText) { + + # Detect beginning of DIP Range block + if ($line -match "DIP Range") { + $collect = $true + continue + } + + # Stop when FlagsEx or another header appears + if ($collect -and $line -match "FlagsEx") { + break + } + + # Process lines like: + # { 10.244.0.25 : 53 } + # { fdf5:5d67:b9ce:b28f::13f : 4445 } + if ($collect -and $line.Trim().StartsWith("{")) { + + # Remove surrounding { } then trim + $clean = $line.Trim().Trim('{','}').Trim() + # Use regex to extract IP before last " : " + if ($clean -match '(.+)\s*:\s*\d+$') { + $ip = $matches[1].Trim() + $dips += $ip + } + } + } + + return $dips +} + +While($true) { + Write-Host "##========== Waiting for $refreshIntervalSeconds seconds for the next iteration..." -ForegroundColor Cyan + Start-Sleep -Seconds $refreshIntervalSeconds + Write-Host "##========== Starting new iteration to check for stale LB DSR rules..." -ForegroundColor Cyan + $staleRuleCommands_1 = Get-StaleRuleCommands -Groups $groups + Start-Sleep -Seconds 60 # Short pause before executing commands + $staleRuleCommands_2 = Get-StaleRuleCommands -Groups $groups + + # Rules present in both passes (consistently stale) + $inBothPasses = $staleRuleCommands_1 | Where-Object { $staleRuleCommands_2 -contains $_ } + + if ($inBothPasses.Count -gt 0) { + Write-Host "##========== Found $($inBothPasses.Count) stale rule(s) to remove." -ForegroundColor Yellow + } else { + Write-Host "##========== No stale rules found." -ForegroundColor Green + } + + # Execute only rules that appeared in both passes (consistently stale) + foreach ($cmd in $inBothPasses) { + Write-Host "##========== Executing Delete Command: $cmd" -ForegroundColor Yellow + Invoke-Expression $cmd + } +} \ No newline at end of file diff --git a/scripts/mitigation-scripts/stale-lb-dsr-rules/cleanup-stale-lb-rules.yaml b/scripts/mitigation-scripts/stale-lb-dsr-rules/cleanup-stale-lb-rules.yaml new file mode 100644 index 0000000..5aee063 --- /dev/null +++ b/scripts/mitigation-scripts/stale-lb-dsr-rules/cleanup-stale-lb-rules.yaml @@ -0,0 +1,202 @@ +apiVersion: v1 +kind: Namespace +metadata: + name: demo + namespace: demo +--- +apiVersion: v1 +kind: ConfigMap +metadata: + name: check-hns-issue + namespace: demo +data: + mitigate-del-lb-issue.ps1: | + + $groups = @("LB_DSR_IPv4_OUT", "LB_DSR_IPv6_OUT") + $refreshIntervalSeconds = 10 + + function Get-EndpointIpDictionary { + $dict = @{} + + $policies = Get-HnsPolicyList + + $endpointIds = $policies.References | + Where-Object { $_ -like "/endpoints/*" } | + ForEach-Object { ($_ -split "/")[-1] } | + Sort-Object -Unique + + $endpointIds | ForEach-Object { + try { + $endpoint = Get-HnsEndpoint -Id $_ + } catch { + Write-Host "Failed to get HNS endpoint $_`: $($_.Exception.Message)" -ForegroundColor Yellow + continue + } + + if ($null -eq $endpoint) { + Write-Host "HNS endpoint $_ not found, skipping." -ForegroundColor Yellow + continue + } + + if ($null -ne $endpoint.IPAddress) { + $dict[$endpoint.IPAddress] = $true + } + if ($null -ne $endpoint.IPv6Address) { + $dict[$endpoint.IPv6Address] = $true + } + } + + return $dict + } + + function Get-StaleRuleCommands { + param( + [string[]]$Groups + ) + + $dictDstIPs = Get-EndpointIpDictionary + $staleRuleCommands = [System.Collections.Generic.List[string]]::new() + + $ports = (vfpctrl.exe /list-vmswitch-port /format 1 | ConvertFrom-Json).Ports.Name + foreach ($port in $ports) { + foreach ($group in $Groups) { + $rules = (vfpctrl /port $port /layer LB_DSR /group $group /list-rule /format 1 | ConvertFrom-Json).Rules + foreach ($rule in $rules) { + $ruleId = $rule.Id + $ruleText = vfpctrl /get-rule-info /port $port /layer LB_DSR /group $group /rule $ruleId 2>&1 + if (-not $ruleText) { + Write-Host "No output from vfpctrl" + continue + } + + $dips = Get-DipRangesFromRuleText -RuleText $ruleText + # Check which DIPs are missing in the dictionary + $missingDIPs = $dips | Where-Object { -not $dictDstIPs.ContainsKey($_) } + + if ($missingDIPs.Count -eq 0) { + # Write-Host "All DIP ranges are present in the dictionary." -ForegroundColor Green + } else { + # Write-Host "Missing DIP ranges:" -ForegroundColor Red + # $missingDIPs | ForEach-Object { Write-Host " - $_" } + $staleRuleCommands.Add("vfpctrl /remove-rule /port $port /layer LB_DSR /group $group /rule $ruleId") + } + } + } + } + + return $staleRuleCommands + } + + function Get-DipRangesFromRuleText { + param([string[]]$RuleText) + + $collect = $false + $dips = @() + + foreach ($line in $RuleText) { + + # Detect beginning of DIP Range block + if ($line -match "DIP Range") { + $collect = $true + continue + } + + # Stop when FlagsEx or another header appears + if ($collect -and $line -match "FlagsEx") { + break + } + + # Process lines like: + # { 10.244.0.25 : 53 } + # { fdf5:5d67:b9ce:b28f::13f : 4445 } + if ($collect -and $line.Trim().StartsWith("{")) { + + # Remove surrounding { } then trim + $clean = $line.Trim().Trim('{','}').Trim() + # Use regex to extract IP before last " : " + if ($clean -match '(.+)\s*:\s*\d+$') { + $ip = $matches[1].Trim() + $dips += $ip + } + } + } + + return $dips + } + + While($true) { + Write-Host "##========== Waiting for $refreshIntervalSeconds seconds for the next iteration..." -ForegroundColor Cyan + Start-Sleep -Seconds $refreshIntervalSeconds + Write-Host "##========== Starting new iteration to check for stale LB DSR rules..." -ForegroundColor Cyan + $staleRuleCommands_1 = Get-StaleRuleCommands -Groups $groups + Start-Sleep -Seconds 60 # Short pause before executing commands + $staleRuleCommands_2 = Get-StaleRuleCommands -Groups $groups + + # Rules present in both passes (consistently stale) + $inBothPasses = $staleRuleCommands_1 | Where-Object { $staleRuleCommands_2 -contains $_ } + + if ($inBothPasses.Count -gt 0) { + Write-Host "##========== Found $($inBothPasses.Count) stale rule(s) to remove." -ForegroundColor Yellow + } else { + Write-Host "##========== No stale rules found." -ForegroundColor Green + } + + # Execute only rules that appeared in both passes (consistently stale) + foreach ($cmd in $inBothPasses) { + Write-Host "##========== Executing Delete Command: $cmd" -ForegroundColor Yellow + Invoke-Expression $cmd + } + } +--- +apiVersion: apps/v1 +kind: DaemonSet +metadata: + name: stale-lb-rules-mitigator + namespace: demo + labels: + app: stale-lb-rules-mitigator +spec: + selector: + matchLabels: + app: stale-lb-rules-mitigator + template: + metadata: + labels: + app: stale-lb-rules-mitigator + spec: + securityContext: + windowsOptions: + hostProcess: true + runAsUserName: 'NT AUTHORITY\SYSTEM' + hostNetwork: true + affinity: + nodeAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + nodeSelectorTerms: + - matchExpressions: + - key: kubernetes.io/os + operator: In + values: + - windows + containers: + - name: stale-lb-rules-mitigator + image: mcr.microsoft.com/dotnet/framework/samples:aspnetapp + imagePullPolicy: IfNotPresent + command: + - powershell.exe + - -File + - C:\scripts\mitigate-del-lb-issue.ps1 + volumeMounts: + - name: script + mountPath: C:\scripts + - name: kube-path + mountPath: C:\k + terminationGracePeriodSeconds: 60 + volumes: + - name: script + configMap: + name: check-hns-issue + - name: kube-path + hostPath: + path: C:\k + type: DirectoryOrCreate \ No newline at end of file diff --git a/scripts/mitigation-scripts/stale-lb-dsr-rules/fix-del-lb-issue.yaml b/scripts/mitigation-scripts/stale-lb-dsr-rules/fix-del-lb-issue.yaml new file mode 100644 index 0000000..57c2ffe --- /dev/null +++ b/scripts/mitigation-scripts/stale-lb-dsr-rules/fix-del-lb-issue.yaml @@ -0,0 +1,81 @@ +apiVersion: v1 +kind: Namespace +metadata: + name: demo + namespace: demo +--- +apiVersion: v1 +kind: ConfigMap +metadata: + name: check-hns-issue + namespace: demo +data: + mitigate-del-lb-issue.ps1: | + + $regKeyVal = (Get-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Policies\Microsoft\FeatureManagement\Overrides" -Name 140377743).140377743 + if ($regKeyVal -eq 1) { + Write-Host "Registry keys are not zero. Setting reg key to 0 and restarting the node." -ForegroundColor Yellow + Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Policies\Microsoft\FeatureManagement\Overrides" -Name 140377743 -Value 0 -Type DWORD + Restart-Computer -Force + Start-Sleep -Seconds 30 + } else { + Write-Host "Registry keys are zero. Continuing the script." -ForegroundColor Green + } + + While($true) { + Write-Host "Registry keys are disabled...." -ForegroundColor Cyan + Start-Sleep -Seconds 36000 + } +--- +apiVersion: apps/v1 +kind: DaemonSet +metadata: + name: del-lb-fail-mitigator + namespace: demo + labels: + app: del-lb-fail-mitigator +spec: + selector: + matchLabels: + app: del-lb-fail-mitigator + template: + metadata: + labels: + app: del-lb-fail-mitigator + spec: + securityContext: + windowsOptions: + hostProcess: true + runAsUserName: 'NT AUTHORITY\SYSTEM' + hostNetwork: true + affinity: + nodeAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + nodeSelectorTerms: + - matchExpressions: + - key: kubernetes.io/os + operator: In + values: + - windows + containers: + - name: del-lb-fail-mitigator + image: mcr.microsoft.com/dotnet/framework/samples:aspnetapp + imagePullPolicy: IfNotPresent + command: + - powershell.exe + - -File + - C:\scripts\mitigate-del-lb-issue.ps1 + volumeMounts: + - name: script + mountPath: C:\scripts + - name: kube-path + mountPath: C:\k + terminationGracePeriodSeconds: 60 + volumes: + - name: script + configMap: + name: check-hns-issue + - name: kube-path + hostPath: + path: C:\k + type: DirectoryOrCreate \ No newline at end of file