Skip to content

fix: strict RFC 2397 regex in _parse_base64_data_uri to reject SSE data#1524

Open
MoonSangJin wants to merge 4 commits intolangfuse:mainfrom
MoonSangJin:fix/5659-sse-bug
Open

fix: strict RFC 2397 regex in _parse_base64_data_uri to reject SSE data#1524
MoonSangJin wants to merge 4 commits intolangfuse:mainfrom
MoonSangJin:fix/5659-sse-bug

Conversation

@MoonSangJin
Copy link

@MoonSangJin MoonSangJin commented Feb 13, 2026

Summary

  • _parse_base64_data_uri previously used a loose startswith("data:") check, which misidentified SSE data (e.g., "data: {'foo': 'bar'}") as base64 data URIs
  • Replace manual parsing with a strict RFC 2397 regex requiring the full data:[<mediatype>][;params];base64,<data> format
  • Non-matching inputs now return (None, None) with an error log for debugging, while SSE data is no longer falsely decoded as media

Test plan

  • Added 9 test cases in tests/test_issue_5659.py covering SSE data, valid data URIs, MIME params, missing MIME type, empty/invalid strings
  • All existing tests/test_media.py unit tests pass
  • ruff check, ruff format, mypy all pass

Closes langfuse/langfuse#5659

@CLAassistant
Copy link

CLAassistant commented Feb 13, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

…identifying SSE data

_parse_base64_data_uri previously used a loose startswith("data:") check,
which caused SSE data (e.g., "data: {...}") to be incorrectly processed
as base64 data URIs, resulting in spurious error logs.

Replace the manual parsing with a strict regex that requires the full
data:[<mediatype>][;params];base64,<data> format. Non-matching inputs
now return (None, None) cleanly without error logging.

Closes langfuse/langfuse#5659
MoonSangJin and others added 3 commits February 18, 2026 13:34
Add back error log when _parse_base64_data_uri receives a string that
does not match the RFC 2397 regex, so callers get feedback on malformed
input. Update tests to focus on the core fix (no false decoding) rather
than log suppression.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: python: "Data is not base64 encoded" on server sent events

2 participants

Comments