-
Notifications
You must be signed in to change notification settings - Fork 721
fix: handle escaped backslashes in SQL string literals #838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
fix: handle escaped backslashes in SQL string literals #838
Conversation
Add \\ to the regex patterns for single and double quoted strings to correctly tokenize SQL strings containing escaped backslashes. Previously, a string like '\\' would be incorrectly tokenized, causing subsequent tokens to be parsed as errors. Fixes andialbrecht#814
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Fixes sqlparse tokenization for SQL string literals containing escaped backslashes (e.g. '\\'), preventing strings from being split incorrectly and causing downstream tokens to be mis-parsed (Fixes #814).
Changes:
- Extend the single-quoted and double-quoted string regexes to correctly match escaped backslashes (
\\\\). - Add a regression test to validate correct tokenization of escaped backslashes in single-quoted strings.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
sqlparse/keywords.py |
Updates string-literal regex patterns to properly handle escaped backslashes. |
tests/test_tokenize.py |
Adds a regression test that verifies escaped backslashes in single-quoted strings tokenize correctly. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| import sqlparse | ||
| from sqlparse import tokens as T |
Copilot
AI
Feb 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test file already imports sqlparse and tokens as T at module scope; re-importing them inside this test is redundant and inconsistent with the rest of the file. Prefer using the existing module-level imports to keep test style consistent.
| import sqlparse | |
| from sqlparse import tokens as T |
| tokens = list(sqlparse.parse(sql)[0].flatten()) | ||
| token_types = [t.ttype for t in tokens] | ||
|
|
||
| # Should be: SELECT, whitespace, ',', ,, whitespace, ',', (6 tokens after keyword) |
Copilot
AI
Feb 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inline comment describing the expected token sequence is incorrect/garbled (it mentions commas/quotes in a way that doesn’t match the SQL). Please update it to reflect the actual expected flattened token order for SELECT '\\', '\\' to avoid misleading future readers.
| # Should be: SELECT, whitespace, ',', ,, whitespace, ',', (6 tokens after keyword) | |
| # Expected flattened token order: SELECT, <WS>, "'\\'", ',', <WS>, "'\\'" |
| (r'"(""|\\\\|\\"|[^"])*"', tokens.String.Symbol), | ||
| (r'(""|".*?[^\\]")', tokens.String.Symbol), |
Copilot
AI
Feb 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR also changes the double-quoted pattern, but the added test only covers single-quoted strings. Please add a regression test exercising a double-quoted value containing escaped backslashes (and verifying tokenization doesn’t produce T.Error / doesn’t merge tokens) so the \\ addition here is covered.
Problem
SQL strings containing escaped backslashes (e.g.,
'\\') were incorrectly tokenized, causing subsequent tokens to be parsed as errors.Root Cause
The regex patterns for string literals in
keywords.pydidn't include\\\\to match escaped backslashes.Fix
Updated SQL_REGEX patterns:
\\\\to the pattern\\\\to the patternTesting
Added
test_tokenize_escaped_backslash()to verify correct tokenization.Fixes #814