-
Notifications
You must be signed in to change notification settings - Fork 145
Enforce DataFrame display memory limits with max_rows + min_rows constraint (deprecate repr_rows)
#1367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+474
−88
Merged
Enforce DataFrame display memory limits with max_rows + min_rows constraint (deprecate repr_rows)
#1367
Changes from all commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
fa9f257
Update DataFrameHtmlFormatter to enforce min_rows_display constraint …
kosiew 0563f6c
Refactor DataFrame formatter to replace repr_rows with max_rows and u…
kosiew 168eda8
Add validation for formatter parameters and deprecate repr_rows alias
kosiew 0ad2621
Add boundary condition tests for HTML formatter memory limits and res…
kosiew a7dfd3f
Remove repr_rows handling in max_rows resolution in Rust
kosiew 61db037
Refactor whitespace in parameter validation and update test for HTML …
kosiew 69bcf6f
ruff fix
kosiew 399e3e2
Rename min_rows_display to min_rows in formatter configuration and up…
kosiew af3ef4b
Refactor function parameter handling and documentation
kosiew af11824
Update HTML formatter memory boundary tests for large datasets
kosiew 1085992
Enhance memory boundary tests in HTML formatter for large datasets
kosiew 6f92b3d
Add fixture for multi-batch DataFrame and test early stream terminati…
kosiew 894bac0
Add backward compatibility tests for deprecated formatter attributes
kosiew a835ceb
ruff fix
kosiew 97e2b6d
Remove deprecation timeline comments from HTML formatter backward com…
kosiew File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -18,6 +18,7 @@ | |
|
|
||
| from __future__ import annotations | ||
|
|
||
| import warnings | ||
| from typing import ( | ||
| TYPE_CHECKING, | ||
| Any, | ||
|
|
@@ -61,6 +62,93 @@ def _validate_bool(value: Any, param_name: str) -> None: | |
| raise TypeError(msg) | ||
|
|
||
|
|
||
| def _validate_formatter_parameters( | ||
| max_cell_length: int, | ||
| max_width: int, | ||
| max_height: int, | ||
| max_memory_bytes: int, | ||
| min_rows: int, | ||
| max_rows: int | None, | ||
| repr_rows: int | None, | ||
| enable_cell_expansion: bool, | ||
| show_truncation_message: bool, | ||
| use_shared_styles: bool, | ||
| custom_css: str | None, | ||
| style_provider: Any, | ||
| ) -> int: | ||
| """Validate all formatter parameters and return resolved max_rows value. | ||
|
|
||
| Args: | ||
| max_cell_length: Maximum cell length value to validate | ||
| max_width: Maximum width value to validate | ||
| max_height: Maximum height value to validate | ||
| max_memory_bytes: Maximum memory bytes value to validate | ||
| min_rows: Minimum rows to display value to validate | ||
| max_rows: Maximum rows value to validate (None means use default) | ||
| repr_rows: Deprecated repr_rows value to validate | ||
| enable_cell_expansion: Boolean expansion flag to validate | ||
| show_truncation_message: Boolean message flag to validate | ||
| use_shared_styles: Boolean styles flag to validate | ||
| custom_css: Custom CSS string to validate | ||
| style_provider: Style provider object to validate | ||
|
|
||
| Returns: | ||
| The resolved max_rows value after handling repr_rows deprecation | ||
|
|
||
| Raises: | ||
| ValueError: If any numeric parameter is invalid or constraints are violated | ||
| TypeError: If any parameter has invalid type | ||
| DeprecationWarning: If repr_rows parameter is used | ||
| """ | ||
| # Validate numeric parameters | ||
| _validate_positive_int(max_cell_length, "max_cell_length") | ||
| _validate_positive_int(max_width, "max_width") | ||
| _validate_positive_int(max_height, "max_height") | ||
| _validate_positive_int(max_memory_bytes, "max_memory_bytes") | ||
| _validate_positive_int(min_rows, "min_rows") | ||
|
|
||
| # Handle deprecated repr_rows parameter | ||
| if repr_rows is not None: | ||
| warnings.warn( | ||
| "repr_rows parameter is deprecated, use max_rows instead", | ||
| DeprecationWarning, | ||
| stacklevel=4, | ||
| ) | ||
| _validate_positive_int(repr_rows, "repr_rows") | ||
| if max_rows is not None and repr_rows != max_rows: | ||
| msg = "Cannot specify both repr_rows and max_rows; use max_rows only" | ||
| raise ValueError(msg) | ||
| max_rows = repr_rows | ||
|
|
||
| # Use default if max_rows was not provided | ||
| if max_rows is None: | ||
| max_rows = 10 | ||
|
|
||
| _validate_positive_int(max_rows, "max_rows") | ||
|
|
||
| # Validate constraint: min_rows <= max_rows | ||
| if min_rows > max_rows: | ||
| msg = "min_rows must be less than or equal to max_rows" | ||
| raise ValueError(msg) | ||
|
|
||
| # Validate boolean parameters | ||
| _validate_bool(enable_cell_expansion, "enable_cell_expansion") | ||
| _validate_bool(show_truncation_message, "show_truncation_message") | ||
| _validate_bool(use_shared_styles, "use_shared_styles") | ||
|
|
||
| # Validate custom_css | ||
| if custom_css is not None and not isinstance(custom_css, str): | ||
| msg = "custom_css must be None or a string" | ||
| raise TypeError(msg) | ||
|
|
||
| # Validate style_provider | ||
| if style_provider is not None and not isinstance(style_provider, StyleProvider): | ||
| msg = "style_provider must implement the StyleProvider protocol" | ||
| raise TypeError(msg) | ||
|
|
||
| return max_rows | ||
|
|
||
|
|
||
| @runtime_checkable | ||
| class CellFormatter(Protocol): | ||
| """Protocol for cell value formatters.""" | ||
|
|
@@ -126,8 +214,9 @@ class DataFrameHtmlFormatter: | |
| max_width: Maximum width of the HTML table in pixels | ||
| max_height: Maximum height of the HTML table in pixels | ||
| max_memory_bytes: Maximum memory in bytes for rendered data (default: 2MB) | ||
| min_rows_display: Minimum number of rows to display | ||
| repr_rows: Default number of rows to display in repr output | ||
| min_rows: Minimum number of rows to display (must be <= max_rows) | ||
| max_rows: Maximum number of rows to display in repr output | ||
| repr_rows: Deprecated alias for max_rows | ||
| enable_cell_expansion: Whether to add expand/collapse buttons for long cell | ||
| values | ||
| custom_css: Additional CSS to include in the HTML output | ||
|
|
@@ -143,8 +232,9 @@ def __init__( | |
| max_width: int = 1000, | ||
| max_height: int = 300, | ||
| max_memory_bytes: int = 2 * 1024 * 1024, # 2 MB | ||
| min_rows_display: int = 20, | ||
| repr_rows: int = 10, | ||
| min_rows: int = 10, | ||
| max_rows: int | None = None, | ||
| repr_rows: int | None = None, | ||
| enable_cell_expansion: bool = True, | ||
| custom_css: str | None = None, | ||
| show_truncation_message: bool = True, | ||
|
|
@@ -155,71 +245,70 @@ def __init__( | |
|
|
||
| Parameters | ||
| ---------- | ||
| max_cell_length : int, default 25 | ||
| max_cell_length | ||
| Maximum length of cell content before truncation. | ||
| max_width : int, default 1000 | ||
| max_width | ||
| Maximum width of the displayed table in pixels. | ||
| max_height : int, default 300 | ||
| max_height | ||
| Maximum height of the displayed table in pixels. | ||
| max_memory_bytes : int, default 2097152 (2MB) | ||
| Maximum memory in bytes for rendered data. | ||
| min_rows_display : int, default 20 | ||
| Minimum number of rows to display. | ||
| repr_rows : int, default 10 | ||
| Default number of rows to display in repr output. | ||
| enable_cell_expansion : bool, default True | ||
| max_memory_bytes | ||
| Maximum memory in bytes for rendered data. Helps prevent performance | ||
| issues with large datasets. | ||
| min_rows | ||
| Minimum number of rows to display even if memory limit is reached. | ||
| Must not exceed ``max_rows``. | ||
| max_rows | ||
| Maximum number of rows to display. Takes precedence over memory limits | ||
| when fewer rows are requested. | ||
| repr_rows | ||
| Deprecated alias for ``max_rows``. Use ``max_rows`` instead. | ||
| enable_cell_expansion | ||
| Whether to allow cells to expand when clicked. | ||
| custom_css : str, optional | ||
| custom_css | ||
| Custom CSS to apply to the HTML table. | ||
| show_truncation_message : bool, default True | ||
| show_truncation_message | ||
| Whether to show a message indicating that content has been truncated. | ||
| style_provider : StyleProvider, optional | ||
| style_provider | ||
| Provider of CSS styles for the HTML table. If None, DefaultStyleProvider | ||
| is used. | ||
| use_shared_styles : bool, default True | ||
| Whether to use shared styles across multiple tables. | ||
| use_shared_styles | ||
| Whether to use shared styles across multiple tables. This improves | ||
| performance when displaying many DataFrames in a single notebook. | ||
|
|
||
| Raises: | ||
| ------ | ||
| ValueError | ||
| If max_cell_length, max_width, max_height, max_memory_bytes, | ||
| min_rows_display, or repr_rows is not a positive integer. | ||
| min_rows or max_rows is not a positive integer, or if min_rows | ||
| exceeds max_rows. | ||
| TypeError | ||
| If enable_cell_expansion, show_truncation_message, or use_shared_styles is | ||
| not a boolean, | ||
| or if custom_css is provided but is not a string, | ||
| or if style_provider is provided but does not implement the StyleProvider | ||
| not a boolean, or if custom_css is provided but is not a string, or if | ||
| style_provider is provided but does not implement the StyleProvider | ||
| protocol. | ||
| """ | ||
| # Validate numeric parameters | ||
| _validate_positive_int(max_cell_length, "max_cell_length") | ||
| _validate_positive_int(max_width, "max_width") | ||
| _validate_positive_int(max_height, "max_height") | ||
| _validate_positive_int(max_memory_bytes, "max_memory_bytes") | ||
| _validate_positive_int(min_rows_display, "min_rows_display") | ||
| _validate_positive_int(repr_rows, "repr_rows") | ||
|
|
||
| # Validate boolean parameters | ||
| _validate_bool(enable_cell_expansion, "enable_cell_expansion") | ||
| _validate_bool(show_truncation_message, "show_truncation_message") | ||
| _validate_bool(use_shared_styles, "use_shared_styles") | ||
|
|
||
| # Validate custom_css | ||
| if custom_css is not None and not isinstance(custom_css, str): | ||
| msg = "custom_css must be None or a string" | ||
| raise TypeError(msg) | ||
|
|
||
| # Validate style_provider | ||
| if style_provider is not None and not isinstance(style_provider, StyleProvider): | ||
| msg = "style_provider must implement the StyleProvider protocol" | ||
| raise TypeError(msg) | ||
| # Validate all parameters and get resolved max_rows | ||
| resolved_max_rows = _validate_formatter_parameters( | ||
| max_cell_length, | ||
| max_width, | ||
| max_height, | ||
| max_memory_bytes, | ||
| min_rows, | ||
| max_rows, | ||
| repr_rows, | ||
| enable_cell_expansion, | ||
| show_truncation_message, | ||
| use_shared_styles, | ||
| custom_css, | ||
| style_provider, | ||
| ) | ||
|
|
||
| self.max_cell_length = max_cell_length | ||
| self.max_width = max_width | ||
| self.max_height = max_height | ||
| self.max_memory_bytes = max_memory_bytes | ||
| self.min_rows_display = min_rows_display | ||
| self.repr_rows = repr_rows | ||
| self.min_rows = min_rows | ||
| self._max_rows = resolved_max_rows | ||
| self.enable_cell_expansion = enable_cell_expansion | ||
| self.custom_css = custom_css | ||
| self.show_truncation_message = show_truncation_message | ||
|
|
@@ -231,6 +320,55 @@ def __init__( | |
| self._custom_cell_builder: Callable[[Any, int, int, str], str] | None = None | ||
| self._custom_header_builder: Callable[[Any], str] | None = None | ||
|
|
||
| @property | ||
| def max_rows(self) -> int: | ||
| """Get the maximum number of rows to display. | ||
|
|
||
| Returns: | ||
| The maximum number of rows to display in repr output | ||
| """ | ||
| return self._max_rows | ||
|
|
||
| @max_rows.setter | ||
| def max_rows(self, value: int) -> None: | ||
| """Set the maximum number of rows to display. | ||
|
|
||
| Args: | ||
| value: The maximum number of rows | ||
| """ | ||
| self._max_rows = value | ||
|
|
||
| @property | ||
| def repr_rows(self) -> int: | ||
| """Get the maximum number of rows (deprecated name). | ||
|
|
||
| .. deprecated:: | ||
| Use :attr:`max_rows` instead. This property is provided for | ||
| backward compatibility. | ||
|
|
||
| Returns: | ||
| The maximum number of rows to display | ||
| """ | ||
| return self._max_rows | ||
|
|
||
| @repr_rows.setter | ||
| def repr_rows(self, value: int) -> None: | ||
| """Set the maximum number of rows using deprecated name. | ||
|
|
||
|
Comment on lines
+354
to
+357
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same, why add for deprecated? |
||
| .. deprecated:: | ||
| Use :attr:`max_rows` setter instead. This property is provided for | ||
| backward compatibility. | ||
|
|
||
| Args: | ||
| value: The maximum number of rows | ||
| """ | ||
| warnings.warn( | ||
| "repr_rows is deprecated, use max_rows instead", | ||
| DeprecationWarning, | ||
| stacklevel=2, | ||
| ) | ||
| self._max_rows = value | ||
|
|
||
| def register_formatter(self, type_class: type, formatter: CellFormatter) -> None: | ||
| """Register a custom formatter for a specific data type. | ||
|
|
||
|
|
@@ -659,7 +797,8 @@ def configure_formatter(**kwargs: Any) -> None: | |
| "max_width", | ||
| "max_height", | ||
| "max_memory_bytes", | ||
| "min_rows_display", | ||
| "min_rows", | ||
| "max_rows", | ||
| "repr_rows", | ||
| "enable_cell_expansion", | ||
| "custom_css", | ||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If
repr_rowsis being deprecated, why add an accessor?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the accessors for backward compatibility during the deprecation period:
Rationale:
formatter.repr_rows = 20continue working during the deprecation periodrepr_rowsdirectly will continue to workShall we keep the accessors for now with the deprecation warnings, plan removal in next major version?