Merged
Conversation
- Add graphrag.graphs package with compute_degree operating directly on relationships DataFrames instead of building NetworkX graphs - Update finalize_entities and finalize_relationships to use the new utility, eliminating NX graph construction in those paths - Remove the old compute_degree operation from index/operations - Add side-by-side tests validating parity with NetworkX degree output
- Add connected_components and largest_connected_component to graphrag.graphs using union-find on edge list DataFrames - Fix compute_degree to normalise edge direction so (A,B) and (B,A) are treated as the same undirected edge - Replace NX largest_connected_component in prune_graph operation with the new DataFrame utility via graph_to_dataframes - Add realistic A Christmas Carol graph fixture (529 nodes, 978 edges) converted from verb test parquet data - Add side-by-side tests for connected components and fixture-based test for compute_degree, all validated against NetworkX
- cluster_graph now accepts a DataFrame instead of nx.Graph - hierarchical_leiden now accepts list[tuple[str, str, float]] edge list - create_communities passes relationships DataFrame directly, removing create_graph dependency - Edge direction normalization and deduplication (keep='last') replaces implicit NX dedup behavior - Modularity helper callers convert to edge list via _nx_to_edge_list
- prune_graph operation now accepts (entities, relationships) DataFrames instead of nx.Graph, returns pruned DataFrames directly - Uses compute_degree for degree calculation, largest_connected_component for LCC filtering — no NetworkX - Workflow no longer round-trips through create_graph/graph_to_dataframes - Reset index on returned DataFrames to avoid downstream alignment errors
- Move stable_lcc (NX version) to tests/unit/graphs/nx_stable_lcc.py for side-by-side comparison tests only - Delete graph_to_dataframes.py (dead code, zero imports) - Update test imports to use the new test helper location
- snapshot_graphml now accepts edges DataFrame directly and calls nx.from_pandas_edgelist internally - finalize_graph workflow passes relationships DataFrame to snapshot - Removed create_graph.py (no remaining callers)
…odules - hierarchical_leiden, first/final_level_hierarchical_clustering → graphs/hierarchical_leiden.py - calculate_pmi/rrf_edge_weights → graphs/edge_weights.py - calculate_* modularity functions, _df_to_edge_list → graphs/modularity.py - NX-based modularity/LCC/edge-list helpers removed (replaced by DF-based equivalents) - Delete index/utils/graphs.py (no remaining callers) - Update cluster_graph.py and build_noun_graph.py to import from new locations - Inline NX largest_connected_component into test helper nx_stable_lcc.py - Add side-by-side modularity tests (9 tests comparing DF vs NX)
dayesouza
approved these changes
Feb 13, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Removes networkx and replaces the core utilities with equivalent ones. This avoid much of the memory explosion the networkx graph structure consumes.