Skip to content

Remove networkx#2228

Merged
natoverse merged 13 commits intomainfrom
remove-networkx
Feb 13, 2026
Merged

Remove networkx#2228
natoverse merged 13 commits intomainfrom
remove-networkx

Conversation

@natoverse
Copy link
Collaborator

Removes networkx and replaces the core utilities with equivalent ones. This avoid much of the memory explosion the networkx graph structure consumes.

- Add graphrag.graphs package with compute_degree operating directly on
  relationships DataFrames instead of building NetworkX graphs
- Update finalize_entities and finalize_relationships to use the new
  utility, eliminating NX graph construction in those paths
- Remove the old compute_degree operation from index/operations
- Add side-by-side tests validating parity with NetworkX degree output
- Add connected_components and largest_connected_component to
  graphrag.graphs using union-find on edge list DataFrames
- Fix compute_degree to normalise edge direction so (A,B) and (B,A)
  are treated as the same undirected edge
- Replace NX largest_connected_component in prune_graph operation with
  the new DataFrame utility via graph_to_dataframes
- Add realistic A Christmas Carol graph fixture (529 nodes, 978 edges)
  converted from verb test parquet data
- Add side-by-side tests for connected components and fixture-based
  test for compute_degree, all validated against NetworkX
- cluster_graph now accepts a DataFrame instead of nx.Graph
- hierarchical_leiden now accepts list[tuple[str, str, float]] edge list
- create_communities passes relationships DataFrame directly, removing
  create_graph dependency
- Edge direction normalization and deduplication (keep='last') replaces
  implicit NX dedup behavior
- Modularity helper callers convert to edge list via _nx_to_edge_list
- prune_graph operation now accepts (entities, relationships) DataFrames
  instead of nx.Graph, returns pruned DataFrames directly
- Uses compute_degree for degree calculation, largest_connected_component
  for LCC filtering — no NetworkX
- Workflow no longer round-trips through create_graph/graph_to_dataframes
- Reset index on returned DataFrames to avoid downstream alignment errors
- Move stable_lcc (NX version) to tests/unit/graphs/nx_stable_lcc.py
  for side-by-side comparison tests only
- Delete graph_to_dataframes.py (dead code, zero imports)
- Update test imports to use the new test helper location
- snapshot_graphml now accepts edges DataFrame directly and calls
  nx.from_pandas_edgelist internally
- finalize_graph workflow passes relationships DataFrame to snapshot
- Removed create_graph.py (no remaining callers)
…odules

- hierarchical_leiden, first/final_level_hierarchical_clustering → graphs/hierarchical_leiden.py
- calculate_pmi/rrf_edge_weights → graphs/edge_weights.py
- calculate_* modularity functions, _df_to_edge_list → graphs/modularity.py
- NX-based modularity/LCC/edge-list helpers removed (replaced by DF-based equivalents)
- Delete index/utils/graphs.py (no remaining callers)
- Update cluster_graph.py and build_noun_graph.py to import from new locations
- Inline NX largest_connected_component into test helper nx_stable_lcc.py
- Add side-by-side modularity tests (9 tests comparing DF vs NX)
@natoverse natoverse requested a review from a team as a code owner February 12, 2026 21:59
@natoverse natoverse merged commit e1c92cc into main Feb 13, 2026
18 checks passed
@natoverse natoverse deleted the remove-networkx branch February 13, 2026 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments