Skip to content

Multigraphs via an edge index and edge properties/metadata#81

Open
zblanco wants to merge 22 commits intobitwalker:mainfrom
zblanco:zw/multigraph-indexes
Open

Multigraphs via an edge index and edge properties/metadata#81
zblanco wants to merge 22 commits intobitwalker:mainfrom
zblanco:zw/multigraph-indexes

Conversation

@zblanco
Copy link
Contributor

@zblanco zblanco commented Aug 20, 2024

An effort to implement #18 as well as #54 for some of my own use cases. Looks like some planning work went into these in the past so not sure if there's a more preferred approach for multigraphs and/or adjacency indexing.

Ran into a performance use case for multigraphs where to minimize enumeration on traversals for I wanted an index to trade space for time.

Current approach

An opt-in flag Graph.new(multigraph: true) with options for using a partitioned adjacency index in reflection APIs (out_edges, edges, in_edges, etc).

By default this option will maintain an adjacency index partitioned by the edge label. This is overrideable with the :partition_by option which accepts and edge and returns a partition. E.g. Graph.new(multigraph: true, partition_by: fn edge -> edge.weight end)

Reflection API options:

:by : a term or list of terms containing the partition keys.
:where: a filter function which accepts an edge and returns a boolean to include or exclude it from the result.

The edge_index is implemented as a nested map %{partition => %{vertex_id => Mapset(edge_keys)}} so the :by option can use map access time to get the set of adjacent edges for one or more partitions.

Edge Properties

For metadata / edge properties this PR changes the edge value from %{label => edge_weight} to

@type edge_properties :: %{
          label: label,
          weight: edge_weight,
          properties: map
}

as well as adding the properties map to the %Edge{} struct.

Todos:

  • Option to enable multigraphs via edge indexing
  • Override-able indexing function defaulting to fn %{label: label} -> label end to return the key
  • Change index function to support partitioning an edge to more than one set
  • Traversal APIs with filter predicates to benefit from the indexing
  • Support edge properties/metadata
  • More docs
  • Multigraph traversal in BFS/DFS/pathfinding
  • Multigraph benchmarks
  • Multigraph property tests
  • CI/CD chores

@zblanco zblanco marked this pull request as ready for review June 24, 2025 17:05
@zblanco
Copy link
Contributor Author

zblanco commented Jun 24, 2025

I've been using this PR inside https://github.com/zblanco/runic for some time now as a way to keep causal runtime edges produced during DAG executions from increasing the dataflow traversal costs. It hasn't need changes and tests pass for 1.18 and might be worth reviewing.

I developed this locally on Elixir 1.18+ which entailed some changes to doctests related to ordering of some results but this leaves older versions broken in CI. Not 100% on a preferred course of action - I would rather find a way to keep older versions compatible so that's what I'm looking into now.

This test discrepancy may be related to OTP updates as it's not related to the actual changes.

1) test sizeof/1 (Graph.UtilsTest)
Error:      test/utils_test.exs:8
     match (=) failed
     code:  assert 456 = sizeof(String.duplicate("bar", 128))
     left:  456
     right: 440
     stacktrace:
       test/utils_test.exs:10: (test)

@zblanco
Copy link
Contributor Author

zblanco commented Feb 21, 2026

The library I've been building that depends on this PR's multigraph functionality: https://github.com/zblanco/runic is about ready to be released on Hex with some other libraries on top of that also looking to be released.

Latest commits should add multigraph interaction to the rest of graph mutations and traversal / pathfinding.

There are some included benchmarks that show space / time trade offs at different graph sizes with the adjacency index. I also benchmarked this PR against main:

Benchmark main map edge properties separate map edge properties
create 10k 10.69ms 11.61ms (+8%) 10.74ms
create 100k 170.36ms 180.94ms (+6%) 173.75ms
create 1M 3.72s 4.09s (+10%) 3.96s
topsort 7.39ms 8.03ms (+9%) 7.23ms
k_core 1.77s 2.14s (+21%) 1.87s

The original changes I made had edge metadata changed to %{label => %{weight: w, properties: %{}}} but this added overhead and some minor performance regressions so I moved it back to %{label => weight} and stored in a separate edge_properties map on the graph struct - only populated when used.

mix bench.multigraph results:

➜  libgraph git:(zw/multigraph-indexes) ✗ mix bench.multigraph       
Operating System: Linux
CPU Information: AMD Ryzen 7 5800X 8-Core Processor
Number of Available Cores: 16
Available memory: 62.74 GB
Elixir 1.18.3
Erlang 27.2

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 10 s
memory time: 5 s
reduction time: 0 ns
parallel: 1
inputs: 10k vertices, 50k edges, 50 labels, 1k vertices, 5k edges, 10 labels
Estimated total run time: 2.27 min

Benchmarking indexed lookup (multigraph by:) with input 10k vertices, 50k edges, 50 labels ...
Benchmarking indexed lookup (multigraph by:) with input 1k vertices, 5k edges, 10 labels ...
Benchmarking indexed out_edges (multigraph by:) with input 10k vertices, 50k edges, 50 labels ...
Benchmarking indexed out_edges (multigraph by:) with input 1k vertices, 5k edges, 10 labels ...
Benchmarking scan all edges + filter (no multigraph) with input 10k vertices, 50k edges, 50 labels ...
Benchmarking scan all edges + filter (no multigraph) with input 1k vertices, 5k edges, 10 labels ...
Benchmarking scan out_edges + filter (no multigraph) with input 10k vertices, 50k edges, 50 labels ...
Benchmarking scan out_edges + filter (no multigraph) with input 1k vertices, 5k edges, 10 labels ...
warning: using map.field notation (without parentheses) to invoke function Benchee.Conversion.Count.base_unit() is deprecated, you must add parentheses instead: remote.function()
  (benchee 1.1.0) lib/benchee/conversion/scale.ex:165: Benchee.Conversion.Scale.do_best_unit/3
  (benchee 1.1.0) lib/benchee/conversion.ex:63: Benchee.Conversion.units/2
  (benchee 1.1.0) lib/benchee/formatters/console/run_time.ex:90: Benchee.Formatters.Console.RunTime.render/2
  (benchee 1.1.0) lib/benchee/formatters/console.ex:134: Benchee.Formatters.Console.generate_output/3
  (elixir 1.18.3) lib/enum.ex:1714: Enum."-map/2-lists^map/1-1-"/2


##### With input 10k vertices, 50k edges, 50 labels #####
Name                                              ips        average  deviation         median         99th %
indexed out_edges (multigraph by:)          1543.31 K        0.65 μs ±10047.10%        0.55 μs        0.91 μs
scan out_edges + filter (no multigraph)      648.08 K        1.54 μs  ±3613.50%        1.19 μs        3.56 μs
indexed lookup (multigraph by:)                1.27 K      786.65 μs    ±26.92%      810.52 μs      938.08 μs
scan all edges + filter (no multigraph)      0.0360 K    27812.55 μs    ±33.33%    25003.63 μs    89080.10 μs

Comparison: 
indexed out_edges (multigraph by:)          1543.31 K
scan out_edges + filter (no multigraph)      648.08 K - 2.38x slower +0.90 μs
indexed lookup (multigraph by:)                1.27 K - 1214.04x slower +786.00 μs
scan all edges + filter (no multigraph)      0.0360 K - 42923.31x slower +27811.91 μs

Memory usage statistics:

Name                                       Memory usage
indexed out_edges (multigraph by:)           0.00108 MB
scan out_edges + filter (no multigraph)      0.00317 MB - 2.92x memory usage +0.00208 MB
indexed lookup (multigraph by:)                 1.03 MB - 946.52x memory usage +1.02 MB
scan all edges + filter (no multigraph)        32.58 MB - 30071.13x memory usage +32.58 MB

**All measurements for memory usage were the same**

##### With input 1k vertices, 5k edges, 10 labels #####
Name                                              ips        average  deviation         median         99th %
indexed out_edges (multigraph by:)           868.61 K        1.15 μs  ±3411.98%        0.91 μs        2.50 μs
scan out_edges + filter (no multigraph)      818.72 K        1.22 μs  ±3518.01%        0.97 μs        2.90 μs
indexed lookup (multigraph by:)                3.01 K      331.95 μs    ±22.96%      320.27 μs      428.02 μs
scan all edges + filter (no multigraph)        0.68 K     1467.22 μs    ±16.28%     1418.30 μs     1914.55 μs

Comparison: 
indexed out_edges (multigraph by:)           868.61 K
scan out_edges + filter (no multigraph)      818.72 K - 1.06x slower +0.0701 μs
indexed lookup (multigraph by:)                3.01 K - 288.34x slower +330.80 μs
scan all edges + filter (no multigraph)        0.68 K - 1274.44x slower +1466.07 μs

Memory usage statistics:

Name                                       Memory usage
indexed out_edges (multigraph by:)              1.70 KB
scan out_edges + filter (no multigraph)         2.65 KB - 1.56x memory usage +0.95 KB
indexed lookup (multigraph by:)               500.35 KB - 295.14x memory usage +498.66 KB
scan all edges + filter (no multigraph)      3341.46 KB - 1971.00x memory usage +3339.77 KB

**All measurements for memory usage were the same**
  • Indexed out_edges is 2.4x faster than scan at 10k scale, index is only really useful when graphs are large with varying edge kinds.

  • Global edges queries Indexed lookup is 35x faster than scanning all edges at 10k scale (787μs vs 27,812μs) and uses 32x less memory (1MB vs 33MB).

  • The indexed lookup advantage grows with graph size — from ~4x at 1k to ~35x at 10k for global edge queries — confirming O(1)-ish index access vs O(E) scan.

Only really matters when multigraphs are actually needed with varying kinds of edges. My use case required workflow runtime memory stored as edges so this prevents dataflow traversal costs from growing with long running workflows.

This includes some minor test changes to get CI passing across the Elixir/OTP versions that broke some list comparison ordering in doc tests and charlist representations but it might be worth just dropping older version support and keeping the doctests clean.

@bitwalker let me know if you want any changes and/or need time to review - we can point repos at a temporary fork if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant