Skip to content

Add examples of AutoTP#998

Merged
tohtana merged 11 commits intodeepspeedai:masterfrom
tohtana:tohtana/custom_auto_tp
Feb 7, 2026
Merged

Add examples of AutoTP#998
tohtana merged 11 commits intodeepspeedai:masterfrom
tohtana:tohtana/custom_auto_tp

Conversation

@tohtana
Copy link
Contributor

@tohtana tohtana commented Jan 22, 2026

This PR adds examples of AutoTP training including custom partitioning partterns.

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
@tohtana tohtana requested a review from tjruwase as a code owner January 22, 2026 00:13
tohtana added a commit to deepspeedai/DeepSpeed that referenced this pull request Jan 31, 2026
This PR introduces a flexible, configuration-driven API for AutoTP
(Automatic Tensor Parallelism) that allows users to define custom layer
partitioning patterns for training.
@inkcherry @delock 

## Motivation

Previously, AutoTP relied on hardcoded layer detection logic that was
difficult to customize for new model architectures. This PR enables:

1. **Custom models**: Users can define exact regex patterns to match
their model's parameter names
2. **Fused layers**: Support for fused QKV, gate_up_proj, and other
packed weight matrices with unequal sub-parameter sizes (e.g., GQA with
different Q/K/V dimensions)
3. **Extensibility**: Easy to add new model presets or customize
existing ones

Here is an example of a config including custom partitioning patterns:

```json
{
    "tensor_parallel": {
        "autotp_size": 4,
        "partition_config": {
            "use_default_specs": false,
            "layer_specs": [
                {
                    "patterns": [".*\\.o_proj\\.weight$", ".*\\.down_proj\\.weight$"],
                    "partition_type": "row"
                },
                {
                    "patterns": [".*\\.[qkv]_proj\\.weight$"],
                    "partition_type": "column"
                },
                {
                    "patterns": [".*\\.gate_up_proj\\.weight$"],
                    "partition_type": "column",
                    "shape": [2, -1],
                    "partition_dim": 0
                }
            ]
        }
    }
}
```

Refer to the
[document](https://github.com/tohtana/DeepSpeed/blob/tohtana/autotp_custom_patterns/docs/code-docs/source/training.rst)
for more details (including preset models and how to define partitioning
for fused models).
We also opened a new
[PR](deepspeedai/DeepSpeedExamples#998) to show
the usage.


## Simplified initialization step

AutoTP previously required calling ``set_autotp_mode(training=True)``
and ``deepspeed.tp_model_init`` before ``deepspeed.initialize``. Now we
can include all the necessary configurations in the DeepSpeed config.

We still support the traditional initialization path for backward
compatibility.
When you use both (i.e. calling ``set_autotp_mode(training=True)`` and
``deepspeed.tp_model_init`` and passing the config to
``deepspeed.initialize``), we will merge the settings at initialization.
When we have conflicting settings, we will error out.

---------

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
tohtana and others added 2 commits February 1, 2026 10:18
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
tohtana added a commit to deepspeedai/DeepSpeed that referenced this pull request Feb 7, 2026
The current code has the following issues:
- `use_default_specs: false` doesn't work
- Injection by the traditional pattern runs even when custom patterns
are set
- `mpu` needs to be passed to `deepspeed.initialize` (HF integration
doesn't pass mpu)

This PR fixes AutoTP setup to respect `use_default_specs: false` and
disable the traditional injection path when custom patterns are enabled.
Also, when `mpu` is not passed, we create a TP group in the
initialization process.


With these changes, the [related
tests](https://github.com/deepspeedai/DeepSpeed/tree/master/tests/unit/model_parallelism)
pass and [all AutoTP
examples](https://github.com/tohtana/DeepSpeedExamples/tree/tohtana/custom_auto_tp/training/tensor_parallel)
in DeepSpeedExamples work now
([PR](deepspeedai/DeepSpeedExamples#998)).

---------

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
@tohtana tohtana merged commit ece52bc into deepspeedai:master Feb 7, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant