-
-
Notifications
You must be signed in to change notification settings - Fork 215
[ENH] Refactor Extension
#1590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jgyasu
wants to merge
27
commits into
openml:main
Choose a base branch
from
jgyasu:refactor-extension
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
[ENH] Refactor Extension
#1590
Changes from all commits
Commits
Show all changes
27 commits
Select commit
Hold shift + click to select a range
2c0c1aa
[ENH] Refactor `Extension`
jgyasu 2aab335
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] d368eab
Merge remote-tracking branch 'upstream/main' into refactor-extension
jgyasu 6d3e0e9
Merge remote-tracking branch 'refs/remotes/origin/refactor-extension'…
jgyasu 1365bf6
correct openml exception
jgyasu 67c0efb
use __all__ for imports in __init__
jgyasu e5850ef
update registry
jgyasu 00da7a9
update registry and file structure
jgyasu 7d33463
Merge branch 'main' into refactor-extension
jgyasu 373fa53
[DO NOT MERGE] Refactor openml-sklearn back into openml-python
jgyasu e86fab7
add public function for serialisation and deserialisation
jgyasu e92156a
move the flow utils to flows/functions.py
jgyasu 1945c58
update flows
jgyasu 5a1ccd6
expose parameters of flow_to_model
jgyasu c7e52e1
remove sklearn
jgyasu 12df955
remove .DS_Store
jgyasu 9e5e752
add flow functions to __init__.py
jgyasu bf9a0aa
add tests for extension base classes and registry
jgyasu e3ca07d
remove sklearn extension from registry temporarily
jgyasu a2aa2d0
Merge branch 'main' into pr/1590
fkiraly bc541bb
Merge branch 'main' into refactor-extension
jgyasu 34448aa
Merge remote-tracking branch 'refs/remotes/origin/refactor-extension'…
jgyasu 7bd15e5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 3afc0d9
Merge branch 'main' into refactor-extension
jgyasu 406a96f
move some methods between serializer and executor
jgyasu 5030914
update tests
jgyasu ad441cf
remove api connector, resolve at serialiser and executor level
jgyasu File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| # License: BSD 3-Clause | ||
|
|
||
| """Base classes for OpenML extensions.""" | ||
|
|
||
| from openml.extensions.base._executor import ModelExecutor | ||
| from openml.extensions.base._serializer import ModelSerializer | ||
|
|
||
| __all__ = [ | ||
| "ModelExecutor", | ||
| "ModelSerializer", | ||
| ] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,137 @@ | ||
| # License: BSD 3-Clause | ||
|
|
||
| """Base class for estimator executors.""" | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| from abc import ABC, abstractmethod | ||
| from collections import OrderedDict | ||
| from typing import TYPE_CHECKING, Any | ||
|
|
||
| if TYPE_CHECKING: | ||
| import numpy as np | ||
| import scipy.sparse | ||
|
|
||
| from openml.runs.trace import OpenMLRunTrace, OpenMLTraceIteration | ||
| from openml.tasks.task import OpenMLTask | ||
|
|
||
|
|
||
| class ModelExecutor(ABC): | ||
| """Define runtime execution semantics for a specific API type.""" | ||
|
|
||
| @classmethod | ||
| @abstractmethod | ||
| def can_handle_model(cls, model: Any) -> bool: | ||
| """Check whether a model flow can be handled by this extension. | ||
|
|
||
| This is typically done by checking the type of the model, or the package it belongs to. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| model : Any | ||
|
|
||
| Returns | ||
| ------- | ||
| bool | ||
| """ | ||
|
|
||
| @abstractmethod | ||
| def seed_model(self, model: Any, seed: int | None) -> Any: | ||
| """Set the seed of all the unseeded components of a model and return the seeded model. | ||
|
|
||
| Required so that all seed information can be uploaded to OpenML for reproducible results. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| model : Any | ||
| The model to be seeded | ||
| seed : int | ||
|
|
||
| Returns | ||
| ------- | ||
| model | ||
| """ | ||
|
|
||
| @abstractmethod | ||
| def _run_model_on_fold( # noqa: PLR0913 | ||
| self, | ||
| model: Any, | ||
| task: OpenMLTask, | ||
| X_train: np.ndarray | scipy.sparse.spmatrix, | ||
| rep_no: int, | ||
| fold_no: int, | ||
| y_train: np.ndarray | None = None, | ||
| X_test: np.ndarray | scipy.sparse.spmatrix | None = None, | ||
| ) -> tuple[np.ndarray, np.ndarray | None, OrderedDict[str, float], OpenMLRunTrace | None]: | ||
| """Run a model on a repeat, fold, subsample triplet of the task. | ||
|
|
||
| Returns the data that is necessary to construct the OpenML Run object. Is used by | ||
| :func:`openml.runs.run_flow_on_task`. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| model : Any | ||
| The UNTRAINED model to run. The model instance will be copied and not altered. | ||
| task : OpenMLTask | ||
| The task to run the model on. | ||
| X_train : array-like | ||
| Training data for the given repetition and fold. | ||
| rep_no : int | ||
| The repeat of the experiment (0-based; in case of 1 time CV, always 0) | ||
| fold_no : int | ||
| The fold nr of the experiment (0-based; in case of holdout, always 0) | ||
| y_train : Optional[np.ndarray] (default=None) | ||
| Target attributes for supervised tasks. In case of classification, these are integer | ||
| indices to the potential classes specified by dataset. | ||
| X_test : Optional, array-like (default=None) | ||
| Test attributes to test for generalization in supervised tasks. | ||
|
|
||
| Returns | ||
| ------- | ||
| predictions : np.ndarray | ||
| Model predictions. | ||
| probabilities : Optional, np.ndarray | ||
| Predicted probabilities (only applicable for supervised classification tasks). | ||
| user_defined_measures : OrderedDict[str, float] | ||
| User defined measures that were generated on this fold | ||
| trace : Optional, OpenMLRunTrace | ||
| Hyperparameter optimization trace (only applicable for supervised tasks with | ||
| hyperparameter optimization). | ||
| """ | ||
|
|
||
| @abstractmethod | ||
| def check_if_model_fitted(self, model: Any) -> bool: | ||
| """Returns True/False denoting if the model has already been fitted/trained. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| model : Any | ||
|
|
||
| Returns | ||
| ------- | ||
| bool | ||
| """ | ||
|
|
||
| # Abstract methods for hyperparameter optimization | ||
|
|
||
| @abstractmethod | ||
| def instantiate_model_from_hpo_class( | ||
| self, | ||
| model: Any, | ||
| trace_iteration: OpenMLTraceIteration, | ||
| ) -> Any: | ||
| """Instantiate a base model which can be searched over by the hyperparameter optimization | ||
| model. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| model : Any | ||
| A hyperparameter optimization model which defines the model to be instantiated. | ||
| trace_iteration : OpenMLTraceIteration | ||
| Describing the hyperparameter settings to instantiate. | ||
|
|
||
| Returns | ||
| ------- | ||
| Any | ||
| """ | ||
| # TODO a trace belongs to a run and therefore a flow -> simplify this part of the interface! | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,102 @@ | ||
| # License: BSD 3-Clause | ||
|
|
||
| """Base class for estimator serializors.""" | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| from abc import ABC, abstractmethod | ||
| from typing import TYPE_CHECKING, Any | ||
|
|
||
| if TYPE_CHECKING: | ||
| from openml.flows import OpenMLFlow | ||
|
|
||
|
|
||
| class ModelSerializer(ABC): | ||
| """Handle the conversion between estimator instances and OpenML Flows.""" | ||
|
|
||
| @classmethod | ||
| @abstractmethod | ||
| def can_handle_model(cls, model: Any) -> bool: | ||
| """Check whether a model flow can be handled by this extension. | ||
|
|
||
| This is typically done by checking the type of the model, or the package it belongs to. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| model : Any | ||
|
|
||
| Returns | ||
| ------- | ||
| bool | ||
| """ | ||
|
|
||
| @abstractmethod | ||
| def model_to_flow(self, model: Any) -> OpenMLFlow: | ||
| """Transform a model to a flow for uploading it to OpenML. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| model : Any | ||
|
|
||
| Returns | ||
| ------- | ||
| OpenMLFlow | ||
| """ | ||
|
|
||
| @abstractmethod | ||
| def flow_to_model( | ||
| self, | ||
| flow: OpenMLFlow, | ||
| initialize_with_defaults: bool = False, # noqa: FBT002 | ||
| strict_version: bool = True, # noqa: FBT002 | ||
| ) -> Any: | ||
| """Instantiate a model from the flow representation. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| flow : OpenMLFlow | ||
|
|
||
| initialize_with_defaults : bool, optional (default=False) | ||
| If this flag is set, the hyperparameter values of flows will be | ||
| ignored and a flow with its defaults is returned. | ||
|
|
||
| strict_version : bool, default=True | ||
| Whether to fail if version requirements are not fulfilled. | ||
|
|
||
| Returns | ||
| ------- | ||
| Any | ||
| """ | ||
|
|
||
| @abstractmethod | ||
| def get_version_information(self) -> list[str]: | ||
| """Return dependency and version information.""" | ||
|
|
||
| @abstractmethod | ||
| def obtain_parameter_values( | ||
| self, | ||
| flow: OpenMLFlow, | ||
| model: Any = None, | ||
| ) -> list[dict[str, Any]]: | ||
| """Extracts all parameter settings required for the flow from the model. | ||
|
|
||
| If no explicit model is provided, the parameters will be extracted from `flow.model` | ||
| instead. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| flow : OpenMLFlow | ||
| OpenMLFlow object (containing flow ids, i.e., it has to be downloaded from the server) | ||
|
|
||
| model: Any, optional (default=None) | ||
| The model from which to obtain the parameter values. Must match the flow signature. | ||
| If None, use the model specified in ``OpenMLFlow.model``. | ||
|
|
||
| Returns | ||
| ------- | ||
| list | ||
| A list of dicts, where each dict has the following entries: | ||
| - ``oml:name`` : str: The OpenML parameter name | ||
| - ``oml:value`` : mixed: A representation of the parameter value | ||
| - ``oml:component`` : int: flow id to which the parameter belongs | ||
| """ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,96 @@ | ||
| # License: BSD 3-Clause | ||
|
|
||
| """Extension registries for serializers and executors.""" | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| from typing import TYPE_CHECKING, Any | ||
|
|
||
| from openml.exceptions import PyOpenMLError | ||
|
|
||
| if TYPE_CHECKING: | ||
| from openml.extensions.base import ModelExecutor, ModelSerializer | ||
|
|
||
|
|
||
| SERIALIZER_REGISTRY: list[type[ModelSerializer]] = [] | ||
| EXECUTOR_REGISTRY: list[type[ModelExecutor]] = [] | ||
|
|
||
|
|
||
| def register_serializer(cls: type[ModelSerializer]) -> type[ModelSerializer]: | ||
| """Register a serializer class.""" | ||
| SERIALIZER_REGISTRY.append(cls) | ||
| return cls | ||
|
|
||
|
|
||
| def register_executor(cls: type[ModelExecutor]) -> type[ModelExecutor]: | ||
| """Register an executor class.""" | ||
| EXECUTOR_REGISTRY.append(cls) | ||
| return cls | ||
|
|
||
|
|
||
| def resolve_serializer(estimator: Any) -> ModelSerializer: | ||
| """ | ||
| Identify and return the appropriate serializer for a given estimator. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| estimator : Any | ||
| The estimator instance (e.g., sklearn estimator, sktime estimator). | ||
|
|
||
| Returns | ||
| ------- | ||
| ModelSerializer | ||
| An instance of the matching serializer. | ||
|
|
||
| Raises | ||
| ------ | ||
| PyOpenMLError | ||
| If no serializer supports the estimator or if multiple serializers match. | ||
| """ | ||
| matches = [ | ||
| serializer_cls | ||
| for serializer_cls in SERIALIZER_REGISTRY | ||
| if serializer_cls.can_handle_model(estimator) | ||
| ] | ||
|
|
||
| if len(matches) == 1: | ||
| return matches[0]() | ||
|
|
||
| if len(matches) > 1: | ||
| raise PyOpenMLError("Multiple serializers support this estimator.") | ||
|
|
||
| raise PyOpenMLError("No serializer supports this estimator.") | ||
|
|
||
|
|
||
| def resolve_executor(estimator: Any) -> ModelExecutor: | ||
| """ | ||
| Identify and return the appropriate executor for a given estimator. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| estimator : Any | ||
| The estimator instance. | ||
|
|
||
| Returns | ||
| ------- | ||
| ModelExecutor | ||
| An instance of the matching executor. | ||
|
|
||
| Raises | ||
| ------ | ||
| PyOpenMLError | ||
| If no executor supports the estimator or if multiple executors match. | ||
| """ | ||
| matches = [ | ||
| executor_cls | ||
| for executor_cls in EXECUTOR_REGISTRY | ||
| if executor_cls.can_handle_model(estimator) | ||
| ] | ||
|
|
||
| if len(matches) == 1: | ||
| return matches[0]() | ||
|
|
||
| if len(matches) > 1: | ||
| raise PyOpenMLError("Multiple executors support this estimator.") | ||
|
|
||
| raise PyOpenMLError("No executor supports this estimator.") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.