Skip to content

Thread-safe mode with dj.Instance#1404

Open
dimitri-yatsenko wants to merge 31 commits intomasterfrom
feature/thread-safe-mode
Open

Thread-safe mode with dj.Instance#1404
dimitri-yatsenko wants to merge 31 commits intomasterfrom
feature/thread-safe-mode

Conversation

@dimitri-yatsenko
Copy link
Member

Summary

  • Add dj.Instance(host, user, password) for isolated config + connection pairs
  • Add thread-safe mode (DJ_THREAD_SAFE=true) that disables global singletons and forces explicit dj.Instance() usage
  • Thread config through codec and hash_registry chains so all runtime config reads go through the connection, not global state
  • Refactor dj.Schema from a function wrapper into a proper class subclassing _Schema, so isinstance, hasattr, etc. work naturally

Usage

# Isolated instance (new)
inst = dj.Instance(host="localhost", user="root", password="secret")
schema = inst.Schema("my_schema")

# Global pattern (unchanged)
schema = dj.Schema("my_schema")
# Force instance usage in concurrent apps
export DJ_THREAD_SAFE=true

Use cases

  • Web servers: one Instance per request/tenant
  • Testing: isolated databases per test
  • Multi-database: connect to production and staging simultaneously
  • Parallel pipelines: independent connections per worker

Design spec

See docs/design/thread-safe-mode.md for the full specification including global state audit.

Test plan

  • Unit tests for Instance, _ConfigProxy, thread-safe guards (tests/unit/test_thread_safe.py)
  • Integration tests pass (semantic matching, jobs, GC, schema)
  • Lint passes

🤖 Generated with Claude Code

dimitri-yatsenko and others added 30 commits February 13, 2026 15:58
Simple spec for blocking global state access in multi-tenant environments.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove dead code:
- filepath_checksum_size_limit (never used)
- enable_python_native_blobs (never used)
- cache (only query_cache is used)
- init_function/init_command (database init command)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- All settings can be passed to Connection.from_config()
- Only thread_safe is read-only after initialization

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Parameters and defaults
- Connection-scoped settings via conn.config
- Never accesses global dj.config

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Connection.from_config() creates a Config instance for conn.config
- Database connection settings (host, port, user, password, use_tls, backend)
  become read-only after connection is established
- Other settings remain mutable per-connection

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- thread_safe=True: global dj.config becomes read-only
- conn.config copies from global config, always mutable
- Simpler: global config still readable for defaults

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Simpler API:
- Use existing Connection() constructor
- conn.config copies from global dj.config
- conn.config is always mutable for per-connection settings

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
All runtime operations must use self.connection.config instead of
global config:
- table.py: safemode for delete/drop
- schemas.py: safemode, create_tables
- preview.py: display settings
- diagram.py: diagram_direction
- jobs.py: all jobs settings
- autopopulate.py: jobs settings
- declare.py: add_job_metadata
- connection.py: reconnect, query_cache
- hash_registry.py, codecs: stores, download_path

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Explains how connections propagate:
- Connection → Schema → Table classes → Table instances
- Schema falls back to conn() if no connection provided
- Tables inherit connection from schema via _connection class attribute
- In thread_safe mode, Schema("name") fails, Schema("name", connection=conn) works

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- dj.conn().config IS dj.config (same object)
- dj.Connection(...).config is COPY of dj.config (independent)
- All internal code uses self.connection.config uniformly

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Connection.__init__ always creates self.config = copy(dj.config)
- dj.conn() overrides after creation: conn.config = dj.config

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Mixed scenarios: dj.config affects global connection schemas only
- Explicit connection schemas have independent config
- dj.config.override() affects only schemas using dj.conn()
- conn.config.override() affects only that connection's schemas
- In thread_safe=True: dj.config.override() raises ThreadSafetyError

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
New approach using dj.new() for isolated contexts:
- Each context has one config and one connection
- ctx.Schema() auto-uses context's connection
- ctx.Manual, ctx.Lookup, etc. for table base classes
- dj module acts as singleton context (legacy API)
- thread_safe=True blocks singleton, allows dj.new()

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- ctx exposes only: config, connection, Schema()
- Connection created at context construction via dj.new()
- Tables still use dj.Manual, dj.Lookup as base classes
- thread_safe=True: dj.config only allows thread_safe access

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- dj.config, dj.conn(), dj.Schema() delegate to singleton instance
- Singleton lazily loaded on first access
- thread_safe checked at module import, blocks singleton access
- inst.config created fresh (not copied from dj.config)
- dj.instance() always works, creates isolated instance

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- dj.Manual, dj.Lookup etc. used with @Schema decorator (schema links connection)
- inst.Schema(), inst.FreeTable() need connection directly
- FreeTable added to Instance class

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- dj.Instance() (uppercase) for consistency with dj.Schema()
- Single _singleton_instance created lazily
- dj.config -> _singleton.config (via proxy)
- dj.conn() -> _singleton.connection
- dj.Schema() -> _singleton.Schema()
- dj.FreeTable() -> _singleton.FreeTable()
- All trigger same singleton creation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add Instance class that encapsulates config + connection
- Add ThreadSafetyError exception for global state access
- Add _ConfigProxy to delegate dj.config to global config
- Add _get_singleton_connection for lazy connection creation
- Update dj.conn(), dj.Schema(), dj.FreeTable() to use singleton
- Connection now stores _config reference for instance isolation
- Add DJ_THREAD_SAFE environment variable support
- Add comprehensive tests for thread-safe mode

When DJ_THREAD_SAFE=true:
- dj.config raises ThreadSafetyError
- dj.conn() raises ThreadSafetyError
- dj.Schema() raises ThreadSafetyError (without explicit connection)
- dj.FreeTable() raises ThreadSafetyError (without explicit connection)
- dj.Instance() always works for isolated contexts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- conn(host, user, password) now updates the singleton connection
  instead of creating a separate connection
- Remove irrelevant safemode=False from spec examples
- thread_safe is set via DJ_THREAD_SAFE env var or config file

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove unused `from typing import Callable` in connection.py (lint failure)
- Update mock_cache fixture: `cache` → `download_path` (KeyError in test_attach)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove unused `_singleton_connection` import in __init__.py (F401)
- Remove unused `os` import in test_thread_safe.py (F401)
- Remove unused `Callable` import in connection.py (F401)
- Fix mock_cache fixture: `cache` → `download_path` for 2.0 settings

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The global codec registry is effectively immutable after import:
registration runs under Python's import lock, and the only runtime
mutation (_load_entry_points) is idempotent under the GIL.
Per-instance isolation is unnecessary since codecs are part of the
type system, not connection-scoped state.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Catalog all 8 module-level mutable globals with thread-safety
classification: guarded (config, connection), safe by design
(codec registry), or low risk (logging, blob flags, import caches).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All internal code now reads configuration from self.connection._config
instead of the global config singleton. This ensures thread-safe mode
works correctly: each Instance's connection carries its own config,
and tables/schemas/jobs access it through the connection.

Changes across 9 files:
- schemas.py: safemode, create_tables default
- table.py: safemode in delete/drop, config passed to declare()
- expression.py: loglevel in __repr__
- preview.py: display.* settings via query_expression.connection._config
- autopopulate.py: jobs.allow_new_pk_fields, jobs.auto_refresh
- jobs.py: jobs.default_priority, stale_timeout, keep_completed
- declare.py: jobs.add_job_metadata (config param threaded through)
- diagram.py: display.diagram_direction (connection stored on instance)
- staged_insert.py: get_store_spec()

Removed unused `from .settings import config` imports from 7 modules.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- instance._global_config now reuses settings.config instead of creating
  a duplicate Config object. This ensures dj.config["safemode"] = False
  actually affects self.connection._config["safemode"] reads.
- schemas.py now uses _get_singleton_connection() from instance.py
  instead of the old conn() from connection.py, eliminating the
  duplicate singleton connection holder.
- dj.conn() now only creates a new connection when the singleton doesn't
  exist or reset=True (not on every call with credentials).
- test_uppercase_schema: use prompt=False for drop() calls.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ode spec

Adds Architecture section covering the object graph, config flow for both
singleton and Instance paths, and a table of all connection-scoped config
reads across 9 modules.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Eliminate the last global config reads in the runtime path by threading
connection-scoped config through the codec encode/decode and
hash_registry layers.

Encode path: table.py adds _config to the context dict, codecs extract
it from key and pass to hash_registry/storage helpers.

Decode path: expression.py passes connection to decode_attribute(),
which builds a decode key with _config for codec.decode() calls.

GC path: scan()/collect() extract config from schemas[0].connection and
pass to list_stored_hashes/delete_path/delete_schema_path.

All functions accept config=None with lazy fallback to settings.config
for backward compatibility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rename schemas.Schema → schemas._Schema (internal) and define
dj.Schema as a class that inherits from _Schema with a thread-safety
check in __init__. This eliminates the confusing function-vs-class
duality where dj.Schema was a function wrapper and schemas.Schema
was the class.

Now dj.Schema is a real class: isinstance, hasattr, and subclass
checks all work naturally. The test for rebuild_lineage can use
dj.Schema directly instead of importing from datajoint.schemas.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Keep master's atomic SQL UPDATE for job reservation (prevents race
conditions) while threading connection-scoped config through
_get_job_version().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Connection.__init__ now accepts `backend` and `config_override` kwargs
instead of reading from the module-level global config. This ensures
Instance creates connections using its own config, not the global one.

Also removes set_thread_safe() — thread-safe mode is an infrastructure
decision set via DJ_THREAD_SAFE env var, not a runtime toggle.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments