Implement DRScorer by kgao · Pull Request #757 · py-why/EconML

kgao · 2023-04-11T20:01:13Z

Create initial implementation for drscorer for dr-learner based on dr-loss.

kbattocchi

This is a great start, but please address my comments, as well as the linting and testing failures in the automated checks.

kbattocchi · 2023-04-17T04:26:43Z

econml/score/drscorer.py

+
+        p (model_propensity) = Pr[T | X, W]
+
+        Ydr(g,p) = g  + (Y - g ) / p * T


This doesn't look right to me - this should use the equation from the "Doubly Robust" section of https://github.com/py-why/EconML/blob/main/doc/spec/estimation/dr.rst

Mmm, if you mean the right equation should be the one right below "The Doubly Robust approach": Y_{i, t}^{DR} = g_t(X_i, W_i) + \frac{Y_i -g_t(X_i, W_i)}{p_t(X_i, W_i)} \cdot 1{T_i=t}
It's:
Y_DR[i,t] <- g_t(X[i], W[i]) + (Y[i] - g_t(X[i], W[i])) / p_t(X[i], W[i]) * (T[i] == t)
What I put here should be a short format combine with line 16 and line 18 (Where I put the input, weights)?

Yeah, it's a bit tricky to write out - the key is that you're not multiplying by T at the end, you're multiplying by the indicator function selecting the specific case of T, and likewise you're dividing by the probability of that specific treatment, which is a bit awkward to express in pseudo-notation.

I think something like

Ydr(g,p) = g(X,W,T) + (Y - g(X,W,T)) / p_T(X,W)

would make it more obvious that there's only one term being included, it's not really being multiplied by T in a meaningful way, and it expresses the more-than-binary treatment outcome correctly. This does require writing out the arguments to g and p, but I think that's okay since otherwise it's hard to be precise about what's being computed.

Ok updated acoordingly.

kbattocchi · 2023-04-17T15:20:25Z

econml/score/drscorer.py

+        g, p = self.drlearner_._cached_values.nuisances
+        Y =  self.drlearner_._cached_values.Y
+        T =  self.drlearner_._cached_values.T
+        Ydr = g  + (Y - g) / p * T


This code should basically reflect the code in

EconML/econml/dr/_drlearner.py

Lines 131 to 149 in 8b7fe33

Y_pred, propensities = nuisances

self.d_y = Y_pred.shape[1:-1] # track whether there's a Y dimension (must be a singleton)

self.d_t = Y_pred.shape[-1] - 1 # track # of treatment (exclude baseline treatment)

if (X is not None) and (self._featurizer is not None):

X = self._featurizer.fit_transform(X)

if self._multitask_model_final:

ys = Y_pred[..., 1:] - Y_pred[..., [0]] # subtract control results from each other arm

if self.d_y: # need to squeeze out singleton so that we fit on 2D array

ys = ys.squeeze(1)

weighted_sample_var = np.tile((sample_var / propensities**2).reshape((-1, 1)),

self.d_t) if sample_var is not None else None

filtered_kwargs = filter_none_kwargs(sample_weight=sample_weight,

freq_weight=freq_weight, sample_var=weighted_sample_var)

self.model_cate = self._model_final.fit(X, ys, **filtered_kwargs)

else:

weighted_sample_var = sample_var / propensities**2 if sample_var is not None else None

filtered_kwargs = filter_none_kwargs(sample_weight=sample_weight,

freq_weight=freq_weight, sample_var=weighted_sample_var)

where we take Y_pred from the nuisances and we form the doubly-robust estimate by subtracting Y_pred[:,0] from the other Y_pred[:,t] values. Since in the model fitting code the propensities nuisance is used only for adjusting sample_var, which we don't support here, I think you can ignore all of that code. So, I think this should look something more like:

Suggested change

g, p = self.drlearner_._cached_values.nuisances

Y = self.drlearner_._cached_values.Y

T = self.drlearner_._cached_values.T

Ydr = g + (Y - g) / p * T

Y_pred, _ = self.drlearner_._cached_values.nuisances

Y_dr = Y_pred[..., 1:] - Y_pred[..., [0]]

Good suggestion, thanks!

Signed-off-by: kgao <kevin.leo.gao@gmail.com>

Signed-off-by: Keith Battocchi <kebatt@microsoft.com> Signed-off-by: kgao <kevin.leo.gao@gmail.com>

Signed-off-by: kgao <kevin.leo.gao@gmail.com>

Signed-off-by: Keith Battocchi <kebatt@microsoft.com>

kgao force-pushed the kgao/drscore branch from 38a5ae6 to ce4fec5 Compare April 11, 2023 20:10

kbattocchi requested changes Apr 17, 2023

View reviewed changes

kgao force-pushed the kgao/drscore branch from efa09e4 to 15f4c36 Compare May 1, 2023 20:51

kgao requested a review from kbattocchi May 18, 2023 19:25

kgao and others added 10 commits May 18, 2023 16:20

Init

c402635

Signed-off-by: kgao <kevin.leo.gao@gmail.com>

add routing for drscorer and tester

d9c3129

Signed-off-by: kgao <kevin.leo.gao@gmail.com>

fix linting and formular

42d817a

Signed-off-by: kgao <kevin.leo.gao@gmail.com>

Fix status checks in CI workflow

12e9da9

Signed-off-by: Keith Battocchi <kebatt@microsoft.com> Signed-off-by: kgao <kevin.leo.gao@gmail.com>

Fix #760

83851c9

Signed-off-by: Keith Battocchi <kebatt@microsoft.com> Signed-off-by: kgao <kevin.leo.gao@gmail.com>

Enable compatability with pandas 2.0

06252f2

Signed-off-by: Keith Battocchi <kebatt@microsoft.com> Signed-off-by: kgao <kevin.leo.gao@gmail.com>

Bump supported shap version limit

4791e0c

Signed-off-by: Keith Battocchi <kebatt@microsoft.com> Signed-off-by: kgao <kevin.leo.gao@gmail.com>

Use verbose pip install when debugging workflows

3d96e0a

Signed-off-by: Keith Battocchi <kebatt@microsoft.com> Signed-off-by: kgao <kevin.leo.gao@gmail.com>

Fix numpy 1.24 compatibility

a05e1ba

Signed-off-by: Keith Battocchi <kebatt@microsoft.com> Signed-off-by: kgao <kevin.leo.gao@gmail.com>

update test_drscorer

b5a2982

Signed-off-by: kgao <kevin.leo.gao@gmail.com>

kgao force-pushed the kgao/drscore branch from fcaf891 to b5a2982 Compare May 18, 2023 20:20

kgao added 2 commits May 18, 2023 16:31

add baseline model in unit test

2c0c928

Signed-off-by: kgao <kevin.leo.gao@gmail.com>

Cleanup linting and imports

30e3d88

Signed-off-by: kgao <kevin.leo.gao@gmail.com>

kbattocchi changed the title ~~Init~~ Implement DRScorer Jun 16, 2023

Merge branch 'main' into kgao/drscore

d6a1abf

Signed-off-by: Keith Battocchi <kebatt@microsoft.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement DRScorer#757

Implement DRScorer#757
kgao wants to merge 13 commits intomainfrom
kgao/drscore

kgao commented Apr 11, 2023

Uh oh!

kbattocchi left a comment

Uh oh!

kbattocchi Apr 17, 2023

Uh oh!

kgao Apr 26, 2023

Uh oh!

kbattocchi Apr 26, 2023

Uh oh!

kgao May 1, 2023

Uh oh!

kbattocchi Apr 17, 2023

Uh oh!

kgao Apr 26, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		p (model_propensity) = Pr[T \| X, W]

		Ydr(g,p) = g + (Y - g ) / p * T

	Y_pred, propensities = nuisances
	self.d_y = Y_pred.shape[1:-1] # track whether there's a Y dimension (must be a singleton)
	self.d_t = Y_pred.shape[-1] - 1 # track # of treatment (exclude baseline treatment)
	if (X is not None) and (self._featurizer is not None):
	X = self._featurizer.fit_transform(X)

	if self._multitask_model_final:
	ys = Y_pred[..., 1:] - Y_pred[..., [0]] # subtract control results from each other arm
	if self.d_y: # need to squeeze out singleton so that we fit on 2D array
	ys = ys.squeeze(1)
	weighted_sample_var = np.tile((sample_var / propensities**2).reshape((-1, 1)),
	self.d_t) if sample_var is not None else None
	filtered_kwargs = filter_none_kwargs(sample_weight=sample_weight,
	freq_weight=freq_weight, sample_var=weighted_sample_var)
	self.model_cate = self._model_final.fit(X, ys, **filtered_kwargs)
	else:
	weighted_sample_var = sample_var / propensities**2 if sample_var is not None else None
	filtered_kwargs = filter_none_kwargs(sample_weight=sample_weight,
	freq_weight=freq_weight, sample_var=weighted_sample_var)

Conversation

kgao commented Apr 11, 2023

Uh oh!

kbattocchi left a comment

Choose a reason for hiding this comment

Uh oh!

kbattocchi Apr 17, 2023

Choose a reason for hiding this comment

Uh oh!

kgao Apr 26, 2023

Choose a reason for hiding this comment

Uh oh!

kbattocchi Apr 26, 2023

Choose a reason for hiding this comment

Uh oh!

kgao May 1, 2023

Choose a reason for hiding this comment

Uh oh!

kbattocchi Apr 17, 2023

Choose a reason for hiding this comment

Uh oh!

kgao Apr 26, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants