Conversation
kbattocchi
left a comment
There was a problem hiding this comment.
This is a great start, but please address my comments, as well as the linting and testing failures in the automated checks.
econml/score/drscorer.py
Outdated
|
|
||
| p (model_propensity) = Pr[T | X, W] | ||
|
|
||
| Ydr(g,p) = g + (Y - g ) / p * T |
There was a problem hiding this comment.
This doesn't look right to me - this should use the equation from the "Doubly Robust" section of https://github.com/py-why/EconML/blob/main/doc/spec/estimation/dr.rst
There was a problem hiding this comment.
Mmm, if you mean the right equation should be the one right below "The Doubly Robust approach": Y_{i, t}^{DR} = g_t(X_i, W_i) + \frac{Y_i -g_t(X_i, W_i)}{p_t(X_i, W_i)} \cdot 1{T_i=t}
It's:
Y_DR[i,t] <- g_t(X[i], W[i]) + (Y[i] - g_t(X[i], W[i])) / p_t(X[i], W[i]) * (T[i] == t)
What I put here should be a short format combine with line 16 and line 18 (Where I put the input, weights)?
There was a problem hiding this comment.
Yeah, it's a bit tricky to write out - the key is that you're not multiplying by T at the end, you're multiplying by the indicator function selecting the specific case of T, and likewise you're dividing by the probability of that specific treatment, which is a bit awkward to express in pseudo-notation.
I think something like
Ydr(g,p) = g(X,W,T) + (Y - g(X,W,T)) / p_T(X,W)
would make it more obvious that there's only one term being included, it's not really being multiplied by T in a meaningful way, and it expresses the more-than-binary treatment outcome correctly. This does require writing out the arguments to g and p, but I think that's okay since otherwise it's hard to be precise about what's being computed.
There was a problem hiding this comment.
Ok updated acoordingly.
econml/score/drscorer.py
Outdated
| g, p = self.drlearner_._cached_values.nuisances | ||
| Y = self.drlearner_._cached_values.Y | ||
| T = self.drlearner_._cached_values.T | ||
| Ydr = g + (Y - g) / p * T |
There was a problem hiding this comment.
This code should basically reflect the code in
EconML/econml/dr/_drlearner.py
Lines 131 to 149 in 8b7fe33
where we take Y_pred from the nuisances and we form the doubly-robust estimate by subtracting Y_pred[:,0] from the other Y_pred[:,t] values. Since in the model fitting code the propensities nuisance is used only for adjusting sample_var, which we don't support here, I think you can ignore all of that code. So, I think this should look something more like:
| g, p = self.drlearner_._cached_values.nuisances | |
| Y = self.drlearner_._cached_values.Y | |
| T = self.drlearner_._cached_values.T | |
| Ydr = g + (Y - g) / p * T | |
| Y_pred, _ = self.drlearner_._cached_values.nuisances | |
| Y_dr = Y_pred[..., 1:] - Y_pred[..., [0]] |
There was a problem hiding this comment.
Good suggestion, thanks!
Signed-off-by: kgao <kevin.leo.gao@gmail.com>
Signed-off-by: kgao <kevin.leo.gao@gmail.com>
Signed-off-by: Keith Battocchi <kebatt@microsoft.com> Signed-off-by: kgao <kevin.leo.gao@gmail.com>
Signed-off-by: Keith Battocchi <kebatt@microsoft.com> Signed-off-by: kgao <kevin.leo.gao@gmail.com>
Signed-off-by: Keith Battocchi <kebatt@microsoft.com> Signed-off-by: kgao <kevin.leo.gao@gmail.com>
Signed-off-by: Keith Battocchi <kebatt@microsoft.com> Signed-off-by: kgao <kevin.leo.gao@gmail.com>
Signed-off-by: Keith Battocchi <kebatt@microsoft.com> Signed-off-by: kgao <kevin.leo.gao@gmail.com>
Signed-off-by: kgao <kevin.leo.gao@gmail.com>
Signed-off-by: kgao <kevin.leo.gao@gmail.com>
Signed-off-by: kgao <kevin.leo.gao@gmail.com>
Signed-off-by: Keith Battocchi <kebatt@microsoft.com>
Create initial implementation for drscorer for dr-learner based on dr-loss.