Abstract
Interest in data-driven decision-making has stimulated method developments in estimating heterogeneous treatment effect. In practice, accurately estimating a conditional average treatment effect (CATE) requires a large sample, which is often realized by data integration that leverages information from multiple data sites. This paper attempts to address two challenges involved in such task, treatment effect heterogeneity and privacy protection. The first pertains to differences in the CATE coefficient across sites due to heterogeneity in treatment effect; the second pertains to barriers in sharing sensitive data across sites. We propose a distributed fusion learning approach, DF $R$-learner, to jointly estimate CATE across sites without pooling individual-participant data. It allows the CATE functions to differ and uses a data-driven fusion penalty to combine similar parameters across sites in achieving improved estimation. The estimator uses confidence distributions to facilitate efficiency and private information exchange, which we show theoretically and empirically no loss of efficiency compared to its counterpart based on centralized data. We examine DF $R$-learner through a study of medication treatment for opioid use disorder using distributed Medicaid data from multiple managed care organizations within the state of Pennsylvania.