TY - JOUR
T1 - DATA VALUATION FOR VERTICAL FEDERATED LEARNING
T2 - A MODEL-FREE AND PRIVACY-PRESERVING METHOD1
AU - Han, Xiao
AU - Wang, Leye
AU - Wu, Junjie
AU - Fang, Xiao
N1 - Publisher Copyright:
© 2026 University of Minnesota. All rights reserved.
PY - 2026/3/1
Y1 - 2026/3/1
N2 - Vertical federated learning (VFL) is a promising paradigm for predictive analytics, empowering an organization (i.e., task party) to enhance its predictive models through collaborations with multiple data suppliers (i.e., data parties) in a decentralized and privacy-preserving way. Despite the fast-growing interest in VFL, the lack of effective and secure tools for assessing the value of data owned by data parties hinders the application of VFL in business contexts. In response, we propose FedValue, a privacy-preserving, task-specific but model-free data valuation method for VFL, which consists of a data valuation metric and a federated computation method. Specifically, we first introduce a novel data valuation metric, namely MShapley-CMI. The metric evaluates a data party’s contribution to a predictive analytics task without the need of executing a machine learning model, making it well-suited for real-world applications of VFL. Next, we develop an innovative federated computation method that calculates the MShapley-CMI value for each data party in a privacy-preserving manner. Extensive experiments conducted on synthetic and realistic datasets validate the efficacy of FedValue for data valuation in the context of VFL. In addition, we illustrate the practical utility of FedValue with case studies involving federated recommendations and financial default prediction.
AB - Vertical federated learning (VFL) is a promising paradigm for predictive analytics, empowering an organization (i.e., task party) to enhance its predictive models through collaborations with multiple data suppliers (i.e., data parties) in a decentralized and privacy-preserving way. Despite the fast-growing interest in VFL, the lack of effective and secure tools for assessing the value of data owned by data parties hinders the application of VFL in business contexts. In response, we propose FedValue, a privacy-preserving, task-specific but model-free data valuation method for VFL, which consists of a data valuation metric and a federated computation method. Specifically, we first introduce a novel data valuation metric, namely MShapley-CMI. The metric evaluates a data party’s contribution to a predictive analytics task without the need of executing a machine learning model, making it well-suited for real-world applications of VFL. Next, we develop an innovative federated computation method that calculates the MShapley-CMI value for each data party in a privacy-preserving manner. Extensive experiments conducted on synthetic and realistic datasets validate the efficacy of FedValue for data valuation in the context of VFL. In addition, we illustrate the practical utility of FedValue with case studies involving federated recommendations and financial default prediction.
KW - Data valuation
KW - federated recommendation
KW - predictive analytics
KW - privacy
KW - vertical federated learning
UR - https://www.scopus.com/pages/publications/105032174910
U2 - 10.25300/MISQ/2025/19161
DO - 10.25300/MISQ/2025/19161
M3 - 文章
AN - SCOPUS:105032174910
SN - 0276-7783
VL - 50
SP - 177
EP - 210
JO - MIS Quarterly: Management Information Systems
JF - MIS Quarterly: Management Information Systems
IS - 1
ER -