Fed4Fed: A Privacy-Preserving Federated Statistical Approach for Evaluating Federated Learning Models

  • Zhongchi Wang
  • , Hailong Sun*
  • , Zhengyang Zhao
  • , Li Duan
  • , Wei Ni
  • , Xiang Gao
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

With the widespread application of federated learning in healthcare scenarios, ensuring performance fairness of disease diagnosis models across different medical institutions (clients) has attracted increasing attention. However, accurately evaluating whether models achieve this goal is equally critical yet faces numerous challenges: on one hand, each client can only evaluate the global model based on its own limited private data, which easily leads to performance estimation bias; on the other hand, due to data privacy constraints, clients cannot know the model's performance at other institutions, making it difficult to determine whether the global model truly achieves cross-client performance fairness. To address this, this paper proposes the Fed4Fed federated evaluation framework, which can more accurately evaluate the global model's performance in actual deployment while protecting data privacy by combining private data from multiple clients, and rigorously infer model performance fairness based on statistical hypothesis testing. Specifically, Fed4Fed draws inspiration from federated learning principles to collaboratively utilize multi-party private data while protecting data privacy. Second, it innovatively introduces Bootstrap methods and statistical inference strategies to construct and analyze the statistical distribution of model performance, reducing the randomness of performance evaluation. Third, based on statistical homogeneity testing theory, two fairness testing methods are designed to provide theoretical guarantees for evaluating performance fairness. Finally, experiments on synthetic datasets as well as four types of multi-modal real datasets including CIFAR-10, MNIST, Fashion-MNIST, and SST demonstrate that: Fed4Fed effectively overcomes the limitations of existing evaluation methods, with a fairness misjudgment rate below 5%, an average confidence interval coverage rate of 94.28% for performance, and robust performance across different degrees of non-independent and identically distributed (non-IID) scenarios.

Original languageEnglish
JournalIEEE Transactions on Dependable and Secure Computing
DOIs
StateAccepted/In press - 2026

Keywords

  • evaluation framework
  • Federated Learning
  • performance fairness
  • privacy-protected

Fingerprint

Dive into the research topics of 'Fed4Fed: A Privacy-Preserving Federated Statistical Approach for Evaluating Federated Learning Models'. Together they form a unique fingerprint.

Cite this