Skip to main navigation Skip to search Skip to main content

Amplifying Inter-Message Distance: On Information Divergence Measures in Big Data

  • Rui She
  • , Shanyun Liu
  • , Pingyi Fan*
  • *Corresponding author for this work
  • Tsinghua University

Research output: Contribution to journalArticlepeer-review

Abstract

Message identification (M-I) divergence is an important measure of the information distance between probability distributions, similar to Kullback-Leibler (K-L) and Renyi divergence. In fact, M-I divergence with a variable parameter can make an effect on characterization of distinction between two distributions. Furthermore, by choosing an appropriate parameter of M-I divergence, it is possible to amplify the information distance between adjacent distributions while maintaining enough gap between two nonadjacent ones. Therefore, M-I divergence can play a vital role in distinguishing distributions more clearly. In this paper, we first define a parametric M-I divergence in the view of information theory and then present its major properties. In addition, we design a M-I divergence estimation algorithm by means of the ensemble estimator of the proposed weight kernel estimators, which can improve the convergence of mean squared error from O(Γ-j/d) to O(Γ-1)(j∈(0,d). We also discuss the decision with M-I divergence for clustering or classification, and investigate its performance in a statistical sequence model of big data for the outlier detection problem.

Original languageEnglish
Article number8090523
Pages (from-to)24105-24119
Number of pages15
JournalIEEE Access
Volume5
DOIs
StatePublished - 30 Oct 2017
Externally publishedYes

Keywords

  • Message identification (M-I) divergence
  • big data analysis
  • discrete distribution estimation
  • divergence estimation
  • outlier detection

Fingerprint

Dive into the research topics of 'Amplifying Inter-Message Distance: On Information Divergence Measures in Big Data'. Together they form a unique fingerprint.

Cite this