TY - GEN
T1 - Large Deviations for Statistical Sequence Matching
AU - Zhou, Lin
AU - Wang, Qianyun
AU - Wang, Jingjing
AU - Bai, Lin
AU - Hero, Alfred
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - We revisit the problem of statistical sequence matching between two databases of sequences initiated by Unnikrishnan (TIT 2015) and derive achievable theoretical performance guar-antees for a generalized likelihood ratio test (G LRT) in the large deviations regime, when the number of matched pairs of sequences between two databases is unknown. In this case, the task is to accurately estimate the number of matched pairs and identify the matched pairs of sequences among all possible matches between the sequences in the two databases. We generalize the GLRT by Unnikrishnan and explicitly characterize the tradeoff among the exponential decay rates for probabilities of mismatch, false reject and false alarm. When one of the two databases contains a single sequence, the problem of statistical sequence matching specializes to the problem of multiple classification introduced by Gutman (TIT 1989). For this special case, our result strengthens previous result of Gutman (TIT 1989) and Zhou, Tan and Motani (Information and Inference 2020) by allowing the testing sequence to be generated from a distribution that is different from generating distributions of all training sequences.
AB - We revisit the problem of statistical sequence matching between two databases of sequences initiated by Unnikrishnan (TIT 2015) and derive achievable theoretical performance guar-antees for a generalized likelihood ratio test (G LRT) in the large deviations regime, when the number of matched pairs of sequences between two databases is unknown. In this case, the task is to accurately estimate the number of matched pairs and identify the matched pairs of sequences among all possible matches between the sequences in the two databases. We generalize the GLRT by Unnikrishnan and explicitly characterize the tradeoff among the exponential decay rates for probabilities of mismatch, false reject and false alarm. When one of the two databases contains a single sequence, the problem of statistical sequence matching specializes to the problem of multiple classification introduced by Gutman (TIT 1989). For this special case, our result strengthens previous result of Gutman (TIT 1989) and Zhou, Tan and Motani (Information and Inference 2020) by allowing the testing sequence to be generated from a distribution that is different from generating distributions of all training sequences.
KW - False alarm
KW - False reject
KW - Finite length analysis
KW - Misclassification
KW - Second-order asymptotics
UR - https://www.scopus.com/pages/publications/85202865490
U2 - 10.1109/ISIT57864.2024.10619312
DO - 10.1109/ISIT57864.2024.10619312
M3 - 会议稿件
AN - SCOPUS:85202865490
T3 - IEEE International Symposium on Information Theory - Proceedings
SP - 1275
EP - 1280
BT - 2024 IEEE International Symposium on Information Theory, ISIT 2024 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 IEEE International Symposium on Information Theory, ISIT 2024
Y2 - 7 July 2024 through 12 July 2024
ER -