Abstract
Detecting conspicuous image content is a challenging task in the field of computer vision. In existing studies, most approaches focus on estimating saliency only with the cues from the input image. However, such “intrinsic” cues are often insufficient to distinguish targets and distractors that may share some common visual attributes. To address this problem, we present an approach to estimate image saliency by measuring the joint visual surprise from intrinsic and extrinsic contexts. In this approach, a hierarchical context model is first built on a database of 31.2 million images, where a Gaussian mixture model (GMM) is trained for each leaf node to encode the prior knowledge on “what is where” in a specific scene. For a testing image that shares similar spatial layout within a scene, the pre-trained GMM can serve as an extrinsic context model to measure the “surprise” of an image patch. Since human attention may quickly shift between different surprising locations, we adopt a Markov chain to model a surprise-driven attention-shifting process so as to infer the salient patches that can best capture human attention. Experiments show that our approach outperforms 19 state-of-the-art methods in fixation prediction.
| Original language | English |
|---|---|
| Pages (from-to) | 44-60 |
| Number of pages | 17 |
| Journal | International Journal of Computer Vision |
| Volume | 120 |
| Issue number | 1 |
| DOIs | |
| State | Published - 1 Oct 2016 |
Keywords
- Extrinsic context
- Gaussian mixture model
- Image saliency
- Intrinsic context
- Markov chain
- Visual surprise
Fingerprint
Dive into the research topics of 'Measuring Visual Surprise Jointly from Intrinsic and Extrinsic Contexts for Image Saliency Estimation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver