Improving pseudo-relevance feedback in web information retrieval using web page segmentation

S Yu, D Cai, JR Wen, WY Ma - … of the 12th international conference on …, 2003 - dl.acm.org
Proceedings of the 12th international conference on World Wide Web, 2003dl.acm.org
In contrast to traditional document retrieval, a web page as a whole is not a good information
unit to search because it often contains multiple topics and a lot of irrelevant information from
navigation, decoration, and interaction part of the page. In this paper, we propose a VIsion-
based Page Segmentation (VIPS) algorithm to detect the semantic content structure in a web
page. Compared with simple DOM based segmentation method, our page segmentation
scheme utilizes useful visual cues to obtain a better partition of a page at the semantic level …
In contrast to traditional document retrieval, a web page as a whole is not a good information unit to search because it often contains multiple topics and a lot of irrelevant information from navigation, decoration, and interaction part of the page. In this paper, we propose a VIsion-based Page Segmentation (VIPS) algorithm to detect the semantic content structure in a web page. Compared with simple DOM based segmentation method, our page segmentation scheme utilizes useful visual cues to obtain a better partition of a page at the semantic level. By using our VIPS algorithm to assist the selection of query expansion terms in pseudo-relevance feedback in web information retrieval, we achieve 27% performance improvement on Web Track dataset.
ACM Digital Library