Crawling the client-side hidden web

Álvarez, Manuel; Pan, Alberto; Raposo, Juan; Viña, Ángel

Crawling the client-side hidden web

Álvarez, Manuel ¹
Pan, Alberto ¹
Raposo, Juan ¹
Viña, Ángel ¹

1 Universidade da Coruña

Universidade da Coruña

La Coruña, España

ROR https://ror.org/01qckj285

Libro:

Proceedings of the IADIS International Conference WWW/INTERNET 2004: Madrid, Spain, October 6-9, 2004

Isaías, Pedro (coord.)
Karmakar, Nitya (coord.)

Editorial: IADIS (International Association for Development of the Information Society)

ISBN: 972-99353-0-0

Ano de publicación: 2004

Título do volume: Short Papers-Posters

Volume: 2

Páxinas: 1179-1182

Congreso: International Conference on WWW/Internet (3. 2004. Madrid)

Tipo: Achega congreso

DIALNET GOOGLE SCHOLAR

Resumo

There is a great amount of information on the web that can not be accessed by conventional crawler engines. This portion of the web is usually called hidden web data. To be able to deal with this problem, it is necessary to solve two tasks: crawling the client-side and crawling the server-side hidden web. In this paper we present an architecture and a set of related techniques for assessing the information placed in the client-side hidden web, dealing with aspects such as JavaScript technology, non-standard session maintenance mechanisms, client redirections, pop-up menus, etc. Our approach leverages current browser APIs and implements novel crawling models and algorithms.

Fonte de datos: Dialnet

Crawling the client-side hidden web

Universidade da Coruña

Resumo