Diffbot,
2011.
WWW Version.
Vendor, Document Analysis. Structural analysis of web pages, observe when changes occur.
Ashraf, F.,
Ozyer, T.,
Alhajj, R.,
Employing Clustering Techniques for Automatic Information Extraction
From HTML Documents,
SMC-C(38), No. 5, September 2008, pp. 660-673.
IEEE DOI Link
0810
BibRef
Carullo, M.[Moreno],
Binaghi, E.[Elisabetta],
Gallo, I.[Ignazio],
An online document clustering technique for short web contents,
PRL(30), No. 10, 15 July 2009, pp. 870-876.
Elsevier DOI Link
WWW Version.
0906
Online clustering; Short documents analysis; Similarity measures
BibRef
Carullo, M.[Moreno],
Binaghi, E.[Elisabetta],
Gallo, I.[Ignazio],
Lamberti, N.[Nicola],
Clustering of short commercial documents for the web,
ICPR08(1-4).
IEEE DOI Link
0812
BibRef
Borges, K.A.V.[Karla A.V.],
Davis, C.A.[Clodoveu A.],
Laender, A.H.F.[Alberto H.F.],
Medeiros, C.B.[Claudia Bauzer],
Ontology-driven discovery of geospatial evidence in web pages,
GeoInfo(15), No. 4, October 2011, pp. 609-631.
WWW Version.
1110
BibRef
Lu, W.[Wenting],
Li, L.[Lei],
Li, T.[Tao],
Zhang, H.G.[Hong-Gang],
Guo, J.[Jun],
Web Multimedia Object Clustering via Information Fusion,
ICDAR11(319-323).
IEEE DOI Link
1111
BibRef
Karatzas, D.,
Mestre, S.R.[S. Robles],
Mas, J.,
Nourbakhsh, F.,
Roy, P.P.[P. Pratim],
ICDAR 2011 Robust Reading Competition - Challenge 1: Reading Text in
Born-Digital Images (Web and Email),
ICDAR11(1485-1490).
IEEE DOI Link
1111
BibRef
Liu, G.[Gang],
Qiu, B.[Bite],
Wenyin, L.[Liu],
Automatic Detection of Phishing Target from Phishing Webpage,
ICPR10(4153-4156).
IEEE DOI Link
1008
BibRef
Hassan, T.[Tamir],
User-Guided Wrapping of PDF Documents Using Graph Matching Techniques,
ICDAR09(631-635).
IEEE DOI Link
0907
PDF does not have the structure give by html.
BibRef
Ghosh, S.[Saptarshi],
Mitra, P.[Pabitra],
Combining content and structure similarity for XML document
classification using composite SVM kernels,
ICPR08(1-4).
IEEE DOI Link
0812
BibRef
Hirano, T.[Takashi],
Okano, Y.[Yuichi],
Okada, Y.[Yasuhiro],
Yoda, F.[Fumio],
Text and Layout Information Extraction from Document Files of Various
Formats Based on the Analysis of Page Description Language,
ICDAR07(262-266).
IEEE DOI Link
0709
BibRef
Burget, R.,
Layout Based Information Extraction from HTML Documents,
ICDAR07(624-628).
IEEE DOI Link
0709
BibRef
Guo, H.,
Mahmud, J.,
Borodin, Y.,
Stent, A.,
Ramakrishnan, I.,
A General Approach for Partitioning Web Page Content Based on Geometric
and Style Information,
ICDAR07(929-933).
IEEE DOI Link
0709
BibRef
Yoshida, M.,
Nakagawa, H.,
Web Document Parsing:
A New Approach to Modeling Layout-Language Relations,
ICDAR07(203-207).
IEEE DOI Link
0709
BibRef
Ferilli, S.[Stefano],
Biba, M.[Marenglen],
Basile, T.M.A.[Teresa M.A.],
Esposito, F.[Floriana],
Incremental machine learning techniques for document layout
understanding,
ICPR08(1-4).
IEEE DOI Link
0812
BibRef
Esposito, F.,
Ferilli, S.,
di Mauro, N.,
Basile, T.M.A.,
Incremental Learning of First Order Logic Theories for the Automatic
Annotations of Web Documents,
ICDAR07(1093-1097).
IEEE DOI Link
0709
BibRef
Earlier: A1, A2, A4, A3:
Automatic Content-based Indexing of Digital Documents through
Intelligent Processing Techniques,
DIAL06(204-219).
IEEE DOI Link
0604
BibRef
Earlier: A1, A2, A4, A3:
Intelligent document processing,
ICDAR05(II: 1100-1104).
IEEE DOI Link
0508
See also Automatic Digital Document Processing and Management: Problems, Algorithms and Techniques.
BibRef
Watai, Y.[Yasuyuki],
Yamasaki, T.[Toshihiko],
Aizawa, K.[Kiyoharu],
View-Based Web Page Retrieval using Interactive Sketch Query,
ICIP07(VI: 357-360).
IEEE DOI Link
0709
BibRef
Ma, J.C.[Jun-Chang],
Gu, Z.M.[Zhi-Min],
A Shared Fragments Analysis System for Large Collections of Web Pages,
DAS06(390-401).
Springer DOI Link
0602
BibRef
Liu, W.Y.[Wen-Yin],
Huang, G.[Guanglin],
Liu, X.Y.[Xiao-Yue],
Deng, X.[Xiaotie],
Min, Z.[Zhang],
Phishing Web page detection,
ICDAR05(II: 560-564).
IEEE DOI Link
0508
BibRef
Feng, J.,
Haffner, P.,
Gilbert, M.,
A learning approach to discovering Web page semantic structures,
ICDAR05(II: 1055-1059).
IEEE DOI Link
0508
BibRef
Chao, H.[Hui],
Lin, X.F.[Xiao Fan],
Capturing the layout of electronic documents for reuse in variable data
printing,
ICDAR05(II: 940-944).
IEEE DOI Link
0508
BibRef
Chao, H.[Hui],
Fan, J.[Jian],
Layout and Content Extraction for PDF Documents,
DAS04(213-224).
WWW Version.
0505
BibRef
Behera, A.,
Lalanne, D.,
Ingold, R.,
Enhancement of layout-based identification of low-resolution documents
using geometrical color distribution,
ICDAR05(I: 468-472).
IEEE DOI Link
0508
BibRef
Mekhaldi, D.[Dalila],
Lalanne, D.[Denis],
Ingold, R.[Rolf],
From searching to browsing through multimodal documents linking,
ICDAR05(II: 924-928).
IEEE DOI Link
0508
BibRef
Earlier:
Unity Is Strength: Coupling Media for Thematic Segmentation,
DAS04(559-562).
WWW Version.
0505
BibRef
Rigamonti, M.,
Bloechle, J.L.,
Hadjar, K.,
Lalanne, D.,
Ingold, R.,
Towards a canonical and structured representation of PDF documents
through reverse engineering,
ICDAR05(II: 1050-1054).
IEEE DOI Link
0508
BibRef
Hadjar, K.,
Rigamonti, M.,
Lalanne, D.,
Ingold, R.,
Xed: a new tool for extracting hidden structures from electronic
documents,
DIAL04(212-224).
IEEE DOI Link
0404
BibRef
Hadjar, K.,
Ingold, R.,
Logical labeling of Arabic newspapers using artificial neural nets,
ICDAR05(I: 426-430).
IEEE DOI Link
0508
BibRef
Schenker, A.[Adam],
Bunke, H.[Horst],
Last, M.[Mark],
Kandel, A.[Abraham],
A Graph-Based Framework for Web Document Mining,
DAS04(401-412).
WWW Version.
0505
BibRef
Schenker, A.[Adam],
Last, M.[Mark],
Bunke, H.[Horst],
Kandel, A.[Abraham],
Classification of web documents using a graph model,
ICDAR03(240-244).
IEEE Abstract.
0311
BibRef
Vitali, F.[Fabio],
di Iorio, A.[Angelo],
Campori, E.V.[Elisa Ventura],
Rule-Based Structural Analysis of Web Pages,
DAS04(425-437).
WWW Version.
0505
BibRef
Hu, J.Y.[Jian-Ying],
Bagga, A.,
Identifying story and preview images in news web pages,
ICDAR03(640-644).
IEEE Abstract.
0311
BibRef
Ramachandran, S.,
Kashi, R.,
An architecture for ink annotations on web documents,
ICDAR03(256-260).
IEEE Abstract.
0311
BibRef
Gagneux, A.,
Emptoz, H.,
Web site: a structured document,
ICDAR03(1158-1162).
IEEE Abstract.
0311
BibRef
Mukherjee, S.,
Yang, G.[Guizhen],
Tan, W.F.[Wen-Fang],
Ramakrishnan, I.V.,
Automatic discovery of semantic structures in HTML documents,
ICDAR03(245-249).
IEEE Abstract.
0311
BibRef
Alam, H.,
Kumar, A.,
Nakamura, M.,
Rahman, F.,
Tarnikova, Y.,
Wilcox, C.[Che],
Structured and unstructured document summarization: Design of a
commercial summarizer using Lexical chains,
ICDAR03(1147-1152).
IEEE Abstract.
0311
BibRef
Rahman, F.,
Alam, H.,
A commercial Web based digital library for sharing and distributing
documents,
DIAL04(93-103).
IEEE DOI Link
0404
BibRef
Alam, H.,
Hartono, R.,
Kumar, A.,
Rahman, F.,
Tarnikova, Y.,
Wilcox, C.[Che],
Web page summarization for handheld devices: a natural language
approach,
ICDAR03(1153-1158).
IEEE Abstract.
0311
BibRef
Rahman, A.F.R.,
Alam, H.,
Hartono, R.,
Ariyoshi, K.,
Automatic summarization of Web content to smaller display devices,
ICDAR01(1064-1068).
IEEE DOI Link
0109
BibRef
Serradura, L.,
Slimane, M.,
Vincent, N.,
Web sites thematic classification using hidden Markov models,
ICDAR01(1094-1098).
IEEE DOI Link
0109
BibRef
Penn, G.,
Hu, J.Y.[Jian-Ying],
Luo, H.B.[Heng-Bin],
McDonald, R.,
Flexible Web document analysis for delivery to narrow-bandwidth devices,
ICDAR01(1074-1078).
IEEE DOI Link
0109
BibRef
Anjewierden, A.,
AIDAS: incremental logical structure discovery in PDF documents,
ICDAR01(374-378).
IEEE DOI Link
0109
BibRef
Athitsos, V.,
Swain, M.J.,
Frankel, C.,
Distinguishing photographs and graphics on the World Wide Web,
CBAIVL97(10).
IEEE DOI Link
9706
BibRef
Chapter on OCR, Document Analysis and Character Recognition Systems continues in
Document Retrieval Systems, Databases and Issues, Libraries .