23.2.2.2.10 Document Layout, Structure Analysis, Web Documents, Online Documents

Chapter Contents (Back)
Document Analysis. Application, Document Layout.

Diffbot,
2011.
WWW Version. Vendor, Document Analysis. Structural analysis of web pages, observe when changes occur.

Ashraf, F., Ozyer, T., Alhajj, R.,
Employing Clustering Techniques for Automatic Information Extraction From HTML Documents,
SMC-C(38), No. 5, September 2008, pp. 660-673.
IEEE DOI Link 0810
BibRef

Carullo, M.[Moreno], Binaghi, E.[Elisabetta], Gallo, I.[Ignazio],
An online document clustering technique for short web contents,
PRL(30), No. 10, 15 July 2009, pp. 870-876.
Elsevier DOI Link 0906
Online clustering; Short documents analysis; Similarity measures BibRef

Carullo, M.[Moreno], Binaghi, E.[Elisabetta], Gallo, I.[Ignazio], Lamberti, N.[Nicola],
Clustering of short commercial documents for the web,
ICPR08(1-4).
IEEE DOI Link 0812
BibRef

Borges, K.A.V.[Karla A.V.], Davis, C.A.[Clodoveu A.], Laender, A.H.F.[Alberto H.F.], Medeiros, C.B.[Claudia Bauzer],
Ontology-driven discovery of geospatial evidence in web pages,
GeoInfo(15), No. 4, October 2011, pp. 609-631.
WWW Version. 1110
BibRef

Lu, W.T.[Wen-Ting], Li, J.X.[Jing-Xuan], Li, T.[Tao], Guo, W.D.[Wei-Dong], Zhang, H.G.[Hong-Gang], Guo, J.[Jun],
Web Multimedia Object Classification Using Cross-Domain Correlation Knowledge,
MultMed(15), No. 8, December 2013, pp. 1920-1929.
IEEE DOI Link 1402
Internet BibRef

Lu, W.T.[Wen-Ting], Li, L.[Lei], Li, T.[Tao], Zhang, H.G.[Hong-Gang], Guo, J.[Jun],
Web Multimedia Object Clustering via Information Fusion,
ICDAR11(319-323).
IEEE DOI Link 1111
BibRef


Goyal, A., Jadon, M.K., Pujari, A.K.,
Spectral approach to find number of clusters of short-text documents,
NCVPRIPG13(1-4)
IEEE DOI Link 1408
Markov processes BibRef

Marinai, S.[Simone], Marino, E.[Emanuele], Soda, G.[Giovanni],
Conversion of PDF Books in ePub Format,
ICDAR11(478-482).
IEEE DOI Link 1111
BibRef

Karatzas, D., Mestre, S.R.[S. Robles], Mas, J., Nourbakhsh, F., Roy, P.P.[P. Pratim],
ICDAR 2011 Robust Reading Competition - Challenge 1: Reading Text in Born-Digital Images (Web and Email),
ICDAR11(1485-1490).
IEEE DOI Link 1111
BibRef

Liu, G.[Gang], Qiu, B.[Bite], Wenyin, L.[Liu],
Automatic Detection of Phishing Target from Phishing Webpage,
ICPR10(4153-4156).
IEEE DOI Link 1008
BibRef

Hassan, T.[Tamir],
User-Guided Wrapping of PDF Documents Using Graph Matching Techniques,
ICDAR09(631-635).
IEEE DOI Link 0907
PDF does not have the structure give by html. BibRef

Ghosh, S.[Saptarshi], Mitra, P.[Pabitra],
Combining content and structure similarity for XML document classification using composite SVM kernels,
ICPR08(1-4).
IEEE DOI Link 0812
BibRef

Hirano, T.[Takashi], Okano, Y.[Yuichi], Okada, Y.[Yasuhiro], Yoda, F.[Fumio],
Text and Layout Information Extraction from Document Files of Various Formats Based on the Analysis of Page Description Language,
ICDAR07(262-266).
IEEE DOI Link 0709
BibRef

Burget, R.,
Layout Based Information Extraction from HTML Documents,
ICDAR07(624-628).
IEEE DOI Link 0709
BibRef

Guo, H., Mahmud, J., Borodin, Y., Stent, A., Ramakrishnan, I.,
A General Approach for Partitioning Web Page Content Based on Geometric and Style Information,
ICDAR07(929-933).
IEEE DOI Link 0709
BibRef

Yoshida, M., Nakagawa, H.,
Web Document Parsing: A New Approach to Modeling Layout-Language Relations,
ICDAR07(203-207).
IEEE DOI Link 0709
BibRef

Ferilli, S.[Stefano], Biba, M.[Marenglen], Basile, T.M.A.[Teresa M.A.], Esposito, F.[Floriana],
Incremental machine learning techniques for document layout understanding,
ICPR08(1-4).
IEEE DOI Link 0812
BibRef

Esposito, F., Ferilli, S., di Mauro, N., Basile, T.M.A.,
Incremental Learning of First Order Logic Theories for the Automatic Annotations of Web Documents,
ICDAR07(1093-1097).
IEEE DOI Link 0709
BibRef
Earlier: A1, A2, A4, A3:
Automatic Content-based Indexing of Digital Documents through Intelligent Processing Techniques,
DIAL06(204-219).
IEEE DOI Link 0604
BibRef
Earlier: A1, A2, A4, A3:
Intelligent document processing,
ICDAR05(II: 1100-1104).
IEEE DOI Link 0508
See also Automatic Digital Document Processing and Management: Problems, Algorithms and Techniques. BibRef

Watai, Y.[Yasuyuki], Yamasaki, T.[Toshihiko], Aizawa, K.[Kiyoharu],
View-Based Web Page Retrieval using Interactive Sketch Query,
ICIP07(VI: 357-360).
IEEE DOI Link 0709
BibRef

Ma, J.C.[Jun-Chang], Gu, Z.M.[Zhi-Min],
A Shared Fragments Analysis System for Large Collections of Web Pages,
DAS06(390-401).
Springer DOI Link 0602
BibRef

Liu, W.Y.[Wen-Yin], Huang, G.[Guanglin], Liu, X.Y.[Xiao-Yue], Deng, X.[Xiaotie], Min, Z.[Zhang],
Phishing Web page detection,
ICDAR05(II: 560-564).
IEEE DOI Link 0508
BibRef

Feng, J., Haffner, P., Gilbert, M.,
A learning approach to discovering Web page semantic structures,
ICDAR05(II: 1055-1059).
IEEE DOI Link 0508
BibRef

Chao, H.[Hui], Lin, X.F.[Xiao Fan],
Capturing the layout of electronic documents for reuse in variable data printing,
ICDAR05(II: 940-944).
IEEE DOI Link 0508
BibRef

Chao, H.[Hui], Fan, J.[Jian],
Layout and Content Extraction for PDF Documents,
DAS04(213-224).
WWW Version. 0505
BibRef

Behera, A., Lalanne, D., Ingold, R.,
Enhancement of layout-based identification of low-resolution documents using geometrical color distribution,
ICDAR05(I: 468-472).
IEEE DOI Link 0508
BibRef

Mekhaldi, D.[Dalila], Lalanne, D.[Denis], Ingold, R.[Rolf],
From searching to browsing through multimodal documents linking,
ICDAR05(II: 924-928).
IEEE DOI Link 0508
BibRef
Earlier:
Unity Is Strength: Coupling Media for Thematic Segmentation,
DAS04(559-562).
WWW Version. 0505
BibRef

Rigamonti, M., Bloechle, J.L., Hadjar, K., Lalanne, D., Ingold, R.,
Towards a canonical and structured representation of PDF documents through reverse engineering,
ICDAR05(II: 1050-1054).
IEEE DOI Link 0508
BibRef

Hadjar, K., Rigamonti, M., Lalanne, D., Ingold, R.,
Xed: a new tool for extracting hidden structures from electronic documents,
DIAL04(212-224).
IEEE DOI Link 0404
BibRef

Hadjar, K., Ingold, R.,
Logical labeling of Arabic newspapers using artificial neural nets,
ICDAR05(I: 426-430).
IEEE DOI Link 0508
BibRef

Schenker, A.[Adam], Bunke, H.[Horst], Last, M.[Mark], Kandel, A.[Abraham],
A Graph-Based Framework for Web Document Mining,
DAS04(401-412).
WWW Version. 0505
BibRef

Schenker, A.[Adam], Last, M.[Mark], Bunke, H.[Horst], Kandel, A.[Abraham],
Classification of web documents using a graph model,
ICDAR03(240-244).
IEEE Abstract. 0311
BibRef

Vitali, F.[Fabio], di Iorio, A.[Angelo], Campori, E.V.[Elisa Ventura],
Rule-Based Structural Analysis of Web Pages,
DAS04(425-437).
WWW Version. 0505
BibRef

Hu, J.Y.[Jian-Ying], Bagga, A.,
Identifying story and preview images in news web pages,
ICDAR03(640-644).
IEEE Abstract. 0311
BibRef

Ramachandran, S., Kashi, R.,
An architecture for ink annotations on web documents,
ICDAR03(256-260).
IEEE Abstract. 0311
BibRef

Gagneux, A., Emptoz, H.,
Web site: a structured document,
ICDAR03(1158-1162).
IEEE Abstract. 0311
BibRef

Mukherjee, S., Yang, G.[Guizhen], Tan, W.F.[Wen-Fang], Ramakrishnan, I.V.,
Automatic discovery of semantic structures in HTML documents,
ICDAR03(245-249).
IEEE Abstract. 0311
BibRef

Alam, H., Kumar, A., Nakamura, M., Rahman, F., Tarnikova, Y., Wilcox, C.[Che],
Structured and unstructured document summarization: Design of a commercial summarizer using Lexical chains,
ICDAR03(1147-1152).
IEEE Abstract. 0311
BibRef

Rahman, F., Alam, H.,
A commercial Web based digital library for sharing and distributing documents,
DIAL04(93-103).
IEEE DOI Link 0404
BibRef

Alam, H., Hartono, R., Kumar, A., Rahman, F., Tarnikova, Y., Wilcox, C.[Che],
Web page summarization for handheld devices: a natural language approach,
ICDAR03(1153-1158).
IEEE Abstract. 0311
BibRef

Rahman, A.F.R., Alam, H., Hartono, R., Ariyoshi, K.,
Automatic summarization of Web content to smaller display devices,
ICDAR01(1064-1068).
IEEE DOI Link 0109
BibRef

Serradura, L., Slimane, M., Vincent, N.,
Web sites thematic classification using hidden Markov models,
ICDAR01(1094-1098).
IEEE DOI Link 0109
BibRef

Penn, G., Hu, J.Y.[Jian-Ying], Luo, H.B.[Heng-Bin], McDonald, R.,
Flexible Web document analysis for delivery to narrow-bandwidth devices,
ICDAR01(1074-1078).
IEEE DOI Link 0109
BibRef

Anjewierden, A.,
AIDAS: incremental logical structure discovery in PDF documents,
ICDAR01(374-378).
IEEE DOI Link 0109
BibRef

Athitsos, V., Swain, M.J., Frankel, C.,
Distinguishing photographs and graphics on the World Wide Web,
CBAIVL97(10).
IEEE DOI Link 9706
BibRef

Chapter on OCR, Document Analysis and Character Recognition Systems continues in
Document Retrieval Systems, Databases and Issues, Libraries .


Last update:Oct 15, 2014 at 21:10:33