The Indexer: The International Journal of Indexing

Example-based text categorization (EBTC): the key to automatic indexing and classification?

The Indexer: The International Journal of Indexing (2009), 27, (3), 117–123.

Abstract

The goal of text categorization is the automatic classification of documents into predefined categories. In this article Xue and Hou discuss the traditional, probability-theory-based method, using algorithms such as K-nearest neighbor (KNN), naïve Bayes, and support vector machine (SVM) and go on to describe the alternative example-based text categorization (EBTC) method, concluding that, although work to improve both the automatic construction of the example base and the classification algorithm must continue, EBTC has demonstrated its effectiveness for automatic indexing and classification and has decided advantages over other systems.

If you have private access to this content, please log in with your username and password here

Aas, K. and Eikvil, L. (1999) Text categorization: a survey. Available at: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.2236 Google Scholar

Han, J. W. and Kamber, M. (2001) Data mining: concepts and techniques. Morgan Kaufmann. Data mining: concepts and techniques Google Scholar

Hou, H. Q. (1998) Construction of the indexing languages compatibility system on the basis of the Classified Chinese Thesaurus. Journal of the National Library of China, 4, 35–9, 90. Construction of the indexing languages compatibility system on the basis of the Classified Chinese Thesaurus Journal of the National Library of China 4 35 9 Google Scholar

Hou, H. Q. and, Xue, C. X. (2004) Construction of knowledge base for automatic indexing and classification based on the Chinese Library Classification. AOS Proceedings. Construction of knowledge base for automatic indexing and classification based on the Chinese Library Classification AOS Proceedings Google Scholar

Hou, H. Q. and Xue, P. J. (2003) Design and construction of knowledge database for automatic classification in Chinese. Journal of the China Society for Scientific and Technical Information, 22(6), 681–6. Design and construction of knowledge database for automatic classification in Chinese Journal of the China Society for Scientific and Technical Information 22 681 6 Google Scholar

Moens, M. F. and Dumortier, J. (2000). Text categorization: the assignment of subject descriptors to magazine articles. Information Processing and Management, 36, 841–61. Text categorization: the assignment of subject descriptors to magazine articles Information Processing and Management 36 841 61 Google Scholar

Somers, H. (1999) Review article: example-based machine translation. Machine Translation, 14, 113–57. Available at: http://portal.acm.org/citation.cfm?id=593215 Google Scholar

Sun, J. J. et al (2004) Technologies of information retrieval at 166-200. Beijing: Science Press. Technologies of information retrieval at 166-200 Google Scholar

Wang, H. F. (2003) Method and issues of example-based machine translation. Terminology Standardization and Information Technology, 2, 33–6. Method and issues of example-based machine translation Terminology Standardization and Information Technology 2 33 6 Google Scholar

Yang, Y.M. and Chute, C.G. (1994) An example-based mapping method of text categorization and retrieval. ACM Transactions on Information Systems, 12(3), 252–77. An example-based mapping method of text categorization and retrieval ACM Transactions on Information Systems 12 252 77 Google Scholar

Zhang, C. Z. (2002) Web concept mining based on text layer model: automatic indexing and automatic classifying based on concept semantic network. Master dissertation supervised by Hou Hanqing, Nanjing Agricultural University. Web concept mining based on text layer model: automatic indexing and automatic classifying based on concept semantic network Google Scholar

Zelikovitz, S. (2002) Using background knowledge to improve text classification. Available at: http://www.cs.csi.cuny.edu/~zelikovi/thesis.pdf Google Scholar

Aas, K. and Eikvil, L. (1999) Text categorization: a survey. Available at: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.2236 Google Scholar

Han, J. W. and Kamber, M. (2001) Data mining: concepts and techniques. Morgan Kaufmann. Data mining: concepts and techniques Google Scholar

Hou, H. Q. (1998) Construction of the indexing languages compatibility system on the basis of the Classified Chinese Thesaurus. Journal of the National Library of China, 4, 35–9, 90. Construction of the indexing languages compatibility system on the basis of the Classified Chinese Thesaurus Journal of the National Library of China 4 35 9 Google Scholar

Hou, H. Q. and, Xue, C. X. (2004) Construction of knowledge base for automatic indexing and classification based on the Chinese Library Classification. AOS Proceedings. Construction of knowledge base for automatic indexing and classification based on the Chinese Library Classification AOS Proceedings Google Scholar

Hou, H. Q. and Xue, P. J. (2003) Design and construction of knowledge database for automatic classification in Chinese. Journal of the China Society for Scientific and Technical Information, 22(6), 681–6. Design and construction of knowledge database for automatic classification in Chinese Journal of the China Society for Scientific and Technical Information 22 681 6 Google Scholar

Moens, M. F. and Dumortier, J. (2000). Text categorization: the assignment of subject descriptors to magazine articles. Information Processing and Management, 36, 841–61. Text categorization: the assignment of subject descriptors to magazine articles Information Processing and Management 36 841 61 Google Scholar

Somers, H. (1999) Review article: example-based machine translation. Machine Translation, 14, 113–57. Available at: http://portal.acm.org/citation.cfm?id=593215 Google Scholar

Sun, J. J. et al (2004) Technologies of information retrieval at 166-200. Beijing: Science Press. Technologies of information retrieval at 166-200 Google Scholar

Wang, H. F. (2003) Method and issues of example-based machine translation. Terminology Standardization and Information Technology, 2, 33–6. Method and issues of example-based machine translation Terminology Standardization and Information Technology 2 33 6 Google Scholar

Yang, Y.M. and Chute, C.G. (1994) An example-based mapping method of text categorization and retrieval. ACM Transactions on Information Systems, 12(3), 252–77. An example-based mapping method of text categorization and retrieval ACM Transactions on Information Systems 12 252 77 Google Scholar

Zhang, C. Z. (2002) Web concept mining based on text layer model: automatic indexing and automatic classifying based on concept semantic network. Master dissertation supervised by Hou Hanqing, Nanjing Agricultural University. Web concept mining based on text layer model: automatic indexing and automatic classifying based on concept semantic network Google Scholar

Zelikovitz, S. (2002) Using background knowledge to improve text classification. Available at: http://www.cs.csi.cuny.edu/~zelikovi/thesis.pdf Google Scholar

If you have private access to this content, please log in with your username and password here

Details

Author details

Chunxiang, Xue

Hanqing, Hou