Mining world knowledge for analysis of search engine content

King, John D. and Li, Yuefeng and Tao, Xiaohui ORCID: https://orcid.org/0000-0002-0020-077X and Nayak, Richi (2007) Mining world knowledge for analysis of search engine content. Web Intelligence and Agent Systems: an International Journal, 5 (3). pp. 233-253. ISSN 1570-1263


Abstract

Little is known about the content of the major search engines. We present an automatic learning method which trains an ontology with world knowledge of hundreds of different subjects in a three-level taxonomy covering all the documents offered in our university library. We then mine this ontology to find important classification rules, and then use these rules to perform an extensive analysis of the content of the largest general purpose internet search engines in use today. Instead of representing documents and collections as a set of terms, we represent them as a set of subjects, which is a highly efficient representation, leading to a more robust representation of information and a decrease of synonymy.


Statistics for USQ ePrint 20109
Statistics for this ePrint Item
Item Type: Article (Commonwealth Reporting Category C)
Refereed: Yes
Item Status: Live Archive
Additional Information: Permanent restricted access to paper due to publisher copyright policy.
Faculty/School / Institute/Centre: Historic - Faculty of Sciences - Department of Maths and Computing (Up to 30 Jun 2013)
Faculty/School / Institute/Centre: Historic - Faculty of Sciences - Department of Maths and Computing (Up to 30 Jun 2013)
Date Deposited: 02 Jan 2012 04:48
Last Modified: 01 Jun 2017 02:34
Uncontrolled Keywords: ontology; hierarchal classification; taxonomy; collection selection; search engines; data mining
Fields of Research (2008): 08 Information and Computing Sciences > 0801 Artificial Intelligence and Image Processing > 080109 Pattern Recognition and Data Mining
08 Information and Computing Sciences > 0807 Library and Information Studies > 080704 Information Retrieval and Web Search
08 Information and Computing Sciences > 0805 Distributed Computing > 080501 Distributed and Grid Systems
Fields of Research (2020): 46 INFORMATION AND COMPUTING SCIENCES > 4699 Other information and computing sciences > 469999 Other information and computing sciences not elsewhere classified
46 INFORMATION AND COMPUTING SCIENCES > 4605 Data management and data science > 460508 Information retrieval and web search
46 INFORMATION AND COMPUTING SCIENCES > 4606 Distributed computing and systems software > 460605 Distributed systems and algorithms
Socio-Economic Objectives (2008): E Expanding Knowledge > 97 Expanding Knowledge > 970108 Expanding Knowledge in the Information and Computing Sciences
URI: http://eprints.usq.edu.au/id/eprint/20109

Actions (login required)

View Item Archive Repository Staff Only