Zhang, Ji and Ling, Tok Wang and Bruckner, Robert and Tjoa, A. Min (2003) Building XML data warehouse based on frequent patterns in user queries. In: 5th International Conference on Data Warehousing and Knowledge Discovery (DaWaK'03), 3-5 Sept 2003, Prague, Czech Republic.
|HTML Citation||EndNote||Dublin Core||Reference Manager|
Full text available as:
|PDF (Accepted Version) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader|
[Abstract]: With the proliferation of XML-based data sources available across the Internet, it is increasingly important to provide users with a data warehouse of XML data sources to facilitate decision-making processes. Due to the extremely large amount of XML data available on web, unguided warehousing of XML data turns out to be highly costly and usually cannot well accommodate the users’ needs in XML data acquirement. In this paper, we propose an approach to materialize XML data warehouses based on frequent query patterns discovered from historical queries issued by users. The schemas of integrated XML documents in the warehouse are built using these frequent query patterns represented as Frequent Query Pattern Trees (FreqQPTs). Using hierarchical clustering technique, the integration approach in the data warehouse is flexible with respect to obtaining and maintaining XML documents. Experiments show that the overall processing of the same queries issued against the global schema become much efficient by using the XML data warehouse built than by directly searching the multiple data sources.
|Item Type:||Conference or Workshop Item (Commonwealth Reporting Category E) (Paper)|
|Additional Information:||Author's version deposited in accordance with the copyright policy of the publisher. The original publication is available at www.springerlink.com)|
|Uncontrolled Keywords:||XML data warehouses; frequent query patterns|
|Fields of Research (FOR2008):||08 Information and Computing Sciences > 0801 Artificial Intelligence and Image Processing > 080109 Pattern Recognition and Data Mining|
|Subjects:||280000 Information, Computing and Communication Sciences|
|Socio-Economic Objective (SEO2008):||UNSPECIFIED|
|Deposited On:||28 Sep 2009 16:53|
|Last Modified:||15 Apr 2011 14:05|
Archive Staff Only: edit this record