Stanford InfoLab Publication Server

Dynamic Data Mining: Exploring Large Rule Spaces by Sampling.

Brin, Sergey and Page, Lawrence (1999) Dynamic Data Mining: Exploring Large Rule Spaces by Sampling. Technical Report. Stanford InfoLab.

BibTeXDublinCoreEndNoteHTML

[img]
Preview
PDF
1539Kb

Abstract

A great challenge for data mining techniques is the huge space of potential rules which can be generated. If there are tens of thousands of items, then potential rules involving three items number in the trillions. Traditional data mining techniques rely on downward-closed measures such as support to prune the space of rules. However, in many applications, such pruning techniques either do not sufficiently reduce the space of rules, or they are overly restrictive. We propose a new solution to this problem, called Dynamic Data Mining (DDM). DDM foregoes the completeness offered by traditional techniques based on downward-closed measures in favor of the ability to drill deep into the space of rules and provide the user with a better view of the structure present in a data set. Instead of a single deterministic run, DDM runs continuously, exploring more and more of the rule space. Instead of using a downward-closed measure such as support to guide its exploration, DDM uses a user-defined measure called weight, which is not restricted to be downward closed. The exploration is guided by a heuristic called the Heavy Edge Property. The system incorporates user feedback by allowing weight to be redefined dynamically. We test the system on a particularlly difficult data set - the word usage in a large subset of the World Wide Web. We find that Dynamic Data Mining is an effective tool for mining such difficult data sets.

Item Type:Techreport (Technical Report)
Additional Information:Previous number = SIDL-WP-1999-0122
Subjects:Computer Science > Digital Libraries
Projects:Digital Libraries
Related URLs:Project Homepagehttp://www-diglib.stanford.edu/diglib/pub/
ID Code:424
Deposited By:Import Account
Deposited On:30 Oct 2001 16:00
Last Modified:27 Dec 2008 16:16

Download statistics

Repository Staff Only: item control page