Web pages partitioned into classes, with hyperlink data. The dataset has been used for text categorization and learning to extract symbolic knowledge from the World Wide Web.
HS3D (Homo Sapiens Splice Sites Dataset) is a database of Homo Sapiens Exon, Intron and Splice regions extracted from GenBank primate sequences Rel.123. The aim of this data set is to give standardized material to train and to assess the prediction ac...
Datgen, formerly SCDS, is a computer program that generates data to systematically test programs that consume data. These synthetic datasets can be used to validate learning algorithms.
Repository of online information sources: test domains for information extraction and wrapper generation tools that learn extraction rules (extraction patterns).
Datasets used for the experimental analysis of function approximation techniques and for training and demonstration by machine learning and statistics community.