PhpDig is a search engine written in PHP that uses a MySQL database backend. It features indexing of both static and dynamic pages, spidering of almost all links in HTML content, in hrefs, areamaps, and frames, and full text indexing. The search results appearence is skin-able, using a very simple t ...
ICA is used to classify text in extension to the latent semantic indexing framework. ICA show to align the context grouping structure well in a human sense [1], thus can be used for unsupervised classification. The demonstration shows this on medical abstracts (MED dataset), that uses BIC to estimat ...
Rainbow is a C program that performs document classification usingone of several different methods, including naive Bayes, TFIDF/Rocchio,K-nearest neighbor, Maximum Entropy, Support Vector Machines, Fuhr sProbabilitistic Indexing, and a simple-minded form a shrinkage withnaive Bayes.
The goal of this library is to make ODBC recordsets look just like an STL container. As a user, you can move through our containers using standard STL iterators and if you insert(), erase() or replace() records in our containers changes can be automatically committed to the database for you. The lib ...
Abstract
The Lucene Server project is an attempt to extend the Jakarta Lucene tool with server capabilities.
Lucene is a robust Java API that enables you creating indexes from text sources and perform powerful searches on these indexes. With Lucene, creating an index must be done programmatically ...