Dynamic Datamining on the Web
Dynamic Data Mining on the Web
Student Contacts: Carlos Chen, Gord Cepuran, Helena Zheng
Supervisor Contacts: Medhat
Moussa, Systems Design Engineering
Workshop Co-ordinator: David A. Swan,
Systems Design Engineering
Information exists in abundance on the World Wide Web. There are millions
of web sites containing information on any subject imaginable. Although
there is an enormous amount of knowledge stored on the Web, finding the
specific information that a user requires can be time consuming and difficult.
Is Dynamic Data Mining of the Web a feasible solution? This question, and
many more, will be answered by this Systems Design 4th year workshop project.
The 'intelligent' search engine that will be prototyped and tested in
this project will hopefully provide a less time-consuming method of searching
the Web. In concept it is dramatically different from the commercial search
engines such as Alta Vista, Yahoo, etc. It will be targeted towards a very
small range of subjects and users. In this case they will be limited to
about ten subjects of interest to the PAMI
Laboratory at the University of Waterloo.
This search engine will consist of two main components. The first is
the dynamic data miner which will operate in the background building a
database. The dynamic data mining component will take the ten subject keywords
and run searches on the web using existing search engines such as Lycos
and AltaVista. The URLs found will then be added to a database after each
document has been parsed. A number of keywords which occur frequently in
the documents will be also found. This data miner is supposed to run every
day during periods of low CPU usage on its host machine.
The second major component of this search engine is a neural network
which will filter the list of URL's returned to a user who executes a search.
This neural network will be trained by the users who access the search
engine. It will attempt to build relationships between subjects and key
words in order to make conceptual searches more accurate. The main advantage
this system has over other search engines is that the list of hits returned
to the user will be shorter and more selective (after a successful training
of the neural network is accomplished) and new material will be dynamically
added by the data mining component.
The objective of this work shop is to develop a prototype search engine
and perform a thorough test to determine its feasibility in terms of resources
consumed and accuracy of results. Also, this project will be used to investigate
the difficulties involved in training a neural network for conceptual searches
and the level of accuracy that can be achieved.
Back to Systems Design Workshop SyDe
461/462 Overview