WEB日志挖掘!

源代码在线查看: dynamic datamining on the web.htm

软件大小: 361 K
上传用户: bi_yangfeng
关键词: WEB 日志
下载地址: 免注册下载 普通下载 VIP

相关代码

				
				
				
				   Dynamic Datamining on the Web
				   
				
				
				
				Dynamic Data Mining on the Web
				
				
				
				
				
				
				
				
				
				Student Contacts: Carlos Chen, Gord Cepuran, Helena Zheng
				
				Supervisor Contacts: Medhat
				Moussa, Systems Design Engineering
				
				Workshop Co-ordinator: David A. Swan,
				Systems Design Engineering
				
				
				
				
				
				
				
				
				
				
				
				Information exists in abundance on the World Wide Web. There are millions
				of web sites containing information on any subject imaginable. Although
				there is an enormous amount of knowledge stored on the Web, finding the
				specific information that a user requires can be time consuming and difficult.
				Is Dynamic Data Mining of the Web a feasible solution? This question, and
				many more, will be answered by this Systems Design 4th year workshop project.
				
				
				The 'intelligent' search engine that will be prototyped and tested in
				this project will hopefully provide a less time-consuming method of searching
				the Web. In concept it is dramatically different from the commercial search
				engines such as Alta Vista, Yahoo, etc. It will be targeted towards a very
				small range of subjects and users. In this case they will be limited to
				about ten subjects of interest to the PAMI
				Laboratory at the University of Waterloo. 
				
				This search engine will consist of two main components. The first is
				the dynamic data miner which will operate in the background building a
				database. The dynamic data mining component will take the ten subject keywords
				and run searches on the web using existing search engines such as Lycos
				and AltaVista. The URLs found will then be added to a database after each
				document has been parsed. A number of keywords which occur frequently in
				the documents will be also found. This data miner is supposed to run every
				day during periods of low CPU usage on its host machine.
				
				The second major component of this search engine is a neural network
				which will filter the list of URL's returned to a user who executes a search.
				This neural network will be trained by the users who access the search
				engine. It will attempt to build relationships between subjects and key
				words in order to make conceptual searches more accurate. The main advantage
				this system has over other search engines is that the list of hits returned
				to the user will be shorter and more selective (after a successful training
				of the neural network is accomplished) and new material will be dynamically
				added by the data mining component. 
				
				The objective of this work shop is to develop a prototype search engine
				and perform a thorough test to determine its feasibility in terms of resources
				consumed and accuracy of results. Also, this project will be used to investigate
				the difficulties involved in training a neural network for conceptual searches
				and the level of accuracy that can be achieved.
				
				
				
				
				
				
				Back to Systems Design Workshop SyDe
				461/462 Overview
				
				
				
							

相关资源