源码地带 > 电路图 > 电子资料下载 > 其他 >这是一款很好用的工具包 > 查看压缩包源码

这是一款很好用的工具包

源代码在线查看： ngram-class.html

软件大小：	3034 K
上传用户：	wanghaihah
关键词：	工具包
下载地址：	免注册下载普通下载


相关代码
ngram-class.html ngram-class.vcproj.vspscc ngram-class.1 class.html class.html html.java class.html ngram-class.1

																ngram-class								ngram-class				 NAME 				ngram-class - induce word classes from N-gram statistics				 SYNOPSIS 				 ngram-class 				[-help]				 option 				...				 DESCRIPTION 				 ngram-class 				induces word classes from distributional statistics,				so as to minimize perplexity of a class-based N-gram model				given the provided word N-gram counts.				Presently, only bigram statistics are used, i.e., the induced classes				are best suited for a class-bigram language model.								The program generates the class N-gram counts and class expansions				needed by				ngram-count(1)				and				ngram(1),				respectively to train and to apply the class N-gram model.				 OPTIONS 								Each filename argument can be an ASCII file, or a 				compressed file (name ending in .Z or .gz), or ``-'' to indicate				stdin/stdout.								 -help 								Print option summary.				 -version 								Print version information.				-debug level								Set debugging output at				level.				Level 0 means no debugging.				Debugging messages are written to stderr.				A useful level to trace the formation of classes is 2.												 Input Options 								-vocab file								Read a vocabulary from file.				Subsequently, out-of-vocabulary words in both counts or text are				replaced with the unknown-word token.				If this option is not specified all words found are implicitly added				to the vocabulary.				 -tolower 								Map the vocabulary to lowercase.				-counts file								Read N-gram counts from a file.				Each line contains an N-gram of 				words, followed by an integer count, all separated by whitespace.				Repeated counts for the same N-gram are added.				Counts collected by 				 -text 				and 				 -counts 				are additive as well.								Note that the input should contain consistent lower- and higher-order				counts (i.e., unigrams and bigrams), as would be generated by				ngram-count(1).				-text textfile								Generate N-gram counts from text file.				 textfile 				should contain one sentence unit per line.				Begin/end sentence tokens are added if not already present.				Empty lines are ignored.												 Class Merging 								-numclasses C								The target number of classes to induce.				A zero argument suppresses automatic class merging altogether				(e.g., for use with 				 -interact). 				 -full 								Perform full greedy merging over all classes starting with one class per				word.				This is the O(V^3) algorithm described in Brown et al. (1992).				 -incremental 								Perform incremental greedy merging, starting with 				one class each for the 				 C 				most frequent words, and then adding one word at a time.				This is the O(V*C^2) algorithm described in Brown et al. (1992);				it is the default.				 -interact 								Enter a primitive interactive interface when done with automatic class				induction, allowing manual specification of additional merging steps.				-noclass-vocab file								Read a list of vocabulary items from				 file 				that are to be excluded from classes.				These words or tags do no undergo class merging, but their 				N-gram counts still affect the optimization of model perplexity.								The default is to exclude the sentence begin/end tags (<s> and </s>)				from class merging; this can be suppressed by specifying				-noclass-vocab /dev/null.												 Output Options 								-class-counts file								Write class N-gram counts to				 file 				when done.				The format is the same as for word N-gram counts, and can be				read by				ngram-count(1)				to estimate a class-N-gram model.				-classes file								Write class definitions (member words and their probabilities) to				 file 				when done.				The output format is the same as required by the				 -classes 				option of 				ngram(1).				-save S								Save the class counts and/or class definitions every				 S 				iterations during induction.				The filenames are obtained from the				 -class-counts 				and				 -classes 				options, respectively, by appending the iteration number.				This is convenient for producing sets of classes at different granularities				during the same run.				S=0				(the default) suppresses the saving actions.												 SEE ALSO 				ngram-count(1), ngram(1).								P. F. Brown, V. J. Della Pietra, P. V. deSouza, J. C. Lai and R. L. Mercer,				``Class-Based n-gram Models of Natural Language,''				Computational Linguistics 18(4), 467-479, 1992.				 BUGS 				Classes are optimized only for bigram models at present.				 AUTHOR 				Andreas Stolcke <stolcke@speech.sri.com>.								Copyright 1999-2004 SRI International


相关资源
这是一款很好用的工具包这是一款很好用的B/S结构的酒店管理系统简单这是一本很好用的VHDL编程书这是一本很好用的VHDL编程书 UltraEdit是一款很好用的编辑软件这是一款很好的SQL多用户版程序这是一款简单易用的自动升级及更新软件这是一款很好的登陆软件

这是一款很好用的工具包

源代码在线查看： ngram-class.html

相关代码

相关资源

友情链接