这是一款很好用的工具包

源代码在线查看: ppl-scripts.html

软件大小: 3034 K
上传用户: wanghaihah
关键词: 工具包
下载地址: 免注册下载 普通下载 VIP

相关代码

																ppl-scripts								ppl-scripts				 NAME 				ppl-scripts, add-ppls, compare-ppls, compute-best-mix, compute-best-sentence-mix, hits-from-log, ppl-from-log, subtract-ppls - manipulate perplexities				 SYNOPSIS 				 add-ppls 				[ppl-file..]								 subtract-ppls 				 ppl-file1 				[ppl-file2...]								 ppl-from-log 				[ppl-file...]								 hits-from-log 				[ppl-file...]								 compare-ppls 				[mindelta=D]				 ppl-file1 				 ppl-file2 								 compute-best-mix 				[lambda='l1 l2				...'				precision=P]				 ppl-file1 				[ppl-file2...]								 compute-best-sentence-mix 				[lambda='l1 l2				...'				precision=P]				 ppl-file1 				[ppl-file2...]				 DESCRIPTION 				These scripts process the output of the 				ngram(1)				option				 -ppl 				to extract various useful information.				They are particularly convenient in analyzing the performance (perplexity) of 				language models on specific subsets of the test data,				or to compare and combine multiple models.								 add-ppls 				takes several ppl output files and computes an aggregate perplexity and				corpus statistics.				Its output is suitable for subsequent manipulation by				 add-ppls 				or				subtract-ppls.								 subtract-ppls 				similarly computes an aggregate perplexity by removing the				statistics of zero or more				 ppl-file2 				from those in				ppl-file1.				Its output is suitable for subsequent manipulation by				 add-ppls 				or				subtract-ppls.								 ppl-from-log 				recomputes the total perplexities and statistics from individual				lines in				 ngram -debug 2 -ppl 				output.				Combined with some filtering of that output this allows computing 				perplexities on interesting subsets of words.								 hits-from-log 				computes N-gram hit rates from				 ngram -debug 2 -ppl 				output.								 compare-ppls 				tallies the number of words for which two language models produce the same,				higher, or lower probabilities.				The input files should be 				 ngram -debug 2 -ppl 				output for the two models on the same test set.				The parameter				 D 				is the minimum absolute difference for two log probabilities to be 				considered different (the default is 0).								 compute-best-mix 				takes the output of several				 ngram -debug 2 -ppl 				runs on the same test set and computes the optimal interpolation 				weights for the corresponding models,				i.e., the weights that minimize the perplexity of an interpolated model.				Initial weights may be specified as				l1 l2 ....				The computation is iterative and stops when the interpolation weights				change by less than				 P 				(default 0.001).								 compute-best-sentence-mix 				similarly optimizes the weights for sentence-level interpolation of LMs.				It requires input files generated by				ngram -debug 1 -ppl.				(Sentence-level mixtures can be implemented using the 				 ngram -hmm 				option, by constructing a suitable HMM structure.)				 SEE ALSO 				ngram(1).				 BUGS 				All scripts depend on the idiosyncrasies of				 ngram -ppl 				output.				 AUTHOR 				Andreas Stolcke <stolcke@speech.sri.com>.								Copyright 1995-2002 SRI International															

相关资源