这是一款很好用的工具包

源代码在线查看: ppl-scripts.1

软件大小: 3034 K
上传用户: wanghaihah
关键词: 工具包
下载地址: 免注册下载 普通下载 VIP

相关代码

				.\" $Id: ppl-scripts.1,v 1.3 2002/04/19 14:11:30 stolcke Exp $				.TH ppl-scripts 1 "$Date: 2002/04/19 14:11:30 $" "SRILM Tools"				.SH NAME				ppl-scripts, add-ppls, compare-ppls, compute-best-mix, compute-best-sentence-mix, hits-from-log, ppl-from-log, subtract-ppls \- manipulate perplexities				.SH SYNOPSIS				.B add-ppls 				.RI [ ppl-file ..]				.br				.B subtract-ppls				.I ppl-file1				.RI [ ppl-file2 ...]				.br				.B ppl-from-log				.RI [ ppl-file ...]				.br				.B hits-from-log				.RI [ ppl-file ...]				.br				.B compare-ppls 				[\c				.BI mindelta= D\c				]				.I ppl-file1				.I ppl-file2				.br				.B compute-best-mix				[\c				.BI lambda=' "l1 l2"				.RB ... '				.BI precision= P\c				]				.I ppl-file1				.RI [ ppl-file2 ...]				.br				.B compute-best-sentence-mix				[\c				.BI lambda=' "l1 l2"				.RB ... '				.BI precision= P\c				]				.I ppl-file1				.RI [ ppl-file2 ...]				.SH DESCRIPTION				These scripts process the output of the 				.BR ngram (1)				option				.B \-ppl				to extract various useful information.				They are particularly convenient in analyzing the performance (perplexity) of 				language models on specific subsets of the test data,				or to compare and combine multiple models.				.PP				.B add-ppls 				takes several ppl output files and computes an aggregate perplexity and				corpus statistics.				Its output is suitable for subsequent manipulation by				.B add-ppls 				or				.BR subtract-ppls .				.PP				.B subtract-ppls				similarly computes an aggregate perplexity by removing the				statistics of zero or more				.I ppl-file2				from those in				.IR ppl-file1 .				Its output is suitable for subsequent manipulation by				.B add-ppls 				or				.BR subtract-ppls .				.PP				.B ppl-from-log				recomputes the total perplexities and statistics from individual				lines in				.B "ngram \-debug 2 \-ppl"				output.				Combined with some filtering of that output this allows computing 				perplexities on interesting subsets of words.				.PP				.B hits-from-log				computes N-gram hit rates from				.B "ngram \-debug 2 \-ppl"				output.				.PP				.B compare-ppls				tallies the number of words for which two language models produce the same,				higher, or lower probabilities.				The input files should be 				.B "ngram \-debug 2 \-ppl"				output for the two models on the same test set.				The parameter				.I D				is the minimum absolute difference for two log probabilities to be 				considered different (the default is 0).				.PP				.B compute-best-mix				takes the output of several				.B "ngram \-debug 2 \-ppl"				runs on the same test set and computes the optimal interpolation 				weights for the corresponding models,				i.e., the weights that minimize the perplexity of an interpolated model.				Initial weights may be specified as				.IR "l1 l2 ..." .				The computation is iterative and stops when the interpolation weights				change by less than				.I P 				(default 0.001).				.PP				.B compute-best-sentence-mix				similarly optimizes the weights for sentence-level interpolation of LMs.				It requires input files generated by				.BR "ngram \-debug 1 \-ppl" .				(Sentence-level mixtures can be implemented using the 				.B "ngram \-hmm"				option, by constructing a suitable HMM structure.)				.SH "SEE ALSO"				ngram(1).				.SH BUGS				All scripts depend on the idiosyncrasies of				.B "ngram \-ppl" 				output.				.SH AUTHOR				Andreas Stolcke .				.br				Copyright 1995-2002 SRI International							

相关资源