这是一款很好用的工具包

源代码在线查看: lm-scripts.html

软件大小: 3034 K
上传用户: wanghaihah
关键词: 工具包
下载地址: 免注册下载 普通下载 VIP

相关代码

																lm-scripts								lm-scripts				 NAME 				lm-scripts, add-dummy-bows, change-lm-vocab, empty-sentence-lm, get-unigram-probs, make-hiddens-lm, make-lm-subset, make-sub-lm, remove-lowprob-ngrams, reverse-lm, sort-lm - manipulate N-gram language models				 SYNOPSIS 				 add-dummy-bows 				[lm-file]				>new-lm-file								 change-lm-vocab 				 -vocab 				 vocab 				 -lm 				 lm-file 				 -write-lm 				 new-lm-file 				[-tolower]				[-subset]				[ngram-options...]								 empty-sentence-lm 				 -prob 				 p 				 -lm 				 lm-file 				 -write-lm 				 new-lm-file 				[ngram-options...]								 get-unigram-probs 				[linear=1]								 make-hiddens-lm 				[lm-file]				>hiddens-lm-file								 make-lm-subset 				count-file| - 				[lm-file|-]								 make-sub-lm 				[maxorder=N]				[lm-file]				>new-lm-file								 remove-lowprob-ngrams 				[lm-file]				>new-lm-file								 reverse-lm 				[lm-file]				>new-lm-file								 sort-lm 				[lm-file]				>sorted-lm-file				 DESCRIPTION 				These scripts perform various useful manipulations on N-gram models				in their textual representation.				Most operate on backoff N-grams in ARPA				ngram-format(5).								Since these tools are implemented as scripts they don't automatically				input or output compressed model files correctly, unlike the main				SRILM tools.				However, since most scripts work with data from standard input or				to standard output (by leaving out the file argument, or specifying it 				as ``-'') it is easy to combine them with 				gunzip(1)				or				gzip(1)				on the command line.								Also note that many of the scripts take their options with the 				gawk(1)				syntax				option=value				instead of the more common				-option				value.								 add-dummy-bows 				adds dummy backoff weights to N-grams, even where they 				are not required, to satisfy some broken software that expects				backoff weights on all N-grams (except those of highest order).								 change-lm-vocab 				modifies the vocabulary of an LM to be that in 				vocab.				Any N-grams containing out-of-vocabulary words are removed,				new words receive a unigram probability, and the model				is renormalized.				The 				 -tolower 				option causes case distinctions to be ignored.				 -subset 				only removes words from the LM vocabulary, without adding any.				Any remaining				 ngram-options 				are passes to				ngram(1),				and can be used to set debugging level, N-gram order, etc.								 empty-sentence-lm 				modifies an LM so that it allows the empty sentence with 				probability				p.				This is useful to modify existing LMs that are trained on non-empty				sentences only.				 ngram-options 				are passes to				ngram(1),				and can be used to set debugging level, N-gram order, etc.								 make-hiddens-lm 				constructs an N-gram model that can be used with the				 ngram -hiddens 				option.				The new model contains intra-utterance sentence boundary				tags ``<#s>'' with the same probability as the original model				had final sentence tags </s>.				Also, utterance-initial words are not conditioned on <s> and				there is no penalty associated with utterance-final </s>.				Such as model might work better it the test corpus is segmented 				at places other than proper <s> boundaries.								 make-lm-subset 				forms a new LM containing only the N-grams found in the 				count-file,				in 				ngram-count(1)				format.				The result still needs to be renormalized with				 ngram -renorm 				(which will also adjust the N-gram counts in the header).								 make-sub-lm 				removes N-grams of order exceeding				N.				This function is now redundant, since				all SRILM tools can do this implicitly (without using extra memory 				and very small time overhead) when reading N-gram models				with the appropriate				 -order 				parameter.								 remove-lowprob-ngrams 				eliminates N-grams whose probability is lower than that which they				would receive through backoff.				This is useful when building finite-state networks for N-gram				models.				However, this function is now performed much faster by 				ngram(1)				with the				 -prune-lowprobs 				option.								 reverse-lm 				produces a new LM that generates sentences with probabilities equal				to the reversed sentences in the input model.								 sort-lm 				sorts the n-grams in an LM in lexicographic order (left-most words being				the most significant).				This is not a requirement for SRILM, but might be necessary for some 				other LM software.				(The LMs output by SRILM are sorted somewhat differently, reflecting 				the internal data structures used; that is also the order that should give				best cache utilization when using SRILM to read models.)								 get-unigram-probs 				extracts the unigram probabilities in a simple table format				from a backoff language model.				The 				 linear=1 				option causes probabilities to be output on a linear (instead of log) scale.				 SEE ALSO 				ngram-format(5), ngram(1).				 BUGS 				These are quick-and-dirty scripts, what do you expect?								 reverse-lm 				supports only bigram LMs, and can produce improper probability estimates 				as a result of inconsistent marginals in the input model.				 AUTHOR 				Andreas Stolcke <stolcke@speech.sri.com>.								Copyright 1995-2006 SRI International															

相关资源