这是一款很好用的工具包

源代码在线查看: lm-scripts.1

软件大小: 3034 K
上传用户: wanghaihah
关键词: 工具包
下载地址: 免注册下载 普通下载 VIP

相关代码

				lm-scripts(1)                                       lm-scripts(1)																NNAAMMEE				       lm-scripts,  add-dummy-bows,  change-lm-vocab,  empty-sen-				       tence-lm, get-unigram-probs, make-hiddens-lm, make-lm-sub-				       set, make-sub-lm, remove-lowprob-ngrams, reverse-lm, sort-				       lm - manipulate N-gram language models								SSYYNNOOPPSSIISS				       aadddd--dduummmmyy--bboowwss [_l_m_-_f_i_l_e] >>_n_e_w_-_l_m_-_f_i_l_e				       cchhaannggee--llmm--vvooccaabb --vvooccaabb _v_o_c_a_b --llmm _l_m_-_f_i_l_e --wwrriittee--llmm _n_e_w_-_l_m_-				       _f_i_l_e [--ttoolloowweerr] [--ssuubbsseett] [_n_g_r_a_m_-_o_p_t_i_o_n_s...]				       eemmppttyy--sseenntteennccee--llmm  --pprroobb  _p  --llmm _l_m_-_f_i_l_e --wwrriittee--llmm _n_e_w_-_l_m_-				       _f_i_l_e [_n_g_r_a_m_-_o_p_t_i_o_n_s...]				       ggeett--uunniiggrraamm--pprroobbss [lliinneeaarr==11]				       mmaakkee--hhiiddddeennss--llmm [_l_m_-_f_i_l_e] >>_h_i_d_d_e_n_s_-_l_m_-_f_i_l_e				       mmaakkee--llmm--ssuubbsseett _c_o_u_n_t_-_f_i_l_e|-- [_l_m_-_f_i_l_e|--]				       mmaakkee--ssuubb--llmm [mmaaxxoorrddeerr==_N] [_l_m_-_f_i_l_e] >>_n_e_w_-_l_m_-_f_i_l_e				       rreemmoovvee--lloowwpprroobb--nnggrraammss [_l_m_-_f_i_l_e] >>_n_e_w_-_l_m_-_f_i_l_e				       rreevveerrssee--llmm [_l_m_-_f_i_l_e] >>_n_e_w_-_l_m_-_f_i_l_e				       ssoorrtt--llmm [_l_m_-_f_i_l_e] >>_s_o_r_t_e_d_-_l_m_-_f_i_l_e								DDEESSCCRRIIPPTTIIOONN				       These scripts perform various useful manipulations  on  N-				       gram models in their textual representation.  Most operate				       on backoff N-grams in ARPA nnggrraamm--ffoorrmmaatt(5).								       Since these tools are implemented as  scripts  they  don't				       automatically  input or output compressed model files cor-				       rectly, unlike the main SRILM tools.  However, since  most				       scripts  work with data from standard input or to standard				       output (by leaving out the file argument, or specifying it				       as  ``-'')  it  is  easy to combine them with gguunnzziipp(1) or				       ggzziipp(1) on the command line.								       Also note that many of the scripts take their options with				       the ggaawwkk(1) syntax _o_p_t_i_o_n==_v_a_l_u_e instead of the more common				       --_o_p_t_i_o_n _v_a_l_u_e.								       aadddd--dduummmmyy--bboowwss adds dummy backoff weights to N-grams, even				       where  they are not required, to satisfy some broken soft-				       ware that expects backoff weights on all  N-grams  (except				       those of highest order).								       cchhaannggee--llmm--vvooccaabb  modifies  the  vocabulary  of an LM to be				       that in _v_o_c_a_b.  Any N-grams  containing  out-of-vocabulary				       words  are  removed, new words receive a unigram probabil-				       ity, and the model is renormalized.  The  --ttoolloowweerr  option				       causes  case  distinctions  to  be  ignored.  --ssuubbsseett only				       removes words from the LM vocabulary, without adding  any.				       Any  remaining  _n_g_r_a_m_-_o_p_t_i_o_n_s  are passes to nnggrraamm(1), and				       can be used to set debugging level, N-gram order, etc.								       eemmppttyy--sseenntteennccee--llmm modifies an LM so  that  it  allows  the				       empty sentence with probability _p.  This is useful to mod-				       ify existing LMs that are trained on  non-empty  sentences				       only.   _n_g_r_a_m_-_o_p_t_i_o_n_s  are  passes to nnggrraamm(1), and can be				       used to set debugging level, N-gram order, etc.								       mmaakkee--hhiiddddeennss--llmm constructs an N-gram  model  that  can  be				       used  with  the nnggrraamm --hhiiddddeennss option.  The new model con-				       tains intra-utterance sentence boundary tags ``'' with				       the  same probability as the original model had final sen-				       tence tags .  Also, utterance-initial  words  are  not				       conditioned on  and there is no penalty associated with				       utterance-final .  Such as model might work better  it				       the  test  corpus is segmented at places other than proper				        boundaries.								       mmaakkee--llmm--ssuubbsseett forms a new LM containing only the  N-grams				       found  in  the  _c_o_u_n_t_-_f_i_l_e, in nnggrraamm--ccoouunntt(1) format.  The				       result still needs to be renormalized with  nnggrraamm  --rreennoorrmm				       (which  will also adjust the N-gram counts in the header).								       mmaakkee--ssuubb--llmm removes N-grams of order  exceeding  _N.   This				       function  is  now  redundant, since all SRILM tools can do				       this implicitly (without using extra memory and very small				       time  overhead) when reading N-gram models with the appro-				       priate --oorrddeerr parameter.								       rreemmoovvee--lloowwpprroobb--nnggrraammss eliminates N-grams whose probability				       is  lower than that which they would receive through back-				       off.  This is useful when building  finite-state  networks				       for  N-gram  models.   However,  this function is now per-				       formed much faster by nnggrraamm(1)  with  the  --pprruunnee--lloowwpprroobbss				       option.								       rreevveerrssee--llmm produces a new LM that generates sentences with				       probabilities equal to the reversed sentences in the input				       model.								       ssoorrtt--llmm  sorts the n-grams in an LM in lexicographic order				       (left-most words being the most significant).  This is not				       a  requirement  for SRILM, but might be necessary for some				       other LM software.  (The LMs output by  SRILM  are  sorted				       somewhat  differently, reflecting the internal data struc-				       tures used; that is also the order that should  give  best				       cache utilization when using SRILM to read models.)								       ggeett--uunniiggrraamm--pprroobbss  extracts the unigram probabilities in a				       simple table format from a backoff  language  model.   The				       lliinneeaarr==11  option  causes  probabilities  to be output on a				       linear (instead of log) scale.								SSEEEE AALLSSOO				       ngram-format(5), ngram(1).								BBUUGGSS				       These are quick-and-dirty scripts, what do you expect?				       rreevveerrssee--llmm supports  only  bigram  LMs,  and  can  produce				       improper probability estimates as a result of inconsistent				       marginals in the input model.								AAUUTTHHOORR				       Andreas Stolcke .				       Copyright 1995-2006 SRI International																SRILM Tools        $Date: 2006/11/18 22:32:45 $     lm-scripts(1)							

相关资源