这是一款很好用的工具包

源代码在线查看: ngram-merge.1

软件大小: 3034 K
上传用户: wanghaihah
关键词: 工具包
下载地址: 免注册下载 普通下载 VIP

相关代码

				.\" $Id: ngram-merge.1,v 1.7 2004/12/03 17:59:01 stolcke Exp $				.TH ngram-merge 1 "$Date: 2004/12/03 17:59:01 $"  "SRILM Tools"				.SH NAME				ngram-merge \- merge N-gram counts				.SH SYNOPSIS				.B ngram-merge				[\c				.BR \-help ]				[\c				.B \-write				.IR outfile ]				[\c				.BR \-float-counts ]				[\c				.BR -- ]				.I infile1				.I infile2				\&...				.SH DESCRIPTION				.B ngram-merge 				reads two or more lexicographically sorted N-gram count files				(as produced by 				.BR "ngram-count -sort" )				and outputs the merged, sorted counts.				The output is thus suitable for subsequent merging steps.				.PP				The input format consists of one N-gram count per line,				.br				.I					word1 word2 ... wordn count				.P				.br				The lines must be sorted lexicographically on the words, leftmost first.				The input may contain N-grams of different lengths.				.PP				Each filename argument can be a plain ASCII count file, or a 				compressed file (name ending in .Z or .gz), or ``-'' to indicate				stdin/stdout.				.PP				.B ngram-merge 				is recommended in cases where the full counts would far exceed 				available real memory.				Although an arbitrary number of input count files is accepted,				it is best to use the program as follows.				First, partition the input text into the largest chunks so that				.B ngram-count				can run in real memory.				Then merge the resulting sorted counts using				.B ngram-merge				pairwise, and continue doing so in a binary tree pattern until a				single count file containing all N-grams remains.				This procedure is automated by the				.B make-batch-counts				and				.B merge-batch-counts				scripts.				.SH OPTIONS				.PP				Each filename argument can be an ASCII file, or a 				compressed file (name ending in .Z or .gz), or ``-'' to indicate				stdin/stdout.				.TP				.B \-help				Print option and usage summary.				.TP				.B \-version				Print version information.				.TP				.BI \-write " outfile"				Write merged counts to				.IR outfile ,				instead of standard output.				.TP				.B \-float-counts				Process counts as floating point numbers.				By default counts are assumed to be unsigned integers.				.TP				.B \-\-				Indicates the end of options, in case the first input filename begins				with ``-''.				.SH "SEE ALSO"				ngram-count(1), ngram(1), training-scripts(1).				.SH AUTHOR				Andreas Stolcke 				.br				Copyright 1995\-2004 SRI International							

相关资源