搜索资源列表
ictclas_Source_Code
- 计算所汉语词法分析系统ICTCLAS介绍 词是最小的能够独立活动的有意义的语言成分。 但汉语是以字为基本的书写单位,词语之间没有明显的区分标记,因此,中文词语分析是中文信息处理的基础与关键。为此,我们中国科学院计算技术研究所在多年研究基础上,耗时一年研制出了汉语词法分析系统ICTCLAS(Institute of Computing Technology, Chinese Lexical Analysis System),该系统的功能有:中文分词;词性标注;未登录词识别。分词正确率高达97
CDevideSentence
- 用c++写的分词算法,简单,实用,详情看里面的帮助文件!-using c + + to write the sub-term algorithm is simple, practical, inside look at the details of the help files!
wordpos
- 给定带有分词和词性标注信息语料,从中总结单词的词频,并按照出现次数排序输出-given with sub-term and part-of-speech tagging information corpus, it is concluded that the words and phrases, and in accordance with the order of the output frequency
WordSegMM
- 中文最大匹配分词源码-matching the largest Chinese-term source
Qiyi
- 最大概率法分词的数据结构与算法,用这样的方法分词可以提高分词中歧义词的辨别率,非常经典啊-greatest probability - term data structure and algorithm, this approach can increase Word word-of ambiguous words in the identification rate, very classic ah
pymmseg.用python写的分词程序
- 用python写的分词程序,实现的是最大匹配方法,简单易用,Using python to write the sub-term process of implementation is the largest matching method, easy-to-use
CJKAnalyzer.分词系统(JAVA开发
- 一个很好用的分词系统(JAVA开发),词表可以扩展。,A good word to use the sub-system (JAVA development), the term sheet can be extended.
lzj.rar
- 给一篇文章,然后根据停用词表,去除该文章的内的次用词,然后存入一个文件中。,To an article, and then form the basis of stop words to remove the article, the second term, and then into a file.
PatTermExtraction
- 使用无监督的机器学习方法进行术语抽取的系统,具有预处理、分词、抽取术语等功能。-Unsupervised machine learning methods for term extraction system with preprocessing, segmentation, extracted terms, and so on.
splitword
- 自己写的小分词程序,中文分词测试版,仅供参考,谢谢!-Writing their own small sub-term process, English sub-test version of the word, for reference purposes only, thank you!
2
- 清浊音检测,方法为短时平均能量,自己编的matlab程序。希望有用--Qingzhuo tone detection, methods for short-term average energy, the matlab own procedures. Hope useful
POSTagger_Src
- 包含了词条及其词性标记,频度信息的词典 练语料的格式要求: 每个词以 / 分隔, / 后是该词的词性标记。词性标记后至少要有一个空格。一个句子的所有词必须在同一行中。击“开始词性标注”选取文本文件(一次可以选择多个)进行标注处理-Includes a term and its part of speech marks, the frequency of information and training Corpus dictionary format requirements: Each w
nlp
- 中文自然語言處理相關程式,包括中文字頻統計及Jensen-Shannon Divergence計算程式,並包含古典文獻範例-This rar file include natural language processing related programs, includeing Chinese term frequency statistics, Jensen-Shannon Divergence program and text examples.
StdMis
- create database StdMis Go use StdMis Go If exists(select name from sysobjects where name="User" and type="U") Drop table User go create table T_User( UserName varchar(10) not null, Password varchar(6) not null, FullName var
word_split
- 这个一个基于逆向最大匹配的分词程序,语料规模比较小。-The maximum matching based on the reverse of the sub-term process, relatively small-scale corpus.
TF-IDF
- The tf–idf weight (term frequency–inverse document frequency) is a weight often used in information retrieval and text mining. This weight is a statistical measure used to evaluate how important a word is to a document in a collection or corpus. The
NLP-test-and-amendament
- 自然語言處理作業(NLP)-期中考試卷訂正 Term Explanation(需各舉一個例子做說明)-Word-sense disambiguation In computational linguistics, word sense disambiguation (WSD) is the process of identifying which sense of a word is used in any given sentence, when the word has a numb