使用IK分词插件及更新IK词典

使用IK分词插件及更新IK词典

验证分词效果。

执行如下代码,对输入文本这是个测试进行分词。

GET ik_pinyin/_analyze

{

"text": "这是个测试",

"analyzer": "ik_pinyin_analyzer"

}预期的返回结果如下。

{

"tokens" : [

{

"token" : "zhe",

"start_offset" : 0,

"end_offset" : 2,

"type" : "CN_WORD",

"position" : 0

},

{

"token" : "这是",

"start_offset" : 0,

"end_offset" : 2,

"type" : "CN_WORD",

"position" : 0

},

{

"token" : "zs",

"start_offset" : 0,

"end_offset" : 2,

"type" : "CN_WORD",

"position" : 0

},

{

"token" : "shi",

"start_offset" : 0,

"end_offset" : 2,

"type" : "CN_WORD",

"position" : 1

},

{

"token" : "ge",

"start_offset" : 2,

"end_offset" : 3,

"type" : "CN_CHAR",

"position" : 2

},

{

"token" : "个",

"start_offset" : 2,

"end_offset" : 3,

"type" : "CN_CHAR",

"position" : 2

},

{

"token" : "g",

"start_offset" : 2,

"end_offset" : 3,

"type" : "CN_CHAR",

"position" : 2

},

{

"token" : "ce",

"start_offset" : 3,

"end_offset" : 5,

"type" : "CN_WORD",

"position" : 3

},

{

"token" : "shi",

"start_offset" : 3,

"end_offset" : 5,

"type" : "CN_WORD",

"position" : 4

},

{

"token" : "测试",

"start_offset" : 3,

"end_offset" : 5,

"type" : "CN_WORD",

"position" : 4

},

{

"token" : "cs",

"start_offset" : 3,

"end_offset" : 5,

"type" : "CN_WORD",

"position" : 4

}

]

}

相关推荐

科技日报29项卡脖子技术(10)
365天稳定更新

科技日报29项卡脖子技术(10)

📅 07-07 👀 3418
蹦床运动入门
bet3365标准版

蹦床运动入门

📅 08-23 👀 6549
垄断又蛮横,为什么“苹果税”就是告不倒?
365天稳定更新

垄断又蛮横,为什么“苹果税”就是告不倒?

📅 07-09 👀 4907
昨晚,又一个硬核IPO诞生:祥峰投资10年斩获百倍回报