ES

es-immense-term

es集群在运行过程中,收到告警说两个几点CPU load告警,上去看了日志发现如下错误:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
java.lang.IllegalArgumentException: Document contains at least one immense term in field="msg" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[123, 34, 98, 114, 111, 97, 100, 99, 97, 115, 116, 73, 100, 34, 58, 49, 52, 48, 56, 49, 57, 57, 57, 56, 56, 44, 34, 116, 121, 112]...', original message: bytes can be at most 32766 in length; got 40283
at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:685)
at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:359)
at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:318)
at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:239)
at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:454)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1511)
at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1246)
at org.elasticsearch.index.engine.internal.InternalEngine.innerCreateNoLock(InternalEngine.java:482)
at org.elasticsearch.index.engine.internal.InternalEngine.innerCreate(InternalEngine.java:435)
at org.elasticsearch.index.engine.internal.InternalEngine.create(InternalEngine.java:404)
at org.elasticsearch.index.shard.service.InternalIndexShard.create(InternalIndexShard.java:403)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:449)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardUpdateOperation(TransportShardBulkAction.java:541)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:240)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:511)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:419)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: bytes can be at most 32766 in length; got 40283
at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:284)
at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:151)
at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:659)
... 18 more

意思就是消息的中的某个字段msg 的字段长度 超过了最大设置的值 32766字节。es系统不处理这个消息并抛出异常。 一般是这个字段设置成了not_anayzed,然后长度超出限制了。term 是一个搜索的最小单位,一般不会太大。

网上找了下解决方案,就是超出的部分就ignore。

1
2
3
4
5
6
7
8
9
10
11
curl -XPUT 'http://localhost:9200/twitter' -d '
{
"mappings":{
"tweet" : {
"properties" : {
"message" : {"type" : "string", "index":"not_analyzed","ignore_above":256 }
}
}
}
}
'

如果这个字段是 index:analyzed的情况就不会出现这个问题。具体要查看你es中设置的mapping。我默认是设置了动态 mapping。将没有指定的filed 设置成了not anaylzed。然后就出来这个长度的限制,修改mapping 搞定。