Lucene/IndexFileFormats10

Solr 3.5 のexampleディレクトリの以下で

% java -jar start.jar

した際の index fileの形式を見る. その10.

indexはまっさらにしている. 前回の続きではない.

1件update&commit(id, trie_ti field)

curl http://localhost:8983/solr/update\?commit\=true -H "Content-Type: text/xml" --data-binary '<add><doc><field name="id">abc</field><field name="test_i">4</field><field name="test_ti">4</field></doc></add>'

schama.xml (抜粋)

    <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
    <fieldType name="int" class="solr.TrieIntField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
    <fieldType name="tint" class="solr.TrieIntField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
    <field name="id" type="string" indexed="true" stored="true" required="true" /> 
    <dynamicField name="*_i"  type="int"    indexed="true"  stored="true"/>
    <dynamicField name="*_ti" type="tint"    indexed="true"  stored="true"/>

ファイル

segments.gen
segments_2
_0.fdt
_0.fdx
_0.fnm
_0.frq
_0.nrm
_0.tii
_0.tis

segments*, *.fnm, _0,nrm については略.

_0.fdt

0000000 00 00 00 03 03 00 00 03 61 62 63 01 09 00 00 00
0000020 04 02 09 00 00 00 04
0000027

Format
- 00 00 00 03
  - org.apache.lucene.index.FieldsWriter.FORMAT_LUCENE_3_2_NUMERIC_FIELDS
FieldCount
- 03

FieldNum
- 03
Bits
- 00
Value
- 03 61 62 63
  - "abc"
FieldNum
- 01
Bits
- 09
  - low order bit is one for tokenized fields (TrieIntField は Tokenize している)
  - 4th to 6th bit (mask: 0x7<<3) define the type of a numeric field: 1<<3: Value is Int
Value
- 00 00 00 04
FieldNum
- 02
Bits
- 09
  - low order bit is one for tokenized fields (TrieIntField は Tokenize している)
  - 4th to 6th bit (mask: 0x7<<3) define the type of a numeric field: 1<<3: Value is Int
Value
- 00 00 00 04

_0.frq

0000000 00 00 00 00 00 00
0000006

id フィールド
TermFreq(DocDelta)
- 00

test_i フィールド
TermFreq(DocDelta)
- 00

test_ti フィールド
TermFreq(DocDelta)
- 00
TermFreq(DocDelta)
- 00
TermFreq(DocDelta)
- 00
TermFreq(DocDelta)
- 00
  - precisionStep が 8 なので 4 個 Token ができている

_0.tis

0000000 ff ff ff fc 00 00 00 00 00 00 00 06 00 00 00 80
0000020 00 00 00 10 00 00 00 0a 00 03 61 62 63 00 01 00
0000040 00 00 06 60 08 00 00 00 04 01 01 01 00 06 00 02
0000060 01 01 00 00 05 68 04 00 00 00 02 01 01 00 00 04
0000100 70 02 00 00 02 01 01 00 00 03 78 01 00 02 01 01
0000120 00
0000121

TIVersion
- ff ff ff fc
  - org.apache.lucene.index.TermInfosWriter.FORMAT_VERSION_UTF8_LENGTH_IN_BYTES
IndexTermCount
- 00 00 00 00 00 00 00 06
IndexInterval
- 00 00 00 80
SkipInterval
- 00 00 00 10
MaxSkipLevels
- 00 00 00 0a

TermInfo(Term<PrefixLength, Suffix, FieldNum>, DocFreq, FreqDelta, ProxDelta, [SkipDelta])
- 00 03 61 62 63 00 01 00 00
  - Suffix : 03 61 62 63 ("abc"), FieldNum: 0(id), DocFreq: 1, FreqDelta: 0 , ProxDelta: 0, SkipDeltaなし

TermInfo(Term<PrefixLength, Suffix, FieldNum>, DocFreq, FreqDelta, ProxDelta, [SkipDelta])
- 00 06 60 08 00 00 00 04 01 01 01 00
  - Suffix : 06 60 08 00 00 00 04 (4 が NumericUtils.intToPrefixCoded(int) されたもの), FieldNum: 1(test_i), DocFreq: 1, FreqDelta: 1, ProxDelta: 0, SkipDeltaなし
  - 60 は [SHIFT_START_LONG + (shift, 0)], 数値がどれだけ shift されているかがわかる
  - 残りは sortableBits = val ^ 0x80000000 を 7bit 区切りで格納したもの
  - // Store 7 bits per character for good efficiency when UTF-8 encoding.

TermInfo(Term<PrefixLength, Suffix, FieldNum>, DocFreq, FreqDelta, ProxDelta, [SkipDelta])
- 06 00 02 01 01 00
  - PrefixLength: 06, FieldNum: 2(test_ti), DocFreq: 1, FreqDelta: 1, ProxDelta: 0, SkipDeltaなし
TermInfo(Term<PrefixLength, Suffix, FieldNum>, DocFreq, FreqDelta, ProxDelta, [SkipDelta])
- 00 05 68 04 00 00 00 02 01 01 00
  - 8 bit shift されている
TermInfo(Term<PrefixLength, Suffix, FieldNum>, DocFreq, FreqDelta, ProxDelta, [SkipDelta])
- 00 04 70 02 00 00 02 01 01 00
TermInfo(Term<PrefixLength, Suffix, FieldNum>, DocFreq, FreqDelta, ProxDelta, [SkipDelta])
- 00 03 78 01 00 02 01 01 00

このページを編集するこのページを元に新規ページを作成

印刷する

Lucene/IndexFileFormats10 - 春山征吾のWiki 先頭へ

春山征吾のWiki

1件update&commit(id, trie_ti field)

schama.xml (抜粋)

ファイル

_0.fdt

_0.frq

_0.tis

Wiki内検索

最近更新したページ

2024-04-08

2021-02-02

2021-01-16

2021-01-05

2021-01-02

2020-12-15

2019-09-12

2018-08-09

2017-12-19

2016-11-16

2015-12-28

2015-12-11

2015-11-03

2015-02-26

2014-10-26

2014-08-03

2014-06-24

2014-03-07

2013-12-02

2013-12-01

最新コメント

Menu

タグ

カテゴリー

インターネット

暮らし/生活