Solr 3.5 のexampleディレクトリの以下で

% java -jar start.jar

した際の index fileの形式を見る. その10.

indexはまっさらにしている. 前回の続きではない.

1件update&commit(id, trie_ti field)


curl http://localhost:8983/solr/update\?commit\=true -H "Content-Type: text/xml" --data-binary '<add><doc><field name="id">abc</field><field name="test_i">4</field><field name="test_ti">4</field></doc></add>'

schama.xml (抜粋)


    <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
    <fieldType name="int" class="solr.TrieIntField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
    <fieldType name="tint" class="solr.TrieIntField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
    <field name="id" type="string" indexed="true" stored="true" required="true" /> 
    <dynamicField name="*_i"  type="int"    indexed="true"  stored="true"/>
    <dynamicField name="*_ti" type="tint"    indexed="true"  stored="true"/>

ファイル

  • segments.gen
  • segments_2
  • _0.fdt
  • _0.fdx
  • _0.fnm
  • _0.frq
  • _0.nrm
  • _0.tii
  • _0.tis
  • segments*, *.fnm, _0,nrm については略.

_0.fdt


0000000 00 00 00 03 03 00 00 03 61 62 63 01 09 00 00 00
0000020 04 02 09 00 00 00 04
0000027
  • Format
    • 00 00 00 03
      • org.apache.lucene.index.FieldsWriter.FORMAT_LUCENE_3_2_NUMERIC_FIELDS
  • FieldCount
    • 03
  • FieldNum
    • 03
  • Bits
    • 00
  • Value
    • 03 61 62 63
      • "abc"
  • FieldNum
    • 01
  • Bits
    • 09
      • low order bit is one for tokenized fields (TrieIntField は Tokenize している)
      • 4th to 6th bit (mask: 0x7<<3) define the type of a numeric field: 1<<3: Value is Int
  • Value
    • 00 00 00 04
  • FieldNum
    • 02
  • Bits
    • 09
      • low order bit is one for tokenized fields (TrieIntField は Tokenize している)
      • 4th to 6th bit (mask: 0x7<<3) define the type of a numeric field: 1<<3: Value is Int
  • Value
    • 00 00 00 04

_0.frq


0000000 00 00 00 00 00 00
0000006

  • id フィールド
  • TermFreq(DocDelta)
    • 00
  • test_i フィールド
  • TermFreq(DocDelta)
    • 00
  • test_ti フィールド
  • TermFreq(DocDelta)
    • 00
  • TermFreq(DocDelta)
    • 00
  • TermFreq(DocDelta)
    • 00
  • TermFreq(DocDelta)
    • 00
      • precisionStep が 8 なので 4 個 Token ができている

_0.tis


0000000 ff ff ff fc 00 00 00 00 00 00 00 06 00 00 00 80
0000020 00 00 00 10 00 00 00 0a 00 03 61 62 63 00 01 00
0000040 00 00 06 60 08 00 00 00 04 01 01 01 00 06 00 02
0000060 01 01 00 00 05 68 04 00 00 00 02 01 01 00 00 04
0000100 70 02 00 00 02 01 01 00 00 03 78 01 00 02 01 01
0000120 00
0000121
  • TIVersion
    • ff ff ff fc
      • org.apache.lucene.index.TermInfosWriter.FORMAT_VERSION_UTF8_LENGTH_IN_BYTES
  • IndexTermCount
    • 00 00 00 00 00 00 00 06
  • IndexInterval
    • 00 00 00 80
  • SkipInterval
    • 00 00 00 10
  • MaxSkipLevels
    • 00 00 00 0a
  • TermInfo(Term<PrefixLength, Suffix, FieldNum>, DocFreq, FreqDelta, ProxDelta, [SkipDelta])
    • 00 03 61 62 63 00 01 00 00
      • Suffix : 03 61 62 63 ("abc"), FieldNum: 0(id), DocFreq: 1, FreqDelta: 0 , ProxDelta: 0, SkipDeltaなし
  • TermInfo(Term<PrefixLength, Suffix, FieldNum>, DocFreq, FreqDelta, ProxDelta, [SkipDelta])
    • 00 06 60 08 00 00 00 04 01 01 01 00
      • Suffix : 06 60 08 00 00 00 04 (4 が NumericUtils.intToPrefixCoded(int) されたもの), FieldNum: 1(test_i), DocFreq: 1, FreqDelta: 1, ProxDelta: 0, SkipDeltaなし
      • 60 は [SHIFT_START_LONG + (shift, 0)], 数値が どれだけ shift されているかがわかる
      • 残りは sortableBits = val ^ 0x80000000 を 7bit 区切りで格納したもの
      • // Store 7 bits per character for good efficiency when UTF-8 encoding.
  • TermInfo(Term<PrefixLength, Suffix, FieldNum>, DocFreq, FreqDelta, ProxDelta, [SkipDelta])
    • 06 00 02 01 01 00
      • PrefixLength: 06, FieldNum: 2(test_ti), DocFreq: 1, FreqDelta: 1, ProxDelta: 0, SkipDeltaなし
  • TermInfo(Term<PrefixLength, Suffix, FieldNum>, DocFreq, FreqDelta, ProxDelta, [SkipDelta])
    • 00 05 68 04 00 00 00 02 01 01 00
      • 8 bit shift されている
  • TermInfo(Term<PrefixLength, Suffix, FieldNum>, DocFreq, FreqDelta, ProxDelta, [SkipDelta])
    • 00 04 70 02 00 00 02 01 01 00
  • TermInfo(Term<PrefixLength, Suffix, FieldNum>, DocFreq, FreqDelta, ProxDelta, [SkipDelta])
    • 00 03 78 01 00 02 01 01 00

メンバーのみ編集できます