Solr 3.5 のexampleディレクトリの以下で

% java -jar start.jar

した際の index fileの形式を見る. その9.

indexはまっさらにしている. 前回の続きではない.

2件update&commit(id, cat field)


curl http://localhost:8983/solr/update\?commit\=true -H "Content-Type: text/xml" --data-binary '<add><doc><field name="id">abc</field><field name="cat">abc abc</field></doc><doc><field name="id">def</field><field name="cat">def abc</field></doc></add>'

schama.xml (抜粋)


    <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
    <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
      </analyzer>
    ...
    </fieldType>
   <field name="id" type="string" indexed="true" stored="true" required="true" /> 
   <field name="cat" type="string" indexed="true" stored="true" multiValued="true"/>
   <field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>
   <copyField source="cat" dest="text"/>

ファイル

  • segments.gen
  • segments_2
  • _0.fdt
  • _0.fdx
  • _0.fnm
  • _0.frq
  • _0.nrm
  • _0.prx
  • _0.tii
  • _0.tis
  • segments*, *.fdt, *.fdx については略.
  • .tii はその7と同じ


_0.frq


0000000 00 01 00 01 00 02 03 03
0000010

  • cat フィールド
  • TermFreq(DocDelta)
    • 00
  • TermFreq(DocDelta)
    • 01
  • id フィールド
  • TermFreq(DocDelta)
    • 00
  • TermFreq(DocDelta)
    • 01
  • text フィールド("abc")
  • TermFreq(docID=0)
    • DocDelta
      • 00
    • Freq
      • 02
  • TermFreq(docID=1)
    • 03
  • text フィールド("def")
  • TermFreq(docID=1)
    • 03


lucene-3.5.0/src/java/org/apache/lucene/index/FormatPostingsDocsWriter.java
     95     if (omitTermFreqAndPositions)
     96       out.writeVInt(delta);
     97     else if (1 == termDocFreq)
     98       out.writeVInt((delta<<1) | 1);
     99     else {
    100       out.writeVInt(delta<<1);
    101       out.writeVInt(termDocFreq);
    102     }

_0.nrm


0000000 4e 52 4d ff 79 79
0000006
  • NormHeader
    • 4e 52 4d ff
      • 'N','R','M',Version
  • Version
    • ff
      • currently -1.
  • Norms
    • 79

_0.prx


0000000 00 01 01 00
0000004

  • TermPositions<Positions=(PositionDelta,Payload?)^Freq>
    • 00 01
      • docID=0 での abc
  • TermPositions<Positions=(PositionDelta,Payload?)^Freq>
    • 01
      • docID=1 での abc
  • TermPositions<Positions=(PositionDelta,Payload?)^Freq>
    • 00
      • docID=1 での def

_0.tis


0000000 ff ff ff fc 00 00 00 00 00 00 00 06 00 00 00 80
0000020 00 00 00 10 00 00 00 0a 00 07 61 62 63 20 61 62
0000040 63 01 01 00 00 00 07 64 65 66 20 61 62 63 01 01
0000060 01 00 00 03 61 62 63 00 01 01 00 00 03 64 65 66
0000100 00 01 01 00 00 03 61 62 63 02 02 01 00 00 03 64
0000120 65 66 02 01 03 03
0000126
  • TIVersion
    • ff ff ff fc
      • org.apache.lucene.index.TermInfosWriter.FORMAT_VERSION_UTF8_LENGTH_IN_BYTES
  • IndexTermCount
    • 00 00 00 00 00 00 00 02
  • IndexInterval
    • 00 00 00 80
  • SkipInterval
    • 00 00 00 10
  • MaxSkipLevels
    • 00 00 00 0a
  • TermInfo(Term<PrefixLength, Suffix, FieldNum>, DocFreq, FreqDelta, ProxDelta, [SkipDelta])
    • 00 07 61 62 63 20 61 62 63 01 01 00 00
      • Suffix : 07 61 62 63 20 61 62 63 ("abc abc"), FieldNum: 1(cat), DocFreq: 1, SkipDeltaなし
  • TermInfo(Term<PrefixLength, Suffix, FieldNum>, DocFreq, FreqDelta, ProxDelta, [SkipDelta])
    • 00 07 64 65 66 20 61 62 63 01 01 01 00
      • Suffix : 07 64 65 66 20 61 62 63 ("def abc"), FieldNum: 1(cat), DocFreq: 1, FreqDelta: 1, SkipDeltaなし
  • TermInfo(Term<PrefixLength, Suffix, FieldNum>, DocFreq, FreqDelta, ProxDelta, [SkipDelta])
    • 00 03 61 62 63 00 01 01 00
      • Suffix : 03 61 62 63 (abc), FieldNum: 0(id), DocFreq: 1, FreqDelta: 1, SkipDeltaなし
  • TermInfo(Term<PrefixLength, Suffix, FieldNum>, DocFreq, FreqDelta, ProxDelta, [SkipDelta])
    • 00 03 64 65 66 00 01 01 00
      • Suffix : 03 64 65 66 (def), FieldNum: 0(id), DocFreq: 1, FreqDelta: 1, SkipDeltaなし
  • TermInfo(Term<PrefixLength, Suffix, FieldNum>, DocFreq, FreqDelta, ProxDelta, [SkipDelta])
    • 00 03 61 62 63 02 02 01 00
      • Suffix : 03 61 62 63 (abc), FieldNum: 2(text), DocFreq: 2, FreqDelta: 1, ProxDelta: 0, SkipDeltaなし
  • TermInfo(Term<PrefixLength, Suffix, FieldNum>, DocFreq, FreqDelta, ProxDelta, [SkipDelta])
    • 00 03 64 65 66 02 01 03 03
      • Suffix : 03 64 65 66 (def), FieldNum: 2(text), DocFreq: 1, FreqDelta: 3, ProxDelta: 3, SkipDeltaなし

メンバーのみ編集できます