Solr 3.5 のexampleディレクトリの以下で

% java -jar start.jar

した際の index fileの形式を見る. その11.

indexはまっさらにしている. 前回の続きではない.

1件update&commit(id, includes field)


curl http://localhost:8983/solr/update\?commit\=true -H "Content-Type: text/xml" --data-binary '<add><doc><field name="id">abc</field><field name="includes">a b</field></doc></add>'

schama.xml (抜粋)


    <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
    <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>

    <field name="id" type="string" indexed="true" stored="true" required="true" /> 
    <field name="includes" type="text_general" indexed="true" stored="true" termVectors="true" termPositions="true" termOffsets="true" />
    <copyField source="includes" dest="text"/>

ファイル

  • segments.gen
  • segments_2
  • _0.fdt
  • _0.fdx
  • _0.fnm
  • _0.frq
  • _0.nrm
  • _0.prx
  • _0.tii
  • _0.tis
  • _0.tvd
  • _0.tvf
  • _0.tvx
  • segments*, *.fdx, *.fnm, *,nrm, *.prx, *.tii については略.

_0.fdt


0000000 00 00 00 03 02 00 00 03 61 62 63 01 01 03 61 20
0000020 62
0000021
  • Format
    • 00 00 00 03
      • org.apache.lucene.index.FieldsWriter.FORMAT_LUCENE_3_2_NUMERIC_FIELDS
  • FieldCount
    • 02
  • FieldNum
    • 00
  • Bits
    • 00
  • Value
    • 03 61 62 63
      • "abc"
  • FieldNum
    • 01
  • Bits
    • 01
      • low order bit is one for tokenized fields
  • Value
    • 03 61 20 62

_0.frq


0000000 00 01 01 01 01
0000005

  • id フィールド
  • TermFreq(DocDelta)
    • 00
  • includes フィールド
  • TermFreq(DocDelta)
    • 01
  • TermFreq(DocDelta)
    • 01
  • text フィールド
  • TermFreq(DocDelta)
    • 01
  • TermFreq(DocDelta)
    • 01

_0.tis


0000000 ff ff ff fc 00 00 00 00 00 00 00 05 00 00 00 80
0000020 00 00 00 10 00 00 00 0a 00 03 61 62 63 00 01 00
0000040 00 01 00 01 01 01 00 00 01 62 01 01 01 01 00 01
0000060 61 02 01 01 01 00 01 62 02 01 01 01
0000074
  • TIVersion
    • ff ff ff fc
      • org.apache.lucene.index.TermInfosWriter.FORMAT_VERSION_UTF8_LENGTH_IN_BYTES
  • IndexTermCount
    • 00 00 00 00 00 00 00 05
  • IndexInterval
    • 00 00 00 80
  • SkipInterval
    • 00 00 00 10
  • MaxSkipLevels
    • 00 00 00 0a
  • TermInfo(Term<PrefixLength, Suffix, FieldNum>, DocFreq, FreqDelta, ProxDelta, [SkipDelta])
    • 00 03 61 62 63 00 01 00 00
      • Suffix : 03 61 62 63 ("abc"), FieldNum: 0(id), DocFreq: 1, FreqDelta: 0 , ProxDelta: 0, SkipDeltaなし
  • TermInfo(Term<PrefixLength, Suffix, FieldNum>, DocFreq, FreqDelta, ProxDelta, [SkipDelta])
    • 01 00 01 01 01 00
      • PrefixLength: 01, FieldNum: 1(includes), DocFreq: 1, FreqDelta: 1, ProxDelta: 0, SkipDeltaなし
  • TermInfo(Term<PrefixLength, Suffix, FieldNum>, DocFreq, FreqDelta, ProxDelta, [SkipDelta])
    • 00 01 62 01 01 01 01
      • Suffix: 01 62 ("b") FieldNum: 1(includes), DocFreq: 1, FreqDelta: 1, ProxDelta: 1, SkipDeltaなし
  • TermInfo(Term<PrefixLength, Suffix, FieldNum>, DocFreq, FreqDelta, ProxDelta, [SkipDelta])
    • 00 01 61 02 01 01 01
      • Suffix: 01 61 ("a") FieldNum: 2(text), DocFreq: 1, FreqDelta: 1, ProxDelta: 1, SkipDeltaなし
  • TermInfo(Term<PrefixLength, Suffix, FieldNum>, DocFreq, FreqDelta, ProxDelta, [SkipDelta])
    • 00 01 62 02 01 01 01
      • Suffix: 01 62 ("b") FieldNum: 2(text), DocFreq: 1, FreqDelta: 1, ProxDelta: 1, SkipDeltaなし

_0.tvd


0000000 00 00 00 04 01 01
0000006
  • TVDVersion
    • 00 00 00 04
    • TermVectorsReader.FORMAT_UTF8_LENGTH_IN_BYTES(4)
  • NumField
    • 01
  • FieldNums
    • 01

_0.tvf


0000000 00 00 00 04 02 03 00 01 61 01 00 00 01 00 01 62
0000020 01 01 02 01
0000024
  • TVFVersion
    • 00 00 00 04
    • TermVectorsReader.FORMAT_UTF8_LENGTH_IN_BYTES(4)
  • NumTerms
    • 02
  • Position/Offset
    • 03
  • TermFreqs
    • TermText
      • 00 01 61
    • TermFreq
      • 01
    • Positions
      • 00
    • Offsets<startOffset, endOffset>
      • 00 01
    • TermText
      • 00 01 62
    • TermFreq
      • 01
    • Positions
      • 01
    • Offsets<startOffset, endOffset>
      • 02 01

_0.tvx



0000000 00 00 00 04 00 00 00 00 00 00 00 04 00 00 00 00
0000020 00 00 00 04
0000024
  • TVXVersion
    • 00 00 00 04
    • TermVectorsReader.FORMAT_UTF8_LENGTH_IN_BYTES(4)
  • DocumentPosition
    • 00 00 00 00 00 00 00 04
  • FieldPosition
    • 00 00 00 00 00 00 00 04

メンバーのみ編集できます