Solr 3.5 のexampleディレクトリの以下で

% java -jar start.jar

した際の index fileの形式を見る


Primitive Types


以下のクラスでシリアライズ, デシリアライズ
  • org.apache.lucene.store.DataInput
  • org.apache.lucene.store.DataOutput

なにもインデックスしていない場合

ファイル

  • segments.gen
  • segments_1

segments.gen


This file simply contains an Int32 version header (SegmentInfos.FORMAT_LOCKLESS = -2), followed by the generation recorded as Int64, written twice.

0000000 ff ff ff fe 00 00 00 00 00 00 00 01 00 00 00 00
0000020 00 00 00 01

segments_1


0000000 ff ff ff f5 00 00 01 34 8c da c6 5d 00 00 00 00
0000020 00 00 00 00 00 00 00 00 00 00 00 00 b3 01 9e 54
0000040
  • Format
    • ff ff ff f5(-11)
      • org.apache.lucene.index.SegmentInfos.FORMAT_3_1
  • Version
    • 00 00 01 34 8c da c6 5d(1325213075037)
      • ミリ秒単位のunit time
  • NameCounter
    • 00 00 00 00
  • SegCount
    • 00 00 00 00
  • CommitUserData
    • 00 00 00 00
      • Map<String,String> --> Count <String,String>^Count, CountはInt32
  • Checksum
    • 00 00 00 00 00 b3 01 9e 54



1件update&commit


curl http://localhost:8983/solr/update?commit=true -H "Content-Type: text/xml" --data-binary '<add><doc><field name="id">testdoc</field></doc></add>'

参考: schema.xml


<field name="id" type="string" indexed="true" stored="true" required="true" />

ファイル

  • segments.gen
  • segments_2
  • _0.fdt
  • _0.fdx
  • _0.fnm
  • _0.frq
  • _0.nrm
  • _0.tii
  • _0.tis

segments.gen


0000000 ff ff ff fe 00 00 00 00 00 00 00 02 00 00 00 00
0000020 00 00 00 02

segments_2


0000000 ff ff ff f5 00 00 01 34 8c da c6 5f 00 00 00 01
0000020 00 00 00 01 03 33 2e 35 02 5f 30 00 00 00 01 ff
0000040 ff ff ff ff ff ff ff ff ff ff ff 01 ff ff ff ff
0000060 ff 00 00 00 00 00 00 00 00 07 02 6f 73 05 4c 69
0000100 6e 75 78 0a 6f 73 2e 76 65 72 73 69 6f 6e 0f 33
0000120 2e 32 2e 30 2d 72 63 34 2d 61 6d 64 36 34 06 73
0000140 6f 75 72 63 65 05 66 6c 75 73 68 0e 6c 75 63 65
0000160 6e 65 2e 76 65 72 73 69 6f 6e 2b 33 2e 35 2e 30
0000200 20 31 32 30 34 39 38 38 20 2d 20 73 69 6d 6f 6e
0000220 20 2d 20 32 30 31 31 2d 31 31 2d 32 32 20 31 34
0000240 3a 34 36 3a 35 31 07 6f 73 2e 61 72 63 68 05 61
0000260 6d 64 36 34 0c 6a 61 76 61 2e 76 65 72 73 69 6f
0000300 6e 11 31 2e 37 2e 30 5f 31 34 37 2d 69 63 65 64
0000320 74 65 61 0b 6a 61 76 61 2e 76 65 6e 64 6f 72 12
0000340 4f 72 61 63 6c 65 20 43 6f 72 70 6f 72 61 74 69
0000360 6f 6e 00 00 00 00 00 00 00 00 00 45 07 16 63
0000377
  • Format
    • ff ff ff f5(-11)
      • org.apache.lucene.index.SegmentInfos.FORMAT_3_1
  • Version
    • 00 00 01 34 8c da c6 5f
  • NameCounter
    • 00 00 00 01
  • SegCount
    • 00 00 00 01
  • SegVersion
    • 03 33 2e 35
      • "3.5"
  • SegName
    • 02 5f 30
      • "_0"
  • SegSize
    • 00 00 00 01
      • 1つの文書
  • DelGen
    • ff ff ff ff ff ff ff ff
      • DelGen is the generation count of the separate deletes file. If this is -1, there are no separate deletes.
  • DocStoreOffset
    • ff ff ff ff
      • If DocStoreOffset is -1, this segment has its own doc store (stored fields values and term vectors) files and DocStoreSegment and DocStoreIsCompoundFile are not stored.
  • HasSingleNormFile
    • 01
      • If HasSingleNormFile is 1, then the field norms are written as a single joined file (with extension .nrm)
  • NumField
    • ff ff ff ff
      • NumField is the size of the array for NormGen, or -1 if there are no NormGens stored.
  • IsCompoundFile
    • ff
      • If this is -1, the segment is not a compound file.
  • DeletionCount
    • 00 00 00 00
  • HasProx
    • 00
      • HasProx is 1 if any fields in this segment have position data (IndexOptions.DOCS_AND_FREQS_AND_POSITIONS); else, it's 0.
  • Diagnostics
    • 00 00 00 07 02 6f 73 05 4c 69 6e 75 78 0a 6f 73 2e 76 65 72 73 69 6f 6e 0f 33 2e 32 2e 30 2d 72 63 34 2d 61 6d 64 36 34 ... 6f 6e
      • OS, OSのversion などなど
  • HasVectors
    • 00
      • HasVectors is 1 if this segment stores term vectors, else it's 0.
  • CommitUserData
    • 00 00 00 00
      • Map<String,String> --> Count <String,String>^Count, CountはInt32
  • Checksum
    • 00 00 00 00 00 45 07 16 63

_0.fdt


This contains the stored fields of each document,

0000000 00 00 00 03 01 00 00 07 74 65 73 74 64 6f 63
0000017
  • Format
    • 00 00 00 03
      • org.apache.lucene.index.FieldsWriter.FORMAT_LUCENE_3_2_NUMERIC_FIELDS
  • FieldCount
    • 01
  • FieldNum
    • 00
  • Bits
    • 00
  • Value
    • 07 74 65 73 74 64 6f 63
      • "testdoc"

_0.fdx


0000000 00 00 00 03 00 00 00 00 00 00 00 04
0000014
  • Format
    • 00 00 00 03
      • org.apache.lucene.index.FieldsWriter.FORMAT_LUCENE_3_2_NUMERIC_FIELDS
  • FieldValuesPosition
    • 00 00 00 00 00 00 00 04

_0.fnm


0000000 fd ff ff ff 0f 01 02 69 64 51
0000012
  • FNMVersion
    • fd ff ff ff 0f
      • VIntで -3
  • FieldsCount
    • 01
  • FieldName
    • 02 69 64
      • "id"
  • FieldBits
    • 51 (01010001)

_0.frq


0000000 00
0000001

  • TermFreq(DocDelta)
    • 00

_0.nrm


0000000 4e 52 4d ff
0000004
  • NormHeader
    • 4e 52 4d ff
      • 'N','R','M',Version
  • Version
    • ff
      • currently -1.

_0.tii


0000000 ff ff ff fc 00 00 00 00 00 00 00 01 00 00 00 80
0000020 00 00 00 10 00 00 00 0a 00 00 ff ff ff ff 0f 00
0000040 00 00 18
0000043
  • TIVersion
    • ff ff ff fc
      • org.apache.lucene.index.TermInfosWriter.FORMAT_VERSION_UTF8_LENGTH_IN_BYTES
  • TermCount
    • 00 00 00 00 00 00 00 01
  • IndexInterval
    • 00 00 00 80
  • SkipInterval
    • 00 00 00 11
  • MaxSkipLevels
    • 00 00 00 0a
  • TermInfo(Term<PrefixLength, Suffix, FieldNum>, DocFreq, FreqDelta, ProxDelta, [SkipDelta])
    • 00 00 ff ff ff ff 0f 00 00 00
      • FieldNum: ff ff ff ff 0f (-1), DocFreq: 00, SkipDeltaなし
  • IndexDelta
    • 18
      • tisの 18 + 4(format分) byte目からTermInfo情報がある.

_0.tis



0000000 ff ff ff fc 00 00 00 00 00 00 00 01 00 00 00 80
0000020 00 00 00 10 00 00 00 0a 00 07 74 65 73 74 64 6f
0000040 63 00 01 00 00
0000045
  • TIVersion
    • ff ff ff fc
      • org.apache.lucene.index.TermInfosWriter.FORMAT_VERSION_UTF8_LENGTH_IN_BYTES
  • IndexTermCount
    • 00 00 00 00 00 00 00 01
  • IndexInterval
    • 00 00 00 80
  • SkipInterval
    • 00 00 00 10
  • MaxSkipLevels(ここまでtiiと同じ)
    • 00 00 00 0a
  • TermInfo(Term<PrefixLength, Suffix, FieldNum>, DocFreq, FreqDelta, ProxDelta, [SkipDelta])
    • 00 07 74 65 73 74 64 6f 63 00 01 00 00
      • Suffix : 07 74 65 73 74 64 6f 63 (testdoc), FieldNum: 0, DocFreq: 1, SkipDeltaなし

メンバーのみ編集できます