kafka

log

package log

Visibility
  1. Public
  2. All

Type Members

  1. case class CleanerConfig(numThreads: Int = 1, dedupeBufferSize: Long = 4*1024*1024L, dedupeBufferLoadFactor: Double = 0.9d, ioBufferSize: Int = 1024*1024, maxMessageSize: Int = 32*1024*1024, maxIoBytesPerSecond: Double = Double.MaxValue, backOffMs: Long = 15 * 1000, enableCleaner: Boolean = true, hashAlgorithm: String = "MD5") extends Product with Serializable

    Configuration parameters for the log cleaner

    Configuration parameters for the log cleaner

    numThreads

    The number of cleaner threads to run

    dedupeBufferSize

    The total memory used for log deduplication

    dedupeBufferLoadFactor

    The maximum percent full for the deduplication buffer

    maxMessageSize

    The maximum size of a message that can appear in the log

    maxIoBytesPerSecond

    The maximum read and write I/O that all cleaner threads are allowed to do

    backOffMs

    The amount of time to wait before rechecking if no logs are eligible for cleaning

    enableCleaner

    Allows completely disabling the log cleaner

    hashAlgorithm

    The hash algorithm to use in key comparison.

  2. class FileMessageSet extends MessageSet with Logging

    An on-disk message set.

    An on-disk message set. An optional start and end position can be applied to the message set which will allow slicing a subset of the file.

    Annotations
    @nonthreadsafe()
  3. class Log extends Logging with KafkaMetricsGroup

    An append-only log for storing messages.

    An append-only log for storing messages.

    The log is a sequence of LogSegments, each with a base offset denoting the first message in the segment.

    New log segments are created according to a configurable policy that controls the size in bytes or time interval for a given segment.

    Annotations
    @threadsafe()
  4. class LogCleaner extends Logging with KafkaMetricsGroup

    The cleaner is responsible for removing obsolete records from logs which have the dedupe retention strategy.

    The cleaner is responsible for removing obsolete records from logs which have the dedupe retention strategy. A message with key K and offset O is obsolete if there exists a message with key K and offset O' such that O < O'.

    Each log can be thought of being split into two sections of segments: a "clean" section which has previously been cleaned followed by a "dirty" section that has not yet been cleaned. The active log segment is always excluded from cleaning.

    The cleaning is carried out by a pool of background threads. Each thread chooses the dirtiest log that has the "dedupe" retention policy and cleans that. The dirtiness of the log is guessed by taking the ratio of bytes in the dirty section of the log to the total bytes in the log.

    To clean a log the cleaner first builds a mapping of key=>last_offset for the dirty section of the log. See kafka.log.OffsetMap for details of the implementation of the mapping.

    Once the key=>offset map is built, the log is cleaned by recopying each log segment but omitting any key that appears in the offset map with a higher offset than what is found in the segment (i.e. messages with a key that appears in the dirty section of the log).

    To avoid segments shrinking to very small sizes with repeated cleanings we implement a rule by which if we will merge successive segments when doing a cleaning if their log and index size are less than the maximum log and index size prior to the clean beginning.

    Cleaned segments are swapped into the log as they become available.

    One nuance that the cleaner must handle is log truncation. If a log is truncated while it is being cleaned the cleaning of that log is aborted.

    Messages with null payload are treated as deletes for the purpose of log compaction. This means that they receive special treatment by the cleaner. The cleaner will only retain delete records for a period of time to avoid accumulating space indefinitely. This period of time is configurable on a per-topic basis and is measured from the time the segment enters the clean portion of the log (at which point any prior message with that key has been removed). Delete markers in the clean section of the log that are older than this time will not be retained when log segments are being recopied as part of cleaning.

  5. case class LogConfig(segmentSize: Int = Defaults.SegmentSize, segmentMs: Long = Defaults.SegmentMs, segmentJitterMs: Long = Defaults.SegmentJitterMs, flushInterval: Long = Defaults.FlushInterval, flushMs: Long = Defaults.FlushMs, retentionSize: Long = Defaults.RetentionSize, retentionMs: Long = Defaults.RetentionMs, maxMessageSize: Int = Defaults.MaxMessageSize, maxIndexSize: Int = Defaults.MaxIndexSize, indexInterval: Int = Defaults.IndexInterval, fileDeleteDelayMs: Long = Defaults.FileDeleteDelayMs, deleteRetentionMs: Long = Defaults.DeleteRetentionMs, minCleanableRatio: Double = Defaults.MinCleanableDirtyRatio, compact: Boolean = Defaults.Compact, uncleanLeaderElectionEnable: Boolean = ..., minInSyncReplicas: Int = Defaults.MinInSyncReplicas) extends Product with Serializable

    Configuration settings for a log

    Configuration settings for a log

    segmentSize

    The soft maximum for the size of a segment file in the log

    segmentMs

    The soft maximum on the amount of time before a new log segment is rolled

    segmentJitterMs

    The maximum random jitter subtracted from segmentMs to avoid thundering herds of segment rolling

    flushInterval

    The number of messages that can be written to the log before a flush is forced

    flushMs

    The amount of time the log can have dirty data before a flush is forced

    retentionSize

    The approximate total number of bytes this log can use

    retentionMs

    The age approximate maximum age of the last segment that is retained

    maxIndexSize

    The maximum size of an index file

    indexInterval

    The approximate number of bytes between index entries

    fileDeleteDelayMs

    The time to wait before deleting a file from the filesystem

    deleteRetentionMs

    The time to retain delete markers in the log. Only applicable for logs that are being compacted.

    minCleanableRatio

    The ratio of bytes that are available for cleaning to the bytes already cleaned

    compact

    Should old segments in this log be deleted or deduplicated?

    uncleanLeaderElectionEnable

    Indicates whether unclean leader election is enabled; actually a controller-level property but included here for topic-specific configuration validation purposes

    minInSyncReplicas

    If number of insync replicas drops below this number, we stop accepting writes with -1 (or all) required acks

  6. class LogManager extends Logging

    The entry point to the kafka log management subsystem.

    The entry point to the kafka log management subsystem. The log manager is responsible for log creation, retrieval, and cleaning. All read and write operations are delegated to the individual log instances.

    The log manager maintains logs in one or more directories. New logs are created in the data directory with the fewest logs. No attempt is made to move partitions after the fact or balance based on size or I/O rate.

    A background thread handles log retention by periodically truncating excess log segments.

    Annotations
    @threadsafe()
  7. class LogSegment extends Logging

    A segment of the log.

    A segment of the log. Each segment has two components: a log and an index. The log is a FileMessageSet containing the actual messages. The index is an OffsetIndex that maps from logical offsets to physical file positions. Each segment has a base offset which is an offset <= the least offset of any message in this segment and > any offset in any previous segment.

    A segment with a base offset of [base_offset] would be stored in two files, a [base_offset].index and a [base_offset].log file.

    Annotations
    @nonthreadsafe()
  8. class OffsetIndex extends Logging

    An index that maps offsets to physical file locations for a particular log segment.

    An index that maps offsets to physical file locations for a particular log segment. This index may be sparse: that is it may not hold an entry for all messages in the log.

    The index is stored in a file that is pre-allocated to hold a fixed maximum number of 8-byte entries.

    The index supports lookups against a memory-map of this file. These lookups are done using a simple binary search variant to locate the offset/location pair for the greatest offset less than or equal to the target offset.

    Index files can be opened in two ways: either as an empty, mutable index that allows appends or an immutable read-only index file that has previously been populated. The makeReadOnly method will turn a mutable file into an immutable one and truncate off any extra bytes. This is done when the index file is rolled over.

    No attempt is made to checksum the contents of this file, in the event of a crash it is rebuilt.

    The file format is a series of entries. The physical format is a 4 byte "relative" offset and a 4 byte file location for the message with that offset. The offset stored is relative to the base offset of the index file. So, for example, if the base offset was 50, then the offset 55 would be stored as 5. Using relative offsets in this way let's us use only 4 bytes for the offset.

    The frequency of entries is up to the user of this class.

    All external APIs translate from relative offsets to full offsets, so users of this class do not interact with the internal storage format.

  9. trait OffsetMap extends AnyRef

  10. case class OffsetPosition(offset: Long, position: Int) extends Product with Serializable

    The mapping between a logical log offset and the physical position in some log file of the beginning of the message set entry with the given offset.

  11. class SkimpyOffsetMap extends OffsetMap

    An hash table used for deduplicating the log.

    An hash table used for deduplicating the log. This hash table uses a cryptographicly secure hash of the key as a proxy for the key for comparisons and to save space on object overhead. Collisions are resolved by probing. This hash table does not support deletes.

    Annotations
    @nonthreadsafe()

Value Members

  1. object Defaults

  2. object Log

    Helper functions for logs

  3. object LogConfig extends Serializable

  4. object LogFlushStats extends KafkaMetricsGroup

Ungrouped