kafka.log

LogCleaner

class LogCleaner extends Logging with KafkaMetricsGroup

The cleaner is responsible for removing obsolete records from logs which have the dedupe retention strategy. A message with key K and offset O is obsolete if there exists a message with key K and offset O' such that O < O'.

Each log can be thought of being split into two sections of segments: a "clean" section which has previously been cleaned followed by a "dirty" section that has not yet been cleaned. The active log segment is always excluded from cleaning.

The cleaning is carried out by a pool of background threads. Each thread chooses the dirtiest log that has the "dedupe" retention policy and cleans that. The dirtiness of the log is guessed by taking the ratio of bytes in the dirty section of the log to the total bytes in the log.

To clean a log the cleaner first builds a mapping of key=>last_offset for the dirty section of the log. See kafka.log.OffsetMap for details of the implementation of the mapping.

Once the key=>offset map is built, the log is cleaned by recopying each log segment but omitting any key that appears in the offset map with a higher offset than what is found in the segment (i.e. messages with a key that appears in the dirty section of the log).

To avoid segments shrinking to very small sizes with repeated cleanings we implement a rule by which if we will merge successive segments when doing a cleaning if their log and index size are less than the maximum log and index size prior to the clean beginning.

Cleaned segments are swapped into the log as they become available.

One nuance that the cleaner must handle is log truncation. If a log is truncated while it is being cleaned the cleaning of that log is aborted.

Messages with null payload are treated as deletes for the purpose of log compaction. This means that they receive special treatment by the cleaner. The cleaner will only retain delete records for a period of time to avoid accumulating space indefinitely. This period of time is configurable on a per-topic basis and is measured from the time the segment enters the clean portion of the log (at which point any prior message with that key has been removed). Delete markers in the clean section of the log that are older than this time will not be retained when log segments are being recopied as part of cleaning.

go to: companion
Inherited
  1. Hide All
  2. Show all
  1. KafkaMetricsGroup
  2. Logging
  3. AnyRef
  4. Any
Visibility
  1. Public
  2. All

Instance constructors

  1. new LogCleaner(config: CleanerConfig, logDirs: Array[File], logs: Pool[TopicAndPartition, Log], time: Time = kafka.utils.SystemTime)

Value Members

  1. def !=(arg0: AnyRef): Boolean

    attributes: final
    definition classes: AnyRef
  2. def !=(arg0: Any): Boolean

    o != arg0 is the same as !(o == (arg0)).

    o != arg0 is the same as !(o == (arg0)).

    arg0

    the object to compare against this object for dis-equality.

    returns

    false if the receiver object is equivalent to the argument; true otherwise.

    attributes: final
    definition classes: Any
  3. def ##(): Int

    attributes: final
    definition classes: AnyRef → Any
  4. def $asInstanceOf[T0](): T0

    attributes: final
    definition classes: AnyRef
  5. def $isInstanceOf[T0](): Boolean

    attributes: final
    definition classes: AnyRef
  6. def ==(arg0: AnyRef): Boolean

    o == arg0 is the same as if (o eq null) arg0 eq null else o.equals(arg0).

    o == arg0 is the same as if (o eq null) arg0 eq null else o.equals(arg0).

    arg0

    the object to compare against this object for equality.

    returns

    true if the receiver object is equivalent to the argument; false otherwise.

    attributes: final
    definition classes: AnyRef
  7. def ==(arg0: Any): Boolean

    o == arg0 is the same as o.equals(arg0).

    o == arg0 is the same as o.equals(arg0).

    arg0

    the object to compare against this object for equality.

    returns

    true if the receiver object is equivalent to the argument; false otherwise.

    attributes: final
    definition classes: Any
  8. def abortAndPauseCleaning(topicAndPartition: TopicAndPartition): Unit

    Abort the cleaning of a particular partition if it's in progress, and pause any future cleaning of this partition.

    Abort the cleaning of a particular partition if it's in progress, and pause any future cleaning of this partition. This call blocks until the cleaning of the partition is aborted and paused.

  9. def abortCleaning(topicAndPartition: TopicAndPartition): Unit

    Abort the cleaning of a particular partition, if it's in progress.

    Abort the cleaning of a particular partition, if it's in progress. This call blocks until the cleaning of the partition is aborted.

  10. def asInstanceOf[T0]: T0

    This method is used to cast the receiver object to be of type T0.

    This method is used to cast the receiver object to be of type T0.

    Note that the success of a cast at runtime is modulo Scala's erasure semantics. Therefore the expression1.asInstanceOf[String] will throw a ClassCastException at runtime, while the expressionList(1).asInstanceOf[List[String]] will not. In the latter example, because the type argument is erased as part of compilation it is not possible to check whether the contents of the list are of the requested typed.

    returns

    the receiver object.

    attributes: final
    definition classes: Any
  11. def awaitCleaned(topic: String, part: Int, offset: Long, timeout: Long = 30000L): Unit

    TODO: For testing, a way to know when work has completed.

    TODO: For testing, a way to know when work has completed. This method blocks until the cleaner has processed up to the given offset on the specified topic/partition

  12. def clone(): AnyRef

    This method creates and returns a copy of the receiver object.

    This method creates and returns a copy of the receiver object.

    The default implementation of the clone method is platform dependent.

    returns

    a copy of the receiver object.

    attributes: protected
    definition classes: AnyRef
  13. val config: CleanerConfig

  14. def debug(msg: ⇒ String, e: ⇒ Throwable): Unit

    definition classes: Logging
  15. def debug(e: ⇒ Throwable): Any

    definition classes: Logging
  16. def debug(msg: ⇒ String): Unit

    definition classes: Logging
  17. def eq(arg0: AnyRef): Boolean

    This method is used to test whether the argument (arg0) is a reference to the receiver object (this).

    This method is used to test whether the argument (arg0) is a reference to the receiver object (this).

    The eq method implements an [http://en.wikipedia.org/wiki/Equivalence_relation equivalence relation] on non-null instances of AnyRef: * It is reflexive: for any non-null instance x of type AnyRef, x.eq(x) returns true. * It is symmetric: for any non-null instances x and y of type AnyRef, x.eq(y) returns true if and only if y.eq(x) returns true. * It is transitive: for any non-null instances x, y, and z of type AnyRef if x.eq(y) returns true and y.eq(z) returns true, then x.eq(z) returns true.

    Additionally, the eq method has three other properties. * It is consistent: for any non-null instances x and y of type AnyRef, multiple invocations of x.eq(y) consistently returns true or consistently returns false. * For any non-null instance x of type AnyRef, x.eq(null) and null.eq(x) returns false. * null.eq(null) returns true.

    When overriding the equals or hashCode methods, it is important to ensure that their behavior is consistent with reference equality. Therefore, if two objects are references to each other (o1 eq o2), they should be equal to each other (o1 == o2) and they should hash to the same value (o1.hashCode == o2.hashCode).

    arg0

    the object to compare against this object for reference equality.

    returns

    true if the argument is a reference to the receiver object; false otherwise.

    attributes: final
    definition classes: AnyRef
  18. def equals(arg0: Any): Boolean

    This method is used to compare the receiver object (this) with the argument object (arg0) for equivalence.

    This method is used to compare the receiver object (this) with the argument object (arg0) for equivalence.

    The default implementations of this method is an [http://en.wikipedia.org/wiki/Equivalence_relation equivalence relation]: * It is reflexive: for any instance x of type Any, x.equals(x) should return true. * It is symmetric: for any instances x and y of type Any, x.equals(y) should return true if and only if y.equals(x) returns true. * It is transitive: for any instances x, y, and z of type AnyRef if x.equals(y) returns true and y.equals(z) returns true, then x.equals(z) should return true.

    If you override this method, you should verify that your implementation remains an equivalence relation. Additionally, when overriding this method it is often necessary to override hashCode to ensure that objects that are "equal" (o1.equals(o2) returns true) hash to the same scala.Int (o1.hashCode.equals(o2.hashCode)).

    arg0

    the object to compare against this object for equality.

    returns

    true if the receiver object is equivalent to the argument; false otherwise.

    definition classes: AnyRef → Any
  19. def error(msg: ⇒ String, e: ⇒ Throwable): Unit

    definition classes: Logging
  20. def error(e: ⇒ Throwable): Any

    definition classes: Logging
  21. def error(msg: ⇒ String): Unit

    definition classes: Logging
  22. def fatal(msg: ⇒ String, e: ⇒ Throwable): Unit

    definition classes: Logging
  23. def fatal(e: ⇒ Throwable): Any

    definition classes: Logging
  24. def fatal(msg: ⇒ String): Unit

    definition classes: Logging
  25. def finalize(): Unit

    This method is called by the garbage collector on the receiver object when garbage collection determines that there are no more references to the object.

    This method is called by the garbage collector on the receiver object when garbage collection determines that there are no more references to the object.

    The details of when and if the finalize method are invoked, as well as the interaction between finalizeand non-local returns and exceptions, are all platform dependent.

    attributes: protected
    definition classes: AnyRef
  26. def getClass(): java.lang.Class[_]

    Returns a representation that corresponds to the dynamic class of the receiver object.

    Returns a representation that corresponds to the dynamic class of the receiver object.

    The nature of the representation is platform dependent.

    returns

    a representation that corresponds to the dynamic class of the receiver object.

    attributes: final
    definition classes: AnyRef
  27. def hashCode(): Int

    Returns a hash code value for the object.

    Returns a hash code value for the object.

    The default hashing algorithm is platform dependent.

    Note that it is allowed for two objects to have identical hash codes (o1.hashCode.equals(o2.hashCode)) yet not be equal (o1.equals(o2) returns false). A degenerate implementation could always return 0. However, it is required that if two objects are equal (o1.equals(o2) returns true) that they have identical hash codes (o1.hashCode.equals(o2.hashCode)). Therefore, when overriding this method, be sure to verify that the behavior is consistent with the equals method.

    returns

    the hash code value for the object.

    definition classes: AnyRef → Any
  28. def info(msg: ⇒ String, e: ⇒ Throwable): Unit

    definition classes: Logging
  29. def info(e: ⇒ Throwable): Any

    definition classes: Logging
  30. def info(msg: ⇒ String): Unit

    definition classes: Logging
  31. def isInstanceOf[T0]: Boolean

    This method is used to test whether the dynamic type of the receiver object is T0.

    This method is used to test whether the dynamic type of the receiver object is T0.

    Note that the test result of the test is modulo Scala's erasure semantics. Therefore the expression1.isInstanceOf[String] will return false, while the expression List(1).isInstanceOf[List[String]] will return true. In the latter example, because the type argument is erased as part of compilation it is not possible to check whether the contents of the list are of the requested typed.

    returns

    true if the receiver object is an instance of erasure of type T0; false otherwise.

    attributes: final
    definition classes: Any
  32. val logDirs: Array[File]

  33. var logIdent: String

    attributes: protected
    definition classes: Logging
  34. lazy val logger: Logger

    definition classes: Logging
  35. val loggerName: String

    definition classes: Logging
  36. val logs: Pool[TopicAndPartition, Log]

  37. def ne(arg0: AnyRef): Boolean

    o.ne(arg0) is the same as !(o.eq(arg0)).

    o.ne(arg0) is the same as !(o.eq(arg0)).

    arg0

    the object to compare against this object for reference dis-equality.

    returns

    false if the argument is not a reference to the receiver object; true otherwise.

    attributes: final
    definition classes: AnyRef
  38. def newGauge[T](name: String, metric: Gauge[T]): Gauge[T]

    definition classes: KafkaMetricsGroup
  39. def newHistogram(name: String, biased: Boolean = true): Histogram

    definition classes: KafkaMetricsGroup
  40. def newMeter(name: String, eventType: String, timeUnit: TimeUnit): Meter

    definition classes: KafkaMetricsGroup
  41. def newTimer(name: String, durationUnit: TimeUnit, rateUnit: TimeUnit): Timer

    definition classes: KafkaMetricsGroup
  42. def notify(): Unit

    Wakes up a single thread that is waiting on the receiver object's monitor.

    Wakes up a single thread that is waiting on the receiver object's monitor.

    attributes: final
    definition classes: AnyRef
  43. def notifyAll(): Unit

    Wakes up all threads that are waiting on the receiver object's monitor.

    Wakes up all threads that are waiting on the receiver object's monitor.

    attributes: final
    definition classes: AnyRef
  44. def resumeCleaning(topicAndPartition: TopicAndPartition): Unit

    Resume the cleaning of a paused partition.

    Resume the cleaning of a paused partition. This call blocks until the cleaning of a partition is resumed.

  45. def shutdown(): Unit

    Stop the background cleaning

    Stop the background cleaning

  46. def startup(): Unit

    Start the background cleaning

    Start the background cleaning

  47. def swallow(action: ⇒ Unit): Unit

    definition classes: Logging
  48. def swallowDebug(action: ⇒ Unit): Unit

    definition classes: Logging
  49. def swallowError(action: ⇒ Unit): Unit

    definition classes: Logging
  50. def swallowInfo(action: ⇒ Unit): Unit

    definition classes: Logging
  51. def swallowTrace(action: ⇒ Unit): Unit

    definition classes: Logging
  52. def swallowWarn(action: ⇒ Unit): Unit

    definition classes: Logging
  53. def synchronized[T0](arg0: T0): T0

    attributes: final
    definition classes: AnyRef
  54. def toString(): String

    Returns a string representation of the object.

    Returns a string representation of the object.

    The default representation is platform dependent.

    returns

    a string representation of the object.

    definition classes: AnyRef → Any
  55. def trace(msg: ⇒ String, e: ⇒ Throwable): Unit

    definition classes: Logging
  56. def trace(e: ⇒ Throwable): Any

    definition classes: Logging
  57. def trace(msg: ⇒ String): Unit

    definition classes: Logging
  58. def wait(): Unit

    attributes: final
    definition classes: AnyRef
  59. def wait(arg0: Long, arg1: Int): Unit

    attributes: final
    definition classes: AnyRef
  60. def wait(arg0: Long): Unit

    attributes: final
    definition classes: AnyRef
  61. def warn(msg: ⇒ String, e: ⇒ Throwable): Unit

    definition classes: Logging
  62. def warn(e: ⇒ Throwable): Any

    definition classes: Logging
  63. def warn(msg: ⇒ String): Unit

    definition classes: Logging