Although at a conceptual level tables may be viewed as a sparse set of rows. Physically
they are stored on a per-column family basis. New columns (i.e.,
columnfamily:column
) can be added to any column family without
pre-announcing them.
Table 5.2. ColumnFamily anchor
Row Key | Time Stamp | Column Family anchor |
---|---|---|
"com.cnn.www" | t9 | anchor:cnnsi.com = "CNN" |
"com.cnn.www" | t8 | anchor:my.look.ca = "CNN.com" |
Table 5.3. ColumnFamily contents
Row Key | Time Stamp | ColumnFamily "contents:" |
---|---|---|
"com.cnn.www" | t6 | contents:html = "<html>..." |
"com.cnn.www" | t5 | contents:html = "<html>..." |
"com.cnn.www" | t3 | contents:html = "<html>..." |
It is important to note in the diagram above that the empty cells shown in the
conceptual view are not stored since they need not be in a column-oriented storage format.
Thus a request for the value of the contents:html
column at time stamp
t8
would return no value. Similarly, a request for an
anchor:my.look.ca
value at time stamp t9
would
return no value. However, if no timestamp is supplied, the most recent value for a
particular column would be returned and would also be the first one found since timestamps
are stored in descending order. Thus a request for the values of all columns in the row
com.cnn.www
if no timestamp is specified would be: the value of
contents:html
from time stamp t6
, the value of
anchor:cnnsi.com
from time stamp t9
, the value of
anchor:my.look.ca
from time stamp t8
.
For more information about the internals of how Apache HBase stores data, see Section 9.7, “Regions”.