Storage Infrastructure Behind Facebook Messages Using HBase at Scale

Why picked Hbase

Need High Write Throughput

因為資料會不斷寫入，需要對write做最佳化。

Easy to scale

Operation的難度要低，可以簡單地把disk volume和computation capacity加上去。

Cost Effective

只要用一般的硬體(commodity hardware)，而不用專門設計的硬體。

Strong Consistency

Strong Consistency對developer還是比較直覺。

Experience with HDFS

團隊有HDFS的專家。 Í

Pre-flight preparations

本文發表的時間大概在2012左右，那時候HBase一定沒現在的功能齊全，所以FB在改用HBase之前，自己加了一些需要的功能。

Correctness Fixes

那時候的HBase不保證row的atomicty，FB修正了這個問題。

Block Placement Policy

原本的policy會導致data分布在整個cluster，FB更動policy讓block只會在指定的node上，減少crash的pattern。

Availability

FB讓HBase支援rolling update，減少production down time。

Performance

為了避免LSM-Tree read performance不好的問題，FB在HFlie level做了幾個hack:

Bllom filters on keys: 避免不存在的key要找完所有的segment才發現。
Timerange hints: 每個HFile有time range的標記，可以只找一個file。
Seek optimizations

Shadow Testing

FB將資料同時也在HBase存一份，直接用production的data來做performance test。

Mid-air refueling

這邊是FB提出他們在HBase上線以後，發現並改善的問題。

HFile V2

HBase的index原本是全部存在一份HFile裡面，FB把這個做法改成hierarchy，並且每個HFile都有bloom filter來判斷key是否存在，這樣做的好處是block cache可以把相近的資料cache再一起。

Compactions

原本的compaction會把三份差不多大小的file集合成一個，但這樣會造成propagation，導致同樣的資料不斷地複製寫入，所以FB加入sum來判斷資料是否已經寫過。

然後HBase只用一個thread來做compact，如果碰到major compaction的話，會卡住其他minor compaction，因此FB用multi-threaded來處理這件事。

Storage Infrastructure Behind Facebook Messages Using HBase at Scale

Why picked Hbase​

Need High Write Throughput​

Easy to scale​

Cost Effective​

Strong Consistency​

Experience with HDFS​

Pre-flight preparations​

Correctness Fixes​

Block Placement Policy​

Availability​

Performance​

Shadow Testing​

Mid-air refueling​

HFile V2​

Compactions​

References​