Practical Uses of Synchronized Clocks in Distributed Systems

TL;DR

synchronized clock是基於機率的演算法，演算法基於synchronized clock的設計無法保證correctness，但performance的degradation通常是可接受的，因此可以考慮用synchronized clock來改善performance，但不要依賴synchronized clock來保證正確性。

Motivation

在NTP，synchronized clock在分散式系統中被廣泛地使用，因此作者搜集了一些用synchronized clock來設計的distributed algorithms。

Note: 這篇論文是1991年的paper，裡面的系統在現今不一定還存在，或是有更新的作法。

Syncrhonzied Clock

要注意的是，synchronzied clock並不是指系統中可以存在一組絕對的時間，而是有高機率系統中存在一個時間，每個process對此時間的誤差不超過 $ε/2$ (有些process快，有些process慢)，而任兩者的誤差不超過 $ε$ 。

因為還是存在某個process落後/超前其他process很多的情況，所以使用synchronized clock的演算法無法保證correctness，因為必定存在process之間誤差超過 $ε/2$ 的情況。

但是synchronzied clock很適合用在改善performance的演算法，因為即使time skew了，通常我們也可以容忍在極端狀況下performance degradation的情況。

At-most-once Messages

要實作At-most-once演算法，receiver要maintain一個table紀錄active的connection。sender在送message的時候，會在message上加上sender的clock time，而receiver會給每個connection一個global unique id，並記錄最後一次收到從此connection收到訊息的時間。

如果message的timestamp低於最後一次從此connection收到訊息的時間，視為此message duplicated，reject此訊息。如果receiver的connection table不存在此connectin訊息(receiver可以隨時清掉connection table的record或可能被重啟)，receiver同時也會keep一個G.upper的值， $upper = G.time - p - ε$ ( $p$ 為message life time)，任何message timestamp低於 G.upper都是為duplicated。

同時，每個receiver也會在storage寫入G.latest，G.latest的值比目前所有收到的message timestamp的值都還要大，如果message的timestamp比G.latest還要大，就rejected或delayed。

當process重啟的時候，會從storage讀取G.latest，並且把G.upper的值設定為G.latest，然後reject所有可能duplicated的message。一般會設定 $G.latest = G.time + \beta$ ( $\beta$ 通常是數秒)，也就是當重啟以後，會reject一段時間的message保證不duplicated。

Note: At-most-once Messages保證系統不會重複處理message，但有可能會reject訊息不處理。

Authentication Tickets in Kerberos

Kerberos systems中，client與server要傳遞message之前，會先在ticket granting service (TGS)拿到一組session key，TGS有client的private key，在回傳ticket與session key的時候，會用client 的private key加密，所以只有client 可以解開。

ticket則是由server的private key來加密，因此只有只有server能解開，client 和 server則是共同知道session key，用session key來加解密訊息。

synchronized clock用來知道ticket什麼時候過期，相較於 $\epsilon$ ，過期的長度大很多，所以即使server的clock skew了，晚一點expire的代價通常也都是可以接受的。

類似At-most-once Messages的作法，如果不想重複處理message的話，client可以在message裡面加上authenticator，通常是client自己的clock time，server處理完以後會記住這個authenticator，必且之後碰到<= authenticator的message都丟掉。

這種作法保證了只要session key不要外流，攻擊者很難藉由重複使用authenticator來嘗試執行某些操作。

Cache Consistency

Write-through

當client 要cache某份檔案，它跟server要該檔案的lease，lease裡面包含cache的expiration time $E$ ，只要 $E > time(client) - \epsilon$ ，則該cache就不能在使用，必需要在拿到新的lease。

因為是write-through，如果檔案被修改了，則server會嘗試告知其他有該檔案cache的client，當得到所有持有該cache的client結束lease，或是等到所有的lease都過期，server就執行此修改。

Write-behind

跟 write-through稍微不同，在write-behind的機制，需要兩種lease：read lease和write lease，read lease只能讀cache，write lease才有權限改檔案。如果一個持有read lease的client想修改檔案，它必須取得write lease。

取得write lease的過程跟write-through的過程一樣，client像server要求write lease，server要求其他client放棄lease或等到所有lease都過期。

Atomicity

Atomicity蓋在cache consistency上面，它用transaction來包裝cache consistency。transaction是optimistic，每個object有自己的版本好，commit上去的時候才檢查有沒有conflict。

為了減少 transaction abort的頻率，可以利用cache consistency lease的機制，當持有此cache的其他client放棄lease或是所有lease都過期，才允許transaction commit。

Commit Windows

以下演算法是Harp File System的replication 演算法。Harp File System是linux上的file system，除了支援VFS (vitrual file system) interface，而且可以用在distributed network像是NFS。

所有的請求都由某個node處理，此node稱為primary，當一個write request來，primary 會在送給backup nodes，讓它們把此操作寫進log，primary和backups形成一組view，並且是majority，當有個node掛掉，會在從系統中挑一個node來形成新的view，保持view都是majority。

因為 read、write都由primary處理，如果舊的primary跟其他backup partiation，系統又組織一個新的view，如果client還是從舊的primary讀取資料，可能就會讀到stale data。

Harp 藉由讓 primary持有leases來確保資料可讀取，每當backup send message給primary，此message會包含一個lease，只要primary持有某個object的lease就可以回覆client read request。而當舊的primary掛掉，組成新的view，新的primary也必須要等到所有lease都過期才可以接受請求。

一般來說，lease的expriation長度大概都是幾秒，所以不至於讓整個系統等太久無法處理新的request。

Synchronized Clock vs Synchronized Rate

有某些使用synchronized clock的演算法，也可以用synchronized rate來處理，但是synchronized rate需要雙方先有一個共識，開始的基準點，並且在此基準點上假設雙方的rate是一致的(小於某個誤差值)。

也因為需要有個基準點，所以synchronized rate可能需要先有一個message來同步基準點，而synchronized clock不需要此基準點，而且有某些情境只能使用synchronized clock而無法使用synchronized rate，因此當synchronized clock出現以後，synchronized rate的作法會漸漸被syncrhonized clock的作法取代。

Practical Uses of Synchronized Clocks in Distributed Systems

TL;DR​

Motivation​

Syncrhonzied Clock​

At-most-once Messages​

Authentication Tickets in Kerberos​

Cache Consistency​

Write-through​

Write-behind​

Atomicity​

Commit Windows​

Synchronized Clock vs Synchronized Rate​