Mobile File Systems
Mobility and Distributed File-systems Examples of distributed file-systems – NFS (network file system), AFS (andrew file system), etc. Key problems in a wireless/mobile environment? Disconnections Low bandwidths 2017-12-22
File-systems … CODA – supports disconnected operation PFS – targets partially connected environments Bayou – support for data-sharing among mobile users others … 2017-12-22
CODA Three key components Hoarding Emulating Reintegrating Hoarding 2017-12-22
Hoarding Cache maintained locally at clients Hoarding – fills in cache when mobile client is connected Two types of hoarding Usage based User specified Cache management User driven LRU 2017-12-22
Emulating When mobile client is disconnected, coda enters the emulation mode All requests are served from the local cache change modify log (CML) maintained based on user-writes CML used later in conflict detection and resolution CML optimized periodically to reduce the size of the CML – delete unnecessary entries 2017-12-22
Reintegrating Transparent conflict resolution for directories Except under impossible scenarios – directory permission changes, add/deletion of same directories … File conflict resolution ASR – application specific resolvers Possess information about nature of files (say calendars) – resolve based on application semantics Manual repair User provided a view-graph of conflicts and asked to resolve 2017-12-22
Optimizations Rapid cache validations Original mechanism in CODA for cache coherence based on callbacks Callbacks cannot be used when clients are disconnected Client validates cached copies explicitly upon reconnection Potentially time-consuming CODA uses cache coherence checks at multiple levels of granularity 2017-12-22
Rapid cache validations - Illustration Vol 1 Vol 2 Vol 3 Vol 4 Version x Version y Version z Check volume version stamp If version stamps different, check individual object version stamps 2017-12-22
Trickle Re-integration When network is weakly connected, propagate updates to server asynchronously Trade-offs lower bandwidth efficiency as fewer CML optimizations are possible More up to date copies at server – fewer conflicts possible CODA allows users to dynamically set the period of the asynchronous updates based on requirements 2017-12-22
User assisted cache miss handling When a cache miss occurs, what should be done? Option 1 – fetch from server Option 2 – convey error message to user Trade-offs? Latency vs. availability Patience vs. importance CODA uses a “patience threshold” to decide whether a file can be retrieved within this threshold 2017-12-22
Mimic: Raw Activity Shipping for File Synchronization in Mobile File Systems GNAN Research Group School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, GA 30332, USA Good afternoon, ladies and gentlemen. My name is Tae-Young Chang. I’m working in GNAN research group in Georgia Institute of Technology. My presentation today is about a raw activity shipping for file synchronization in mobile file systems, called Mimic.
File Synchronization in Mobile File Systems File synchronization in distributed file systems is a consistency restoration process between a file server and a client Over mobile networks such as a WWAN, file synchronization in traditional distributed file systems cannot be used effectively due to limited bandwidth Bandwidth usage efficiency of file synchronization is important 2017-12-22
Current File Synchronization Schemes (1) Data compression Compresses each block of data or meta-data On-line data compression in a log-structured file system [Burrows’92] Differential update Exploits similarities between versions of the same file Based on the diff scheme of the UNIX systems Rsync [Tridgell’00] Low-bandwidth network file systems [Muthi’01] 2017-12-22
Current File Synchronization Schemes (2) Operation shipping Logs and ships user operations that update the files Session/application level logging and replaying Need modification for GUI-based interactive applications Corrects replaying errors by forward error correction (FEC) Minor re-execution discrepancies caused by non-repeatable operations can be detected by fingerprint algorithms Operation shipping for mobile file systems [Lee’02] 2017-12-22
Motivation Update size comparison in Microsoft Word Activity Description Activity Events Full File Size diff Size Activity Size 236 B 1543 B 29341 B 98 keystrokes Insert a line 848 B 2111 B 29356 B 476 keystrokes Insert a paragraph 72 B 1119 B 33449 B 6 keystrokes + 12 mouse-clicks Copy and paste a paragraph from the same file 30 B 1660 B 40611 B 7 mouse-clicks Change the font type of a paragraph 2017-12-22
Goal and Overview Goal To design an application-unaware activity shipping scheme for file synchronization in interactive applications Mimic is a file synchronization strategy that records raw activity at the client side, ships the records to the server during synchronization, and replays the activities at the server 2017-12-22
Mimic System Overview Client Server 2017-12-22 Input Device Input Driver Event Message Queue Input Application Input Message Mimic Recorder Local File System File Record File / Record Network File System Mimic File Verifier Figerprint / FEC Mimic Replayer Record File Message Queue Input Application Input Message 2017-12-22
Mimic Design Elements Mimic Client Design: recording raw input activity Mapping filename to process/session Record optimization Mimic server design: replaying raw input activity Environment synchronization Replaying speed optimization Integration of Mimic with file systems Verification/error correction of replayed files Presented in [Lee’02] 2017-12-22
Mimic Client to Window Manager Interface Mimic Client Design: Mimic Client to Window Manager Interface OS Window Manager Mimic Client File System Server Synchronization Native File Record trapSystemMessage( ) getClipboard( ) getEnvironment( ) processToWindowHandle( ) system message Now, I would like to outline the overview of Mimic. There are a client and server in the system. When a user inputs using input devices, The input is delivered into the input device. And, the interpreted input system message is delivered to the system message queue, And, it is also captured by the Mimic recorder. Then, the queue sends the message to the corresponding application. Finally, the file and the operation record is saved in the local file system. On the other hand, the server receives the record, And, deliver it to Mimic replayer Mimic replayer regenerates a input system message based on the records and give it to the message queue. The message queue send the message the application. If the playback is over, Mimic file verifier verifies the file and correct minor errors. 2017-12-22
Background Mimic Client Design: Translation of input activity Device driver translates an input event into a input system message Handle of the corresponding window is also written on the message Delivery of input system messages Interactive application has a set of application windows System message queue demultiplexes system messages according to their window handle 2017-12-22
Window Handle Table (WHT) Mimic Client Design: Window Handle Table (WHT) WHT maps the process (or session) handle and filename of an application to the corresponding set of window handles When a file is opened with a filename, the mapping information is acquired, and added on WHT through processToWindowHandles( ) When the file is closed, the mapping is removed from WHT Mimic recorder captures system messages having the window handles listed on the WHT or the system window handles through trapSystemMessage( ) Most GUI-based interactive applications have their own process or session consisting of one or multiple application windows. Usually, they have a single output file. In order to capture the user activity, Mimic maintains Window Handle Table (WHT). The table maps the process handle and its filename of an application to the corresponding set of window handles. Then, Mimic can capture input system messages having the corresponding window handle based on WHT. 2017-12-22
Descriptors Each captured message is translated into a descriptor Mimic Client Design: Descriptors Each captured message is translated into a descriptor Activity Descriptor (AD) Describes an individual user input activity for an application Meta Descriptor (MD) Capture the changes of system environment during recording Corresponding system messages are generated Includes screen resolution, color depth, keyboard layout, clipboard content, etc. Environment Descriptor (ED) Describes the initial system environment when recording begins Same structure as that of an MD through getEnvironment() and getClibboard() After capturing the input system messages, Mimic translates the message into a descriptor minimizing the size There are three types of descriptors. First, AD describes an individual user input activity for an application Second, ED describes the initial system environment when recording It includes screen resolution, color depth, keyboard layout, clipboard content. MD captures the changes of system environment during recording Basically, operating system generates the corresponding system messages And, Mimic captures and translates them It structure is the same as that of an ED 2017-12-22
Descriptor Structures Mimic Client Design: Descriptor Structures message identifier 31 lParam wParam time hwnd 32 63 64 95 96 127 128 159 20 bytes EVENTMSG in Windows message type 3 clipboard size 4 clipboard information 15 keyboard layout information 2 bytes x-coordinate time y-coordinate 4 bytes variable color information 11 12 Meta Descriptor (Clibboard Content) Meta Descriptor (Keyboard Layout Change) Meta Descriptor (Screen Resolution Change) Meta Descriptor (Color Depth Change) message type 3 virtual-key code 4 11 time 12 15 x-coordinate y-coordinate 4 bytes MD link information Activity Descriptor (Keyboard Activity) 2 bytes Activity Descriptor (Mouse Activity) Activity Descriptor (Meta Descriptor Link) 2017-12-22
Records Each descriptor is recorded in a record Mimic Client Design: Records Each descriptor is recorded in a record File Activity Record (FAR) Maintained per file Consisting of file information, a set of environment descriptors (EDs), and a sequence of activity descriptors (ADs) Linked to a meta descriptor (MD) of a meta activity record (MAR) when an environment change happens Meta Activity Record (MAR) Shared by file activity records (FARs) Consisting of a sequence of meta descriptors (MDs) All descriptors are recorded in records. There are two types of records. FAR consists of its filename, an ED, and a sequence of ADs. It is maintained per file. When an environment change happen, it is linked to the corresponding MD of MAR. MAR consists of a sequence of MDs, and it is shared by FARs. 2017-12-22
Mimic Client to File System Interface Mimic Client Design: Mimic Client to File System Interface OS Window Manager Mimic Client File System Server Synchronization Native File processToWindowHandle( ) system message getEnvironment( ) trapSystemMessage( ) getClipboard( ) Record open ( ) close ( ) rename ( ) delete ( ) finish ( ) synch ( ) /Synch_Status 2017-12-22
Integration with File Systems Mimic Client Design: Integration with File Systems open(filename,mode,process_handle) Called when a shared file is opened close(filename), rename(filename), delete(filename) Called when a shared file is closed, renamed, or deleted synch(filename,diff_size) Called when a shared file needs to be synchronized Decides update mode by comparing FAR_size with diff_size Returns Synch_Status SYNCH_FAIL if Mimic synchronization is failed or diff is chosen Updates again on diff mode when Mimic synchronization failed finish() Called when the synchronization process is terminated 2017-12-22
Mimic Server to Window Manager/OS Interface Mimic Server Design: Mimic Server to Window Manager/OS Interface OS Window Manager Mimic Client File System Server Synchronization Native File open ( ) close rename delete finish synch /Synch_Status processToWindowHandle( ) system message getEnvironment( ) trapSystemMessage( ) getClipboard( ) Record Replay playActivity( ) setEnvironment( ) setClipboard( ) executeDefaultApplication( ) waitForProcessIdle( ) 2017-12-22
Initialization Environment synchronization Application synchronization Mimic Server Design: Initialization Environment synchronization Based on the environment descriptor (ED) of the FAR Initial system environment synchronization through setEnvironment() Clipboard content synchronization through setClipboard() Application synchronization Opens a corresponding application of the file activity record Based on the file extension of the filename Sets the same environment such as window size and location through executeDefaultApplication() Moves the system focus to the application When a server is requested to update a file receiving operation records, the server reads the ED of the FAR, and set up the initial system environment at first. Then, it opens the corresponding application, and moves the system focus to the application. When the environment is ready, Mimic reads Ads, and maps them into input system messages. If it reads an MD, it translated it into a system function. Then, Mimic delivers them to the system message queue. 2017-12-22
Replaying System message/function generation Mimic Server Design: Replaying System message/function generation Activity descriptor (AD) is mapped into an input system message Meta or environment descriptor (MD or ED) is mapped into a system function to set the environment through playActivity() System message/function re-execution Deliver the messages to system message queue Run the system functions 2017-12-22
Replaying Speed Control Mimic Server Design: Replaying Speed Control User activity skipping and misinterpretation Certain inputs are relevant to the application only for particular states Too fast replaying may not let the application wait for a particular state, and cause replaying errors Replaying speed control in Mimic Monitors the CPU utilization of the process after every message playback Playbacks the next AD, only when the process is idle through waitForProcessIdle( ) While replaying, user operation can be skipped or misinterpreted. It is because certain inputs are related only to a certain application state. Therefore, too fast playback may not let the application wait enough, and cause errors. In order to prevent this situation, Mimic monitors the CPU utilization of each process, and replays only when the process is idle. 2017-12-22
Experiment Setup Wireless wide area network (WWAN) CDMA2000-1X cellular network Effective data rate: about 17 Kbps Round-trip time between the client and server: about 300ms Operating system/application Microsoft Windows 2000 Professional Microsoft Office 2000 suite Metrics Transfer size Synchronization latency Includes shipping, replaying, and verification delays Compared with the differential update (diff) xDelta for Windows In the experiment, we used WWAN, especially CDMA2000-1x cellular network. Its effective date rate is 17kbps, and rtt is about 300ms. We used Microsoft Windows and Office 2000. We measured the transfer size and synchronization latency, and compared the results with the diff results. 2017-12-22
Transfer Size Results (1) This figure shows the transfer size for MS Word. In the figure, Mimic shows same or better overall performance except the external copies. It is because the external copies depends on the OLE structure of MS, and the structure is not optimized in terms of size. 2017-12-22
Transfer Size Results (2) This figure shows the transfer size for MS Visio. In the figure, Mimic shows better overall performance except the external copies. The reason is the same. 2017-12-22
Transfer Size Results (3) Transfer size in Mimic is generally less than that in diff Except when copying from outside the file Bandwidth-inefficient clipboard structure (OLE) Mimic overhead is generally proportional to activity size Transfer size relies on the size of the copied object Diff overhead is not proportional to activity size Single line insertion in diff may consume more bandwidth than a single paragraph insertion Delete or modify operations in Mimic incurs significantly smaller overheads than those in diff In summary, the transfer size in Mimic is generally equal or less than that in diff except when copying from outside the file. And, it is because of the inefficient OLE structure. On the other hand, Mimic overhead is generally proportional to the activity size. And, There are some exceptions such as external copies. Especially, Delete or modify operations can incur much smaller overheads 2017-12-22
Synchronization Latency Results (1) This figure shows the synchronization latency for MS word. In the figure, Mimic shows worse overall performance except internal copies and deletion. It is because those operations do not need a longer replaying time, so the transmission time is dominant. 2017-12-22
Synchronization Latency Results (2) This figure shows the synchronization latency for MS Visio In the figure, Mimic shows worse overall performance except external copies. And, it is because the excessive transmission time for inefficient OLE objects. 2017-12-22
Synchronization Latency Results (3) Mimic still shows better latency performance for certain activity For small insertions, deletions, internal copies, and meta data changes Even though the latency in Mimic includes its playback time, its total update time does not exceed that of diff Benefit of small transfer size for those operations is larger than playback overhead However, for the other types of activities, diff performs better in terms of latency For large insertions, modifications, and external copies In the previous figures, Mimic performs better in terms of latency for certain activity, small insertions, deletions, internal copying and pastings, and meta data changes It is because the benefit of small transfer size for those operations is larger than playback overhead However, for the other types of activities, Mimic performs worse than diff in latency 2017-12-22
Conclusions We propose an application-unaware and OS-independent approach called Mimic that relies on transferring raw user activity records to the server, where the file is updated through a playback of the raw user activity on the old copy of the file We show that Mimic performs much better than diff in most scenarios in terms of the transfer file sizes We conclude that Mimic can be used in tandem with diff to substantially improve file synchronization performance We propose an application-unaware approach called Mimic, which relies on transferring user activity records to the server, where the file is updated through a playback of the user activity on the old copy of the file We show that Mimic performs much better than diff in most scenarios in terms of the transfer file sizes We conclude that Mimic can be used in tandem with diff to substantially improve file synchronization performance, especially when the bandwidth available on the network connection is low and expensive 2017-12-22
Puzzle Lateral thinking A man lives on the 11th of a building E.g. A man approaches the center of a field, frantically trying to open a package. When he reaches the center of the field, he dies. Why? A man lives on the 11th of a building The elevator is a small one (can accommodate only 1 person) When he goes to work, he takes the elevator to the 1st floor When he comes back from work, he takes the elevator to the 6th floor, and walks up 5 floors When it rains, he takes the elevator up to the 11th floor Why? 2017-12-22