Interpret the execution mode of SQL query in F1 Query paper

Interpret the execution mode of SQL query in F1 Query paper
CHUNG-HAO-WEI CHEN-YU-PING Department of Computer Science and Information Engineering , Chung Cheng University , Chiayi Introduction F1 is a distributed database developed by Google which has drawn much attention from people. The way to implement distributed query engine is one of the most concerned issue of the database academia. The F1 group published paper about the issue on VLDB2018. F1 is originated from distributed SQL query engine of Google AdWords , collocating with Spanner distributed database that opened a new generation of NewSQL. F1 Query is a distributed SQL execution database which can support many different data sources queries , it can be relational database or column-oriented like Parquet. Centralized Execution Centralized Execution corresponds to the OLTP queries such as single-table check, indexed join. This kind of queries are simple enough which can be executed in milliseconds. The best way to handle is to execute them immediately on the machine that receives the request. F1 Query contains F1 Server and F1 Worker. F1 Server is responsible for receiving client requests. If it determines the query should execute by Centralized Execution not Distributed Execution , it will execute immediately and return the results. The behavior is consistent with most OLTP databases; the final request will be handled by only one thrend. Distributed Execution F1 adopts Distributed Execution method for more complex queries such as Join or Aggregation without index. For most OLAP queries , especially Ad-hoc queries is categorized in this classification. The F1 server only works as a coordinator at this time , assigning tasks to multiple Workers until their tasks are all completed. The most similar model is Presto. Presto is a distributed SQL engine developed by Facebook. Unlike Centralized Execution , operators in a Distributed Execution can have multiple instance executed parallelly. Each instance takes responsible for some of the data. One of a data slice is called Fragment , which is called split in Presto. Batch Execution F1 Query also has a unique mode called batch execution. The mode is aimed to handle a larger amount of data and longer query time. On the other hand , its results are no longer returned to client but written to the specified place such as Colossus. Presto mode has no Fault-tolerance which is a fatal for long-running batch tasks that will increase the probability of query failures. Therefore , the primary issue that a batch query should solve is Fault-tolerance ; it must be able to recover from the failure of the Worker node. There are two ways to solve this problem. MapReduce : Divides the calculation into several stages , and the intermediate results are persisted to a distributed file system such as HDFS. Spark RDD : Records Lineage information , when a node failure happened , it recovers lost data fragments by a simple recalculation. F1 Query choose to overflow to disk when there is not enough memory; the intermediate result of MapReduce must be written to the external file system; Compared to Distributed Execution , each step in Batch Execution persists to the external file system. RESULT Conclusions From the situation described in the paper, F1 Query is not a perfect and mature system. Its positioning is more like a glue system to meet business needs, not a technology like Spanner. It just pursue good enough. There is still some room for improvement. These two points are the core competitiveness of F1 Query. F1 Query hopes to solve all OLTP, OLAP, ETL requirements with one system Access various formats in the data center with a system. An ideal database should hide the details of query execution. When user enters a declaration (such as SQL) , the query can be executed in an optimal way to return results. F1 Query made a good attempt to combine multiple execution models into one system , achieving optimal execution performance at lower costs . Figure 2. Thread pool processing model of MYSQL. There are multiple Thread Groups, Figure 1 described one of them. Materials and methods F1 Query defines three different types of query execution modes. User query is divided into three parts according to the data size or execution time of the query 1. Centralized Execution 2. Distributed Execution 3. Batch Execution The first two are interactive, that means the client will wait for the returning of the result; the last one , Batch Execution is more like ETL , after entering the task by client, it is no longer managed, and the query results are written to the specified location. Figure 3. The F1 physical Plan is the same as the Distributed Execution. The difference is that in batch mode , it is divided into series of MapReduce tasks , which are handed over to the scheduler for processing. Literature cited F1 Query: Declarative Querying at Scale MySQL Thread Pool Implementation Figure 1. The F1 system architecture F1 Master, F1 Server, and F1 Worker. Other catalog , such as DF Server and Batch Metadata are used to store query Metadata. Acknowledgments We thank CHEN-YU-PING for collecting infomation, CHUNG-HAO-WEI for consolidation. Figure 4. Batch Mode Service Framework For further information Pay attention to the Technical Writing and Communication in English and get more information from the class.

Interpret the execution mode of SQL query in F1 Query paper

Similar presentations

Presentation on theme: "Interpret the execution mode of SQL query in F1 Query paper"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Interpret the execution mode of SQL query in F1 Query paper

Similar presentations

Presentation on theme: "Interpret the execution mode of SQL query in F1 Query paper"— Presentation transcript:

Similar presentations

About project

Feedback