Introduction to tds_fdw

Introduction to tds_fdw
By Geoff Montee

Heterogeneous Data Sources
It is common for applications to reply on multiple types of database systems. For example: The application is being migrated from one database system to another. The application uses different database systems for different types of data.

Heterogeneous Data Sources
Can they exchange data? PostgreSQL Microsoft SQL Server

Heterogeneus Data Sources -
Until Recently Can they communicate? PostgreSQL Microsoft SQL Server Transformation

Heterogeneous Data Sources –
Foreign Data Wrappers tds_fdw PostgreSQL Microsoft SQL Server

What is FDW? FDW stands for Foreign Data Wrapper.
A Foreign Data Wrapper is a plugin that allows PostgreSQL to pull table data from some foreign source (such as another database server, a flat file, a web service, etc.). PostgreSQL support read-only foreign data wrapper. PostgreSQL 9.3+ added write support.

What is TDS? TDS stands for Tabular Data Stream.
Protocol used by MS SQL Server and Sybase database servers. Different versions of the protocol are used by different products and support different features. FreeTDS is an open source library that implements the TDS protocol.

What is tds_fdw? tds_fdw is a foreign data wrapper that allows PostgreSQL to connect to foreign servers over the TDS protocol. Foreign tables can be defined in PostgreSQL to pull data from the foreign server. Supports: Microsoft SQL Server (all versions) Microsoft Azure SQL Database Sybase Adaptive Server Enterprise (ASE) Sybase IQ Possibly others that I’m less familiar with. (e.g. Sybase SQL Anywhere)

tds_fdw features tds_fdw 1.0 supports PostgreSQL 9.1+.
2.0 supports WHERE and column push-downs. No write support yet. Setting tds_version allows Postgres to connect to different products. (e.g. see this: Foreign table data source can be: a table on the foreign server a query that’s executed on the foreign server

tds_fdw project Development is not backed by any company.
Me. I currently work for MariaDB though, so PostgreSQL work mainly happens on the weekends. Julio González Gil maintains a Jenkins CI server for the project. Community involvement: Received a few patches. Lots of GitHub issues. Bug reports are much appreciated!

Installation Step 1: Install PostgreSQL. e.g.:
sudo yum install -y sudo yum install -y postgresql95 postgresql95-server postgresql95-libs postgresql95-devel sudo /usr/pgsql-9.5/bin/postgresql95-setup initdb sudo systemctl start postgresql-9.5.service

Installation Step 2: Install FreeTDS. Requires EPEL on RHEL/CentOS. e.g.: sudo yum install -y sudo yum install -y freetds freetds-devel

Installation Step 3: Build tds_fdw. e.g.:
wget --output-document=tds_fdw alpha.1.tar.gz tar -xvzf tds_fdw alpha.1.tar.gz cd tds_fdw alpha.1 PATH=/usr/pgsql-9.5/bin:$PATH make USE_PGXS=1 sudo PATH=/usr/pgsql-9.5/bin:$PATH make USE_PGXS=1 install

Using tds_fdw There are three different types of objects to use in tds_fdw: Foreign server: A foreign server is your MS SQL Server or Sybase database server. User mapping: A user mapping maps a local PostgreSQL role to a role on the foreign server. Foreign table: A foreign table’s data gets pulled from the foreign server either via a physical table or the execution of a query.

Creating a foreign server
Simple: CREATE SERVER mssql_svr FOREIGN DATA WRAPPER tds_fdw OPTIONS ( servername 'sqlserver01.mydomain.com', port '1433', database 'tds_fdw_test', tds_version '7.1'); TDS protocol versions 7.0 and above are used by MS SQL Server and Azure SQL Database.

Creating a user mapping
Also simple: CREATE USER MAPPING FOR postgres SERVER mssql_svr OPTIONS ( username 'sa', password 'mysapassword'); A PostgreSQL role requires a user mapping in order to login to the foreign server.

Creating a foreign table
Do you want to query a foreign table directly? Then define the table attribute. Do you want your table to reflect the results of a query executed on the foreign server? Then define the query attribute. If you are using MS SQL Server, then you can set row_estimate_method to showplan_all to use statistics from MS SQL Server’s SET SHOWPLAN ALL in query plan calculations.

Creating a foreign table – From a remote table
Let’s say that your table on a foreign server looks like this (on MS SQL Server): CREATE TABLE dbo.tab ( id int primary key, value nvarchar(50), submitted datetime2 ); The foreign table would look like this: CREATE FOREIGN TABLE mssql_tab ( id int, value varchar(50), submitted timestamptz ) SERVER mssql_svr OPTIONS ( table 'dbo.tab', row_estimate_method 'showplan_all');

Creating a foreign table – From a remote query
Let’s say that your table on a foreign server looks like this (on MS SQL Server): CREATE TABLE dbo.user_data ( id int primary key, value nvarchar(50), owner nvarchar(50) ); And you want your PostgreSQL table to be the results of this query: SELECT id, value, owner FROM dbo.user_data WHERE owner = 'geoff'

Creating a foreign table – From a remote query
The foreign table would look like this: CREATE FOREIGN TABLE my_mssql_data ( id int, value varchar(50), owner varchar(50) ) SERVER mssql_svr OPTIONS ( query 'SELECT id, value, owner FROM dbo.user_data WHERE owner = ''geoff''', row_estimate_method 'showplan_all'); The “table” and “query” parameters cannot be used at the same time.

row_estimate_method row_estimate_method is a foreign table option that controls how tds_fdw estimates the number of rows that will be in a given table’s or query’s result set. This estimation is used by the planner. The option that has two possible values: execute: Execute the query on the remote server, so that PostgreSQL knows the exact number of rows. showplan_all: Use Microsoft SQL Server’s “SET SHOWPLAN_ALL ON” to estimate the number of rows. No optimization for Sybase yet. If use_remote_estimate is disabled, then only local estimates are used. A local estimate can be defined with local_tuple_estimate.

SHOWPLAN_ALL Let’s say that I executed the following on Microsoft SQL Server… SET SHOWPLAN_ALL ON; SELECT * FROM dbo.user_data;

SHOWPLAN_ALL Then I would get results like the following:
StmtText,StmtId,NodeId,Parent,PhysicalOp,LogicalOp,Argument,DefinedValues,EstimateRows,EstimateIO,EstimateCPU,AvgRowSize,TotalSubtreeCost,OutputList,Warnings,Type,Parallel,EstimateExecutions SELECT * FROM dbo.user_data;,1,1,0,NULL,NULL,1,NULL,3,NULL,NULL,NULL, ,NULL,NULL,SELECT,0,NULL |--Clustered Index Scan(OBJECT:([tds_fdw_jenkins].[dbo].[user_data].[PK__user_dat__3213E83F668CC6F2])),1,2,1,Clustered Index Scan,Clustered Index Scan,OBJECT:([tds_fdw_jenkins].[dbo].[user_data].[PK__user_dat__3213E83F668CC6F2]),[tds_fdw_jenkins].[dbo].[user_data].[id], [tds_fdw_jenkins].[dbo].[user_data].[value], [tds_fdw_jenkins].[dbo].[user_data].[owner],3, , ,117, ,[tds_fdw_jenkins].[dbo].[user_data].[id], [tds_fdw_jenkins].[dbo].[user_data].[value], [tds_fdw_jenkins].[dbo].[user_data].[owner],NULL,PLAN_ROW,0,1 Only “EstimateRows” is used by tds_fdw at the moment.

Recent noteworthy features/fixes
Added WHERE and column push-downs in version 2.0. Fixed some large memory leaks. Improved performance by 50% by starting to use raw data for some data types, rather than relying entirely on FreeTDS’s string conversions. Changed row estimates to use double instead of int, allowing tds_fdw to handle more than INT_MAX rows. Made datetime conversions locale-independent in PostgreSQL 9.4+, thanks to make_timestamp().

WHERE and column push-downs
Let’s say that you have a table with a lot of rows, and some large columns, that looks like this: CREATE FOREIGN TABLE mssql_tab ( id int, small_col varchar(50), large_col varchar(50000), submitted timestamptz ) SERVER mssql_svr OPTIONS ( table 'dbo.tab', row_estimate_method 'showplan_all'); Let’s say that you only wanted 1 column from 1 row. E.g.: SELECT small_col FROM mssql_tab WHERE id = 1;

WHERE and column push-downs
tds_fdw 1.0 would fetch the whole table, and then filter results locally. e.g. it would execute the following: SELECT * FROM dbo.tab; tds_fdw 2.0 only fetches the rows that are needed by pushing down columns and WHERE clauses. e.g. it would execute the following: SELECT small_col FROM dbo.tab WHERE id=1;

New variables Added some variables to help show how tds_fdw’s memory grows. tds_fdw.show_before_row_memory_stats tds_fdw.show_after_row_memory_stats tds_fdw.show_finished_memory_stats Calls MemoryContextStats() at different points in the code to show memory usage at a given time. Useful for detecting memory leaks. Use SET to set them: postgres=# SET tds_fdw.show_before_row_memory_stats=1; SET postgres=# SET tds_fdw.show_after_row_memory_stats=1; postgres=# SET tds_fdw.show_finished_memory_stats=1;

New variables After executing the query, data like this gets printed to the error log: Showing memory statistics before row 1. ExecutorState: 8192 total in 1 blocks; 2528 free (2 chunks); 5664 used printtup: 0 total in 0 blocks; 0 free (0 chunks); 0 used tds_fdw data: 8192 total in 1 blocks; 7560 free (0 chunks); 632 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used Showing memory statistics after row 1. ExprContext: 8192 total in 1 blocks; 8040 free (0 chunks); 152 used Showing memory statistics before row 2. ExecutorState: 8192 total in 1 blocks; 2256 free (2 chunks); 5936 used printtup: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used

Features to implement next
Write support JOIN push-down row_estimate_method optimized for Sybase

Questions?

Introduction to tds_fdw http://lfnw.org/node/3906

Introduction to tds_fdw

Similar presentations

Presentation on theme: "Introduction to tds_fdw"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to tds_fdw

Similar presentations

Presentation on theme: "Introduction to tds_fdw"— Presentation transcript:

Similar presentations

About project

Feedback