Planning Server Deployments Lesson 20
Skills Matrix
Replication You use replication to put copies of the same data at different locations throughout the enterprise. Common reasons to replicate are: –To move data closer to the user. –To reduce locking conflicts when multiple sites want to work with the same data. –To allow site autonomy so each location can set up its own rules and procedures for working with its copy of the data. –To preclude the impact of read-intensive operations, such as report generation and ad hoc query processing from the OLTP database.
Replication SQL Server uses two strategies for replication: replication itself and distributed transactions managed by the distributed transaction coordinator. Whichever strategy you use, the copies of the data are current and consistent. You can also use both strategies in the same environment.
Replication The timing describes the main difference between replication and distributed transactions. With distributed transactions, SQL Server maintains your data one hundred percent synchronized one hundred percent of the time. Replication involves some latency.
Publisher/Subscriber The publisher is the source database where replication begins. –It makes data available for replication. The subscriber is the destination database where replication ends. –It either receives a snapshot of all the published data or applies transactions that have been replicated to itself.
Publisher/Subscriber The distributor is the intermediary between the publisher and subscriber. –It receives published transactions or snapshots from the publisher and then stores and forwards these publications to the subscribers. The publication is the storage container for different articles. –A subscriber can subscribe to an individual article or an entire publication. An article is the data, transactions, or stored procedures that are stored within a publication. –This is the actual information being replicated.
New Publication Wizard
Two-phase Commit Two-phase commit (sometimes referred to as 2PC) is a form of realtime distribution in which modifications are made to all involved databases at the same time. Distributed transactions handles this. As with any transaction, either all statements commit successfully or all modifications roll back. Two-phase commit uses the Microsoft DTC to accomplish its tasks. The DTC implements the functionality of a portion of the Microsoft Transaction Server.
Replication Factors Autonomy: This refers to how much independence you want to give each subscriber with regard to the replicated data. Latency: Latency refers to the time lag between updates on the subscriber. Transactional consistency: Although several types of replication exist, the most common method moves transactions from the publisher through the distributor and on to the subscriber.
Types of Replication Snapshot replication - Snapshot replication distributes data exactly as it appears at a specific moment in time and does not monitor for updates to the data. Transactional replication - A type of replication that typically starts with a snapshot of the publication database objects and data. Merge replication - A type of replication that allows sites to make autonomous changes to replicated data, and at a later time, merge changes and resolve conflicts when necessary.
Distribution Types Distributed transactions Transactional replication Transactional replication with immediate updating subscribers Snapshot replication Snapshot replication with immediate updating subscribers Merge replication Queued updating
Distribution Types
Queue Updating With transactional and snapshot replication, you can also configure queued updating. Like the immediate updating subscribers option, this gives your users the ability to make changes to the subscription database. But unlike immediate updating subscribers, queued updating will store changes until the publisher can be contacted. This can be extremely useful in networks where you have subscribers who are not always connected or the connection is unreliable.
Subscriptions When you set up your subscribers, you can create either pull or push subscriptions.
Push Subscriptions Push subscriptions help centralize your administrative duties because the subscription itself is stored on the distribution server. In other words, the data can be pushed to the subscribers based on the publisher’s schedule. Push subscriptions are most useful if a subscriber needs to be updated whenever a change occurs at the publisher. The publisher knows when the modification takes place, so it can immediately push those changes to the subscribers.
Pull Subscriptions Pull subscriptions are configured and maintained at each subscriber. The subscribers will administer the synchronization schedules and can pull changes whenever they consider it necessary. This type of subscriber also relieves the distribution server of some of the overhead of processing. Pull subscriptions are also useful in situations in which security is not a primary issue.
Replication Agents Five replication agents handle the tasks of moving data from the publisher to the distributor on to the subscribers. –Logreader agent –Distribution agent –Snapshot agent –Merge agent –Queue reader agent
Merge Replication When you use merge replication, the merge agent can be centrally located on the distributor, or it can reside on every subscriber involved in the merge replication process. When you have implemented push replication, the merge agent will reside on the distributor. In a pull scenario, the merge agent is on every subscriber.
Merge Replication
Conflict Resolution in Merge Replication Performing updates to the same records at multiple locations causes conflicts. To resolve these conflicts, SQL Server uses the MSmerge_contents table and some settings from the publication itself. When you first create a merge publication, you can use the conflict resolver with three levels of resolution tracking in a merge publication: –Row-level tracking –Column-level tracking –Logical record-level tracking
Snapshot Replication When you use snapshot replication, an entire copy of the publication moves from the publisher to the subscriber. Everything on the subscriber database is overwritten, allowing for autonomy, as well as transactional consistency because all changes are made at once. Latency can be high for this type of replication if you want it to be. When you use snapshot replication, there is no merge agent. Snapshot replication uses the distribution agent.
Snapshot Replication
Transactional Replication When you use transactional replication, only the changes (transactions) made to the data are moved. Before these transactions can be applied at a subscriber, however, the subscriber must have a copy of the data as a base. Because of its speed and relatively low overhead on the distribution server, transactional replication is currently the most often-used form of replication. Generally, data on the subscriber is treated as read- only, unless you are implementing transactional replication with immediate updating subscribers.
Transactional Replication
Publication Issues Before you start your replication process, you should consider a few more topics, including data definition issues, IDENTITY column issues, and some general rules involved when publishing. Keep the following data definition items in mind when you are preparing to publish data: –Timestamp data types –Identity values –User-defined data types –Not for replication
Tips for Distribution Servers Here are some tips to keep in mind when selecting a machine to be the distributor: –Ensure you have enough hard disk space for the Distribution working folder and the distribution database. –You must manage the distribution database’s transaction log carefully. –The distribution database will store all transactions from the publisher to the subscriber.
Tips for Distribution Servers –Snapshots and merge data are stored in the Distribution working folder. –Be aware of the size and number of articles being published. –Text, ntext, and image datatypes are replicated only when you use a snapshot. –A higher degree of latency can significantly increase your storage space requirements. –Know how many transactions per synchronization cycle there are.
Replication Models You can use one of several models for each replication process that you implement: –Central publisher/central distributor –Remote distribution –Central subscriber/multiple publishers –Multiple publishers/multiple subscribers
Central Publisher Model
Remote Distribution
Central Subscriber/Multiple Publishers
Multiple Publishers/Multiple Subscribers Use this model when you need to maintain a single table on multiple servers. Each server subscribes to the table and also publishes the table to other servers. This model can be particularly useful in the following business situations: –Reservations systems –Regional order-processing systems –Multiple warehouse implementations
Multiple Publishers/Multiple Subscribers
Heterogeneous Replication Heterogeneous database replication allow you to replicate data to non-Microsoft database servers including replicate to databases across the Internet. Heterogeneous replication occurs when you publish to other databases through an OLE DB connection.
Heterogeneous Replication When you publish to these non–SQL Server subscribers, you need to keep the following rules in mind: –Only push subscriptions are supported. –You can publish index views as tables; they cannot be replicated as an indexed view. –Snapshot data will be sent using bulk copy’s character format. –Datatypes will be mapped as closely as possible.
Heterogeneous Replication –The account under which the distribution agent runs must have read access to the install directory of the OLE DB provider. –If an article is added to or deleted from a publication, subscriptions to non–SQL Server subscribers must be reinitialized. –NULL and NOT NULL are the only constraints supported for all non–SQL Server subscribers. –Primary key constraints are replicated as unique indexes.
Replication over the Internet Replicating data over the Internet allows remote, disconnected users to access data stored or “parked”, temporarily on an FTP site when they need it using a connection to the Internet. Replicate data over the Internet using: –A Virtual Private Network (VPN). –The Web synchronization option for merge replication.
Installing and Using Replication To successfully install and enable replication, you must install a distribution server, create your publications, and then subscribe to them. Before any of this can take place, you must first configure SQL Server. To install your replication scenario, you must be a member of the sysadmins fixed server role.
Installing and Using Replication Before you can configure your SQL Server for replication, the computer itself must meet the following requirements: –All servers involved with replication must be registered in Management Studio. –If the servers are from different domains, Active Directory trust relationships must be established before replication can occur. –Any account you use must have access rights to the Distribution working folder on the distribution server.
Installing and Using Replication Use a single Windows domain user account for all your SQL Server Agents. –Do not use a LocalSystem account because this account has no network capabilities and will not, therefore, allow replication. –Also, you need to make the account a member of the Domain Administrators group because only administrators have access to the system ($) shares.
Installing a Distribution Server Before you can enable a publication database, you must be a member of the sysadmin fixed server role. Once you have enabled publishing, any member of that database’s db_owner role can create and manage publications.
Adding a Publication The Create Publication Wizard allows you to specify the following options: –Number of articles –Schedule for the snapshot agent –Whether to maintain the snapshot on the distributor –Tables and stored procedures you want to publish –Publications that will share agents –Whether to allow updating subscribers –Whether to allow pull subscriptions
Creating a Subscription As part of the process of creating a subscription, you will be able to specify the publishers you want to subscribe to and a destination database to receive the published data, verify your security credentials, and set up a default schedule.
Testing Replication You can now verify that replication is running properly.
Replication Monitor The Replication Monitor gathers replication information about the different replication agents. This includes the agent history, with information about inserts, updates, deletes, and any other transactions that were processed. Through the Replication Monitor, you can also edit the various schedules and properties of the replication agents.
Replication Scripts Now that you have replication set up and working properly, you may want to save all your hard work in the form of a replication script.
Replication Scripts Scripting your replication scenario has the following advantages: –You can use the scripts to track different versions of your replication implementation. –You can use the scripts (with some minor tweaking) to create additional subscribers and publishers with the same basic options. –You can quickly customize your environment by modifying the script and then rerunning it. –You can use the scripts as part of your database recovery process.
Replication Scripts From here you can script the distributor and publications for the various replication items stored with this distribution server. You can also script the options for any subscribers and even the replication jobs. When you have made your choices, just click the Script to File button and save the script wherever you like.
Replication Resources Replication requires considerable memory and processor resources. You can perform a number of tweaks to increase the performance of your replication scheme: –Set a minimum memory allocation limit. –Use a separate hard disk for all the databases used in replication.
Replication Resources –Use multiple processors. –Publish only the amount of data required. –Place the snapshot folder on a drive that does not have database or log files. –Be sparing with horizontal partitioning. –Use a fast network. –Run agents continuously instead of frequently.
Summary Replication is a powerful tool used to distribute data to other database engines in your enterprise, which you need to do so your data will be closer to your users and, therefore, faster and easier for them to access.
Summary Microsoft uses a publisher/subscriber metaphor to explain replication. The publisher contains the data that needs to be copied. The subscribers get a copy of the data from the publisher, and the distributor moves the data from the publisher to the subscribers. The data are published in groups called publications; a publication can contain several articles, which are the actual data being replicated.
Summary You can choose from three main types of replication: merge, transactional, and snapshot. Each has pros and cons, but you should consider three main issues when picking a replication type: autonomy, latency, and consistency. In other words, you need to know whether the data has to be replicated right away or whether it can be late (latency). You need to know whether subscribers can update the data (autonomy); and you need to know whether the transactions need to be applied all at the same time and in a specific order (consistency).
Summary When you have picked the right type of replication, you have a number of physical models to choose from: –Central publisher/central distributor –Remote distribution –Central subscriber/multiple publishers –Multiple publishers/multiple subscribers –Heterogeneous replication
Summary Once you have implemented a replication solution, you need to back it up. –You should back up all the databases involved in replication, but especially the distributor, because if you do not, the transaction log in the distribution database will fill up and replication will stop. You should also generate replication scripts so that if your server ever suffers a catastrophic failure, you will be able to rebuild the replication solution much faster.
Summary Also keep in mind all the points for enhancing replication performance. Once you have implemented replication, your users will come to depend on it, and if it doesn’t move fast enough, it will not be dependable, and your users will not be happy. If you keep it in top shape, though, users will be able to take full advantage of the power of replication.
Summary for Certification Examination Know the publisher/subscriber metaphor: Publishers contain the original copy of the data where changes are made. Subscribers receive copies of the data from the publishers. The data are disseminated to the subscribers through the distributor.
Summary for Certification Examination Know the types of replication: Three basic types of replication exist: snapshot, transactional, and merge. In transactional replication, transactions are read right from the transaction log and copied from the publisher to the subscribers. In snapshot replication, the entire publication is copied every time the publication is replicated. In merge replication, data from the publisher is merged with data from the subscribers, which are allowed to update. With the immediate updating subscribers and queued updating options, subscribers can make changes to data that has been replicated with transactional and snapshot data as well.
Summary for Certification Examination Know the replication models: You need to be familiar with the various models—that is, who publishes, who subscribes, and who distributes. In the central publisher/central distributor model, a single server is both the publisher and distributor, and there are multiple subscribers. The remote distribution model has one publishing server, a separate distributor, and multiple subscribers.
Summary for Certification Examination In the central subscriber/multiple publishers model, multiple publishers all publish to a single subscribing server. The multiple publishers/multiple subscribers model contains multiple publishing servers and multiple subscribing servers. The number of distributors is undefined. Heterogeneous replication describes replication to a third-party database engine, such as DB2 or Oracle.
Summary for Certification Examination Understand how publications and articles work: A publication comprises articles that contain the data being replicated. An article is actually a representation of a table. The article can be partitioned either vertically or horizontally, and it can be transformed.