Transactional Replication and Transaction Log Management

As you know by now, Transactional Replication is using Transaction Log of the published database.

But Wait! Do I need to run my DB in Full Recovery Model? How about checkpoint? Entire beauty of this is that you do not need run your db in full recovery model. Once Replication is up and configured no transactions for given articles will be lost. How it works?

Lets assume we published a table, we will be replicating INSERT/UPDATES/DELETES and we created subscription (publication need to have at least one active subscription in order to start picking up data for replication). From now on ALL transactions for that table will be marked in transaction log as “For Replication”. That transaction will be kept in the log until LogReader  read it and put into distribution database. If from some reason, Log Reader is failing or stopped the log will keep growing, growing and growing…. when finally you ran out of space…In this case  No backup logs, no checkpoints, no changing recovery model around….will help. Ok one thing will help… dropping entire replication on given published database:) But this is not a solution for us. You need to keep in mind that Log reader agent is crucial and it need to run always. any errors on that agent can affect your published database and entire instance. Keep monitoring that job/agent.

Sequential reads

Transaction log is sequential, Log Reader is reading data  sequentially and Transactional replication is sequential… That means various things or actions can affect our replication. Pretty much everything that touching Transaction Log heavily will impact it.

Index Maintenance/Checkdb?

It will affect your Replication mostly when you are using Online rebuild. Index Rebuild is doing most of the job in transaction log and it will keep growing. Even if Replication really CANNOT replicate this “transactions”, the LogReader still need to read entire Transaction Log from beginning. Let’s assume after large index rebuild your transaction log is 10GB in size… now your Log Reader need to go trough and check entire log in order to find all transaction that are marked for replication. You either need to be very careful with rebuilds on published database (add waitfordelay between rebuilds and let log reader to read the log etc.) or you and the business accepting replication delay during the index maintenance. You will often get message like this in the Log Reader.

LogREader_Indexrebuild

As you see it scans the log and return message that no transactions marked for replication.

One thing you can tweak is the Log Reader Agent Profile. You can increase – LogScanThreshold and -Readbatchsize. Sweet spot? start by adding 0 to each value and see if it helps in your situation. Also what you should increase are -Logintimeout and -QueryTimeout as in some cases you connection can break due to timeout and Log Reader will need to start from the begging and we DO NOT want that.

Mirroring

I know that at some point mirroring will be replaced but still this need to be mention. Lets assume that database you published is also mirrored…If for any reason mirroring is falling behind (slow connection, errors etc.), your replication will do the same. Transactional Replication (Log Reader to be exact) will wait for mirroring to catch up and will read only part of the transaction log that been already send to Secondary by mirroring.

Everything ELSE:

The problem with Transactional Replication is that even large updates on tables that ARE NOT published can affect it..  Update will still go to transaction log and Log Reader still will need to scan it.

Conclusion:

You do not need Full Recovery model in order to setup Transactional Replication and replication wont loose any data because of it. You need to be aware of anything heavy happening on your database and that touching transaction log. Keep monitoring your transaction log and Log Reader.

 

How to setup Transactional Replication? Part 1

Distributor:

The main part (and 1st step) of the setup is configuring Distributor. Either it can be located on the same instance as a publisher (not recommended as it will add some load to your Publisher) or configure as remote distributor.
As always you can do this using GUI or T-SQL.

For purpose of this article I created three named instances, PUBLISHER, DISTRIBUTOR and SUBSCRIBER. Remember that anytime you want to work on Replication you need to connect to full named instance name (in my Case MainTest\Distributor etc.)

First step is enable distributor on the instance that will be running as a Distributor.
Using GUI,  go to Replication  in SSMS, right click and choose Configure Distribution from there you will be moved to Distribution Wizard.
After first page u need to choose “<server name> will act as its own distributor, SQL Server will create a distribution database and log” It will create default distribution database on that instance and enable also this instance as a Publisher. Later on if u decide to use different distribution db (or create multiple distribution dbs) you can remove that default distribution db.

Configuredist_1

Then choose path where the snapshot will be located. Check if there is enough space on that drive as some full snapshots of all your published articles can be very big.

configuredist_2

Next you will need to provide a distribution db name and path to the distribution db data file and log file. Same as you creating any new user db. On the next page this distribution db will be automatically Mapped to the publisher instance (which in this case will be your distribution name). You can also choose how Distributor will authenticate with Publisher. In most of the cases you will use Windows auth and the SQL Service account (or dedicated AD account). This will end of the Distributor configuration.

configure_distributiondb1

Configuredist_3
Now when right clicking on Replication tree you will have 3 new options.
When choosing Distributor Properties you can modify :
-distributor setting (like retention period etc.)
-dropping/creating distribution
-mapping publishers to given distribution db
-managing Agent Profile Defaults (here you can change profiles for every agent on distributor)
Now we want to configure distributor for our MainTest\Publisher instance. In order to do this first we need to configure new (or use existing one) distribution db under Distributor Properties. We already have one default distribution db on that instance however for purpose of this article we will create new one.
After clicking “New” you can change database name, database paths and retention period.
I strongly suggest if you have enough disk space set this as a fixed range (min and max on same value). Later on it can save you some work. If you leave default setting (0-72 hours) the transactions that been delivered to subscriber will be removed straight away during next distribution cleanup run also it will remove any transactions older than 72 hour (does not matter if delivered or not).
If you set fixed values lets say Min = 48 hour and Max=48 hours. In this case Cleanup will remove only transactions older than 48 hours. It will NEVER remove data that been inserted within last 48 hours. In some cases it can save u lots of work as you can manipulate with LSN of replication transactions in order to set back the replication in time (will write about this bit later). In our case it works as kind of “DR” solution (not often but still).configuredist_4

After creating new distribution db on the left pane go to “Publishers” and add new one, connect to it and map it to correct distribution database. On the same page SQL will ask u to provide a password for user that Publisher will use to connect to Distributor. Save this password as it will be used for EACH Publisher that will use that remote distributor.

configuredist_5

Publisher:

Last part of configuration is to go to Publisher and point it to configured distributor by providing a password that will be used to connect to Distributor.
Right click on Replication component and choose configure distribution. You will get same Wizard as before on Distributor however on 2nd page choose second option and point it to your distributor. Next page will ask you for password that you set in previous step.

configure_publisher_1

publisher2

publisher3
Once this wizard is completed you good to start creating publications and sending it to subscribers. But this I will cover in next article.

 

Other way to get this done is by using T-SQL. It is much quicker especially when you need to setup few of these.

How to do it? I will show it in next blog post.

Configuring distributor and publisher with T-SQL

When Distribution Cleanup “not behaving”

Distribution Cleanup running every minute and completing with success? –check

Distributor has plenty of resources and everything seems fine?- check

but msrepl_commands is getting bigger (and bigger) and Cleanup locks with Logreader……. why?

That happens quite often in two cases:

  1.  we are using one distribution db for multiple PUBLISHERS.
  2. tables that we are publishing are very busy and cleanup just not handling the load.

I would like to focus only on the second point. Lets assume 1st point has been fixed and there is only one Publisher going trough that one distribution database.

Retention:

I like to keep retention on “fixed” values, means min and max are the same.

EXEC dbo.sp_MSdistribution_cleanup @min_distretention = 24, @max_distretention = 24

Defaults by Microsoft are min=0 and max=72 which means everything thats been delivered will be deleted on next cleanup run and if replication fails to deliver the replication data will be removed after 72 hours. I like to set it to same values as it did save my “life” few times (more about “disaster recovery” using LSN and transnational replication later).

Im mentioning retention because if you set it to high then the table msrepl_commands and msrepl_transactions will become huge and hard to maintain. Check your retention settings on distributor and set it to values that wont cause disk space issues but still meets your and business needs.

Cleanup procedures:

This is quite interesting… I did notice in very big and busy replication environments some of the default settings does not work well. Reason? The default batch size for delete is 2000. Which is bit low for system that can get 30K (or higher) commands every minute. In such cases you think cleanup is working fine and running but actually it is not coping with amount of load. In this case you will see msrepl_commands just going very big and above the retention period you set on distribution db.

Now just a note… It is not the perfect way but in some cases it need to be done. We need to modify two system procedures on distribution db. Here are the changes I suggest and changes that helped me on large distribution dbs:

  1. Modifictions to sp_MSdelete_publisherdb_trans (proc that cleans up msrepl_commands)
  •  Batch size (DELETE TOP())  increase that to value that suits you. Remember to change as well at the end of the loop IF @row_count < [same value as in DELETE TOP]. There are two loops that you need to change it.
  • Modifying PAGELOCK to ROWLOCK.  This will help with locking
  • Additionally you can modify MAXDOP setting (if different globaly on server). I never need to modify it so far. Works good with maxdop=1

2. Modifications to sp_MSdelete_dodelete (proc that cleans up msrepl_transactions:

  •  Batch size (DELETE TOP())  increase that to value that suits you. Remember to change as well at the end of the loop IF @row_count < [same value as in DELETE TOP]. There are two loops that you need to change it.
  • Modifying PAGELOCK to ROWLOCK.  This will help with locking

As you see two modifications are exactly same between these two the only difference will be the values in TOP part. For sp_MSdelete_dodelete you can set these values lower (or not). You just need to see what works the best on your system.

One last thing about this. If you made this changes on SQL 2014 and will apply SP2 CU2. All changes will be replaced with default settings again.

Articles/Publications:

At this point we need to ask question.. Do we really need to replicate all tables? Do we really need so many publications? and the most important do we have articles that are published in multiple publications???

First two questions you need to answer by yourself… however if answer on the last question is “yes I do have articles published in more than one publication” this is part you should read.

Lets assume you have TableA that is published in publication Pub1 but you also have publication named Pub2. 99% of tables published in these two are different but one table that getting 1 million transactions a day is published in both publications… now what happened just by doing it you multiple entries in msrepl_commands for that article. Here is an example. Below query showing how commands are looking for ONE insert into table A.

multiple_commands

As you see because table is published in two different publication you will get two entries for each transactions on that table.

How you can solve it? If possible, send that table with one publication. Remember you can decide what articles goes to which subscription. You can have two subscriptions going out from one Publication but with different sets of articles. But this is topic for different article.

Distribution Cleanup Job:

Cleanup job by default is set to run every 10 minutes. You can decrease that time to 1 minute but this wont always help, still you cleanup can fail with a deadlock. On busy systems I do often adding retry attempts to cleanup step and putting 0 between retries. That save us time and cleanup will try to run again straight away. Note: Do not set retry attempts too high. Otherwise you wont know if there are some real problems with Cleanup Job (because it will never fail or fail when it will be to late and you ran out of disk space).

SQL Server configurations:

Is our distributor following best practices? (multiple tempdb files, maxdop etc.) Is our distributor running on default settings? (default 1MB autogrowth on distribution db, low memory settings etc.). This is just reminder as Im assuming these things you already checked and you ran out of ideas, that is why you are here:).

Last thing is about running update stats on distributors. You may or may not know that it is not really needed…. as cleanup procs running UPDATE STATISTICS (on msrepl_commands and msrepl_transactions) at the begging and at the end of the cleanup proc execution.

Conclusion:

Check your Distributor configurations. Check retention on distribution dbs. Modify cleanup procedures accordingly. Review your publications and articles. Add retry attempts to cleanup step.

These are my suggestions to you. If you have any other ideas or comments let me know.

 

Transactional Replication – How it works?

 

Replication is one of the most “mature” technology of SQL Server that are bit underrated these days with all AG, Mirroring and other new technology coming now with SQL Server.

As the name says it is Transactional it means it is log based.
This is how it works:
Lets say we doing an insert into published table:
1. Insert transaction is marked as “for replication” in the transaction log of the published database.
2. Log reader that is responsible for reading a log is picking up that transaction (I will go more detail in other article) and it puts the information about this transaction into distribution db on distributor
3. Once the transaction has been read by log reader it can be cleared when next checkpoint (simple recovery) or next tran log backup (full recovery) occur.
4. Distribution agent is picking up next transaction from distribution database and it is delivering that transaction to subscriber.

These 4 simple point describing high level of “how transactional replication works”. We will go more in detail about each of these steps.

AS You noticed we have few components that are crucial for replication to work. Here are the list and short descriptions

Publisher – Is the source of published data.
Distributor – it is connecting to publisher to take the published information, store it and distribute it to subscribers.
Subscriber – it is destination of published data.

You can compare Transactional Replication to printed newspaper subscription. Publisher (New York Times, Washington Post etc.) is creating and publishing information (Articles), Distributor (currier company or post office) is delivering the newspaper to ALL Subscribers that paid for subscription.

Other important components of transactional replication are the AGENTS, each with different tasks.
The main agents for transactional replication are:
Log Reader – running on distributor and connecting to publisher in order to read marked transactions and deliver these to distributor
Snapshot Agent – Is the agent that doing a snapshot of published articles in two main cases : When initial snapshot of all articles are made or when our publication is in “full sync” mode and we are adding new articles into replication
Distribution Agent – Its main job is delivering replicated transactions to all subscribers

Log reader and Distribution Agent are running continuously and should not be stopped for longer period. Each agent is actual SQL Job (usually located on distributor, pull subscription has distribution agents located on subscriber) have sets of parameters that DBA can tweak it (like multi streaming).

These are basics of transactional replication. Now we can go more into the detail. In next articles I show you how to setup, maintain, tweak and fix (or break:)) it.