The way how the changes are replicated in transactional replication (CALL vs SCALL)

As I already mentioned before Replication is using replication procedures in order to replicate data changes  Today I will focus on that how the changes can be send down to subscriber, especially for UPDATE.

By default these start with sp_MSins_%, sp_MSupd_%, sp_MSdel_%. You may already seen CALL or SCALL associated with these replication procedures. You can see these in GUI under articles properties:

call_articles_grab

or by just selecting from sysarticles on Publisher db or msarticles on distribution db:

grab_sysarticles

You can change these either by drop down list in SSMS in article properties or by using sp_changearticle stored procedures (executed on Publisher)

The most interesting in that is the UPDATE as that DML you can send down in few different ways. I will focus on two most common ones CALL and SCALL.

Lets go into more detail…

SCALL (default)

SCALL is default setting when you setting up the publication and adding articles into it. The update will pass only columns that have been changed, including PK values and lastly bitmask with information  about what columns has been changed. Here is an example how this looks like in distribution database (Browse replication commands).

Updated one column in TableD:

UpdateTAbleD_SCALL

After browsing distribution db we can find this:

TAbleD_SCALL_Browse_with comments

As you can see the first parameter is NULL, this is the column that has not been changed. Here is how the procedure looks like on Subscriber:

USE [RC] 

go 

/****** Object:  StoredProcedure [dbo].[sp_MSupd_dboTableD]    Script Date: 2/27/2018 6:00:49 AM ******/ 
SET ansi_nulls ON 

go 

SET quoted_identifier ON 

go 

ALTER PROCEDURE [dbo].[Sp_msupd_dbotabled] @c1     INT = NULL, 
                                           @c2     NVARCHAR(100) = NULL, 
                                           @c3     NVARCHAR(200) = NULL, 
                                           @pkc1   INT = NULL, 
                                           @bitmap BINARY(1) 
AS 
  BEGIN 
      DECLARE @primarykey_text NVARCHAR(100) = '' 

      IF ( Substring(@bitmap, 1, 1) & 1 = 1 ) 
        BEGIN 
            UPDATE [dbo].[tabled] 
            SET    [tabled_id] = CASE Substring(@bitmap, 1, 1) & 1 
                                   WHEN 1 THEN @c1 
                                   ELSE [tabled_id] 
                                 END, 
                   [tabled_text] = CASE Substring(@bitmap, 1, 1) & 2 
                                     WHEN 2 THEN @c2 
                                     ELSE [tabled_text] 
                                   END, 
                   [text2] = CASE Substring(@bitmap, 1, 1) & 4 
                               WHEN 4 THEN @c3 
                               ELSE [text2] 
                             END 
            WHERE  [tabled_id] = @pkc1 

            IF @@rowcount = 0 
              IF @@microsoftversion > 0x07320000 
                BEGIN 
                    IF EXISTS (SELECT * 
                               FROM   sys.all_parameters 
                               WHERE  object_id = Object_id('sp_MSreplraiserror' 
                                                  ) 
                                      AND [name] = '@param3') 
                      BEGIN 
                          SET @primarykey_text = 
                          @primarykey_text + '[TableD_ID] = ' 
                          + CONVERT(NVARCHAR(100), @pkc1, 1) 

                          EXEC Sp_msreplraiserror 
                            @errorid=20598, 
                            @param1=N'[dbo].[TableD]', 
                            @param2=@primarykey_text, 
                            @param3=13233 
                      END 
                    ELSE 
                      EXEC Sp_msreplraiserror 
                        @errorid=20598 
                END 
        END 
      ELSE 
        BEGIN 
            UPDATE [dbo].[tabled] 
            SET    [tabled_text] = CASE Substring(@bitmap, 1, 1) & 2 
                                     WHEN 2 THEN @c2 
                                     ELSE [tabled_text] 
                                   END, 
                   [text2] = CASE Substring(@bitmap, 1, 1) & 4 
                               WHEN 4 THEN @c3 
                               ELSE [text2] 
                             END 
            WHERE  [tabled_id] = @pkc1 

            IF @@rowcount = 0 
              IF @@microsoftversion > 0x07320000 
                BEGIN 
                    IF EXISTS (SELECT * 
                               FROM   sys.all_parameters 
                               WHERE  object_id = Object_id('sp_MSreplraiserror' 
                                                  ) 
                                      AND [name] = '@param3') 
                      BEGIN 
                          SET @primarykey_text = 
                          @primarykey_text + '[TableD_ID] = ' 
                          + CONVERT(NVARCHAR(100), @pkc1, 1) 

                          EXEC Sp_msreplraiserror 
                            @errorid=20598, 
                            @param1=N'[dbo].[TableD]', 
                            @param2=@primarykey_text, 
                            @param3=13233 
                      END 
                    ELSE 
                      EXEC Sp_msreplraiserror 
                        @errorid=20598 
                END 
        END 
  END --  

You can see how bitmask is used to determine if the column actual changed or no.

CALL

Firs thing lets change SCALL to CALL of TableD. You can do it in SSMS by going to Publication Properties and then to Article properties or just run T-SQL like that one:

exec sp_changearticle 
@publication ='Publication1',
@article = 'TableD',
@property ='upd_cmd',
@value='CALL [sp_MSupd_dboTableD]',
@force_invalidate_snapshot=0

After you run this you do not need to run snapshot. New update procedure will be delivered as normal replicated commands and now the code for update procedure looks like this:

create procedure [sp_MSupd_dboTableD]     @c1 int,     @c2 nvarchar(100),     @c3 nvarchar(200),     @pkc1 int
as
begin   	declare @primarykey_text nvarchar(100) = '' if not (@c1 = @pkc1)
begin  
update [dbo].[TableD] set     [TableD_ID] = @c1,     [TableD_text] = @c2,     [text2] = @c3
	where [TableD_ID] = @pkc1 if @@rowcount = 0
    if @@microsoftversion>0x07320000
		Begin
			if exists (Select * from sys.all_parameters where object_id = OBJECT_ID('sp_MSreplraiserror') and [name] = '@param3')
			Begin
				
				set @primarykey_text = @primarykey_text + '[TableD_ID] = ' + convert(nvarchar(100),@pkc1,1)
				exec sp_MSreplraiserror @errorid=20598, @param1=N'[dbo].[TableD]', @param2=@primarykey_text, @param3=13233 
			End
			Else
				exec sp_MSreplraiserror @errorid=20598
		End end   else
begin  
update [dbo].[TableD] set     [TableD_text] = @c2,     [text2] = @c3
	where [TableD_ID] = @pkc1 if @@rowcount = 0
    if @@microsoftversion>0x07320000
		Begin
			if exists (Select * from sys.all_parameters where object_id = OBJECT_ID('sp_MSreplraiserror') and [name] = '@param3')
			Begin
				
				set @primarykey_text = @primarykey_text + '[TableD_ID] = ' + convert(nvarchar(100),@pkc1,1)
				exec sp_MSreplraiserror @errorid=20598, @param1=N'[dbo].[TableD]', @param2=@primarykey_text, @param3=13233 
			End
			Else
				exec sp_MSreplraiserror @errorid=20598
		End end  end   -- 

As you can see no bitmap mask just simple update to ALL columns in tables and proc does not check if values has changed.

Now lets see how the update looks like in the distribution db. As before we just updating one column in that table:

update TableD
set text2='test_2222'
where TableD_ID=2

If you check distribution database you will see this:

CALL_browse

As you can see it does provide all columns and values even if these has not changed. It will always send everything.

CALL vs SCALL

Now you wonder why this is important? Imagine you have large table with 30 columns some big ones like varchar(max) etc. Your application updating only one column in that table…. How do you think this will affect your distribution db? Lets have a look.

I cleaned up distribution db and will populate 10 column table with 1 million rows. Then update that 1 million rows first with SCALL and then with CALL settings.

Table Structure looks like that:

CREATE TABLE [dbo].[TableA](
	[TableA_ID] [int] NOT NULL,
	[Col1] [nvarchar](100) NULL,
	[Col2] [nvarchar](100) NULL,
	[Col3] bigint NULL,
	[Col4] int NULL,
	[Col5] int NULL,
	[Col6] int NULL,
	[Col7] datetime NULL,
	[Col8] int NULL,
	[Col9] bigint NULL,
	[Col10] [nvarchar](max) NULL,
PRIMARY KEY CLUSTERED 
(
	[TableA_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

 

After creating I will populate table with random data and then publish it. Once Snapshot of TableA wth 1 million rows delivered to Subscriber I am checking space  of msrepl_commands on distribution database for that Publication (nothing else is getting into that distribution db at this point).

Snapshot delivered

Right now msrepl_commands is 35 rows and 40KB reserved. It contains only initial snapshot. Article is set to use SCALL for updates. Lets send now 1 million updates just for one column, Col4 which is just INT. I will increase that column by number 100 that is it.

1 million SCALL updates sent down

After sending 1 million updates with SCALL my msrepl_commands table went to: 1000035 rows and 183368 KB reserved for that table.

Lets change article to normal CALL and lets send 1 million updates for Col5.

1 million CALL updates sent down

Now my msrepl_commands table went to 2000088 rows and reserved 759624 KB on drive. Just to explain why row count is not 2000035… it is because I changed article using sp_changearticle which added few rows still very small ones (should not take more than 40 KB)

Summary

As you can see using normal CALL will take way more space inside your distribution db

CALL = 576 256 KB

SCALL = 183 328 KB

Same transactions but with CALL we are using almost 3 times more of storage space. Before you change default SCALL setting to CALL think twice and check storage on your distributor.

If you have any questions just use comments section.

Did you use CALL in some cases and if yes to what were these? What setting do you normally use for UPDATES?

 

Immediate_sync on vs off

This is one of the settings that can cause issues with growth of your distribution database or when running Snapshot. It is FALSE by default and it can be set on Publication level. You can check what setting you have by checking syspublications on published database or mspublications on Distributor and distribution database. Let me try to explain what impact have that setting to your replication setup. We are assuming your distribution retention is not “fixed set” like this @min_distretention=48, @max_distretention=48 but more like this @min_distretention=8, @max_distretention=48. More about how this works in the following post.

TRUE

Normal behavior when your retention is set to min=8 and max=48 will be that transactions that been delivered to subscribers and are older than 8 hour will be removed by distribution cleanup. However that changes with Immediate_sync set to ON. The transactions will be kept (even if delivered) until get older than 48 hours. Also it matters when you adding new subscribers. If Snapshot was generated within last 48 hours and you add subscriber it will use existing snapshot and deliver all the transactions that were marked for replication after snapshot generation. Here is an example.

We have Publication1 with TableA as an article and it is published to Subscriber, Subscriberdb database.

Initial_setting

Snapshot was generated within that 48 hour period and two rows added to TableA on Publisher.

Snapshot

Inserts

We are planning to add new subscription stream to same subscriber but to Subscriberdb_2 database. After the creation you do not need to run Snapshot again it will use old Snapshot and all transactions that been delivered afterwards.

New subscription_ON

You can see that it used old Snapshot as TableA has 0 rows (we inserted 2 rows before creating new subscription) and once you select from TAbleA on Subscriberdb_2 you will find two rows. These been delivered as normal replication transactions.

subsriber_select.JPG

From my perspective there is one downside of that setting, especially when you are planning to add new articles quite often. Every time you add new article and want to send a snapshot of these new articles down to subscribers the Snapshot Agent will generate a snapshot for ALL articles with that Publication. In other word last Snapshot has been invalidated.  Lets add TableB to that Publication1 and lets run snapshot.

adding_newarticles_on.JPG

The Snapshot will generate two articles:

new_snapshot_newarticle_on.JPG

However existing Subscribers will send down only new articles TableB:

Subscription_new_article_on.JPG

The generation of Snapshot for ALL tables could be problematic when you publishing hundreds of large tables already. Adding one article will generate new snapshot for all of these tables and from my experience it will generate some locks on Publisher especially if Publisher is very busy.

Lastly the snapshot will be generated no matter if you add new articles or not. If someone will kick off snapshot by mistake it will still do snapshot for entire publication.

Lets have a look how it looks if this setting will be set to FALSE.

FALSE

First thing will be that will be different is that you will need to generate full snapshot every time you adding new subscription.

Second thing that will change is when your retention is set to something like that @min=0 and @max=72. All transactions that have been delivered will be cleaned up on the next Distribution Cleanup job run.

Third difference is that when adding new articles to existing publication the next snapshot run will generate ONLY files for new articles. Lets have an example.

First we changing setting to FALSE by running sp_changepublication:

changingto_false

Now lets add TableC into Publication1:

addingTableC

and run snapshot:

snapshot_with_false

As you can see Snapshot has been generated only for one Table, TableC. That is quite convenient especially when you adding one table into publication that already have hundreds of other tables.

Which settings you should choose? As always it depends on your environment. How often you adding article into Publication? How much space you have on your distributor? How busy is your Publisher? Can you afford of running snapshot for entire Publication?

Replication Procedures Explained Part 1

I already wrote in few other posts about so called “replication procedures”. What are these you wonder? These procedures are used by transactional replication to do INSERT/UPDATES/DELETES on Subscriber DB.

If you did not rename it (or modify it on Publisher) the default “prefix” will be sp_MSins%, sp_MSupd%, sp_MSdel% etc. The exact proc names you can find in properties of any article published while using SSMS (Publication Properties->Articles):

procs_guI

Or just run select on published database and sysarticles table:

procs_tsql

Here is an example of Insert procedure:

procedure_text
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
if object_id(N'[sp_MSins_dboTableA]', 'P') <> 0 
                                        drop proc [sp_MSins_dboTableA]
go
if object_id(N'dbo.MSreplication_objects') is not null 
                                    delete from dbo.MSreplication_objects where object_name = N'sp_MSins_dboTableA'
go
create procedure [sp_MSins_dboTableA] 
  @c1 int,@c2 nvarchar(400),@c3 bigint
as 
begin 
insert into [dbo].[TableA]( 
 [id]
,[TableA_text]
,[id_temp]
 )
values ( 
 @c1
,@c2
,@c3
 ) 
end
go
if columnproperty(object_id(N'dbo.MSreplication_objects'), N'article', 'AllowsNull') is not null 
exec ('insert dbo.MSreplication_objects (object_name, publisher, publisher_db, publication, article, object_type) values (
                                + N''sp_MSins_dboTableA'' , N''MAINTEST\PUBLISHER'' , N''AdventureWorks2014'' , N''Publication1'' , N''TableA'' ,''P'')')

 

These procs are executed on Subscriber when Insert/Update/Delete is performed. You most probably noticed that there is CALL or SCALL before each procedure. This determine how these commands are created and how data is replicated down. That topic is quite big and I will cover it in separate blog post.

These procedures DON’T exists as an objects on Publisher, however you can find these on Subscriber.

proc_subscriber

Replication procedures are delivered in two ways. If you initialize subscription with Snapshot the code for replication procedures is included in SCH (schema) files of the snapshot for given table. If you choose to not initialize the procedures will be delivered as normal command (you can actually browse it in distribution database). Important thing is that any changes to schema on published table will force replication to UPDATE all the procedures code  for changed tables and new version of procedures will be send down to Subscriber. Just a note on that, if you going over the WAN these procedures are delivered slowly, you just need to be aware of that especially if u changing hundreds on articles.

I already mentioned that these procedures do not exists as an object on Publisher. The question is how I can get code for these procedures on the Publishers? That is quite important question as there are some cases you will need the most “recent” procedure definition from Publisher. Remember that the procedure definition on Publisher is the actual one that is used by your replication at that point. You may have cases that code of replication procs been changed by someone or something on Subscriber and it caused replication to fail. In these cases you need the actual code from Publisher…. There are two ways of doing it.

  1. Script out all replication procedures for given Publication. Run following code:  exec sp_scriptpublicationcustomprocs @publication = ‘#publication name#’
  2. You can script out specific article and specific procedures (delete, insert, update etc.) First you need an article ID from sysarticles. Then once you have that you need to run following procedures in order to get procedure code:                                           exec  sp_scriptinsproc  #article_id# –for insert procedure                                           exec  sp_scriptupdproc  #article_id# –for update procedure                                          exec  sp_scriptdelproc  #article_id# –for delete procedure

Now there is problem with 2nd option. I already mention there are other ways of sending Updates or Deletes like SCALL, MCALL etc. If you using SCALL for example for updates you need to execute following: exec sp_scriptsupdproc. For MCALL  exec sp_scriptmappedupdproc. For XCALL: sp_scriptxupdproc etc. If you not sure what “call” you using just check sysarticles table on your Publisher.

Last thing you need to be aware is that when you generating code for these procedures into Text Results… you need to extend maximum characters for Text Results from default 256 to max which is 8192 (new setting will be only “active” when you open new query window). Otherwise some large procedures will be truncated and incomplete. I know small thing but still you need to be aware of this.

These are basics of replication procedures in next post I will focus on what problems and issues you may encounter with these.

How to setup Transactional Replication? Part 2 (Publications and articles)

In Part 1 we covered configuring Publisher and Distributor server. Now you are ready to create first publications and add articles. We will focus on doing this by using SSMS.

Publication

First step is to create Publication on a database that you want to publish. Since you already setup Publisher and Distributor you should be able to extend Replication tree and Right Click on Publication, choose New Publication:

NewPublication

This will open new Publication Wizard:

NewPubwizard

On the Next window you will need to choose which database you want to publish. This will enable given database for Publishing:

Choosingdatabase

In next step you need to choose type of replication. We focusing only on Transactional Replication and we choosing that one:

ReplicationType

Now you need to choose articles. You can choose Tables, Procedures, Functions etc. In my DB I have only Tables:

ChoosingArticles

You can extend each Table to get column list and you can “filter” it vertically by removing some columns from replication. You don’t need to replicate entire table schema.

On the right side you have Article Properties button. You either set Properties for highlighted article or for ALL articles in this Publications. The list of the settings is quite impressive.

Articles Properties

You can decide if you want to copy triggers, all indexes, if table will be dropped on subscriber, choose if you want to send deletes or not etc. There are plenty of settings that you can modify. Choose wisely as some of these settings can invalidate snapshot and you will need to re-snapshot publication. As there are so many of these properties I will go describe it more in separate post.

In next step you can “filter” articles horizontally by adding WHERE condition. For Example you want to replicate only new data then you can set condition on date column etc.

FilteringRows

As you can see amount of properties and configurations that you can use are making replication very flexible:)

In next step you need to decide when Initial Snapshot will be run. If you don’t want to choose anything, you can always run it manually from replication Monitor. You can either schedule Snapshot or choose to generate snapshot straight after publication will be created:

Snapshot

Next step is to configure Security settings for LogReader and Snapshot agent:

SecuritySettings

I already explained (Understanding Replication Agents Part 1 , Understanding Replication Agents Part 2 ) How Replication Agents work. In that place you configuring Processing Account (under what account jobs and exec will be running) and what account will be used in order to connect to Publisher. You can setup different account for LogReader and Snapshot. You also need to keep in mind that account that will be used to connect to Publisher need to have relevant permissions (sysadmin) on Publisher. That account will be generating Snapshot and reading transaction log on Published database.

SecuritySettings

For this example I choose for both SQL Service account. NOTE: I needed to grant permission for Distributor (if remote one) SQL service account on Publisher. If you choosing different processing account for Snapshot than SQL service account, you need to make sure that this account have permissions do file system on Distributor. Snapshot agent will be creating folders and files on disk. SQL Service will have that permissions but if you use different account you may have issues when generating snapshot.

In next step you can choose that either Publication will be created or just generate the script for the Publication you just configuring (or both):

Createorscript

It could be useful to save T-Sql script as it may me useful in future. It is also nice to see what exactly happen when you configuring Publication (as there are many procs executed and configuration changed during that process that you do not see when using SSMS).

Finally we can put name of our Publication and Finish configuration:

Naming Publication

If all went well you will see new Publication in your SSMS:

Completed

DonePublication

This is pretty much entire process of creating basic publication for Transactional Replication. Easy?? Yeah it is easy we will go bit “deeper” in next post.

In next posts will describe some of the articles settings that you can modify, creating subscriptions and configuring Publications/Subscriptions by  using T-sql.

Configuring distributor and publisher with T-SQL

I already described how to configure distributor and publisher by using GUI.

How to setup Transactional Replication? Part 1

However if you have to setup more than one publisher  to setup and also multiple distributors (or distribution dbs) it is getting quite hard to do it quickly. I often you one remote distributor server and multiple distribution dbs on that one server. That below script helped me a lot.  By default I set retention to 48 hours and 24 hours for history. You can change it in the script or later by using GUI. Script will also generate code that you need to run on Publisher server in order to enable Replication and configure distributor.

Note: It will try to create a folder, if parameter @create_folder set to 1, otherwise it will create db in default or provided path. Still if you want to create folder for new distribution database xp_cmdshell need to be enabled on the server.

If you have any improvements or comments to below script let me know.

/* NOTE: 
1.You need to run the script for EACH distribution db u want to create
2.Model db cant be in use during the installation (It need exclusive lock on model in order to create distribution db)
*/
/****** Installing the server as a Distributor.  ******/
use master

declare @distribution_Db nvarchar(300),
@path nvarchar(300),
@SQL nvarchar(max),
@publisher nvarchar(300),
@password nvarchar(300),
@distributor nvarchar(200),
@create_folder bit

-------------- setting variables ------
set @distributor=@@servername
set @distribution_db=''
set @password=''
set @publisher=''
set @path =NULL -- leave null if u want to create distribution db in this forlder: 
--select substring(physical_name,1,patindex('%tempdb%',physical_name)-1) from sys.master_files where database_id in (select database_id from sys.databases where name='tempdb') and file_id =1
set @create_folder =1 --xp_cmdshell need to be enabled otherwise distribution db will be created in default folder. Set to 0 if you created folder manually.
-------------- end of variables --------

--Configuring Distributor
if not exists (select 1 from sys.servers where is_distributor=1 and name ='repl_distributor' and data_source=@distributor)
BEGIN
exec sp_adddistributor @distributor = @distributor, @password = @password
END

if @create_folder =1 and @path is NULL

BEGIN
		select @path=substring(physical_name,1,patindex('%tempdb%',physical_name)-1) from sys.master_files where database_id in (select database_id from sys.databases where name='tempdb') and file_id =1
END

if @create_folder =1 and @path is not NULL
	BEGIN
		set @path = @path + @distribution_Db
set @sql = 'exec xp_cmdshell ''mkdir '  +@path +''''
print @sql
if @create_folder =1
	BEGIN
	 if exists (select 1 from sys.configurations where name ='xp_cmdshell' and value_in_use =1)
	 BEGIN
	 
	print 'Creating Folder  ' + @path
	exec sp_executesql @sql
	END
	ELSE
	BEGIN
	print 'xp_cmdshell not enabled. Db will be created in default folder'
	set @path =NULL
	END
	END
END




-- Adding the distribution databases
if @path is not NULL
set @path = @path + '\'

if @path is NULL
print 'Distribution db will be created in default folder'


if not exists (select 1 from sys.databases where name =@distribution_db)
BEGIN
exec sp_adddistributiondb @database = @distribution_db, 
@data_folder = @path, 
@log_folder = @path,  
@min_distretention = 48, 
@max_distretention = 48, 
@history_retention = 24, 
@security_mode = 1
END


-- Adding the distribution publishers
exec sp_adddistpublisher @publisher =@publisher,
 @distribution_db = @distribution_Db, 
 @security_mode = 1, 
  @publisher_type = N'MSSQLSERVER'

/****** End: Script to be run at Distributor ******/



select 'Execute on publisher ' + @publisher
select 'use master 
exec sp_adddistributor @distributor = ''' + @distributor + ''' , @password = '''+ @password +''''

Transactional Replication and Transaction Log Management

As you know by now, Transactional Replication is using Transaction Log of the published database.

But Wait! Do I need to run my DB in Full Recovery Model? How about checkpoint? Entire beauty of this is that you do not need run your db in full recovery model. Once Replication is up and configured no transactions for given articles will be lost. How it works?

Lets assume we published a table, we will be replicating INSERT/UPDATES/DELETES and we created subscription (publication need to have at least one active subscription in order to start picking up data for replication). From now on ALL transactions for that table will be marked in transaction log as “For Replication”. That transaction will be kept in the log until LogReader  read it and put into distribution database. If from some reason, Log Reader is failing or stopped the log will keep growing, growing and growing…. when finally you ran out of space…In this case  No backup logs, no checkpoints, no changing recovery model around….will help. Ok one thing will help… dropping entire replication on given published database:) But this is not a solution for us. You need to keep in mind that Log reader agent is crucial and it need to run always. any errors on that agent can affect your published database and entire instance. Keep monitoring that job/agent.

Sequential reads

Transaction log is sequential, Log Reader is reading data  sequentially and Transactional replication is sequential… That means various things or actions can affect our replication. Pretty much everything that touching Transaction Log heavily will impact it.

Index Maintenance/Checkdb?

It will affect your Replication mostly when you are using Online rebuild. Index Rebuild is doing most of the job in transaction log and it will keep growing. Even if Replication really CANNOT replicate this “transactions”, the LogReader still need to read entire Transaction Log from beginning. Let’s assume after large index rebuild your transaction log is 10GB in size… now your Log Reader need to go trough and check entire log in order to find all transaction that are marked for replication. You either need to be very careful with rebuilds on published database (add waitfordelay between rebuilds and let log reader to read the log etc.) or you and the business accepting replication delay during the index maintenance. You will often get message like this in the Log Reader.

LogREader_Indexrebuild

As you see it scans the log and return message that no transactions marked for replication.

One thing you can tweak is the Log Reader Agent Profile. You can increase – LogScanThreshold and -Readbatchsize. Sweet spot? start by adding 0 to each value and see if it helps in your situation. Also what you should increase are -Logintimeout and -QueryTimeout as in some cases you connection can break due to timeout and Log Reader will need to start from the begging and we DO NOT want that.

Mirroring

I know that at some point mirroring will be replaced but still this need to be mention. Lets assume that database you published is also mirrored…If for any reason mirroring is falling behind (slow connection, errors etc.), your replication will do the same. Transactional Replication (Log Reader to be exact) will wait for mirroring to catch up and will read only part of the transaction log that been already send to Secondary by mirroring.

Everything ELSE:

The problem with Transactional Replication is that even large updates on tables that ARE NOT published can affect it..  Update will still go to transaction log and Log Reader still will need to scan it.

Conclusion:

You do not need Full Recovery model in order to setup Transactional Replication and replication wont loose any data because of it. You need to be aware of anything heavy happening on your database and that touching transaction log. Keep monitoring your transaction log and Log Reader.

 

How to setup Transactional Replication? Part 1

Distributor:

The main part (and 1st step) of the setup is configuring Distributor. Either it can be located on the same instance as a publisher (not recommended as it will add some load to your Publisher) or configure as remote distributor.
As always you can do this using GUI or T-SQL.

For purpose of this article I created three named instances, PUBLISHER, DISTRIBUTOR and SUBSCRIBER. Remember that anytime you want to work on Replication you need to connect to full named instance name (in my Case MainTest\Distributor etc.)

First step is enable distributor on the instance that will be running as a Distributor.
Using GUI,  go to Replication  in SSMS, right click and choose Configure Distribution from there you will be moved to Distribution Wizard.
After first page u need to choose “<server name> will act as its own distributor, SQL Server will create a distribution database and log” It will create default distribution database on that instance and enable also this instance as a Publisher. Later on if u decide to use different distribution db (or create multiple distribution dbs) you can remove that default distribution db.

Configuredist_1

Then choose path where the snapshot will be located. Check if there is enough space on that drive as some full snapshots of all your published articles can be very big.

configuredist_2

Next you will need to provide a distribution db name and path to the distribution db data file and log file. Same as you creating any new user db. On the next page this distribution db will be automatically Mapped to the publisher instance (which in this case will be your distribution name). You can also choose how Distributor will authenticate with Publisher. In most of the cases you will use Windows auth and the SQL Service account (or dedicated AD account). This will end of the Distributor configuration.

configure_distributiondb1

Configuredist_3
Now when right clicking on Replication tree you will have 3 new options.
When choosing Distributor Properties you can modify :
-distributor setting (like retention period etc.)
-dropping/creating distribution
-mapping publishers to given distribution db
-managing Agent Profile Defaults (here you can change profiles for every agent on distributor)
Now we want to configure distributor for our MainTest\Publisher instance. In order to do this first we need to configure new (or use existing one) distribution db under Distributor Properties. We already have one default distribution db on that instance however for purpose of this article we will create new one.
After clicking “New” you can change database name, database paths and retention period.
I strongly suggest if you have enough disk space set this as a fixed range (min and max on same value). Later on it can save you some work. If you leave default setting (0-72 hours) the transactions that been delivered to subscriber will be removed straight away during next distribution cleanup run also it will remove any transactions older than 72 hour (does not matter if delivered or not).
If you set fixed values lets say Min = 48 hour and Max=48 hours. In this case Cleanup will remove only transactions older than 48 hours. It will NEVER remove data that been inserted within last 48 hours. In some cases it can save u lots of work as you can manipulate with LSN of replication transactions in order to set back the replication in time (will write about this bit later). In our case it works as kind of “DR” solution (not often but still).configuredist_4

After creating new distribution db on the left pane go to “Publishers” and add new one, connect to it and map it to correct distribution database. On the same page SQL will ask u to provide a password for user that Publisher will use to connect to Distributor. Save this password as it will be used for EACH Publisher that will use that remote distributor.

configuredist_5

Publisher:

Last part of configuration is to go to Publisher and point it to configured distributor by providing a password that will be used to connect to Distributor.
Right click on Replication component and choose configure distribution. You will get same Wizard as before on Distributor however on 2nd page choose second option and point it to your distributor. Next page will ask you for password that you set in previous step.

configure_publisher_1

publisher2

publisher3
Once this wizard is completed you good to start creating publications and sending it to subscribers. But this I will cover in next article.

 

Other way to get this done is by using T-SQL. It is much quicker especially when you need to setup few of these.

How to do it? I will show it in next blog post.

Configuring distributor and publisher with T-SQL