Cannot insert explicit value for identity column in table ” when IDENTITY_INSERT is set to OFF

Another error in transactional replication is the one related to Identity columns. That error is easy to fix and it is being raised if the setting “not for replication” is set to incorrect value on SUBSCRIBER database.

Lets start from beginning with creating table that includes identity column with following script (we do not specify “Not for Replication”, by default it will be set to 0):

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[Table_IdentityA](
	[RowID] [bigint] IDENTITY(1,1) NOT NULL,
	[Field1_int] [int] NOT NULL,
	[Field2_int] [int] NOT NULL
 CONSTRAINT [PK_IdentityA] PRIMARY KEY CLUSTERED 
(
	[RowID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, 
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 100) ON [PRIMARY]
) ON [PRIMARY]
GO

TAble_setup_default

After adding that table into Publication with following setting:

@identityrangemanagementoption = N’none’

Here is the script:

use [PublishedDB]
exec sp_addarticle @publication = N'IDENTITY_Test',
@article = N'Table_IdentityA', @source_owner = N'dbo',
@source_object = N'Table_IdentityA', @type = N'logbased', 
@description = N'', @creation_script = N'', @pre_creation_cmd = N'drop',
@schema_option = 0x000000000803509F, @identityrangemanagementoption = N'none', 
@destination_table = N'Table_IdentityA', @destination_owner = N'dbo',
@status = 24, @vertical_partition = N'false', 
@ins_cmd = N'CALL [sp_MSins_dboTable_IdentityA]', 
@del_cmd = N'CALL [sp_MSdel_dboTable_IdentityA]', 
@upd_cmd = N'SCALL [sp_MSupd_dboTable_IdentityA]'
GO

As there is identity column in that table SQL won’t allow you specifying ‘none’. You will get following error.

“Msg 21830, Level 16, State 1, Procedure sp_MSrepl_addarticle, Line 2124
You cannot specify schema_option 0x4 (script identity as identity rather than the base data type) for article ‘Table_IdentityA’. The value specified for the parameter @identityrangemanagementoption is NONE. To replicate identity as identity, the value must be MANUAL or AUTO for publications that do not support queued updating subscriptions.”

AUTO setting is only allowed when using updating subscriptions:

Automatic identity range support is useful only for publications that allow updating subscribers.

Microsoft is suggesting to avoid that setting:

Updatable Subscription

The only setting that we left is MANUAL. Lets add article with that setting set to Manual.

use [PublishedDB]
exec sp_addarticle @publication = N'IDENTITY_Test',
@article = N'Table_IdentityA', @source_owner = N'dbo',
@source_object = N'Table_IdentityA', @type = N'logbased', 
@description = N'', @creation_script = N'', @pre_creation_cmd = N'drop',
@schema_option = 0x000000000803509F, @identityrangemanagementoption = N'manual', 
@destination_table = N'Table_IdentityA', @destination_owner = N'dbo',
@status = 24, @vertical_partition = N'false', 
@ins_cmd = N'CALL [sp_MSins_dboTable_IdentityA]', 
@del_cmd = N'CALL [sp_MSdel_dboTable_IdentityA]', 
@upd_cmd = N'SCALL [sp_MSupd_dboTable_IdentityA]'
GO

This time no error. After Initializing and sending snapshot down to Subscriber,lets check what we have in Not For Replication.

Table_setup_subscriber

A bit surprise it is different than on Publisher. now lets try to insert rows into that table on Publisher. No issues there all 10 rows inserted properly:

no_Error_after_inserting

Lets see what has changed on Publisher:

TAble_setup_after_Adding_article

As you can see setting “Not for Replication” has changed to 1. I did test it and it seems while adding a table with identity column into Publication, either with GUI or T-SQL that setting is being changed to 1. Replication taking care of it from it side and I would not touch that setting on Subscriber. Pretty much which ever setting it is set on Publisher it will be always changed when adding table to Publication (changed to 1).

You can also see that replication procedure for that table is passing the value of Identity column (RowId):

CREATE procedure [dbo].[sp_MSins_dboTable_IdentityA]
    @c1 bigint,
    @c2 int,
    @c3 int
as
begin  
	insert into [dbo].[Table_IdentityA](
		[RowID],
		[Field1_int],
		[Field2_int]
	) values (
    @c1,
    @c2,
    @c3	) 
end

The error I mentioned at the begging can be caused only in two scenarios. First someone manually changed that setting on Subscriber database. Second is that tables on subscriber were created manually instead of snapshot. In the second scenario for example we used a backup of Published database (before Publication has been created) and restore it on Subscriber to initialize from backup.

Solution

In order to fix that error you can run following script on Subscriber database and provide table name that have an issue:

DECLARE @ApplicableObjectID INT
SET @ApplicableObjectID = OBJECT_ID('

‘

)
EXEC sp_identitycolumnforreplication @ApplicableObjectID,1

That script will change ‘Not For Replication’ to 1 and fix the error that you have with your distribution agent.

Alternatively to prevent that happen for other table you can change that setting to all tables on Subscriber (that are populated by replication) using script

select 'EXEC sp_identitycolumnforreplication ' + CAST(c.id AS VARCHAR(10)) + ', 1' ,t.name
FROM syscolumns c
join sys.tables t
on c.id=t.object_id
WHERE columnproperty(c.id, c.name, 'IsIdentity') = 1 
AND columnproperty(c.id, c.name, 'IsIdNotForRepl') = 0
AND  is_ms_shipped <>1

That script will do all user tables that has Identity column and set “Not For Replication” = 0 and set it to 1.

Lastly I would not recommend modifying replication procedures on Subscriber by adding SET IDENTITY_INSERT setting. As any changes to table schema will update that proc to old version and it will break distribution agent again.

Even if that error is not so common I hope it helped you better understand how identity is managed by transactional replication.

June 28, 2018

The row was not found at the Subscriber when applying

In next few posts will try to cover most of the common Transactional Replication errors.

First one that I want to focus on is “Row not found at the Subscriber”. That error will come up anytime that data is missing at the subscriber and that given row been modified (update or delete) on Publisher. Only UPDATE and DELETE can cause this error. Since SQL 2014 SP2 Microsoft added better error handling and you wont need to browse distribution db in order to get more information. More about this here in a update section: sp_Msreplraiseerror (2014 SP2)

Here is my error (as you can see I am using SQL 2014 with SP2).

Troubleshooting

Lets assume we still on older version and the only thing we are getting is the seq no and command_id of the failing command/transaction. First what you need to check is distribution db and browse it just to get full transaction. Here are my results:

As you can see it is update to TableC where PK value is =1. After checking subscriber we can confirm that the row is missing there.

Now there are two ways of fixing it:

1.Skip

By default all agent running on “Default Agent profile” that do not skip ANY Errors. However if you do not care about data on your subscriber to be consistent then you can skip that error by changing Agent Profile. You can see that this error have a code 20598, you can change agent profile to “Continue on data consistency errors.” or create your own Agent Profile with that -SkipErrors value 20598 and restart the agent (since 2017 CU3 you won’t need to restart agents). After that change you will see that Distribution Agent skipped that command and continue with next transactions. Note: That means ALL errors with that error code will be skipped for ALL published tables for that subscription!

If you want to be more selective and skip row not found only for specific table what you can do is modify the update replication proc for given table on the subscriber. You just commenting out last part of proc and put RETURN(1) like that:

After error been skipped you can revert changes.

2. Insert missing data

I would assume we do not want to miss data on the subscriber. The easiest way for fixing smaller amount of data we can just use tool sp_generate_inserts I am using it a lot and it is very useful with plenty of settings (include columns, Identity etc.). However it we have bigger set of data you could export table (or just set of data) to subscriber under different name and use left/right join to find missing data. It is quite a bit of work but in this case you will avoid running snapshot and overwriting entire table on Subscriber. Alternative for this is creating separate Snapshot replication on the side for that one table (with different destination name) and deliver the snapshot to Subscriber and then do the join to find missing data. Just note on that if you decide to overwrite table on Subscriber with new Snapshot you will get other errors on that Distribution agent. Here is an example.

We are assuming TableC is still being populated with inserts/updates/deletes… but our distribution agent is failing with an error (row not found) therefore our pending commands start to climbing:

Lets create a Snapshot replication for that one table:

Once Publication created you can go and create Subscription (remember to leave Initialize ticked).

Now we can generate the Snapshot for or TableC_fix replication:

Now after Snapshot been delivered we stopped getting Row not found error, we start getting PK violation… just because the command in pending queue in our transactional replication have INSERTS for data that been already delivered by our Snapshot replication.

That pretty much means our data is already there in Subscriber. To fix it just Create an Agent Profile with SkipError value 2627 and restart the agent. Once Transactional Replication get to the point when we generate Snapshot the error will disappear and normal replication will start delivering data.

3. Snapshot

In my world I am not able to do it and it can be a bit of pain as well. However if table is big and you want to be sure 100% that you are not missing ANY data, you may not have other option. In order to do that you need to drop article from subscription first by running sp_dropsubscription with @article parameter specified and then sp_Droparticle command for that given article. Then re-add that table into publication and run EXEC sp_refreshsubscriptions @publication ='<publication name’ . (Just remember you need to have Immediate_sync option set to false otherwise you will generate entire snapshot).

DELETE

How you should proceed if it turns out that it is DELETE statement? In my experience you can just skip it, by using agent profile or more selective way… changing DELETE replication procedure on subscriber (point 1). The data is already gone from Publisher and there is no way that we can fix it…

Lastly the most important thing is to find out WHY we are getting these errors (either UPDATE or DELETE) and get the root cause of the issue. Ideally that error should NEVER happen, especially in 1 to 1 structure (one publisher to one subscriber).

Most of you experience that error, for sure. Do you have any other ideas to fix it? Other approach when handling that error? If yes, lets discuss in the comment section.

March 14, 2018May 23, 2019

The way how the changes are replicated in transactional replication (CALL vs SCALL)

As I already mentioned before Replication is using replication procedures in order to replicate data changes Today I will focus on that how the changes can be send down to subscriber, especially for UPDATE.

By default these start with sp_MSins_%, sp_MSupd_%, sp_MSdel_%. You may already seen CALL or SCALL associated with these replication procedures. You can see these in GUI under articles properties:

call_articles_grab

or by just selecting from sysarticles on Publisher db or msarticles on distribution db:

grab_sysarticles

You can change these either by drop down list in SSMS in article properties or by using sp_changearticle stored procedures (executed on Publisher)

The most interesting in that is the UPDATE as that DML you can send down in few different ways. I will focus on two most common ones CALL and SCALL.

Lets go into more detail…

SCALL (default)

SCALL is default setting when you setting up the publication and adding articles into it. The update will pass only columns that have been changed, including PK values and lastly bitmask with information about what columns has been changed. Here is an example how this looks like in distribution database (Browse replication commands).

Updated one column in TableD:

UpdateTAbleD_SCALL

After browsing distribution db we can find this:

TAbleD_SCALL_Browse_with comments

As you can see the first parameter is NULL, this is the column that has not been changed. Here is how the procedure looks like on Subscriber:

USE [RC] 

go 

/****** Object:  StoredProcedure [dbo].[sp_MSupd_dboTableD]    Script Date: 2/27/2018 6:00:49 AM ******/ 
SET ansi_nulls ON 

go 

SET quoted_identifier ON 

go 

ALTER PROCEDURE [dbo].[Sp_msupd_dbotabled] @c1     INT = NULL, 
                                           @c2     NVARCHAR(100) = NULL, 
                                           @c3     NVARCHAR(200) = NULL, 
                                           @pkc1   INT = NULL, 
                                           @bitmap BINARY(1) 
AS 
  BEGIN 
      DECLARE @primarykey_text NVARCHAR(100) = '' 

      IF ( Substring(@bitmap, 1, 1) & 1 = 1 ) 
        BEGIN 
            UPDATE [dbo].[tabled] 
            SET    [tabled_id] = CASE Substring(@bitmap, 1, 1) & 1 
                                   WHEN 1 THEN @c1 
                                   ELSE [tabled_id] 
                                 END, 
                   [tabled_text] = CASE Substring(@bitmap, 1, 1) & 2 
                                     WHEN 2 THEN @c2 
                                     ELSE [tabled_text] 
                                   END, 
                   [text2] = CASE Substring(@bitmap, 1, 1) & 4 
                               WHEN 4 THEN @c3 
                               ELSE [text2] 
                             END 
            WHERE  [tabled_id] = @pkc1 

            IF @@rowcount = 0 
              IF @@microsoftversion > 0x07320000 
                BEGIN 
                    IF EXISTS (SELECT * 
                               FROM   sys.all_parameters 
                               WHERE  object_id = Object_id('sp_MSreplraiserror' 
                                                  ) 
                                      AND [name] = '@param3') 
                      BEGIN 
                          SET @primarykey_text = 
                          @primarykey_text + '[TableD_ID] = ' 
                          + CONVERT(NVARCHAR(100), @pkc1, 1) 

                          EXEC Sp_msreplraiserror 
                            @errorid=20598, 
                            @param1=N'[dbo].[TableD]', 
                            @param2=@primarykey_text, 
                            @param3=13233 
                      END 
                    ELSE 
                      EXEC Sp_msreplraiserror 
                        @errorid=20598 
                END 
        END 
      ELSE 
        BEGIN 
            UPDATE [dbo].[tabled] 
            SET    [tabled_text] = CASE Substring(@bitmap, 1, 1) & 2 
                                     WHEN 2 THEN @c2 
                                     ELSE [tabled_text] 
                                   END, 
                   [text2] = CASE Substring(@bitmap, 1, 1) & 4 
                               WHEN 4 THEN @c3 
                               ELSE [text2] 
                             END 
            WHERE  [tabled_id] = @pkc1 

            IF @@rowcount = 0 
              IF @@microsoftversion > 0x07320000 
                BEGIN 
                    IF EXISTS (SELECT * 
                               FROM   sys.all_parameters 
                               WHERE  object_id = Object_id('sp_MSreplraiserror' 
                                                  ) 
                                      AND [name] = '@param3') 
                      BEGIN 
                          SET @primarykey_text = 
                          @primarykey_text + '[TableD_ID] = ' 
                          + CONVERT(NVARCHAR(100), @pkc1, 1) 

                          EXEC Sp_msreplraiserror 
                            @errorid=20598, 
                            @param1=N'[dbo].[TableD]', 
                            @param2=@primarykey_text, 
                            @param3=13233 
                      END 
                    ELSE 
                      EXEC Sp_msreplraiserror 
                        @errorid=20598 
                END 
        END 
  END --

You can see how bitmask is used to determine if the column actual changed or no.

CALL

First thing lets change SCALL to CALL of TableD. You can do it in SSMS by going to Publication Properties and then to Article properties or just run T-SQL like that one:

exec sp_changearticle 
@publication ='Publication1',
@article = 'TableD',
@property ='upd_cmd',
@value='CALL [sp_MSupd_dboTableD]',
@force_invalidate_snapshot=0

After you run this you do not need to run snapshot. New update procedure will be delivered as normal replicated commands and now the code for update procedure looks like this:

create procedure [sp_MSupd_dboTableD]     @c1 int,     @c2 nvarchar(100),     @c3 nvarchar(200),     @pkc1 int
as
begin   	declare @primarykey_text nvarchar(100) = '' if not (@c1 = @pkc1)
begin  
update [dbo].[TableD] set     [TableD_ID] = @c1,     [TableD_text] = @c2,     [text2] = @c3
	where [TableD_ID] = @pkc1 if @@rowcount = 0
    if @@microsoftversion>0x07320000
		Begin
			if exists (Select * from sys.all_parameters where object_id = OBJECT_ID('sp_MSreplraiserror') and [name] = '@param3')
			Begin
				
				set @primarykey_text = @primarykey_text + '[TableD_ID] = ' + convert(nvarchar(100),@pkc1,1)
				exec sp_MSreplraiserror @errorid=20598, @param1=N'[dbo].[TableD]', @param2=@primarykey_text, @param3=13233 
			End
			Else
				exec sp_MSreplraiserror @errorid=20598
		End end   else
begin  
update [dbo].[TableD] set     [TableD_text] = @c2,     [text2] = @c3
	where [TableD_ID] = @pkc1 if @@rowcount = 0
    if @@microsoftversion>0x07320000
		Begin
			if exists (Select * from sys.all_parameters where object_id = OBJECT_ID('sp_MSreplraiserror') and [name] = '@param3')
			Begin
				
				set @primarykey_text = @primarykey_text + '[TableD_ID] = ' + convert(nvarchar(100),@pkc1,1)
				exec sp_MSreplraiserror @errorid=20598, @param1=N'[dbo].[TableD]', @param2=@primarykey_text, @param3=13233 
			End
			Else
				exec sp_MSreplraiserror @errorid=20598
		End end  end   --

As you can see no bitmap mask just simple update to ALL columns in tables and proc does not check if values has changed.

Now lets see how the update looks like in the distribution db. As before we just updating one column in that table:

update TableD
set text2='test_2222'
where TableD_ID=2

If you check distribution database you will see this:

CALL_browse

As you can see it does provide all columns and values even if these has not changed. It will always send everything.

CALL vs SCALL

Now you wonder why this is important? Imagine you have large table with 30 columns some big ones like varchar(max) etc. Your application updating only one column in that table…. How do you think this will affect your distribution db? Lets have a look.

I cleaned up distribution db and will populate 10 column table with 1 million rows. Then update that 1 million rows first with SCALL and then with CALL settings.

Table Structure looks like that:

CREATE TABLE [dbo].[TableA](
	[TableA_ID] [int] NOT NULL,
	[Col1] [nvarchar](100) NULL,
	[Col2] [nvarchar](100) NULL,
	[Col3] bigint NULL,
	[Col4] int NULL,
	[Col5] int NULL,
	[Col6] int NULL,
	[Col7] datetime NULL,
	[Col8] int NULL,
	[Col9] bigint NULL,
	[Col10] [nvarchar](max) NULL,
PRIMARY KEY CLUSTERED 
(
	[TableA_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

After creating I will populate table with random data and then publish it. Once Snapshot of TableA wth 1 million rows delivered to Subscriber I am checking space of msrepl_commands on distribution database for that Publication (nothing else is getting into that distribution db at this point).

Snapshot delivered

Right now msrepl_commands is 35 rows and 40KB reserved. It contains only initial snapshot. Article is set to use SCALL for updates. Lets send now 1 million updates just for one column, Col4 which is just INT. I will increase that column by number 100 that is it.

1 million SCALL updates sent down

After sending 1 million updates with SCALL my msrepl_commands table went to: 1000035 rows and 183368 KB reserved for that table.

Lets change article to normal CALL and lets send 1 million updates for Col5.

1 million CALL updates sent down

Now my msrepl_commands table went to 2000088 rows and reserved 759624 KB on drive. Just to explain why row count is not 2000035… it is because I changed article using sp_changearticle which added few rows still very small ones (should not take more than 40 KB)

Summary

As you can see using normal CALL will take way more space inside your distribution db

CALL = 576 256 KB

SCALL = 183 328 KB

Same transactions but with CALL we are using almost 3 times more of storage space. Before you change default SCALL setting to CALL think twice and check storage on your distributor.

If you have any questions just use comments section.

Did you use CALL in some cases and if yes to what were these? What setting do you normally use for UPDATES?

December 7, 2017

sp_msUpd% has too many arguments, sp_msUpd% expects parameter. Replication Procedures part 2

In previous post I explained basics of replication procedures. Now lets have a look into some common issues with these. Especially when you Publishing from two (or more) sources, same tables into the same destination (Subscriber).

Here is an example:

Twopublishers_articles

As you can see both articles from two different Publishers are using exactly same procedure. If you check Subscriber you will see only 3 (not 6) procedures for that table:

Subscriber_procs

In that specific scenario you need to be careful with any changes to schema on Publisher. Why? …..

Lets assume you have Subscriber that is getting data from two (or more) different sources (Publishers). TableA on Subscriber getting data from Publisher1 and Publisher2. By default replication procedure names will have the same name “sp_Msupd_dboTableA” , that means both publishers will be using exact same procedure for insert, updates and deletes. You are receiving request from DEV to modify that table. To be exact drop a column. If you deploy change to Publisher1 only the replication procedure will be changed and new version (without dropped column) of the procedure will be applied on subscriber. The new procedure wont have that old column… but what about Publisher2? Publisher2 will be trying to execute procedure still with old column and you will receive “Procedure or function sp_MSins_dboTableA has too many arguments specified.”.

Here is an example. We dropped column id_temp on Publisher1 and then inserted one row on Publisher2:

Column dropped

Once we did that Distribution agent for Publisher2 will start failing:

Dropped column error

How to solve it and/or how to prevent it to happen?

Solution

Once you get to that point when receiving such error you should follow these steps:

Apply change (in our case drop column) on the second Publisher. This wont solve the issue straight away but it will modify replication procedure code to the same as we have on Publisher1. The DDL change will wait in the queue for delivery.
As procedure on Subscriber has 2 instead of 3 parameters we need to modify that procedure manually and include 3rd parameter.
Completing Step 2 will cause any inserts coming from Publisher1 failing as modified procedure expect @c3, meanwhile Publisher2 will deliver command that was failing before. You can also set default NULL value in the proc @c3 bigint=NULL and this wont cause error on Publisher1
Wait until change from Step 1 will be replicated down to Subscriber. It will change back procedure to version without old procedure. This will automatically fix distribution agent that will be failing (Step 2). Both Publisher will have same schema and same replication procedure code for that table.

Similar step you need to follow if u add new column to replicated table. Just instead of adding parameter you need apply old procedure (without new column) from second Publisher (Scripting out Replication Procedures).

Preventing it

You can prevent it to happen in two ways.

While you are configuring Publication. When you are adding articles to Publication you can specify INS/UPD/DEL procedure names. Just use different names for Publisher1 and Publisher2. Below you can see that both are using different procedures.
You can always modify procedures names by using sp_changearticle procedure on a Publisher. However if are receiving error on distribution agent already, this change wont help as we are using Transactional Replication the “change article” will be in the queue behind the failing command. You need to be proactive and do that change before error occur. Here is an example of modifying Insert procedure: exec sp_changearticle
@publication=’Publication2′,
@article=’tableA’,
@property=’ins_cmd’,
@value=’CALL sp_MSins_TAbleA_Publisher2′

Now important thing is that you can’t prevent this issue to occur for dropped columns as the old procedure (from second Publisher) will try to insert data into old column anyways and column will be missing. If so happens follow the steps I provided in that post.

The Best way to avoid issues with replication procedures is working closely with DEV when doing schema changes and configure you publications as mentioned above.

October 16, 2017October 15, 2017

The selected Subscriber does not satisfy the minimum version compatibility level of the selected publication.

As you may know SQL Replication can replicate data between different versions of SQL. It can replicated data from SQL 2012 to SQL 2016 etc. It can also replicate data from higher version to a lower version of SQL. However if you are trying to setup replication to a subscriber that is on higher version than your publisher you may get following error:

“The selected Subscriber does not satisfy the minimum version compatibility level”

Now the problem occurs only when you are adding a new subscriber using GUI and it is related only to version of SSMS you are using during setup.

Lets assume you are configuring subscriber that is on 2016 but your publisher and distributor is on 2012. You most likely connected to a Publisher and trying to add new subscription stream.

You have two solution for this:

Use T-sql. When setting up by using T-SQL, the version of subscriber is not being checked as you are not even trying to connect to a subscriber. Only Distribution agent will connect to it later after the setup.
As T-SQL setup could be bit difficult you have a second solution, which is to use tools that are on the same version as subscriber (in our example it should be SSMS 2016). You can do it either from Subscriber or other server. NOTE: The tools/SSMS should be located on workstation/server that have access to Publisher/Distributor and Subscriber. you need to have access to all 3 components of replication.

If you have other solution for this issue let me know in comments:)

October 6, 2017October 6, 2017

[TSQL Script] Browse replication commands

While you can use just simple stored proc to browse commands (How to “Browse” Distributor (sp_browsereplcmds explained)) you can notices few issues with that proc… for example commands are not ordered by command_id. Here is some script I created. I found it useful especially if I want see the order of commands applied to subscriber:)

You just need to provide, published database name, Seqno of transaction that you want to check (you can get it from replication monitor, query msrepl_transactions etc.) and command id that you want to check. Leave NULL if you want to see ALL commands in given transaction.

--BROWSE REPL COMMANDS

declare @publisher_db nvarchar(150)
declare @publisher_db_id int
declare @seqno nvarchar(500)
declare @seqno_bin varbinary(16)
declare @error_id int
declare @error_txt nvarchar(1000)
declare @command_id int


/*-----------------PARAMETERS---------------------*/
set @publisher_db='<Published db>'
set @seqno_bin=<LSN>
set @seqno='<LSN>'
set @command_id = <command id> -- Leave NULL if all you want to get ALL commands from that transaction.
/*-----------------------------------------------*/

create table #temp_commands (
xact_seqno    varbinary(16) NULL,                    
originator_srvname   varchar(100) NULL,                                                                                                            
originator_db        varchar(100) NULL,                                                                                                            
article_id   		 int NULL,
type        		  int NULL,
partial_command 		bit NULL,
hashkey 				int NULL,
originator_publication_id  int NULL,
originator_db_version      int NULL,
originator_lsn         varbinary	(16) NULL,
command                 nvarchar   (1024) NULL,                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
command_id				int NULL)

select @publisher_db_id=id from mspublisher_databases where publisher_db=@publisher_db
select @error_id=error_id from msdistribution_history where xact_seqno=@seqno_bin
select top 1 @error_txt=error_text from  dbo.MSrepl_errors where id = @error_id

insert into #temp_commands
exec sp_browsereplcmds 
@xact_seqno_start=@seqno,
@xact_seqno_end=@seqno,
@publisher_database_id=@publisher_db_id

if @command_id is NULL
BEGIN
select command,command_id, @error_txt as 'Error' from #temp_commands order by command_id
END
ELSE
BEGIN
select command,command_id, @error_txt as 'Error' from #temp_commands where command_id=@command_id order by command_id
END
drop table #temp_commands

September 7, 2017September 5, 2017

Execs and desktop heap

Agent Processing Account

In previous post I explained how Replication agents works. It is pretty mych exe running on Distributor (or in pull replication distrib.exe on Subscriber). As always each exe is running under some kind of Windows account. Like here: execs

As you can see distrib.exe and logread.exe are running on SQL Service account and also you notice each if these is using RAM. That account you choosing when you configure publication (configuring snapshot and logreader processing account) and subscriber (configuring distribution agent). Every time you configuring one of these things you need to provide Security settings. There are two things that need to be provided. First processing account and second authentication account. In some cases you can use same account for both, like here:

SecuritySettings

As you can see we will be using SQL Service account for processing logread.exe and also we will use same account in order to connect to Publisher (in order to read transactions). Side note: If you choosing to do so, you need to make sure that this account has all necessary permissions on Publisher server.

Side note 2: You can modify these settings always by going to Publication settings/properties (to modify snapshot and logreader security settings) and Subscription settings/properties (to modify Distribution Agent settings).

We focusing now only on first part. Processing account this is the one that exe will be running under.

You probably wonder why I’m writing about this? Well it is to do with “noninteractive desktop heap “.

Noninteractive desktop heap

Now I would like to put small disclaimer. You may not get to this point where you will get that issue. What is the exact issue?

Let’s assume you have one remote distributor that is used to run 50,60, 100 or more distribution agents. Each of this agent is configured to run under SQL Service account. Each agent is running exe under that account and each process taking some amount of memory on that instance. It is also taking so called “noninteractive desktop heap” that is allowing running processes concurrently. That value is limited for each processing account and it means once u reach it for SQL Service account you wont be able to run any more replication agents.

Once you get to this point the replication agents just wont run… the problem is it wont give you ANY error or just fail without giving any description… it will just stuck on “Starting agent” and finally fail after sometime.

You can test if you have that issue already and just run exe command manually under your account (if you have all necessary permissions on the server) and if agent is running fine then most probably you reached that limit. How to run replication agents manually I described here:

Understanding Replication Agents Part 2

Solution:

Change processing account to use different Domain account. Remember it need to have proper permissions on Distributor. You will need to restart agent after you change processing account.
Increase Desktop Heap Size. However this need server reboot:Increasing the noninteractive desktop heap size

I normally go with solution no. 1 and create few more domain account and run agent under these accounts.

August 8, 2017

How to “Browse” Distributor (sp_browsereplcmds explained)

In last post I explained how LSN and seqno works. In this one we will focus on the tool that helps you investigate what actually happen with your replication.

Procedure sp_browsereplcmds.

This is one of the most important tools that use while dealing with transactional replication. The parameters that I only use with that proc are:

@xact_seqno_start
@xact_seqno_end
@publisher_database_id
@article_id
@command_id
@agent_id

Where only first three are “obligatory” in order to get some usefull information. For example not providing @xact_seqno_end will get you ALL commands from seqno that u provided as @xact_seqno_start…. and that could ba a LOT!.

Ok lets explain this base on some example. You have transactional replication and one of the agents is failing with “row not found on subscriber” error. Which pretty much means that the data is missing on subscriber… how do you find what exactly is broken? Well as I explained in previous post in most cases you get seqno and command id from replication monitor (from error of dist agent). To get this exact command you need to provide seqno as a start and end range. Also provide publisher database id and to get this id you can get from distribution db:

select * from MSpublisher_databases

In order to narrow down to exact command just add command id. If you want get entire transaction and all commands in given transaction just skip that variable. Here is the example. First I deleted a row on Subscriber and then I tried to update same row on Publisher.

Error:

error_row_not_found

Updating parameters taken from error above:

exec sp_browsereplcmds 
@xact_seqno_start='0x0000002400000053000300000000',
@xact_seqno_end='0x0000002400000053000300000000',
@publisher_database_id=1, --select * from MSpublisher_databases
@command_id=1

After running it you will get command that is actually failing.

command

As you can see this is an Update procedure on TableA. Now you are able to find what exact row is missing. To get exact information about what value is matching what parameter in that update proc just go to Subscriber (and subscriber db) and check the code of that procedure. In our case it will be this:

Procedure

The good news is that from SQL 2014 SP2 CU2 you won’t need this and all the information about errors you will be able to get from actual error from Replication Monitor. More about that change here (as it may give you some issues if u upgrade your servers in wrong order):

Procedure or function sp_MSreplraiserror has too many arguments specified.

Still it is good to know how to get exact command from distribution database.

June 23, 2017February 27, 2018

Procedure or function sp_MSreplraiserror has too many arguments specified.

UPDATE (27/02/2018)

Some great news! It seems that that error been fixed in SQL 2014 SP2 CU3. More information here . They added following logic at the end of the replication procedures:

            IF @@rowcount = 0 
              IF @@microsoftversion > 0x07320000 
                BEGIN 
                    IF EXISTS (SELECT * 
                               FROM   sys.all_parameters 
                               WHERE  object_id = Object_id('sp_MSreplraiserror' 
                                                  ) 
                                      AND [name] = '@param3') 
                      BEGIN 
                          SET @primarykey_text = 
                          @primarykey_text + '[TableD_ID] = ' 
                          + CONVERT(NVARCHAR(100), @pkc1, 1) 

                          EXEC Sp_msreplraiserror 
                            @errorid=20598, 
                            @param1=N'[dbo].[TableD]', 
                            @param2=@primarykey_text, 
                            @param3=13233 
                      END 
                    ELSE 
                      EXEC Sp_msreplraiserror 
                        @errorid=20598 
                END

It seems it is checking a version of SQL and if version is low it is executing proc without additional parameters.

Still if patching server is not an option for you, follow this blog post:)

END OF UPDATE

This is quite common error when you upgrade your 2014 publishers with SP2. First of all the best way of upgrading your replication infrastructure is to start with subscribers. In real world this is not always possible.

This error occurs when you apply SQL 2014 SP2 to your publishers but not to subscriber. It seems Microsoft modifying error handling (thank GOD!) for replication. Currently as you may know when u getting PK, row not found errors you only getting seqno and command id. Since 2014 SP2 the replication will give you table name and values of Primary Key (to identify the duplicate/missing row). That is great improvement as you don’t need to go to distributor to browse commands (more about browsing commands over here …. )

They added 3rd parameter to sp_MSreplraiserror procedures. Since this is systems proc you not able to update it on subscriber. The only permanent solution for this is applying SP2 to your subscriber.

However there is a workaround for this but only temporary one. First step is to script out ALL replication procs for given publication on publisher. Using following script:

sp_scriptpublicationcustomprocs @publication = ”

Once you copy output, paste it on subscriber db and replace ‘,@param3=’ with ‘- -@param3=’

As I said this is just temporary solution as these procedures may be overwritten if you make changes to some or all articles, you create new subscription or drop/recreate it etc. When you add new articles you will need to repeat process for all articles or if you like for given articles you just added.

I know it is not the best solution for this but I don’t see any other way of doing it. If you do have other ideas, let me know in comments.

May 12, 2017August 29, 2018

When Distribution Cleanup “not behaving”

Distribution Cleanup running every minute and completing with success? –check

Distributor has plenty of resources and everything seems fine?- check

but msrepl_commands is getting bigger (and bigger) and Cleanup locks with Logreader……. why?

That happens quite often in two cases:

we are using one distribution db for multiple PUBLISHERS.
tables that we are publishing are very busy and cleanup just not handling the load.

I would like to focus only on the second point. Lets assume 1st point has been fixed and there is only one Publisher going trough that one distribution database.

Retention:

I like to keep retention on “fixed” values, means min and max are the same.

EXEC dbo.sp_MSdistribution_cleanup @min_distretention = 24, @max_distretention = 24

Defaults by Microsoft are min=0 and max=72 which means everything thats been delivered will be deleted on next cleanup run and if replication fails to deliver the replication data will be removed after 72 hours. I like to set it to same values as it did save my “life” few times (more about “disaster recovery” using LSN and transnational replication later).

Im mentioning retention because if you set it to high then the table msrepl_commands and msrepl_transactions will become huge and hard to maintain. Check your retention settings on distributor and set it to values that wont cause disk space issues but still meets your and business needs.

Cleanup procedures:

This is quite interesting… I did notice in very big and busy replication environments some of the default settings does not work well. Reason? The default batch size for delete is 2000. Which is bit low for system that can get 30K (or higher) commands every minute. In such cases you think cleanup is working fine and running but actually it is not coping with amount of load. In this case you will see msrepl_commands just going very big and above the retention period you set on distribution db.

Now just a note… It is not the perfect way but in some cases it need to be done. We need to modify two system procedures on distribution db. Here are the changes I suggest and changes that helped me on large distribution dbs:

Modifications to sp_MSdelete_publisherdb_trans (proc that cleans up msrepl_commands)

Batch size (DELETE TOP()) increase that to value that suits you. Remember to change as well at the end of the loop IF @row_count < [same value as in DELETE TOP]. There are two loops that you need to change it.
Modifying PAGELOCK to ROWLOCK. This will help with locking
Additionally you can modify MAXDOP setting (if different globaly on server). I never need to modify it so far. Works good with maxdop=1

2. Modifications to sp_MSdelete_dodelete (proc that cleans up msrepl_transactions:

Batch size (DELETE TOP()) increase that to value that suits you. Remember to change as well at the end of the loop IF @row_count < [same value as in DELETE TOP]. There are two loops that you need to change it.
Modifying PAGELOCK to ROWLOCK. This will help with locking

As you see two modifications are exactly same between these two the only difference will be the values in TOP part. For sp_MSdelete_dodelete you can set these values lower (or not). You just need to see what works the best on your system.

One last thing about this. If you made this changes on SQL 2014 and will apply SP2 CU2. All changes will be replaced with default settings again.

Articles/Publications:

At this point we need to ask question.. Do we really need to replicate all tables? Do we really need so many publications? and the most important do we have articles that are published in multiple publications???

First two questions you need to answer by yourself… however if answer on the last question is “yes I do have articles published in more than one publication” this is part you should read.

Lets assume you have TableA that is published in publication Pub1 but you also have publication named Pub2. 99% of tables published in these two are different but one table that getting 1 million transactions a day is published in both publications… now what happened just by doing it you multiple entries in msrepl_commands for that article. Here is an example. Below query showing how commands are looking for ONE insert into table A.

multiple_commands

As you see because table is published in two different publication you will get two entries for each transactions on that table.

How you can solve it? If possible, send that table with one publication. Remember you can decide what articles goes to which subscription. You can have two subscriptions going out from one Publication but with different sets of articles. But this is topic for different article.

Distribution Cleanup Job:

Cleanup job by default is set to run every 10 minutes. You can decrease that time to 1 minute but this wont always help, still you cleanup can fail with a deadlock. On busy systems I do often adding retry attempts to cleanup step and putting 0 between retries. That save us time and cleanup will try to run again straight away. Note: Do not set retry attempts too high. Otherwise you wont know if there are some real problems with Cleanup Job (because it will never fail or fail when it will be to late and you ran out of disk space).

SQL Server configurations:

Is our distributor following best practices? (multiple tempdb files, maxdop etc.) Is our distributor running on default settings? (default 1MB autogrowth on distribution db, low memory settings etc.). This is just reminder as Im assuming these things you already checked and you ran out of ideas, that is why you are here:).

Last thing is about running update stats on distributors. You may or may not know that it is not really needed…. as cleanup procs running UPDATE STATISTICS (on msrepl_commands and msrepl_transactions) at the begging and at the end of the cleanup proc execution.

Conclusion:

Check your Distributor configurations. Check retention on distribution dbs. Modify cleanup procedures accordingly. Review your publications and articles. Add retry attempts to cleanup step.

These are my suggestions to you. If you have any other ideas or comments let me know.