The row was not found at the Subscriber when applying

In next few posts will try to cover most of the common Transactional Replication errors.

First one that I want to focus on is “Row not found at the Subscriber”. That error will come up anytime that data is missing at the subscriber and that given row been modified (update or delete) on Publisher. Only UPDATE and DELETE can cause this error. Since SQL 2014 SP2 Microsoft added better error handling and you wont need to browse distribution db in order to get more information. More about this here in a update section: sp_Msreplraiseerror (2014 SP2)

Here is my error (as you can see I am using SQL 2014 with SP2).

RowNotfoundError

Troubleshooting

Lets assume we still on older version and the only thing we are getting is the seq no and command_id of the failing command/transaction. First what you need to check is distribution db and browse it just to get full transaction. Here are my results:

browse_error

As you can see it is update to TableC where PK value is =1. After checking subscriber we can confirm that the row is missing there.

Now there are two ways of fixing it:

1.Skip

By default all agent running on “Default Agent profile” that do not skip ANY Errors. However if you do not care about data on your subscriber to be consistent then you can skip that error by changing Agent Profile. You can see that this error have a code 20598, you can change agent profile to “Continue on data consistency errors.” or create your own Agent Profile with that -SkipErrors value 20598 and restart the agent (since 2017 CU3 you won’t need to restart agents). After that change you will see that Distribution Agent skipped that command and continue with next transactions. Note: That means ALL errors with that error code will be skipped for ALL published tables for that subscription!

If you want to be more selective and skip row not found only for specific table what you can do is modify the update replication proc for given table on the subscriber. You just commenting out last part of proc and put RETURN(1) like that:

skipping_replication_proc

After error been skipped you can revert changes.

2. Insert missing data

I would assume we do not want to miss data on the subscriber. The easiest way for fixing smaller amount of data we can just use tool sp_generate_inserts  I am using it a lot and it is very useful with plenty of settings (include columns, Identity etc.). However it we have bigger set of data you could export table (or just set of data) to subscriber under different name and use left/right join to find missing data. It is quite a bit of work but in this case you will avoid running snapshot and overwriting entire table on Subscriber. Alternative for this is creating separate Snapshot replication on the side for that one table (with different destination name) and deliver the snapshot to Subscriber and then do the join to find missing data. Just note on that if you decide to overwrite table on Subscriber with new Snapshot you will get other errors on that Distribution agent. Here is an example.

We are assuming TableC is still being populated with inserts/updates/deletes… but our distribution agent is failing with an error (row not found) therefore our pending commands start to climbing:

pendingcmds

Lets create a Snapshot replication for that one table:

SnapshotReplication1Snapshot2Snapshot3

Once Publication created you can go and create Subscription (remember to leave Initialize ticked).

NewSub_snapshot1NewSub_snapshot2NewSub_snapshot3NewSub_snapshot4

Now we can generate the Snapshot for or TableC_fix replication:

Snapshot_Generated

Now after Snapshot been delivered we stopped getting Row not found error, we start getting PK violation… just because the command in pending queue in our transactional replication have INSERTS for data that been already delivered by our Snapshot replication.

Erroraftersnapshot

That pretty much means our data is already there in Subscriber. To fix it just Create an Agent Profile with SkipError value 2627 and restart the agent. Once Transactional Replication get to the point when we generate Snapshot the error will disappear  and normal replication will start delivering data.

Skipping

3. Snapshot

In my world I am not able to do it and it can be a bit of pain as well. However if table is big and you want to be sure 100% that you are not missing ANY data, you may not have other option. In order to do that you need to drop article from subscription first by running sp_dropsubscription with @article parameter specified and then sp_Droparticle command for that given article. Then re-add that table into publication and run EXEC sp_refreshsubscriptions @publication ='<publication name’ . (Just remember you need to have Immediate_sync option set to false otherwise you will generate entire snapshot).

DELETE

How you should proceed if it turns out that it is DELETE statement? In my experience you can just skip it, by using agent profile or more selective way… changing DELETE replication procedure on subscriber (point 1). The data is already gone from Publisher and there is no way that we can fix it…

Lastly the most important thing is to find out WHY we are getting these errors (either UPDATE or DELETE) and get the root cause of the issue. Ideally that error should NEVER happen, especially in 1 to 1 structure (one publisher to one subscriber).

Most of you experience that error, for sure. Do you have any other ideas to fix it? Other approach when handling that error? If yes, lets discuss in the comment section.

How to “Browse” Distributor (sp_browsereplcmds explained)

In last post I explained how LSN and seqno works. In this one we will focus on the tool that helps you investigate what actually happen with your replication.

Procedure sp_browsereplcmds.

This is one of the most important tools that use while dealing with transactional replication. The parameters that I only use with that proc are:

  • @xact_seqno_start
  • @xact_seqno_end
  • @publisher_database_id
  • @article_id
  • @command_id
  • @agent_id

Where only first three are “obligatory” in order to get some usefull information. For example not providing @xact_seqno_end will get you ALL commands from seqno that u provided as @xact_seqno_start…. and that could ba a LOT!.

Ok lets explain this base on some example. You have transactional replication and one of the agents is failing with “row not found on subscriber” error. Which pretty much means that the data is missing on subscriber… how do you find what exactly is broken? Well as I explained in previous post in most cases you get seqno and command id from replication monitor (from error of dist agent). To get this exact command you need to provide seqno as a start and end range. Also provide publisher database id and to get this id you can get from distribution db:

select * from MSpublisher_databases

In order to narrow down to exact command just add command id. If you want get entire transaction and all commands in given transaction just skip that variable. Here is the example. First I deleted a row on Subscriber and then I tried to update same row on Publisher.

Error:

error_row_not_found

Updating parameters taken from error above:

exec sp_browsereplcmds 
@xact_seqno_start='0x0000002400000053000300000000',
@xact_seqno_end='0x0000002400000053000300000000',
@publisher_database_id=1, --select * from MSpublisher_databases
@command_id=1

 

After running it you will get command that is actually failing.

command

As you can see this is an Update procedure on TableA. Now you are able to find what exact row is missing. To get exact information about what value is matching what parameter in that update proc just go to Subscriber (and subscriber db) and check the code of that procedure. In our case it will be this:

Procedure

The good news is that from SQL 2014 SP2 CU2 you won’t need this and all the information about errors you will be able to get from actual error from Replication Monitor. More about that change here (as it may give you some issues if u upgrade your servers in wrong order):

Procedure or function sp_MSreplraiserror has too many arguments specified.

Still it is good to know how to get exact command from distribution database.

 

Procedure or function sp_MSreplraiserror has too many arguments specified.

UPDATE (27/02/2018)

Some great news! It seems that that error been fixed in SQL 2014 SP2 CU3. More information here . They added following logic at the end of the replication procedures:

            IF @@rowcount = 0 
              IF @@microsoftversion > 0x07320000 
                BEGIN 
                    IF EXISTS (SELECT * 
                               FROM   sys.all_parameters 
                               WHERE  object_id = Object_id('sp_MSreplraiserror' 
                                                  ) 
                                      AND [name] = '@param3') 
                      BEGIN 
                          SET @primarykey_text = 
                          @primarykey_text + '[TableD_ID] = ' 
                          + CONVERT(NVARCHAR(100), @pkc1, 1) 

                          EXEC Sp_msreplraiserror 
                            @errorid=20598, 
                            @param1=N'[dbo].[TableD]', 
                            @param2=@primarykey_text, 
                            @param3=13233 
                      END 
                    ELSE 
                      EXEC Sp_msreplraiserror 
                        @errorid=20598 
                END 

 

It seems it is checking a version of SQL and if version is low it is executing proc without additional parameters.

Still if patching server is not an option for you, follow this blog post:)

END OF UPDATE

This is quite common error when you upgrade your 2014 publishers with SP2. First of all the best way of upgrading your replication infrastructure is to start with subscribers. In real world this is not always possible.

This error occurs when you apply SQL 2014 SP2 to your publishers but not to subscriber. It seems Microsoft modifying error handling (thank GOD!) for replication. Currently as you may know when u getting PK, row not found errors you only getting seqno and command id. Since 2014 SP2 the replication will give you table name and values of Primary Key (to identify the duplicate/missing row). That is great improvement as you don’t need to go to distributor to browse commands (more about browsing commands over here …. )

They added 3rd parameter to sp_MSreplraiserror procedures. Since this is systems proc you not able to update it on subscriber. The only permanent solution for this is applying SP2 to your subscriber.

However there is a workaround for this but only temporary one. First step is to script out ALL replication procs for given publication on publisher. Using following script:

sp_scriptpublicationcustomprocs @publication = ”

Once you copy output, paste it on subscriber db and replace  ‘,@param3=’ with ‘- -@param3=’

As I said this is just temporary solution as these procedures may be overwritten if you make changes to some or all articles, you create new subscription or drop/recreate it etc. When you add new articles you will need to repeat process for all articles or if you like for given articles you just added.

I know it is not the best solution for this but I don’t see any other way of doing it. If you do have other ideas, let me know in comments.