In next few posts will try to cover most of the common Transactional Replication errors.
First one that I want to focus on is “Row not found at the Subscriber”. That error will come up anytime that data is missing at the subscriber and that given row been modified (update or delete) on Publisher. Only UPDATE and DELETE can cause this error. Since SQL 2014 SP2 Microsoft added better error handling and you wont need to browse distribution db in order to get more information. More about this here in a update section: sp_Msreplraiseerror (2014 SP2)
Here is my error (as you can see I am using SQL 2014 with SP2).
Lets assume we still on older version and the only thing we are getting is the seq no and command_id of the failing command/transaction. First what you need to check is distribution db and browse it just to get full transaction. Here are my results:
As you can see it is update to TableC where PK value is =1. After checking subscriber we can confirm that the row is missing there.
Now there are two ways of fixing it:
By default all agent running on “Default Agent profile” that do not skip ANY Errors. However if you do not care about data on your subscriber to be consistent then you can skip that error by changing Agent Profile. You can see that this error have a code 20598, you can change agent profile to “Continue on data consistency errors.” or create your own Agent Profile with that -SkipErrors value 20598 and restart the agent (since 2017 CU3 you won’t need to restart agents). After that change you will see that Distribution Agent skipped that command and continue with next transactions. Note: That means ALL errors with that error code will be skipped for ALL published tables for that subscription!
If you want to be more selective and skip row not found only for specific table what you can do is modify the update replication proc for given table on the subscriber. You just commenting out last part of proc and put RETURN(1) like that:
After error been skipped you can revert changes.
2. Insert missing data
I would assume we do not want to miss data on the subscriber. The easiest way for fixing smaller amount of data we can just use tool sp_generate_inserts I am using it a lot and it is very useful with plenty of settings (include columns, Identity etc.). However it we have bigger set of data you could export table (or just set of data) to subscriber under different name and use left/right join to find missing data. It is quite a bit of work but in this case you will avoid running snapshot and overwriting entire table on Subscriber. Alternative for this is creating separate Snapshot replication on the side for that one table (with different destination name) and deliver the snapshot to Subscriber and then do the join to find missing data. Just note on that if you decide to overwrite table on Subscriber with new Snapshot you will get other errors on that Distribution agent. Here is an example.
We are assuming TableC is still being populated with inserts/updates/deletes… but our distribution agent is failing with an error (row not found) therefore our pending commands start to climbing:
Lets create a Snapshot replication for that one table:
Once Publication created you can go and create Subscription (remember to leave Initialize ticked).
Now we can generate the Snapshot for or TableC_fix replication:
Now after Snapshot been delivered we stopped getting Row not found error, we start getting PK violation… just because the command in pending queue in our transactional replication have INSERTS for data that been already delivered by our Snapshot replication.
That pretty much means our data is already there in Subscriber. To fix it just Create an Agent Profile with SkipError value 2627 and restart the agent. Once Transactional Replication get to the point when we generate Snapshot the error will disappear and normal replication will start delivering data.
In my world I am not able to do it and it can be a bit of pain as well. However if table is big and you want to be sure 100% that you are not missing ANY data, you may not have other option. In order to do that you need to drop article from subscription first by running sp_dropsubscription with @article parameter specified and then sp_Droparticle command for that given article. Then re-add that table into publication and run EXEC sp_refreshsubscriptions @publication ='<publication name’ . (Just remember you need to have Immediate_sync option set to false otherwise you will generate entire snapshot).
How you should proceed if it turns out that it is DELETE statement? In my experience you can just skip it, by using agent profile or more selective way… changing DELETE replication procedure on subscriber (point 1). The data is already gone from Publisher and there is no way that we can fix it…
Lastly the most important thing is to find out WHY we are getting these errors (either UPDATE or DELETE) and get the root cause of the issue. Ideally that error should NEVER happen, especially in 1 to 1 structure (one publisher to one subscriber).
Most of you experience that error, for sure. Do you have any other ideas to fix it? Other approach when handling that error? If yes, lets discuss in the comment section.