Replication Agent Profiles explained Part 1

As we know there are 3 Agents that are needed in order to Transactional Replication to work: LogReader, Snapshot and Distribution Agent. Each with different role. Each of these is running under given Agent Profile. What is Agent Profile? In short: these are the settings that our Replication Agents are running with, such as Batch sizes, History level settings etc. Let’s go into the details

Agent Profiles GUI (Replication Monitor)

You can change and see profile setting by going to Replication Monitor, open given agent (Snapshot, Log reader,  Distribution):

AgentProfileGUI

Once you open it you can see stock standard SQL Server Agent Profile with Default profile as a current one if nothing been changed since Replication configuration:

agentsettings

You can create your new custom profile by clicking New and using one of the System profiles as a base of that user profile:

NewAgentProfile.JPG

In order to see ALL settings that you can set you need to untick the checkbox. Otherwise you will see only settings used by the profile you choose at the beginning :

allsettings

In order to change agent profile you just need to choose agent profile and restart given Replication Agent (from SQL 2014 SP3 the restart of  the Agent is not necessary)

Agent Profile T-SQL

As always you can do exactly same thing using T-SQL.

Here  are some useful procedures that can help you manage these. All these procedures need to be executed on distributor.

These need to be run on Distribution DB:

exec sp_help_agent_profile – list all profiles that are present on a given distributor. It is also showing which profile is a default for newly created agents

exec sp_help_agent_parameter – shows all parameters that are set for all profiles. @profile_id can be  provided to show only settings for given profile.

Here are some system tables can be run on msdb database on distributor, these are pretty much base tables used by procedures I mentioned before:

select * from [dbo].[MSagent_profiles]
select * from [dbo].[MSagent_parameters]
select * from [dbo].[MSagentparameterlist]

If you want to change profiles settings or change assigned profile to given agent you need to use following procedures.

exec sp_add_agent_parameter – adding parameter to agent profile
exec sp_add_agent_profile – adding new profile
exec sp_change_agent_parameter – changing parameter of a given profile
exec sp_change_agent_profile– changing description of a given profile
exec sp_drop_agent_parameter – dropping parameter from the given profile
exec sp_drop_agent_profile – dropping agent profile

exec sp_update_agent_profile – assigning new profile for given Replication Agent

Here is small script that will give you a distribution agents mapped to agent profiles and what parameters these has set.

select a.name,p.profile_name,p.def_profile, par.parameter_name,par.value from MSdistribution_agents a
join msdb.[dbo].[MSagent_profiles] p
on a.profile_id = p.profile_id
join msdb.dbo.[MSagent_parameters] par
on p.profile_id=par.profile_id

As you can see you can add/modify agent profiles either via GUI or T-SQL. Same relates to assigning profiles to give agents (distribution/logreader etc.) however in this case you can “hard-code” the agent profile into given replication agent job. You can read about replication jobs more over here: Understanding Replication Agents Part 2. You can add -ProfileName in the replication agent parameters and then this setting will overwrite the one you can see in GUI or using T-SQL. You can also specify batch sizes and much more. You can check all agent parameters over here: https://docs.microsoft.com/en-us/sql/relational-databases/replication/agents/replication-distribution-agent?view=sql-server-2017

I would not recommend that as then you loosing visibility of what is really going on and under which profile the agents are running.  Here is an example… I put in the job parameter -ProfileName [Default agent profie] and did restart the job:

 

I have change profile using GUI to Continue on data consistency errors:

Then I deleted one row on subscriber and tried to delete same row on Publisher. Running on Continue on data consistency errors profile agent should skip the error. But in our case we hard-coded the profile in the job…. and that is why we are getting error anyways:

I never put profile name (or any other related to profiles parameters) into the job directly and I strongly recommend the same to everyone.

In next blog post I will go more into details on some of the parameters used in Agent Profiles. In some cases you want to play around with these settings as it can improve performance of your replication. 

The row was not found at the Subscriber when applying

In next few posts will try to cover most of the common Transactional Replication errors.

First one that I want to focus on is “Row not found at the Subscriber”. That error will come up anytime that data is missing at the subscriber and that given row been modified (update or delete) on Publisher. Only UPDATE and DELETE can cause this error. Since SQL 2014 SP2 Microsoft added better error handling and you wont need to browse distribution db in order to get more information. More about this here in a update section: sp_Msreplraiseerror (2014 SP2)

Here is my error (as you can see I am using SQL 2014 with SP2).

RowNotfoundError

Troubleshooting

Lets assume we still on older version and the only thing we are getting is the seq no and command_id of the failing command/transaction. First what you need to check is distribution db and browse it just to get full transaction. Here are my results:

browse_error

As you can see it is update to TableC where PK value is =1. After checking subscriber we can confirm that the row is missing there.

Now there are two ways of fixing it:

1.Skip

By default all agent running on “Default Agent profile” that do not skip ANY Errors. However if you do not care about data on your subscriber to be consistent then you can skip that error by changing Agent Profile. You can see that this error have a code 20598, you can change agent profile to “Continue on data consistency errors.” or create your own Agent Profile with that -SkipErrors value 20598 and restart the agent (since 2017 CU3 you won’t need to restart agents). After that change you will see that Distribution Agent skipped that command and continue with next transactions. Note: That means ALL errors with that error code will be skipped for ALL published tables for that subscription!

If you want to be more selective and skip row not found only for specific table what you can do is modify the update replication proc for given table on the subscriber. You just commenting out last part of proc and put RETURN(1) like that:

skipping_replication_proc

After error been skipped you can revert changes.

2. Insert missing data

I would assume we do not want to miss data on the subscriber. The easiest way for fixing smaller amount of data we can just use tool sp_generate_inserts  I am using it a lot and it is very useful with plenty of settings (include columns, Identity etc.). However it we have bigger set of data you could export table (or just set of data) to subscriber under different name and use left/right join to find missing data. It is quite a bit of work but in this case you will avoid running snapshot and overwriting entire table on Subscriber. Alternative for this is creating separate Snapshot replication on the side for that one table (with different destination name) and deliver the snapshot to Subscriber and then do the join to find missing data. Just note on that if you decide to overwrite table on Subscriber with new Snapshot you will get other errors on that Distribution agent. Here is an example.

We are assuming TableC is still being populated with inserts/updates/deletes… but our distribution agent is failing with an error (row not found) therefore our pending commands start to climbing:

pendingcmds

Lets create a Snapshot replication for that one table:

SnapshotReplication1Snapshot2Snapshot3

Once Publication created you can go and create Subscription (remember to leave Initialize ticked).

NewSub_snapshot1NewSub_snapshot2NewSub_snapshot3NewSub_snapshot4

Now we can generate the Snapshot for or TableC_fix replication:

Snapshot_Generated

Now after Snapshot been delivered we stopped getting Row not found error, we start getting PK violation… just because the command in pending queue in our transactional replication have INSERTS for data that been already delivered by our Snapshot replication.

Erroraftersnapshot

That pretty much means our data is already there in Subscriber. To fix it just Create an Agent Profile with SkipError value 2627 and restart the agent. Once Transactional Replication get to the point when we generate Snapshot the error will disappear  and normal replication will start delivering data.

Skipping

3. Snapshot

In my world I am not able to do it and it can be a bit of pain as well. However if table is big and you want to be sure 100% that you are not missing ANY data, you may not have other option. In order to do that you need to drop article from subscription first by running sp_dropsubscription with @article parameter specified and then sp_Droparticle command for that given article. Then re-add that table into publication and run EXEC sp_refreshsubscriptions @publication ='<publication name’ . (Just remember you need to have Immediate_sync option set to false otherwise you will generate entire snapshot).

DELETE

How you should proceed if it turns out that it is DELETE statement? In my experience you can just skip it, by using agent profile or more selective way… changing DELETE replication procedure on subscriber (point 1). The data is already gone from Publisher and there is no way that we can fix it…

Lastly the most important thing is to find out WHY we are getting these errors (either UPDATE or DELETE) and get the root cause of the issue. Ideally that error should NEVER happen, especially in 1 to 1 structure (one publisher to one subscriber).

Most of you experience that error, for sure. Do you have any other ideas to fix it? Other approach when handling that error? If yes, lets discuss in the comment section.