Day 12 – High CPU and Bloat in Distribution

This is the final installment in the 12 day series for SQL tidbits during this holiday season.

Previous articles in this mini-series on quick tidbits:

  1. SQL Sat LV announcement
  2. Burning Time
  3. Reviewing Peers
  4. Broken Broker
  5. Peer Identity
  6. Lost in Space
  7. Command ‘n Conquer
  8. Ring in The New
  9. Queries Going Boom
  10. Retention of XE Session Data in a Table
  11. Purging syspolicy

distribution

Working with replication quite a bit with some clients you might run across some particularly puzzling problems.  This story should shed some light on one particularly puzzling issue I have seen on more than one occasion.

In working with a multi-site replication and multi-package replication topology, the cpu was constantly running above 90% utilization and there seemed to be a general slowness even in Windows operations.

Digging into the server took some time to find what might have been causing the slowness and high CPU.  Doing an overall server health check helped point in a general direction.

Some clues from the general health check were as follows.

  1. distribution database over 20GB.  This may not have been a necessarily bad thing but the databases between all the publications weren’t that big.
  2. distribution cleanup job taking more than 5 minutes to complete.  Had the job been cleaning up records, this might not have been an indicator.  In this case, 0 records were cleaned up on each run.

The root cause seemed to be pointing to a replication mis-configuration.  The mis-configuration could have been anywhere from the distribution agent to an individual publication.  Generally, it seems that the real problem is more on a configuration of an individual publication more than any other setting.

When these conditions are met, it would be a good idea to check the publication properties for each publication.  Dive into the distribution database and try to find if any single publication is the root cause and potentially is retaining more replication commands than any other publication.  You can use sp_helppublication to check the publication settings for each publication.  You can check MSrepl_commands in the distribution database to find a correlation of commands retained to publication.

Once having checked all of this information, it’s time to put a fix in place.  It is also time to do a little research before actually applying this fix.  Why?  Well, because you will want to make sure this is an appropriate change for your environment.  For instance, you may not want to try this for a peer-to-peer topology.  In part because one of the settings can’t be changed in a peer-to-peer topology.  I leave that challenge to you to discover in a testing environment.

The settings that can help are as follows.

[codesyntax lang=”tsql”]

[/codesyntax]

These settings can have a profound effect on the distribution retention, the cleanup process and your overall CPU consumption.  Please test and research before implementing these changes.

Besides the potential benefits just described, there are other benefits to changing these commands.  For instance, changing replication articles can become less burdensome by disabling these settings.  The disabling of these settings can help reduce the snapshot load and allow a single article to be snapped to the subscribers instead of the entire publication.

No Comments - Leave a comment

Leave a comment

Your email address will not be published. Required fields are marked *










Calendar
January 2014
M T W T F S S
« Dec   Feb »
 12345
6789101112
13141516171819
20212223242526
2728293031  
Content
SQLHelp

SQLHelp

  • @SQLAmerica: The IIF function (introduced in 2012) doesn't seem to work in SQL Server versions 2008, 2012 and 2014.Do I need something enabled? #SQLHelp
  • @SQLSoldier: @dev_b That was an optimization added in (I think) SQL 2012. But don't think that means its free. It still takes a lock, etc. #sqlhelp
  • @dev_b: #SQLHelp Looking at fn_dblog, LOP_DELETE_ROWS is invoked only when the column value changes? 2/2
  • @dev_b: #SQLHelp ORM updates all columns with the existing values, is SQLserver is smart to ignore those or will it updates all columns in a row?1/2
  • @Kevin3NF: @buddingeek @bflippin If they try to get you to use one service for all SQL on all servers,tell them one hack takes down the farm #SQLHelp
  • @skreebydba: @SQLSoldier Okay, thanks. #sqlhelp is the best thing going because of folks like you.
  • @SQLSoldier: @skreebydba Look at, yes. Be able to read them, no. They're not in a readable format and the format is not documented. #sqlhelp
  • @buddingeek: @bflippin have already asked my windows team to create a dedicated domain service account for sql agent jobs . #sqlhelp
  • @skreebydba: @SQLSoldier Robert, thanks for the rapid response. Is there a way to look at the contents? fn_dblog() etc... #sqlhelp
  • @SQLSoldier: @skreebydba The commands needed to revert the rollback the database to the point before it recovered the db to bring it online. #sqlhelp

Welcome , today is Thursday, September 29, 2016