We’re moving to queues

Some time ago I asked on GitHub whether we should move to using SQL Server Service Broker rather than SQL Agent jobs. Sadly, there have been zero responses from the community, so I made my own decision. You can guess what the decision was.

What is Service Broker

SQL Server Service Broker is SQL Server built-in messaging and queuing mechanism that supports internal and external content. Service broker can execute stored procedure when a new message is added to the queue and it happens in real-time. Service broker is included in every edition including Express and is exceptionally reliable, although not the easiest to use.

Why Service Broker

This is something that has been in my head for few months or even years. Since version 3.0, the data collection is invoked via SQL Agent, every 10 seconds. In previous versions of SQLWATCH this was every 1 minute.

To answer why Service Broker here’s few things I dislike about SQL Agent:

SQL Agent is not available in the Express edition so the only way to make SQLWATCH work in the Express edition is to schedule everything via Windows Task Scheduler. Whilst possible, is less than ideal as we’re then losing visibility into the scheduling mechanisms and potential failures.

SQL Agent can fail and may stop, which will result in no data collection at all.

SQL Agent is expensive to run. Every time a job starts; a lot happens in msdb . Since version 3.0, SQLWATCH is logging performance data every 10 seconds, and this does put some pressure on the Agent subsystem and msdb database.

SQL Agent can only run every 10 seconds or more. To run job every 5 seconds, two jobs would have to be set, each running every 10 seconds with an offset of 5 seconds. This is less than ideal, and it adds to the complexity and creates even more overhead. With the recent release of SQLWATCH 3.0, the ideal collection time would be 5 seconds but due to SQL Agent limitation this is currently set to 10 seconds.

SQL Agent is completely detached from other databases which complicates backups, restores and failovers. Whilst this does not directly affect SQLWATCH as Jobs can be recreated on demand, it is a key design decision in any application design.

SQL Agent requires access to the databases. Most setups comprise of a SQL Agent running under a service account which has full sa access to the SQL Instance. This means that any job can access any database and make any unauthorised modifications. Ideal scenario would be to set a proxy account with minimal required permissions for each job, but we all know this almost never happens.

So, why Service Broker

Service broker is super reliable. Whilst SQL Agent is a separate Service running under its own account, Service Broker is part of the SQL Engine and part of every database. It is as reliable as SQLOS itself. Once enabled it will never shutdown or fail. Sure, it can disable a queue if there are errors (poison messages) but if they are handled correctly, the service will continue to work even if the database is restored from backup.

Service broker allows asynchronous and timed (repeated) execution of stored procedures at any time interval. It is not limited to 10 seconds like SQL Agent.

Service broker has almost no overhead. With SQL Agent, we can observe CPU utilisation attributed to the SQL Agent overhead going up significantly, although for a brief period, when jobs start, this is not the case with Service Broker. I am observing less than 1% CPU utilisation when SQLWATCH is collecting data. Previously this could peak to 5% and even 10%. In my test setup, the performance logger runs every 5 seconds and takes less than 100ms to collect performance data. This is all on a single core, 2GB RAM SQL Server:

SQLWATCH image 1

The disadvantage of Service Broker is that it works within the SQL Server context so does not have access to the operating system like the SQL Agent does. The only way to access OS command line would be via xp_cmdshell. SQL Agent natively supports command line and SQLPS (limited version of PowerShell).

What are the alternatives

An alternative would be an external application that would do all the work. This would include data collection and maintenance – running data retention, building hourly trends etc. Whilst I still anticipate an external application to access and collect non-SQL performance counters, I am not looking to replace what SQL Server gives me out of the box.

If I were to move the entire logic to a console application or a Windows service, I would lose all the benefits of SQL Server Data Tools. This would slow down the development time and would certainly introduce lots of bugs. I blogged about the Development of SQL Databases in Visual Studio (Data Tools) some time ago.

For example, a data retention stored procedure runs every hour and deletes rows of data that are older than the set number of days. It is pure T-SQL and because it’s a VS project, Data Tools take care of making sure I get the right table and columns, that the syntax is correct and compliant with the given version of SQL Server. If I used a syntax not compatible with the pre-configured version of SQL Server it would fail to build. Visual Studio makes database development much faster. To execute our data retention procedure, we can use SQL Agent, Service Broker, or a third-party application that would issue the exec dbo.usp... statement. That application would have to handle schedules or timers and would be complex to write. Alternatively, the application could issue a full T-SQL statement instead of the stored procedure but then the T-SQL code would have to be stored within the, say C# code instead of the database project and would not benefit from the Data Tools.

Even if I write a Service to collect OS and SQL performance data in the future, it will only do that, any in-database processing will still happen in-database and as SQLWATCH is growing it is doing more tasks we need reliable execution mechanism.

I have heard terrible things about the Service Broker

Trust me, most of the “terrible things” come from bad coding. Service Broker is exceptionally reliable and well-written code will never fail and if it does, it will handle itself and will not stop the queue.

How it will work

SQLWATCH has always had four main core schedules: every 1 minute, every 10 minutes, every 1 hour and every 1 day. Since version 3.0, this is every 10 seconds, every 1 minute, every 10 minutes every 1 hour and every 1 day.

Whilst SQL Agent did allow you to set different schedules for data collection, they were not really supported by the dashboards. I blogged about data aggregation and data granularity some time ago. This means that I do not really want you to change the frequency of data collection as it may impact the dashboard and accuracy of data. If you do change the collection from the default 1 minute (or 5 seconds in version 3.0) to 5 minutes or 30 seconds, it could break the dashboard or even show inaccurate results.

There is going to be a single queue that will handle the workload. Messages will be queued using the begin conversation timer or ad-hoc via send on command. There are going to be four timers reflecting the core schedules outline above: 5 seconds, 1 minute, 10 minutes and 60 minutes.

Having said that, as this is all T-SQL you can always make modifications and change the conversation timer

That makes no sense

So, we’re going to have the Service Broker doing most of the execution and still some Agent Jobs?

The few Agent Jobs will be for your convenience to schedule index collection or other, infrequent collectors. Apart from that, there is a disk collector which uses PowerShell and WMI to collect disk utilisation. There is no way to do this via SQL other than xp_cmdshell but I see this as a hack. If, or when I write the application to collect OS performance data, it will do it all so the WMI jobs will go.

Feel free to chip in and contribute to the discussion on GitHub

When will this be released?

I have most of the code written already and was hoping to include as an option from the next release 3.1 and to replace Agent Jobs in version 4.0. Having said all that, I will include an option to move back to Agent Jobs and you will still be able to use Agent Job instead of queues.

I would like you to get involved in the discussion on GitHub or in the comments below and share your views, concerns, and ideas.

7 Comments

  1. SQLWATCH 6bb93425bf4fba6793e4e7b18ef3532b?s=54&d=mm&r=g
    Richard Langdon 12th April 2021 at 14:50 - Reply

    Thanks for updating information about your amazing product! I’ve not experienced any issues using Agent Jobs, but follow your thinking and look forward to the upgrade. An asynchronous message based process makes a great deal of sense. Look forward to a release that can be tested and compared.

    • SQLWATCH a1929ece64de0517e67d5f5a423b7ec3?s=54&d=mm&r=g
      Marcin Gminski 12th April 2021 at 15:15 - Reply

      Hi Richard, thank you for your comment and kind words!
      The code is already available in the branch 3.x https://github.com/marcingminski/sqlwatch/tree/3.x but you would have to build the project yourself in Visual Studio before deploying.
      Alternatively — drop me an email and I will send you a dacpac if you want to test it.

  2. SQLWATCH caf456619ad8cc1a1313cb9de7eb689e?s=54&d=mm&r=g
    Steffen Engel 20th April 2021 at 21:51 - Reply

    Hi Marcin,
    very cool idea. With changing from Agent to service broker the way to an easier implemention of SQLWatch on Azure Managed Instances will be opened.

    Great Job!

    Greetings from Germany!
    Steffen

    • SQLWATCH a1929ece64de0517e67d5f5a423b7ec3?s=54&d=mm&r=g
      Marcin Gminski 20th April 2021 at 21:53 - Reply

      Thanks Steffen!
      Did you get my email this morning? I finally managed to reply! Apologies for the delay!

      • SQLWATCH 981559bb2615358412697889dc107ada?s=54&d=mm&r=g
        Sri 4th May 2021 at 02:29 - Reply

        I think both SQL Agent and Service broker are compatible with the Azure Managed Instance right ?

  3. SQLWATCH 981559bb2615358412697889dc107ada?s=54&d=mm&r=g
    Sri 4th May 2021 at 02:28 - Reply

    Hi Marcin,
    I am planning to use the SQLWATCH to get the performance baselines such that i can compare the On-Premises vs Azure Managed instance so trying to weigh in pros and cons . Please let me know if anyone went this way or tested it any links will be helpful

    Thanks
    Sri

    • SQLWATCH a1929ece64de0517e67d5f5a423b7ec3?s=54&d=mm&r=g
      Marcin Gminski 4th May 2021 at 11:45 - Reply

      If you are trying to compare two different platforms, you would have to replay the same workload on both to get accurate and meaningful results. To record and replay the workload you can use XESmartTarget by spaghettidba https://github.com/spaghettidba/XESmartTarget and then use SQLWATCH to compare performance metrics. Although XESmartTarget will also give you performance metrics

Leave A Comment

Recent Posts