We’re moving to queues

We’re moving to queues

by | Apr 9, 2021

Some time ago I asked on GitHub whether we should move to using SQL Server Service Broker rather than SQL Agent jobs. Sadly, there have been zero responses from the community, so I made my own decision. You can guess what the decision was.

What is Service Broker

SQL Server Service Broker is SQL Server built-in messaging and queuing mechanism that supports internal and external content. Service broker can execute stored procedure when a new message is added to the queue and it happens in real-time. Service broker is included in every edition including Express and is exceptionally reliable, although not the easiest to use.

Why Service Broker

This is something that has been in my head for few months or even years. Since version 3.0, the data collection is invoked via SQL Agent, every 10 seconds. In previous versions of SQLWATCH this was every 1 minute.

To answer why Service Broker here’s few things I dislike about SQL Agent:

SQL Agent is not available in the Express edition so the only way to make SQLWATCH work in the Express edition is to schedule everything via Windows Task Scheduler. Whilst possible, is less than ideal as we’re then losing visibility into the scheduling mechanisms and potential failures.

SQL Agent can fail and may stop, which will result in no data collection at all.

SQL Agent is expensive to run. Every time a job starts; a lot happens in msdb . Since version 3.0, SQLWATCH is logging performance data every 10 seconds, and this does put some pressure on the Agent subsystem and msdb database.

SQL Agent can only run every 10 seconds or more. To run job every 5 seconds, two jobs would have to be set, each running every 10 seconds with an offset of 5 seconds. This is less than ideal, and it adds to the complexity and creates even more overhead. With the recent release of SQLWATCH 3.0, the ideal collection time would be 5 seconds but due to SQL Agent limitation this is currently set to 10 seconds.

SQL Agent is completely detached from other databases which complicates backups, restores and failovers. Whilst this does not directly affect SQLWATCH as Jobs can be recreated on demand, it is a key design decision in any application design.

SQL Agent requires access to the databases. Most setups comprise of a SQL Agent running under a service account which has full sa access to the SQL Instance. This means that any job can access any database and make any unauthorised modifications. Ideal scenario would be to set a proxy account with minimal required permissions for each job, but we all know this almost never happens.

So, why Service Broker

Service broker is super reliable. Whilst SQL Agent is a separate Service running under its own account, Service Broker is part of the SQL Engine and part of every database. It is as reliable as SQLOS itself. Once enabled it will never shutdown or fail. Sure, it can disable a queue if there are errors (poison messages) but if they are handled correctly, the service will continue to work even if the database is restored from backup.

Service broker allows asynchronous and timed (repeated) execution of stored procedures at any time interval. It is not limited to 10 seconds like SQL Agent.

Service broker has almost no overhead. With SQL Agent, we can observe CPU utilisation attributed to the SQL Agent overhead going up significantly, although for a brief period, when jobs start, this is not the case with Service Broker. I am observing less than 1% CPU utilisation when SQLWATCH is collecting data. Previously this could peak to 5% and even 10%. In my test setup, the performance logger runs every 5 seconds and takes less than 100ms to collect performance data. This is all on a single core, 2GB RAM SQL Server:

SQLWATCH image 1

The disadvantage of Service Broker is that it works within the SQL Server context so does not have access to the operating system like the SQL Agent does. The only way to access OS command line would be via xp_cmdshell. SQL Agent natively supports command line and SQLPS (limited version of PowerShell).

What are the alternatives

An alternative would be an external application that would do all the work. This would include data collection and maintenance – running data retention, building hourly trends etc. Whilst I still anticipate an external application to access and collect non-SQL performance counters, I am not looking to replace what SQL Server gives me out of the box.

If I were to move the entire logic to a console application or a Windows service, I would lose all the benefits of SQL Server Data Tools. This would slow down the development time and would certainly introduce lots of bugs. I blogged about the Development of SQL Databases in Visual Studio (Data Tools) some time ago.

For example, a data retention stored procedure runs every hour and deletes rows of data that are older than the set number of days. It is pure T-SQL and because it’s a VS project, Data Tools take care of making sure I get the right table and columns, that the syntax is correct and compliant with the given version of SQL Server. If I used a syntax not compatible with the pre-configured version of SQL Server it would fail to build. Visual Studio makes database development much faster. To execute our data retention procedure, we can use SQL Agent, Service Broker, or a third-party application that would issue the exec dbo.usp... statement. That application would have to handle schedules or timers and would be complex to write. Alternatively, the application could issue a full T-SQL statement instead of the stored procedure but then the T-SQL code would have to be stored within the, say C# code instead of the database project and would not benefit from the Data Tools.

Even if I write a Service to collect OS and SQL performance data in the future, it will only do that, any in-database processing will still happen in-database and as SQLWATCH is growing it is doing more tasks we need reliable execution mechanism.

I have heard terrible things about the Service Broker

Trust me, most of the “terrible things” come from bad coding. Service Broker is exceptionally reliable and well-written code will never fail and if it does, it will handle itself and will not stop the queue.

How it will work

SQLWATCH has always had four main core schedules: every 1 minute, every 10 minutes, every 1 hour and every 1 day. Since version 3.0, this is every 10 seconds, every 1 minute, every 10 minutes every 1 hour and every 1 day.

Whilst SQL Agent did allow you to set different schedules for data collection, they were not really supported by the dashboards. I blogged about data aggregation and data granularity some time ago. This means that I do not really want you to change the frequency of data collection as it may impact the dashboard and accuracy of data. If you do change the collection from the default 1 minute (or 5 seconds in version 3.0) to 5 minutes or 30 seconds, it could break the dashboard or even show inaccurate results.

There is going to be a single queue that will handle the workload. Messages will be queued using the begin conversation timer or ad-hoc via send on command. There are going to be four timers reflecting the core schedules outline above: 5 seconds, 1 minute, 10 minutes and 60 minutes.

Having said that, as this is all T-SQL you can always make modifications and change the conversation timer

That makes no sense

So, we’re going to have the Service Broker doing most of the execution and still some Agent Jobs?

The few Agent Jobs will be for your convenience to schedule index collection or other, infrequent collectors. Apart from that, there is a disk collector which uses PowerShell and WMI to collect disk utilisation. There is no way to do this via SQL other than xp_cmdshell but I see this as a hack. If, or when I write the application to collect OS performance data, it will do it all so the WMI jobs will go.

Feel free to chip in and contribute to the discussion on GitHub

When will this be released?

I have most of the code written already and was hoping to include as an option from the next release 3.1 and to replace Agent Jobs in version 4.0. Having said all that, I will include an option to move back to Agent Jobs and you will still be able to use Agent Job instead of queues.

I would like you to get involved in the discussion on GitHub or in the comments below and share your views, concerns, and ideas.

Notify of
Newest Most Voted
Inline Feedbacks
View all comments