You almost never need real-time alerts or real-time Data Warehouse
I was going to write about real-time alerts only, but I hear more and more businesses talk about real-time Data Warehouses so I threw them both into a single post. If you are a business owner, I want you to read this and think before asking your developers to build a real-time Data Warehouse, alerts or any other real-time system.
According to Wikipedia, real-time data is:
Information that is delivered immediately after collection. There is no delay in the timeliness of the information provided. Real-time data is often used for navigation or tracking. Such data is usually processed using real-time computing although it can also be stored for later or off-line data analysis.
In reality, real-time data isn’t just about how quickly we can deliver it. Real-time information means we can make a corrective action based on the data and influence future data points. GPS and navigation make perfect sense. Our position shown by the GPS may be delayed by a few milliseconds, but we are still able to read the data, interpret the coordinates and then make a corrective action, i.e. take the turn, based on our position as of a few milliseconds ago, and we influence the data itself (because the next read-out is showing our new coordinates, which we changed by taking the turn)
Real-time makes sense in video calls or conferences – of course, we want to see and listen to the person and respond, like in real-life. Real-time is all about making the dialogue work both ways. We get a piece of information, interpret the results and give a response back within milliseconds.
Real-time data is handy in telephony systems. Knowing how many calls are waiting in the queue, we can assign more call handlers to answer those calls, and if the numbers drop, we can ask them to do something else. That’s a good corrective action based on real-time information.
Real-time data in sales is not a good example. Ask yourself what you are going to change right now, based on the information? You are going to need more time and data over a longer period of time, and perhaps even other sources to analyse sales, the market, strategy, gaps, etc.
Real-time Data Warehouse is difficult and expensive
Doing real-time data is expensive. If you are building a real-time data warehouse just for the sake of having it (because it is soo cool), or just to show pretty charts that change often, but you are not making corrective action and steering your business operations based on that data (and most businesses don’t) – don’t bother with real-time, settle or intraday or near-real-time (15 minutes to 1 hour), or even once a day batch.
If you have a Data Warehouse, you will likely have a time dimension with some granularity. This is often one day or 1 hour. Rarely 1 second or even one minute. This makes sense in the analytics world as you are likely to be analysing data (say, sales) over a longer period of time, weeks and months, and not seconds. Every time you get a portion of new data, you will also calculate metrics, such as sales, per product, per area, per day etc. If you are getting this every few seconds, it is challenging to pick what periods to aggregate and calculate.
So, before you jump and request real-time data, ask yourself what you will do with it and how you will use this information to change your business – also in real-time. If you cannot come up with a response, settle on real-real-time or intra-day.
Unless we have a way to provide feedback to change what we do, based on the data we receive, to futher influence that data, we are not doing any real-time, we are just doing data pulls with minimal latency.
Real-time data colletion
Real-time collection means capturing the event as soon as it happens. SQL Server Extended Events are real-time. Technically, with real-time alerting, we can act as soon as bad things start to happen – the corrective action. We are receiving alerts with CPU data showing 98% of utilisation. We log in to the system and kill the process causing the CPU spike. We have made a corrective action based on real-time data. Our activity has then influenced the data because the CPU is no longer 98%. It’s like a feedback loop. It makes sense.
Real-time alerting and false alarms
But just take a second to think about it a bit more. What is real-time alerting? We still need to wait a few minutes to collect CPU utilisation over a period of time. Would we act if the CPU spiked to 98% for 1 second? I wouldn’t, but I would still like to know this in my weekly service review. We need the CPU to be constantly at 98% for a few seconds or even minutes to be able to act so we don’t quite do it in real-time. We just run a process every few minutes to check for the last few minutes of CPU utilisation and trigger alert.
If we send the alert too quickly, we are risking a premature and false alert. The more false alerts, the less trust in the monitoring platform will your team have.
Imagine if your fire alarm was going off randomly. You would eventually start to ignore it. Even if there is a real fire, you would just think, “oh, not again”.
Instead, focus on building resilient systems that can deal with potential problems on their own whilst still letting you know something went wrong but it’s being handled so you don’t have to get up at 3 am and dial-in in a rush to fix it.
What we want is close-to-real-time data collection to be able to run decision-making logic that then triggers alerts as soon as possible, after having done additional checks to make sure the problem really needs attention, and to increase granularity for better analysis. The alerting bit is rarely real-time in the database world.
Analogically, a real-time (or near-real-time, every few minutes) trickle feed into the Data Warehouse to avoid big overnight data loads? Sure. This makes perfect sense, but real-time DW builds and recalculations? Probably not. Is this even possible?
Think what you want out of it
To respond to an alert and fix the issue, we have to know the problem and how to fix it. We need procedures in place. For example, receiving real-time alerts about failed Agent Jobs may not be a good idea since the job may re-try and succeed on the second attempt (because it failed due to a lock that has since gone) or fail due to data problems that are not going away on its own. The resolution would be to fix the data or fix the application to handle the new data, which we may not do straight away in the middle of the night.
We have this super important job that has to run each night and we keep fixing it becuase the source data changes…
In that case, you need to design your process to be robust and resilient, and I am not just talking about the Agent Job, but about the entire, end to end process. If someone changes the source system and starts pushing data you are not expecting, you need a strategy to be included in their approval list and know what to expect in the coming days and weeks.
You need to know what you can do with the real-time data, events and alerts. Suppose we were receiving the CPU data in real-time without doing anything with it. What would be the point of getting it in the first place?
The key to successful alerting and monitoring is to know your user-base, database and application baselines and thresholds so you can set alerts to warn you when things start deviating from normal and before they go completely wrong. You also must ensure alerts are genuine and point to an actual problem that can be solved right away. If you’re getting lots of alerts about disk space being at 20%, you are likely going to keep ignoring it until it drops further. Being pro-active is way more difficult than reactive, but it will pay off in the long term.
Conclusion, the “so what”
Think and design before you build anything because thinking is cheaper than building the wrong system.It does not make sense to do real-time data without doing anything with this data in real-time.
Real-time data is a bit like food on the go. You need a bite right now otherwise, you will faint from hunger. Data Warehousing is a bit like a 10-course dinner where you take time to enjoy every single bit of it.
Do not design your system on the “cool” principles. Focus on how you will be using it and what you can do with the information. By getting it right, you will probably save a lot of money, time and stress.
I am not saying real-time data is useless – absolutely not! There are just better applications for it than a Data Warehouse.
If you struggle with justification, ask yourself the “so what” question and try to answer it. If you can come up with a valid reason, then go for it. If not, go back to the drawing board.