How successful data applications process both historical and real-time data in hybrid analytics and business automation solutions
We have the data. What comes next?
Over the past decade we have witnessed an astounding explosion of data that has brought us to a world where there is so much of it now that humans can no longer understand it directly. Enterprises have been preparing themselves to handle the expected large volumes by investing in setting up a robust infrastructure that is ready to handle the load, with the somewhat implicit expectation that owning the data will also automatically mean owning the value.
But as we are finding out, extracting the value from the data is a very different story- and this will likely be the challenge of the next decade to come. As we stand now, the conversation starts with the enterprise’s opening line being: “We have the data. What comes next?”
Value from big data comes in two forms: information and action
Well, next comes value. And there are only two types of value that companies can derive from processing big data, one is in the form of information, the other is in the form of action. They represent automated versions of the two components of intelligence: learning something new and being able to act on what we learn. Automated, because both the information (in the form of an insight or prediction) and the action (together with its underlying decision) are delivered automatically by the software, without any human interpretation (in the case of the information) or interference (in the case of the action).
- Value in the form of information is now the realm of modern BI, advanced analytics and most recently enterprise AI. The tools that process data to extract information are working with historical data, looking at large sets of ‘old’ data to uncover patterns, provide insights and predictions.
- Value in the form of action is the realm of business automation - decision/rules engines. The tools that process data to make a decision and take an action are working with (near) real-time data, particularly for Internet of Things applications. As the world is event-driven, more and more companies need to respond immediately as events unfold.
But there is a catch- success stories of maximising business value out of big data use hybrid solutions employing both BI/AI and rules engines as complementing each other, just like true intelligence needs both learning and acting.
Hybrid solutions are the game winners
As developers of a rules engine for IoT, we are primarily focused on the automation side of things. And we are looking at one important question together with our customers: how do you come up with the rules? There are two options at the moment.
One option is starting from the data. Larger companies often have the data before they have the business case. As they are already sitting on big data, they may be tempted to say - just start from analytics - run the data we have through our BI/AI systems and let them come up with the ideas for what and how to automate.
The second option is starting from the business. Companies that don’t have large volumes of data readily-available tend to start from the business case, which means taking those business decisions that have been identified as being truly data-driven and automating them with the rules engine.
As it is now, the second option makes most sense. Take this example of two companies in the same industry, given by prof K. Hammond of Northwestern University in the US in this recent O’Reilly podcast. They both sit on massive amounts of data coming in from their devices. One has a policy to never take down equipment during business hours, driven by a clear business goal - to have uninterrupted activity Mon-Fri. The other company has a policy to, on the contrary, always take down flagged equipment that needs repairs, driven by a different business goal - optimising the overall lifespan of the machine.
There is of course no sense in creating a learning system to watch and analyse data in order to infer these goals. You can simply put these goals yourself in the rules engine to start with. This is starting from the business. What happens next, and this is where offline analytics is brought in, is refining the model. The quest for hybrid solutions sees the convergence of analytics and automation into such concepts as real-time analytics or historical rules engines.
Running new data on old models - real-time analytics
When you start from the business, you design your logic template with some rules you know make sense to begin with, and you are ready to deploy this pre-defined model in real-time. Real-time analytics is basically running new data (real-time data) on old (predefined) models. What we do for example with the Waylay rules engine is in fact real-time analytics - running incoming real-time device data on a predefined logic template. Talking about hybrid solutions when it comes to data processing in their report “How to move analytics to real time” Gartner analysts argue that “to operate in real time, companies must leverage predefined analytical models, rather than ad-hoc models, and use current input data rather than just historical data.”
The logic built in the Waylay rules editor acts as the ‘predefined analytical’ model, through which we pass the ‘new’ data (real time). The models may also come from a third-party BI/AI system that the enterprise is using, that can be hooked to the Waylay rules engine and used to refine the business logic of your overall automation scenarios.
Running old data on new models - historical rules engines
Most Internet of Things applications run automation tasks in production with real-time data coming from sensors from smart products at various rates, depending on a number of factors. But often times we see that the real-time data used in live environments is only a portion of the data that is available to the company.
For example, you may have collected a lot of data in the past, before you started working on near real-time data. If you already have backlogs of data sitting in your database, you may want to use it to see how it behaves if you run it through your logic model. This is different than doing offline analytics - as you would actually mimic a real-time situation in order to get new insights or to validate the rules that you have built via offline analytics.
The Waylay rules engine now has a built-in feature that enables companies to easily run their historical backlogs through the logic templates that they are building for real-time. Our core product at Waylay is a (near) real-time rules engine that evaluates rules based on incoming data (either streaming data or data pulled from external systems). With this new feature, we have now also added an additional mode of operation to it, which enables it to act as a historical rules engine as well. The historical rules engine allows simulating the behaviour of rules based on historical data in a highly efficient way. Based on a batch processing approach, large amounts of historical data can be simulated and extensive feedback and logging is available to analyze the eventual outcome of the rules.
There are a number of reasons you may want to run historical data through a logic model that you build for real-time using a rules engine:
- Business insight- running your business logic models on historical data helps with better use case identification and prioritisation. You may find out things that were not apparent before and come up with new ideas for new use cases;
- Testing without risk- you can try and test things and ideas without messing with the actual running task. If you are interested in testing behaviour or you are looking for exceptions, batch processing backlogs of data may get you to that exception much faster than waiting for it to happen ‘naturally’ in real-time;
- Refining the model - the accuracy of the logic models inevitably degrades over time as conditions change, regularly running your historical data through your templates may help with cleaning up your logic and getting it ready for real-time deployments;
- Debugging - the output file of the batch processing provides you with a very extensive level of detail which is very helpful for debugging purposes, to help finding errors you may have missed otherwise.
When looking at processing big data to extract value from it and operationalise it, look for solutions that get the best out of both worlds: historical and real-time, analytics and automation.