What are decision trees?
A popular way of capturing the complexity of conditional rules is by using decision trees, which are graphs that use a branching method to illustrate every possible outcome of a decision.
What are some examples of using decision trees in the IoT domain?
Drools, mostly known for its rules engine based on forward chaining, has an extension to integrate with decision tables, using an excel sheet in combination with snippets of embedded code to accommodate any additional logic or required thresholds.
Should you use decision trees to model complex logic?
Decision trees are useful when the number of states per each variable is limited (such as binary YES/NO states) but can become overwhelming when the number of states increases. This is because the depth of the tree grows linearly with the number of variables, but the number of branches grows exponentially with the number of states. For instance, with 6 Boolean variables, there are 2^2^6 = 2^64 = 18,446,744,073,709,551,616 distinct decision trees (in literature, often referred to as the “hypothesis space for decision trees” problem).
Majority voting is not possible, unless we branch even further, where multiple distinct outcomes are also part of the tree structure. Conditional executions should come out of the box. As the name suggests, decision trees are all about conditional executions.
Decision trees are never implemented as such in an IoT context. In expert systems, where decisions are outcomes of Q&A scenarios, logic would follow conditional execution, as new data (questions) is served to the decision tree engine. On the other hand, in an IoT context, we feed rules engines with data, and expect decisions to come back as a result. In that case, we talk about decision tables, which means we feed data into the decision tables and results (decisions) come back at once. More about this not so subtle difference between tables and trees can be found here:
Decision trees are easily interpretable and that makes them attractive for use cases where this capability is essential (such as healthcare, among others).
Should you use decision trees to model uncertainty?
Decision trees use a white box model. Important insights can be generated based on domain experts describing a situation and their preferences for outcomes. But decision trees are unstable, meaning that a small change in the data can lead to a big change in the structure of the optimal decision tree.
They are also often relatively inaccurate. Calculations can get very complex, particularly if many values are uncertain and/or if many outcomes are linked.
Decision trees cannot model uncertainty and utility functions, unless, just like with time information, we add these within the tree as decision nodes, which complicates decision tables even further.
Are decision trees explainable?
Decision trees are easy to understand and interpret. People are able to understand decision tree models after just a brief explanation. Still, decisions cannot be seen or inspected once the rule is instantiated and are only represented as labeled “arrows” in the graph during the design phase.
When implemented as decision tables, the explainability drops further as each row in the table is a rule with each column in that row being either a condition or action for that rule. That results in the total sequence not being clear - no overall picture is given by decision tables.
Are decision trees adaptable?
Decision trees are mostly used for graphical knowledge representation. It is extremely hard to build a rules engine with decision trees and even harder to build applications on top of it. They are hard to extend with any third-party systems. Also, any small change in the training data can lead to a big change in the structure of the optimal decision tree.
How easy is it to operate with decision trees?
Applying the same decision tree rule across multiple devices in the IoT domain is close to impossible, as most of the decision trees implement rules by mixing logic residing in decision tables with actions defined separately in code, making it extremely difficult to manage the complete process.
Are decision trees scalable?
Decision tree rules are stateless, which means that, in theory, it should be easy to run multiple rules in parallel. However, you cannot, within one instance of a rule, distribute the load to different processes while executing that one particular rule. The fact that the depth of the tree grows linearly with the number of variables but the number of branches grows exponentially with the number of states makes decision trees hard if not impossible to scale. Calculations can get very complex, particularly if many values are uncertain and/or if many outcomes are linked.
This is an excerpt from our latest Benchmark Report on Rules Engines used in the IoT domain. You can have a look at a summary of the results or download the full report over here.