Saturday, January 14, 2012

Optimization for Decision Tree

Hi,
I noticed that when using Decision Trees to learn about some training examples, and there are 2 examples with same attributes but different classification, we can improve the performance of the tree creation.
Actually, when we are using the regular algorithm (with gain calculations) we get gain equals to zero. The regular algorithm doesn't check if the gain is zero, so the tree is fully opened until there are no attributes left, and we have our 2 examples with different classification.  
My suggestion is to stop developing the tree if we reach gain = 0, and return a node with the default value (most common classification in the given examples of the current level).

Ofir.