Prepare for the IBM Data Science Exam. Utilize flashcards and multiple-choice questions with hints and explanations to hone your skills. Get exam-ready now!

Practice this question and more.


What does 'pure subset' mean in the context of decision trees?

  1. All attributes of a leaf had yes for answer

  2. All attributes of a leaf had no for answer

  3. Half of the answers were yes and the other half, no

  4. The leaf cannot be divided any further

The correct answer is: All attributes of a leaf had yes for answer

In the context of decision trees, a 'pure subset' refers to a leaf node where all instances belong to a single class or category. Option A articulates this concept accurately by stating that all attributes of a leaf had 'yes' for the answer. This signifies that the decision tree has made a definitive classification for that node, where every instance falling into that leaf has been categorized under the same class. Purity in decision trees indicates that the decision-making process has been effectively concluded for that subset of data, meaning no further splitting is required as the classification is complete and unambiguous. When you have a pure subset, it provides clarity and confidence in the prediction made at that node. In contrast, the other options refer to different scenarios. For instance, option B would describe a pure subset where all attributes had 'no' for the answer, which is similarly pure but in the opposite classification. Option C, indicating half 'yes' and half 'no,' suggests that the subset is not pure and likely would require further splitting. Option D, suggesting that a leaf cannot be divided any further, is true of all terminal leaves in general, but does not specifically imply purity in classification concerning 'yes' or 'no.'