Challenge II - Unknown vs. N/A
A pregnant attribute on a customer can have four values: yes, no (response), unknown (we didn’t ask), and not applicable (e.g., for males or young children).
E-commerce products have attributes that depend on their family:
- Pants have length, inseam, fabric material
- Microwaves have voltage, cubic inches, weight
For data mining, we flatten all attributes and most values are N/A (e.g., voltage for pants)
How should we handle N/A versus unknown?