
Duniya mein har cheez "Kitni" nahi, balki "Kaunsi" hoti hai.
- "Ye email Spam hai ya nahi?"
- "Is patient ko Diabetes hai ya nahi?" In sawaalon ka jawab dene ke liye hum Classification use karte hain. Is post mein hum dekhenge ki computer kaise "Faisle" (Decisions) leta hai.
1. Logistic Regression: The Decision Maker
Naam mein Regression hai, par kaam Classification hai.
- Ye Sigmoid Function use karta hai jo kisi bhi number ko 0 aur 1 ke beech nichod deta hai.
- Decision Boundary: Ek line jo batati hai ki iske upar "A" hai aur niche "B".
2. SVM: The Margin King
Support Vector Machine (SVM) ka kaam hai do categories ke beech sabse badi "Khaali Jagah" (Margin) banana.
- Kernel Trick: Agar data aapas mein ghusa hua hai, toh SVM use 3D mein le ja kar ek "Plane" se alag kar deta hai (RBF ya Poly kernels).
- Ye complex data par bahut accurate hota hai.
3. K-Nearest Neighbors (KNN): Neighborhood Logic
Ye is logic par chalta hai: "Aap jaise doston ke beech rehte ho, aap waise hi ho."
- Naya data point apne
Ksabse nazdeek padosiyon ko dekhta hai. - Jis category ke padosi zyada honge, naya point bhi wahi ban jayega.
4. Naive Bayes: Text ka Raja
Ye probability (Bayes Theorem) par chalta hai.
- Ise "Naive" isliye kehte hain kyonki ye har feature ko "Independent" maanta hai (jo asliyat mein nahi hota).
- Lekin Spam Detection aur Sentiment Analysis mein ye super-fast aur accurate hai.
5. Summary Table: Classification Matrix
| Algorithm | Strength | Weakness | Best For |
|---|---|---|---|
| Logistic | Fast & Simple | Linear trends only | Binary choices |
| SVM | High Accuracy | Slow on big data | Complex boundaries |
| KNN | No training time | Slow at prediction | Simple patterns |
| Naive Bayes | High Speed | Naive assumption | NLP / Text data |
FAQs
1. "One-vs-Rest" (OvR) kya hai? Jab humein 3 classes (Red, Green, Blue) predict karni hon, toh model 3 binary models banata hai: "Red vs Others", "Green vs Others", etc. Ise multi-class strategy kehte hain.
2. SVM mein 'C' parameter kya hai? 'C' batata hai ki aap model ko kitni galti karne ki "Ijazat" de rahe hain. Chhota C matlab bada margin (par kuch galtiyan), bada C matlab strict boundary.
3. KNN mein K kitna hona chahiye? Hamesha Odd number (e.g., 3, 5, 7) choose karein taaki "Tie" na ho jaye.
4. Accuracy vs F1-Score? Agar aapka data "Imbalanced" hai (e.g., 99% log healthy hain, 1% bimar), toh accuracy hamesha 99% dikhayega jo galat hai. Aise mein F1-Score hi saccha result batata hai.
Classification Machine Learning ka sabse practical hissa hai. Ise samajh kar aap real-world decisions ko automate kar sakte hain! ๐
Tarun ke baare mein: Tarun decision boundaries aur probabilistic classification ke specialist hain. AI-Gyani par har logic sharp hai.