Quick Selection by Data Type
| Data Type | Problem | Start With | Level Up To |
|---|---|---|---|
| Tabular | Classification | Logistic Regression | XGBoost / LightGBM |
| Tabular | Regression | Linear Regression | XGBoost / LightGBM |
| Text | Classification | TF-IDF + LogReg | BERT / Fine-tuned LLM |
| Text | Generation | Pre-trained LLM | Fine-tuned LLM (LoRA) |
| Text | Q&A / Search | BM25 | RAG with embeddings |
| Images | Classification | Pre-trained CNN (ResNet) | Vision Transformer (ViT) |
| Images | Object Detection | YOLO | DETR (Transformer-based) |
| Images | Generation | GAN | Diffusion Models |
| Time Series | Forecasting | ARIMA / Prophet | LSTM / Temporal Fusion Transformer |
| Graph | Node classification | GCN | GAT / GraphSAGE |
By Business Constraint
| Constraint | Recommended Approach |
|---|---|
| Must explain predictions (healthcare, finance) | Linear/Logistic Regression, Decision Trees, SHAP |
| Real-time predictions (< 10ms) | Linear models, Naive Bayes, quantized models |
| Limited labeled data (< 100 samples) | Transfer learning, few-shot, pre-trained models |
| No labeled data at all | Clustering, dimensionality reduction, self-supervised |
| Frequently changing knowledge | RAG (easy to update document store) |
| Consistent style/format | Fine-tuning (SFT with LoRA) |
| Minimum engineering effort | CatBoost (tabular), pre-trained models, AutoML |
| Edge deployment (mobile, IoT) | Quantized models, Knowledge Distillation |
Common Mistakes
| Mistake | Why It's Wrong | Better Approach |
|---|---|---|
| Deep learning for small tabular data | Will overfit; gradient boosting is better | XGBoost / LightGBM / CatBoost |
| KNN on large datasets | O(n) prediction time is too slow | Tree-based models or ANN index |
| t-SNE for dimensionality reduction | Designed for visualization only | PCA or UMAP for preprocessing |
| SVM on millions of rows | O(n²-n³) training doesn't scale | LightGBM or neural networks |
| RNNs for NLP in 2025 | Transformers are superior | BERT, GPT, or other Transformers |
| Fine-tuning when RAG would work | Expensive, knowledge becomes stale | RAG for dynamic factual knowledge |
| GANs for image generation | Diffusion models produce better results | Stable Diffusion |
The Progressive Complexity Ladder
Rule of Thumb
Don't move to the next step unless the current one is clearly insufficient. Each step adds complexity, cost, and maintenance. Many production systems run perfectly well on Step 2 or 3.
- Baseline: Simple model (Linear/Logistic Regression, Naive Bayes)
- Standard ML: Tree-based ensemble (Random Forest, XGBoost)
- Optimized ML: Hyperparameter-tuned gradient boosting + feature engineering
- Deep Learning: Neural networks (only if Step 3 is insufficient)
- State-of-the-art: Pre-trained models, fine-tuning, ensembles