Python dominates machine learning thanks to libraries like NumPy, pandas, scikit-learn, PyTorch, and TensorFlow. Backend developers often integrate ML models into APIs, batch jobs, and data pipelines — this page covers practical ML workflows without requiring a PhD.

ML Workflow Overview

  Data → Explore → Feature Engineering → Train → Evaluate → Deploy → Monitor
  

Most production effort goes into data quality and feature engineering, not model selection.

Environment Setup

  python -m venv .venv
source .venv/bin/activate
pip install numpy pandas scikit-learn matplotlib jupyter
  

For deep learning:

  pip install torch torchvision  # or tensorflow
  

Pin versions in requirements.txt — ML stacks are sensitive to library versions.

Loading and Exploring Data

  import pandas as pd

df = pd.read_csv("customers.csv")
print(df.shape)
print(df.dtypes)
print(df.describe())
print(df.isnull().sum())

# Visualize distributions
import matplotlib.pyplot as plt
df["age"].hist(bins=30)
plt.savefig("age_distribution.png")
  

Key questions before modeling:

  • Missing values? Outliers? Class imbalance?
  • Leakage — does any feature contain future information?
  • Target variable distribution?

Feature Engineering

  from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

numeric_features = ["age", "income", "tenure_months"]
categorical_features = ["plan_type", "region"]

preprocessor = ColumnTransformer([
    ("num", StandardScaler(), numeric_features),
    ("cat", OneHotEncoder(handle_unknown="ignore"), categorical_features),
])
  

Good features beat complex models. Domain knowledge drives feature design:

  • Ratios (spend_per_month = total_spend / months_active)
  • Time windows (purchases_last_30_days)
  • Aggregations (avg_order_value)

Training with scikit-learn

  from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, roc_auc_score

X = df.drop("churned", axis=1)
y = df["churned"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=42
)

pipeline = Pipeline([
    ("prep", preprocessor),
    ("model", RandomForestClassifier(n_estimators=100, random_state=42)),
])

pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)
y_prob = pipeline.predict_proba(X_test)[:, 1]

print(classification_report(y_test, y_pred))
print(f"AUC: {roc_auc_score(y_test, y_prob):.3f}")
  

Always split before preprocessing to avoid data leakage. Use Pipeline to apply transforms consistently.

Cross-Validation and Hyperparameter Tuning

  from sklearn.model_selection import GridSearchCV

param_grid = {
    "model__n_estimators": [50, 100, 200],
    "model__max_depth": [5, 10, None],
}

search = GridSearchCV(pipeline, param_grid, cv=5, scoring="roc_auc", n_jobs=-1)
search.fit(X_train, y_train)
print(search.best_params_)
print(f"Best CV AUC: {search.best_score_:.3f}")
  

Cross-validation gives more reliable estimates than a single train/test split.

Model Evaluation Metrics

Task Metrics
Binary classification Precision, recall, F1, AUC-ROC
Multi-class Macro/micro F1, confusion matrix
Regression MAE, RMSE, R²
Ranking NDCG, MAP

For imbalanced classes (fraud, churn), accuracy is misleading — optimize precision-recall or use class_weight='balanced'.

Feature Importance

  import numpy as np

model = pipeline.named_steps["model"]
importances = model.feature_importances_
feature_names = pipeline.named_steps["prep"].get_feature_names_out()

for name, imp in sorted(zip(feature_names, importances), key=lambda x: -x[1])[:10]:
    print(f"{name}: {imp:.4f}")
  

Use SHAP for model-agnostic explanations in production debugging.

Saving and Loading Models

  import joblib

joblib.dump(pipeline, "churn_model_v1.joblib")

loaded = joblib.load("churn_model_v1.joblib")
prediction = loaded.predict_proba(new_customer_data)
  

Version models (v1, v2) and store metadata (training date, metrics, feature list).

Serving Models in APIs

  from fastapi import FastAPI
import joblib
import pandas as pd

app = FastAPI()
model = joblib.load("churn_model_v1.joblib")

@app.post("/predict")
async def predict(customer: dict):
    df = pd.DataFrame([customer])
    prob = model.predict_proba(df)[0][1]
    return {"churn_probability": float(prob)}
  

For high throughput, use ONNX Runtime, TorchServe, or dedicated ML platforms (SageMaker, Vertex AI).

Deep Learning Basics (PyTorch)

  import torch
import torch.nn as nn

class SimpleNet(nn.Module):
    def __init__(self, input_size: int):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(input_size, 64),
            nn.ReLU(),
            nn.Linear(64, 2),
        )

    def forward(self, x):
        return self.layers(x)

model = SimpleNet(input_size=10)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
  

Use PyTorch/TensorFlow for images, text, and sequences. Start with scikit-learn for tabular data.

MLOps Essentials

  • Experiment tracking — MLflow, Weights & Biases
  • Data versioning — DVC
  • Model registry — MLflow Model Registry
  • Monitoring — detect data drift and prediction distribution shifts
  • Retraining pipeline — scheduled jobs when performance degrades

Common Pitfalls

  • Training on test data (leakage)
  • Ignoring class imbalance
  • Overfitting — use regularization and simpler models first
  • No baseline — always compare against a simple rule (e.g., predict majority class)
  • Deploying without input validation

Machine learning in Python is iterative: start simple, measure rigorously, and only add complexity when metrics justify it.