PKoffee Analysis example

Welcome to the PKoffee project analysis notebook! ☕️

This project aims to analyze the relationship between coffee consumption (in cups) and productivity. We use several mathematical models to find the best fit for our data.

Project Structure

  • pkoffee/data.py: Data loading and cleaning.

  • pkoffee/parametric_function.py: Definition of various models (Quadratic, Logistic, etc.).

  • pkoffee/productivity_analysis.py: Logic to fit models and rank them.

  • pkoffee/visualization.py: Utilities to plot results.

[1]:
import sys
from pathlib import Path
import os
from pkoffee.data import load_csv
from pkoffee.productivity_analysis import fit_all_models, format_model_rankings
from pkoffee.visualization import plot_models, Show

1. Load the data

The experimental data are in file ../analysis/coffee_productivity.csv

[2]:
# Load the data
data_path = Path("../analysis/coffee_productivity.csv")
data = load_csv(data_path)

print(f"Loaded {len(data)} data points from {data_path}")
data.head()
Loaded 200000 data points from ../analysis/coffee_productivity.csv
[2]:
cups productivity
0 9 0.424316
1 8 0.589516
2 9 0.397615
3 10 0.256125
4 10 0.255150

2. Model Fitting

We will now fit several parametric models to the data:

  • Quadratic: \(f(x) = a_0 + a_1 x + a_2 x^2\)

  • Michaelis-Menten: \(f(x) = y_0 + V_{max} \frac{x}{K + x}\)

  • Logistic: \(f(x) = y_0 + \frac{L}{1 + e^{-k(x - x_0)}}\)

  • Peak Model: \(f(x) = a \cdot x \cdot e^{-x/b}\)

The fit_all_models function will run the optimization for all these models and rank them using the \(R^2\) score.

[3]:
# Fit all models
fitted_models = fit_all_models(data)

# Print rankings
print("Model Rankings (by R²):")
print(format_model_rankings(fitted_models))
Model Rankings (by R²):
Model Rankings:
══════════════════════════════════════════════════
Rank   Model                R² Score
══════════════════════════════════════════════════
1      Logistic             0.3051
2      Quadratic            0.2815
3      Michaelis-Menten     0.2662
4      Peak                 0.1724
5      Peak²                0.1511
══════════════════════════════════════════════════

3. Visualization

Finally, we visualize the data distribution using a violin plot and overlay the fitted model curves to see which one accurately captures the “coffee sweet spot”.

[4]:
plot_models(data, fitted_models, show=Show.YES)
_images/analysis_7_0.png