PKoffee Analysis example
Welcome to the PKoffee project analysis notebook! ☕️
This project aims to analyze the relationship between coffee consumption (in cups) and productivity. We use several mathematical models to find the best fit for our data.
Project Structure
pkoffee/data.py: Data loading and cleaning.pkoffee/parametric_function.py: Definition of various models (Quadratic, Logistic, etc.).pkoffee/productivity_analysis.py: Logic to fit models and rank them.pkoffee/visualization.py: Utilities to plot results.
[1]:
import sys
from pathlib import Path
import os
from pkoffee.data import load_csv
from pkoffee.productivity_analysis import fit_all_models, format_model_rankings
from pkoffee.visualization import plot_models, Show
1. Load the data
The experimental data are in file ../analysis/coffee_productivity.csv
[2]:
# Load the data
data_path = Path("../analysis/coffee_productivity.csv")
data = load_csv(data_path)
print(f"Loaded {len(data)} data points from {data_path}")
data.head()
Loaded 200000 data points from ../analysis/coffee_productivity.csv
[2]:
| cups | productivity | |
|---|---|---|
| 0 | 9 | 0.424316 |
| 1 | 8 | 0.589516 |
| 2 | 9 | 0.397615 |
| 3 | 10 | 0.256125 |
| 4 | 10 | 0.255150 |
2. Model Fitting
We will now fit several parametric models to the data:
Quadratic: \(f(x) = a_0 + a_1 x + a_2 x^2\)
Michaelis-Menten: \(f(x) = y_0 + V_{max} \frac{x}{K + x}\)
Logistic: \(f(x) = y_0 + \frac{L}{1 + e^{-k(x - x_0)}}\)
Peak Model: \(f(x) = a \cdot x \cdot e^{-x/b}\)
The fit_all_models function will run the optimization for all these models and rank them using the \(R^2\) score.
[3]:
# Fit all models
fitted_models = fit_all_models(data)
# Print rankings
print("Model Rankings (by R²):")
print(format_model_rankings(fitted_models))
Model Rankings (by R²):
Model Rankings:
══════════════════════════════════════════════════
Rank Model R² Score
══════════════════════════════════════════════════
1 Logistic 0.3051
2 Quadratic 0.2815
3 Michaelis-Menten 0.2662
4 Peak 0.1724
5 Peak² 0.1511
══════════════════════════════════════════════════
3. Visualization
Finally, we visualize the data distribution using a violin plot and overlay the fitted model curves to see which one accurately captures the “coffee sweet spot”.
[4]:
plot_models(data, fitted_models, show=Show.YES)