← Back to Paper List

Eco-Amazon: Enriching E-commerce Datasets with Product Carbon Footprint for Sustainable Recommendations

Giuseppe Spillo, Allegra De Filippo, Cataldo Musto, Michela Milano, Giovanni Semeraro
University of Bari Aldo Moro, Italy, University of Bologna, Italy
arXiv (2026)
Recommendation Benchmark

📝 Paper Summary

Sustainable Recommender Systems Green AI Dataset Enrichment
Eco-Amazon introduces a zero-shot framework using Large Language Models to enrich e-commerce datasets with item-level Product Carbon Footprint metadata, overcoming the scarcity of official environmental data for recommender systems.
Core Problem
Recommender systems research lacks standard benchmarks with item-level environmental impact data (PCF), as official Life Cycle Assessment (LCA) databases are too sparse and expensive to cover massive e-commerce catalogs.
Why it matters:
  • E-commerce contributes substantially to global emissions, but systems cannot promote sustainable choices without item-level carbon data
  • Current approaches rely on proprietary databases or manual mapping, which are non-scalable and hinder reproducible research in sustainable AI
  • The lack of open PCF-enriched resources prevents the community from developing and benchmarking sustainability-aware ranking and recommendation algorithms
Concrete Example: A user searching for 'jeans' sees thousands of results ranked by popularity. Without PCF data, the system cannot distinguish between a high-carbon synthetic pair and low-carbon organic denim. Official databases like Environdec might cover only one specific brand, leaving the vast majority of the catalog unlabelled and effectively invisible to sustainability metrics.
Key Novelty
Zero-shot PCF Estimation via LLMs
  • Leverages the broad domain knowledge of Large Language Models to infer Product Carbon Footprint from unstructured text descriptions without domain-specific training
  • Constrains the LLM generation process using prompts based on international standards (GHG Protocol, ISO 14040) to ensure estimates align with Life Cycle Assessment principles rather than mere statistical guessing
Architecture
Architecture Figure Box 1 (Prompt Logic)
Conceptual flow of the zero-shot prompting strategy used to estimate PCF
Evaluation Highlights
  • Spearman rank correlation > 0.90 for both GPT-o3-mini and Gemini-2.5-flash across Electronics, Clothing, and Home & Kitchen domains, indicating high ordinal reliability
  • Low-impact products (the target for sustainable recommendations) are estimated with high precision, maintaining a Mean Absolute Error (MAE) below 6 kg CO2e across all domains
  • Enriched a total of 49,902 items across three Amazon datasets, creating the largest publicly available resource for PCF-aware recommendation research
Breakthrough Assessment
8/10
Addresses a critical data gap in sustainable AI by providing the first large-scale, open PCF-enriched e-commerce dataset. The zero-shot methodology is highly scalable, though absolute precision on high-impact items remains a challenge.
×