feat: Upgrade recipe scraper to Python recipe-scrapers library (v2025.10.1) #1

Merged
pkartch merged 1 commits from feature/recipe-scraper-upgrade-v2025.10.1 into main 2025-10-28 18:21:07 +00:00
Owner

Summary

  • Replaced custom Cheerio-based scraper with Python recipe-scrapers library (supports 541+ websites)
  • Fixed servings field parsing and error handling
  • Added Traefik configuration with HTTPS/Let's Encrypt
  • Implemented CalVer versioning system (2025.10.1)
  • Tagged and pushed Docker images to Harbor registry

Changes

Recipe Scraper Enhancement

  • Now supports 541+ recipe websites (same as Mealie)
  • Added Python 3 and recipe-scrapers to Docker container
  • Created Python wrapper script with safe extraction
  • Updated scraper service to call Python via subprocess

Bug Fixes

  • Fixed servings field parsing (string to integer)
  • Added graceful error handling for missing fields
  • Fixed Prisma binary targets for Alpine Linux
  • Removed obsolete test files

Infrastructure

  • Configured for basil.pkartchner.com with Traefik
  • Updated CORS settings for production
  • Added VERSION file and version.sh management script

Testing

Successfully tested with Food Network, Bon Appetit, Food.com, Serious Eats
Verified full import and save workflow
Confirmed ingredients and instructions display correctly

🤖 Generated with Claude Code

## Summary - Replaced custom Cheerio-based scraper with Python recipe-scrapers library (supports 541+ websites) - Fixed servings field parsing and error handling - Added Traefik configuration with HTTPS/Let's Encrypt - Implemented CalVer versioning system (2025.10.1) - Tagged and pushed Docker images to Harbor registry ## Changes ### Recipe Scraper Enhancement - Now supports 541+ recipe websites (same as Mealie) - Added Python 3 and recipe-scrapers to Docker container - Created Python wrapper script with safe extraction - Updated scraper service to call Python via subprocess ### Bug Fixes - Fixed servings field parsing (string to integer) - Added graceful error handling for missing fields - Fixed Prisma binary targets for Alpine Linux - Removed obsolete test files ### Infrastructure - Configured for basil.pkartchner.com with Traefik - Updated CORS settings for production - Added VERSION file and version.sh management script ## Testing ✅ Successfully tested with Food Network, Bon Appetit, Food.com, Serious Eats ✅ Verified full import and save workflow ✅ Confirmed ingredients and instructions display correctly 🤖 Generated with [Claude Code](https://claude.com/claude-code)
pkartch added 1 commit 2025-10-28 18:17:30 +00:00
feat: upgrade recipe scraper to Python recipe-scrapers library (v2025.10.1)
Some checks failed
CI Pipeline / Lint Code (pull_request) Has been cancelled
CI Pipeline / Test API Package (pull_request) Has been cancelled
CI Pipeline / Test Web Package (pull_request) Has been cancelled
CI Pipeline / Test Shared Package (pull_request) Has been cancelled
CI Pipeline / Build All Packages (pull_request) Has been cancelled
CI Pipeline / Generate Coverage Report (pull_request) Has been cancelled
Docker Build & Deploy / Build Docker Images (pull_request) Has been cancelled
Docker Build & Deploy / Push Docker Images (pull_request) Has been cancelled
Docker Build & Deploy / Deploy to Staging (pull_request) Has been cancelled
Docker Build & Deploy / Deploy to Production (pull_request) Has been cancelled
E2E Tests / End-to-End Tests (pull_request) Has been cancelled
E2E Tests / E2E Tests (Mobile) (pull_request) Has been cancelled
Security Scanning / NPM Audit (pull_request) Has been cancelled
Security Scanning / Dependency License Check (pull_request) Has been cancelled
Security Scanning / Code Quality Scan (pull_request) Has been cancelled
Security Scanning / Docker Image Security (pull_request) Has been cancelled
Security Scanning / Security Summary (pull_request) Has been cancelled
0945d8f3e1
## Changes

### Recipe Scraper Enhancement
- Replaced custom Cheerio-based scraper with Python recipe-scrapers library
- Now supports 541+ recipe websites (same as Mealie)
- Added Python 3 and recipe-scrapers to Docker container
- Created Python wrapper script (packages/api/scripts/scrape_recipe.py)
- Updated scraper service to call Python script via subprocess

### Bug Fixes
- Fixed servings field parsing (string to integer conversion)
- Added safe extraction with graceful error handling
- Removed obsolete test file that was breaking builds
- Fixed Prisma binary targets for Alpine Linux

### Infrastructure
- Added Traefik configuration for HTTPS with Let's Encrypt
- Updated CORS settings for production domain
- Configured for basil.pkartchner.com

### Version Management
- Implemented CalVer versioning (Year.Month.Increment)
- Added VERSION file (2025.10.1)
- Created version.sh script for managing releases
- Tagged and pushed Docker images to Harbor registry

### Database
- Updated Prisma schema with correct binary targets
- Applied initial migration for all tables

### Build Improvements
- Excluded test files from TypeScript compilation
- Removed non-existent dependencies
- Optimized Docker build process

## Testing
- Successfully tested with Food Network, Bon Appetit, Food.com
- Verified full import and save workflow
- Confirmed ingredients and instructions display correctly

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
pkartch merged commit 0db8180d8a into main 2025-10-28 18:21:07 +00:00
pkartch deleted branch feature/recipe-scraper-upgrade-v2025.10.1 2025-10-28 18:21:08 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: pkartch/basil#1