pacnpal bc59a325a5 Refactored code for Playwright
Refactored the code for Playwright, replacing wkhtmltopdf
2024-12-09 20:26:35 -05:00
2024-12-09 20:26:35 -05:00
2024-12-09 20:26:35 -05:00
2024-12-09 20:26:35 -05:00
2024-05-17 01:44:30 +02:00
2024-12-09 20:26:35 -05:00
2024-05-17 01:44:30 +02:00

README.md

Documentation to PDF Converter

A Python script that clones documentation from a Git repository (default: Next.js), processes it, and generates a well-formatted PDF with table of contents, proper formatting, and consistent styling.

Features

  • Clones specific documentation directories from Git repositories
  • Processes Markdown and MDX files
  • Generates table of contents with proper numbering
  • Handles code blocks with filename annotations
  • Processes frontmatter for metadata
  • Supports image path transformations
  • Creates PDF with customizable headers and footers
  • Includes cover page and proper page breaks

Requirements

System Requirements

  • Python 3.7+
  • Git installed and accessible from command line
  • Internet connection for cloning repositories

Python Dependencies

Install all required packages using:

pip install -r requirements.txt

Then install Playwright's browser:

playwright install chromium

Setup

  1. Clone this repository:
git clone 
cd 
  1. Install dependencies:
pip install -r requirements.txt
playwright install chromium
  1. Ensure you have a styles.css file in the same directory as the script. This file should contain your desired CSS styling for the PDF output.

Usage

  1. Basic usage with default settings (Next.js documentation):
python docs_to_pdf.py
  1. The script will:
    • Clone/update the specified Git repository
    • Process all documentation files
    • Generate a PDF with proper formatting
    • Include a cover page and table of contents

Configuration

You can modify these variables in the script for different configurations:

repo_dir = "nextjs-docs"  # Local directory for cloned repo
repo_url = "https://github.com/vercel/next.js.git"  # Repository URL
branch = "canary"  # Branch to clone
docs_dir = "docs"  # Directory containing documentation

# Image URL transformation settings
Change_img_url = True
base_path = "https://nextjs.org/_next/image?url="
path_args = "&w=1920&q=75"

PDF Output Settings

The PDF generation includes:

  • A4 format
  • Custom margins
  • Page numbers in header
  • Generation date in footer
  • Background colors/images
  • Proper page breaks between sections

File Organization

  • docs_to_pdf.py: Main script file
  • requirements.txt: Python dependencies
  • styles.css: CSS styling for PDF output
  • README.md: This documentation

Troubleshooting

  1. If the PDF file is locked:

    • Ensure the output PDF is not open in any application
    • Check file permissions
  2. If images are not loading:

    • Verify internet connection
    • Check if image URLs are accessible
    • Adjust the wait_for_load_state timing if needed
  3. If the repository won't clone:

    • Verify Git is installed and accessible
    • Check internet connection
    • Ensure you have access to the repository

Notes

  • The script creates temporary files during processing
  • Large documentation sets may take several minutes to process
  • Memory usage depends on the size of the documentation
  • The script requires active internet connection for repository cloning and image processing

CSS Recommendations

Your styles.css should include at least these basic styles for proper PDF formatting:

body {
    font-family: Arial, sans-serif;
    line-height: 1.6;
    margin: 0;
    padding: 20px;
}

.master-container {
    display: flex;
    justify-content: center;
    align-items: center;
    min-height: 100vh;
}

.container {
    text-align: center;
}

.title {
    font-size: 24px;
    font-weight: bold;
    margin-bottom: 20px;
}

.date {
    font-size: 16px;
}

.page-break {
    page-break-after: always;
}

code {
    background-color: #f4f4f4;
    padding: 2px 4px;
    border-radius: 4px;
}

pre {
    background-color: #f8f8f8;
    padding: 15px;
    border-radius: 5px;
    overflow-x: auto;
}

.code-header {
    background-color: #e0e0e0;
    padding: 5px 10px;
    border-radius: 5px 5px 0 0;
}

table {
    border-collapse: collapse;
    width: 100%;
    margin: 15px 0;
}

th, td {
    border: 1px solid #ddd;
    padding: 8px;
    text-align: left;
}

th {
    background-color: #f5f5f5;
}
Description
Converts Next.js documentation to HTML/PDF
Readme AGPL-3.0 129 KiB
Languages
Python 87.3%
CSS 12.7%