mirror of
https://github.com/PacFactory/Docs-Exporter-Nextjs.git
synced 2025-12-19 19:21:05 -05:00
bc59a325a5c9b7c4e607b53c13ec737930ed8dd5
Refactored the code for Playwright, replacing wkhtmltopdf
README.md
Documentation to PDF Converter
A Python script that clones documentation from a Git repository (default: Next.js), processes it, and generates a well-formatted PDF with table of contents, proper formatting, and consistent styling.
Features
- Clones specific documentation directories from Git repositories
- Processes Markdown and MDX files
- Generates table of contents with proper numbering
- Handles code blocks with filename annotations
- Processes frontmatter for metadata
- Supports image path transformations
- Creates PDF with customizable headers and footers
- Includes cover page and proper page breaks
Requirements
System Requirements
- Python 3.7+
- Git installed and accessible from command line
- Internet connection for cloning repositories
Python Dependencies
Install all required packages using:
pip install -r requirements.txt
Then install Playwright's browser:
playwright install chromium
Setup
- Clone this repository:
git clone
cd
- Install dependencies:
pip install -r requirements.txt
playwright install chromium
- Ensure you have a
styles.cssfile in the same directory as the script. This file should contain your desired CSS styling for the PDF output.
Usage
- Basic usage with default settings (Next.js documentation):
python docs_to_pdf.py
- The script will:
- Clone/update the specified Git repository
- Process all documentation files
- Generate a PDF with proper formatting
- Include a cover page and table of contents
Configuration
You can modify these variables in the script for different configurations:
repo_dir = "nextjs-docs" # Local directory for cloned repo
repo_url = "https://github.com/vercel/next.js.git" # Repository URL
branch = "canary" # Branch to clone
docs_dir = "docs" # Directory containing documentation
# Image URL transformation settings
Change_img_url = True
base_path = "https://nextjs.org/_next/image?url="
path_args = "&w=1920&q=75"
PDF Output Settings
The PDF generation includes:
- A4 format
- Custom margins
- Page numbers in header
- Generation date in footer
- Background colors/images
- Proper page breaks between sections
File Organization
docs_to_pdf.py: Main script filerequirements.txt: Python dependenciesstyles.css: CSS styling for PDF outputREADME.md: This documentation
Troubleshooting
-
If the PDF file is locked:
- Ensure the output PDF is not open in any application
- Check file permissions
-
If images are not loading:
- Verify internet connection
- Check if image URLs are accessible
- Adjust the
wait_for_load_statetiming if needed
-
If the repository won't clone:
- Verify Git is installed and accessible
- Check internet connection
- Ensure you have access to the repository
Notes
- The script creates temporary files during processing
- Large documentation sets may take several minutes to process
- Memory usage depends on the size of the documentation
- The script requires active internet connection for repository cloning and image processing
CSS Recommendations
Your styles.css should include at least these basic styles for proper PDF formatting:
body {
font-family: Arial, sans-serif;
line-height: 1.6;
margin: 0;
padding: 20px;
}
.master-container {
display: flex;
justify-content: center;
align-items: center;
min-height: 100vh;
}
.container {
text-align: center;
}
.title {
font-size: 24px;
font-weight: bold;
margin-bottom: 20px;
}
.date {
font-size: 16px;
}
.page-break {
page-break-after: always;
}
code {
background-color: #f4f4f4;
padding: 2px 4px;
border-radius: 4px;
}
pre {
background-color: #f8f8f8;
padding: 15px;
border-radius: 5px;
overflow-x: auto;
}
.code-header {
background-color: #e0e0e0;
padding: 5px 10px;
border-radius: 5px 5px 0 0;
}
table {
border-collapse: collapse;
width: 100%;
margin: 15px 0;
}
th, td {
border: 1px solid #ddd;
padding: 8px;
text-align: left;
}
th {
background-color: #f5f5f5;
}
Languages
Python
87.3%
CSS
12.7%