mirror of
https://github.com/PacFactory/Docs-Exporter-Astro.git
synced 2025-12-23 11:41:05 -05:00
Refactored code for Playwright
Refactored the code for Playwright, replacing wkhtmltopdf
This commit is contained in:
199
README.md
199
README.md
@@ -1,38 +1,191 @@
|
||||
# Docs-Exporter
|
||||
# README.md
|
||||
# Documentation to PDF Converter
|
||||
|
||||
This script automates the process of exporting Next.js documentation from the GitHub repository, converting it to HTML, and then compiling it into a PDF document. It also ensures that all visual content, including images used in the online documentation, and crucial formatting, such as code blocks and tables, are accurately fetched and included.
|
||||
A Python script that clones documentation from a Git repository (default: Next.js), processes it, and generates a well-formatted PDF with table of contents, proper formatting, and consistent styling.
|
||||
|
||||
## Features
|
||||
- **Accurate Content Replication**: Clones the Next.js documentation from the Canary channel of the GitHub repository and preserves its layout.
|
||||
- **Image Handling**: Fetches and embeds the exact images used in the online documentation, ensuring that all visual explanations and illustrations are retained.
|
||||
- **Advanced Formatting**: Maintains the integrity of advanced formatting elements such as code blocks, tables, and special markdown features, ensuring that the educational value of the documentation is preserved.
|
||||
- **Custom PDF Styling**: Generates a styled PDF document with a cover page and a detailed table of contents, formatted through an external CSS file.
|
||||
|
||||
- Clones specific documentation directories from Git repositories
|
||||
- Processes Markdown and MDX files
|
||||
- Generates table of contents with proper numbering
|
||||
- Handles code blocks with filename annotations
|
||||
- Processes frontmatter for metadata
|
||||
- Supports image path transformations
|
||||
- Creates PDF with customizable headers and footers
|
||||
- Includes cover page and proper page breaks
|
||||
|
||||
## Prerequisites
|
||||
## Requirements
|
||||
|
||||
- Python
|
||||
- Git
|
||||
- wkhtmltopdf
|
||||
### System Requirements
|
||||
- Python 3.7+
|
||||
- Git installed and accessible from command line
|
||||
- Internet connection for cloning repositories
|
||||
|
||||
## Installation
|
||||
- Install `wkhtmltopdf` which is required for PDF generation. You can download it from [wkhtmltopdf downloads](https://wkhtmltopdf.org/downloads.html) and follow the installation instructions for your operating system.
|
||||
- Clone the Repository
|
||||
```bash
|
||||
git clone https://github.com/Riyooo/Docs-Exporter.git
|
||||
```
|
||||
- Go into the Directory
|
||||
```bash
|
||||
cd Docs-Exporter
|
||||
```
|
||||
- Install Python Dependencies
|
||||
### Python Dependencies
|
||||
Install all required packages using:
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
Then install Playwright's browser:
|
||||
```bash
|
||||
playwright install chromium
|
||||
```
|
||||
|
||||
## Setup
|
||||
|
||||
1. Clone this repository:
|
||||
```bash
|
||||
git clone
|
||||
cd
|
||||
```
|
||||
|
||||
2. Install dependencies:
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
playwright install chromium
|
||||
```
|
||||
|
||||
3. Ensure you have a `styles.css` file in the same directory as the script. This file should contain your desired CSS styling for the PDF output.
|
||||
|
||||
## Usage
|
||||
|
||||
To run the script, execute the following command from the root of the repository:
|
||||
1. Basic usage with default settings (Next.js documentation):
|
||||
```bash
|
||||
python export-docs.py
|
||||
python docs_to_pdf.py
|
||||
```
|
||||
|
||||
2. The script will:
|
||||
- Clone/update the specified Git repository
|
||||
- Process all documentation files
|
||||
- Generate a PDF with proper formatting
|
||||
- Include a cover page and table of contents
|
||||
|
||||
## Configuration
|
||||
|
||||
You can modify these variables in the script for different configurations:
|
||||
|
||||
```python
|
||||
repo_dir = "nextjs-docs" # Local directory for cloned repo
|
||||
repo_url = "https://github.com/vercel/next.js.git" # Repository URL
|
||||
branch = "canary" # Branch to clone
|
||||
docs_dir = "docs" # Directory containing documentation
|
||||
|
||||
# Image URL transformation settings
|
||||
Change_img_url = True
|
||||
base_path = "https://nextjs.org/_next/image?url="
|
||||
path_args = "&w=1920&q=75"
|
||||
```
|
||||
|
||||
## PDF Output Settings
|
||||
|
||||
The PDF generation includes:
|
||||
- A4 format
|
||||
- Custom margins
|
||||
- Page numbers in header
|
||||
- Generation date in footer
|
||||
- Background colors/images
|
||||
- Proper page breaks between sections
|
||||
|
||||
## File Organization
|
||||
|
||||
- `docs_to_pdf.py`: Main script file
|
||||
- `requirements.txt`: Python dependencies
|
||||
- `styles.css`: CSS styling for PDF output
|
||||
- `README.md`: This documentation
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
1. If the PDF file is locked:
|
||||
- Ensure the output PDF is not open in any application
|
||||
- Check file permissions
|
||||
|
||||
2. If images are not loading:
|
||||
- Verify internet connection
|
||||
- Check if image URLs are accessible
|
||||
- Adjust the `wait_for_load_state` timing if needed
|
||||
|
||||
3. If the repository won't clone:
|
||||
- Verify Git is installed and accessible
|
||||
- Check internet connection
|
||||
- Ensure you have access to the repository
|
||||
|
||||
## Notes
|
||||
|
||||
- The script creates temporary files during processing
|
||||
- Large documentation sets may take several minutes to process
|
||||
- Memory usage depends on the size of the documentation
|
||||
- The script requires active internet connection for repository cloning and image processing
|
||||
|
||||
## CSS Recommendations
|
||||
|
||||
Your `styles.css` should include at least these basic styles for proper PDF formatting:
|
||||
|
||||
```css
|
||||
body {
|
||||
font-family: Arial, sans-serif;
|
||||
line-height: 1.6;
|
||||
margin: 0;
|
||||
padding: 20px;
|
||||
}
|
||||
|
||||
.master-container {
|
||||
display: flex;
|
||||
justify-content: center;
|
||||
align-items: center;
|
||||
min-height: 100vh;
|
||||
}
|
||||
|
||||
.container {
|
||||
text-align: center;
|
||||
}
|
||||
|
||||
.title {
|
||||
font-size: 24px;
|
||||
font-weight: bold;
|
||||
margin-bottom: 20px;
|
||||
}
|
||||
|
||||
.date {
|
||||
font-size: 16px;
|
||||
}
|
||||
|
||||
.page-break {
|
||||
page-break-after: always;
|
||||
}
|
||||
|
||||
code {
|
||||
background-color: #f4f4f4;
|
||||
padding: 2px 4px;
|
||||
border-radius: 4px;
|
||||
}
|
||||
|
||||
pre {
|
||||
background-color: #f8f8f8;
|
||||
padding: 15px;
|
||||
border-radius: 5px;
|
||||
overflow-x: auto;
|
||||
}
|
||||
|
||||
.code-header {
|
||||
background-color: #e0e0e0;
|
||||
padding: 5px 10px;
|
||||
border-radius: 5px 5px 0 0;
|
||||
}
|
||||
|
||||
table {
|
||||
border-collapse: collapse;
|
||||
width: 100%;
|
||||
margin: 15px 0;
|
||||
}
|
||||
|
||||
th, td {
|
||||
border: 1px solid #ddd;
|
||||
padding: 8px;
|
||||
text-align: left;
|
||||
}
|
||||
|
||||
th {
|
||||
background-color: #f5f5f5;
|
||||
}
|
||||
```
|
||||
Reference in New Issue
Block a user