Update README.md

This commit is contained in:
pacnpal
2024-12-09 20:28:31 -05:00
parent bc59a325a5
commit 52f6730f2c

221
README.md
View File

@@ -1,191 +1,78 @@
# README.md # README.md
# Documentation to PDF Converter
A Python script that clones documentation from a Git repository (default: Next.js), processes it, and generates a well-formatted PDF with table of contents, proper formatting, and consistent styling. ## Export Nextjs Docs Script
The script automates the process of cloning documentation repositories, converting Markdown files to HTML, and generating PDF files. This README covers installation and usage.
---
## Features ## Features
- Clones specific documentation directories from Git repositories - Clone remote repositories with sparse checkout.
- Processes Markdown and MDX files - Convert Markdown files to HTML with code block and image path preprocessing.
- Generates table of contents with proper numbering - Generate PDFs with custom headers, footers, and styles.
- Handles code blocks with filename annotations - Automatically create a hierarchical Table of Contents (ToC).
- Processes frontmatter for metadata - Detect the latest version of the documentation.
- Supports image path transformations - Handles YAML frontmatter for metadata-rich documentation.
- Creates PDF with customizable headers and footers
- Includes cover page and proper page breaks
## Requirements ---
### System Requirements ## Installation
- Python 3.7+
- Git installed and accessible from command line
- Internet connection for cloning repositories
### Python Dependencies ### Prerequisites
Install all required packages using:
- Python 3.8+
- Required Python packages:
- `markdown`
- `yaml`
- `tqdm`
- `playwright`
- `gitpython`
- Ensure you have Playwright installed and configured:
```bash ```bash
pip install playwright
playwright install
```
### Clone the Repository and Install
Clone the project repository, create a virtual environment, activate it, and install requirements.
---
```bash
git clone https://github.com/pacnpal/Docs-Exporter.git
cd Docs-Exporter
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt pip install -r requirements.txt
playwright install
``` ```
Then install Playwright's browser:
```bash
playwright install chromium
```
## Setup
1. Clone this repository:
```bash
git clone
cd
```
2. Install dependencies:
```bash
pip install -r requirements.txt
playwright install chromium
```
3. Ensure you have a `styles.css` file in the same directory as the script. This file should contain your desired CSS styling for the PDF output.
## Usage ## Usage
1. Basic usage with default settings (Next.js documentation): ### 1. Clone and Update Nextjs Repository
Run the script to clone or update the remote documentation repository:
```bash ```bash
python docs_to_pdf.py python export-docs.py
``` ```
2. The script will: ### 2. Convert Markdown to HTML
- Clone/update the specified Git repository The script automatically processes `.md` and `.mdx` files, converting them to styled HTML.
- Process all documentation files
- Generate a PDF with proper formatting
- Include a cover page and table of contents
## Configuration ### 3. Generate PDF
A PDF is created with a generated title and ToC. Ensure no other process is using the output file.
You can modify these variables in the script for different configurations:
### Example Configuration
```python ```python
repo_dir = "nextjs-docs" # Local directory for cloned repo repo_url = "https://github.com/vercel/next.js.git"
repo_url = "https://github.com/vercel/next.js.git" # Repository URL branch = "canary"
branch = "canary" # Branch to clone docs_dir = "docs"
docs_dir = "docs" # Directory containing documentation
# Image URL transformation settings
Change_img_url = True
base_path = "https://nextjs.org/_next/image?url="
path_args = "&w=1920&q=75"
``` ```
## PDF Output Settings ### Output
- PDF file: `Next.js_Docs_vXX.XX.X_YYYY-MM-DD.pdf`
- Logs: Process information printed to the terminal.
The PDF generation includes: ---
- A4 format
- Custom margins
- Page numbers in header
- Generation date in footer
- Background colors/images
- Proper page breaks between sections
## File Organization ## LICENSE
- `docs_to_pdf.py`: Main script file This project is governed by the [LICENSE](LICENSE) file. Please ensure compliance when redistributing or modifying the script.
- `requirements.txt`: Python dependencies
- `styles.css`: CSS styling for PDF output
- `README.md`: This documentation
## Troubleshooting
1. If the PDF file is locked:
- Ensure the output PDF is not open in any application
- Check file permissions
2. If images are not loading:
- Verify internet connection
- Check if image URLs are accessible
- Adjust the `wait_for_load_state` timing if needed
3. If the repository won't clone:
- Verify Git is installed and accessible
- Check internet connection
- Ensure you have access to the repository
## Notes
- The script creates temporary files during processing
- Large documentation sets may take several minutes to process
- Memory usage depends on the size of the documentation
- The script requires active internet connection for repository cloning and image processing
## CSS Recommendations
Your `styles.css` should include at least these basic styles for proper PDF formatting:
```css
body {
font-family: Arial, sans-serif;
line-height: 1.6;
margin: 0;
padding: 20px;
}
.master-container {
display: flex;
justify-content: center;
align-items: center;
min-height: 100vh;
}
.container {
text-align: center;
}
.title {
font-size: 24px;
font-weight: bold;
margin-bottom: 20px;
}
.date {
font-size: 16px;
}
.page-break {
page-break-after: always;
}
code {
background-color: #f4f4f4;
padding: 2px 4px;
border-radius: 4px;
}
pre {
background-color: #f8f8f8;
padding: 15px;
border-radius: 5px;
overflow-x: auto;
}
.code-header {
background-color: #e0e0e0;
padding: 5px 10px;
border-radius: 5px 5px 0 0;
}
table {
border-collapse: collapse;
width: 100%;
margin: 15px 0;
}
th, td {
border: 1px solid #ddd;
padding: 8px;
text-align: left;
}
th {
background-color: #f5f5f5;
}
```