Update README.md

This commit is contained in:
pacnpal
2024-12-09 20:28:31 -05:00
parent bc59a325a5
commit 52f6730f2c

221
README.md
View File

@@ -1,191 +1,78 @@
# README.md
# Documentation to PDF Converter
A Python script that clones documentation from a Git repository (default: Next.js), processes it, and generates a well-formatted PDF with table of contents, proper formatting, and consistent styling.
## Export Nextjs Docs Script
The script automates the process of cloning documentation repositories, converting Markdown files to HTML, and generating PDF files. This README covers installation and usage.
---
## Features
- Clones specific documentation directories from Git repositories
- Processes Markdown and MDX files
- Generates table of contents with proper numbering
- Handles code blocks with filename annotations
- Processes frontmatter for metadata
- Supports image path transformations
- Creates PDF with customizable headers and footers
- Includes cover page and proper page breaks
- Clone remote repositories with sparse checkout.
- Convert Markdown files to HTML with code block and image path preprocessing.
- Generate PDFs with custom headers, footers, and styles.
- Automatically create a hierarchical Table of Contents (ToC).
- Detect the latest version of the documentation.
- Handles YAML frontmatter for metadata-rich documentation.
## Requirements
---
### System Requirements
- Python 3.7+
- Git installed and accessible from command line
- Internet connection for cloning repositories
## Installation
### Python Dependencies
Install all required packages using:
```bash
pip install -r requirements.txt
```
### Prerequisites
Then install Playwright's browser:
```bash
playwright install chromium
```
- Python 3.8+
- Required Python packages:
- `markdown`
- `yaml`
- `tqdm`
- `playwright`
- `gitpython`
- Ensure you have Playwright installed and configured:
```bash
pip install playwright
playwright install
```
## Setup
1. Clone this repository:
```bash
git clone
cd
```
2. Install dependencies:
```bash
pip install -r requirements.txt
playwright install chromium
```
3. Ensure you have a `styles.css` file in the same directory as the script. This file should contain your desired CSS styling for the PDF output.
### Clone the Repository and Install
Clone the project repository, create a virtual environment, activate it, and install requirements.
---
```bash
git clone https://github.com/pacnpal/Docs-Exporter.git
cd Docs-Exporter
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
playwright install
```
## Usage
1. Basic usage with default settings (Next.js documentation):
### 1. Clone and Update Nextjs Repository
Run the script to clone or update the remote documentation repository:
```bash
python docs_to_pdf.py
python export-docs.py
```
2. The script will:
- Clone/update the specified Git repository
- Process all documentation files
- Generate a PDF with proper formatting
- Include a cover page and table of contents
### 2. Convert Markdown to HTML
The script automatically processes `.md` and `.mdx` files, converting them to styled HTML.
## Configuration
You can modify these variables in the script for different configurations:
### 3. Generate PDF
A PDF is created with a generated title and ToC. Ensure no other process is using the output file.
### Example Configuration
```python
repo_dir = "nextjs-docs" # Local directory for cloned repo
repo_url = "https://github.com/vercel/next.js.git" # Repository URL
branch = "canary" # Branch to clone
docs_dir = "docs" # Directory containing documentation
# Image URL transformation settings
Change_img_url = True
base_path = "https://nextjs.org/_next/image?url="
path_args = "&w=1920&q=75"
repo_url = "https://github.com/vercel/next.js.git"
branch = "canary"
docs_dir = "docs"
```
## PDF Output Settings
### Output
- PDF file: `Next.js_Docs_vXX.XX.X_YYYY-MM-DD.pdf`
- Logs: Process information printed to the terminal.
The PDF generation includes:
- A4 format
- Custom margins
- Page numbers in header
- Generation date in footer
- Background colors/images
- Proper page breaks between sections
---
## File Organization
## LICENSE
- `docs_to_pdf.py`: Main script file
- `requirements.txt`: Python dependencies
- `styles.css`: CSS styling for PDF output
- `README.md`: This documentation
## Troubleshooting
1. If the PDF file is locked:
- Ensure the output PDF is not open in any application
- Check file permissions
2. If images are not loading:
- Verify internet connection
- Check if image URLs are accessible
- Adjust the `wait_for_load_state` timing if needed
3. If the repository won't clone:
- Verify Git is installed and accessible
- Check internet connection
- Ensure you have access to the repository
## Notes
- The script creates temporary files during processing
- Large documentation sets may take several minutes to process
- Memory usage depends on the size of the documentation
- The script requires active internet connection for repository cloning and image processing
## CSS Recommendations
Your `styles.css` should include at least these basic styles for proper PDF formatting:
```css
body {
font-family: Arial, sans-serif;
line-height: 1.6;
margin: 0;
padding: 20px;
}
.master-container {
display: flex;
justify-content: center;
align-items: center;
min-height: 100vh;
}
.container {
text-align: center;
}
.title {
font-size: 24px;
font-weight: bold;
margin-bottom: 20px;
}
.date {
font-size: 16px;
}
.page-break {
page-break-after: always;
}
code {
background-color: #f4f4f4;
padding: 2px 4px;
border-radius: 4px;
}
pre {
background-color: #f8f8f8;
padding: 15px;
border-radius: 5px;
overflow-x: auto;
}
.code-header {
background-color: #e0e0e0;
padding: 5px 10px;
border-radius: 5px 5px 0 0;
}
table {
border-collapse: collapse;
width: 100%;
margin: 15px 0;
}
th, td {
border: 1px solid #ddd;
padding: 8px;
text-align: left;
}
th {
background-color: #f5f5f5;
}
```
This project is governed by the [LICENSE](LICENSE) file. Please ensure compliance when redistributing or modifying the script.