Telegram Group & Telegram Channel
🖥 PDF CRAFT-a python library for converting PDF (primarily scanned books) in Markdown and EPUB using local AI models and LLM to structure the contents
Github

Basic possibilities

- extracting text and layout
Uses the combination of Doclayout-Yolo and its own algorithms for detecting and filtering headlines, columns, footnotes and page numbers

- Local OCR
Recognizes the text on the page via Onnxocr, supports acceleration on GPU (CUDA)

- Determining the order of reading
With the help of LayoutReader, it builds a flow of text in the order in which it is perceived by a person

- Converting in Markdown
Generates .MD with relative links to images (illustrations, tables, formulas) in the Assets folder

Installation and requirements
Python ≥ 3.10 (recommended 3.10.16).

Pip Install PDF-Craft and PIP Install Onnxruntime == 1.21.0 (or Onnxruntime-GPU == 1.21.0 for CUDA).

For an EPUB conveier, you need access to the LLM service (for example, Deepseek).

🟡 Github


#پایتون #Python #library

🆔 @Python4all_pro



tg-me.com/Python4all_pro/1585
Create:
Last Update:

🖥 PDF CRAFT-a python library for converting PDF (primarily scanned books) in Markdown and EPUB using local AI models and LLM to structure the contents
Github

Basic possibilities

- extracting text and layout
Uses the combination of Doclayout-Yolo and its own algorithms for detecting and filtering headlines, columns, footnotes and page numbers

- Local OCR
Recognizes the text on the page via Onnxocr, supports acceleration on GPU (CUDA)

- Determining the order of reading
With the help of LayoutReader, it builds a flow of text in the order in which it is perceived by a person

- Converting in Markdown
Generates .MD with relative links to images (illustrations, tables, formulas) in the Assets folder

Installation and requirements
Python ≥ 3.10 (recommended 3.10.16).

Pip Install PDF-Craft and PIP Install Onnxruntime == 1.21.0 (or Onnxruntime-GPU == 1.21.0 for CUDA).

For an EPUB conveier, you need access to the LLM service (for example, Deepseek).

🟡 Github


#پایتون #Python #library

🆔 @Python4all_pro

BY پایتون ( Machine Learning | Data Science )


Warning: Undefined variable $i in /var/www/tg-me/post.php on line 283

Share with your friend now:
tg-me.com/Python4all_pro/1585

View MORE
Open in Telegram


telegram Telegram | DID YOU KNOW?

Date: |

Traders also expressed uncertainty about the situation with China Evergrande, as the indebted property company has not provided clarification about a key interest payment.In economic news, the Commerce Department reported an unexpected increase in U.S. new home sales in August.Crude oil prices climbed Friday and front-month WTI oil futures contracts saw gains for a fifth straight week amid tighter supplies. West Texas Intermediate Crude oil futures for November rose $0.68 or 0.9 percent at 73.98 a barrel. WTI Crude futures gained 2.8 percent for the week.

Among the actives, Ascendas REIT sank 0.64 percent, while CapitaLand Integrated Commercial Trust plummeted 1.42 percent, City Developments plunged 1.12 percent, Dairy Farm International tumbled 0.86 percent, DBS Group skidded 0.68 percent, Genting Singapore retreated 0.67 percent, Hongkong Land climbed 1.30 percent, Mapletree Commercial Trust lost 0.47 percent, Mapletree Logistics Trust tanked 0.95 percent, Oversea-Chinese Banking Corporation dropped 0.61 percent, SATS rose 0.24 percent, SembCorp Industries shed 0.54 percent, Singapore Airlines surrendered 0.79 percent, Singapore Exchange slid 0.30 percent, Singapore Press Holdings declined 1.03 percent, Singapore Technologies Engineering dipped 0.26 percent, SingTel advanced 0.81 percent, United Overseas Bank fell 0.39 percent, Wilmar International eased 0.24 percent, Yangzijiang Shipbuilding jumped 1.42 percent and Keppel Corp, Thai Beverage, CapitaLand and Comfort DelGro were unchanged.

telegram from sg


Telegram پایتون ( Machine Learning | Data Science )
FROM USA