Telegram Group & Telegram Channel
🖥 PDF CRAFT-a python library for converting PDF (primarily scanned books) in Markdown and EPUB using local AI models and LLM to structure the contents
Github

Basic possibilities

- extracting text and layout
Uses the combination of Doclayout-Yolo and its own algorithms for detecting and filtering headlines, columns, footnotes and page numbers

- Local OCR
Recognizes the text on the page via Onnxocr, supports acceleration on GPU (CUDA)

- Determining the order of reading
With the help of LayoutReader, it builds a flow of text in the order in which it is perceived by a person

- Converting in Markdown
Generates .MD with relative links to images (illustrations, tables, formulas) in the Assets folder

Installation and requirements
Python ≥ 3.10 (recommended 3.10.16).

Pip Install PDF-Craft and PIP Install Onnxruntime == 1.21.0 (or Onnxruntime-GPU == 1.21.0 for CUDA).

For an EPUB conveier, you need access to the LLM service (for example, Deepseek).

🟡 Github


#پایتون #Python #library

🆔 @Python4all_pro



tg-me.com/Python4all_pro/1585
Create:
Last Update:

🖥 PDF CRAFT-a python library for converting PDF (primarily scanned books) in Markdown and EPUB using local AI models and LLM to structure the contents
Github

Basic possibilities

- extracting text and layout
Uses the combination of Doclayout-Yolo and its own algorithms for detecting and filtering headlines, columns, footnotes and page numbers

- Local OCR
Recognizes the text on the page via Onnxocr, supports acceleration on GPU (CUDA)

- Determining the order of reading
With the help of LayoutReader, it builds a flow of text in the order in which it is perceived by a person

- Converting in Markdown
Generates .MD with relative links to images (illustrations, tables, formulas) in the Assets folder

Installation and requirements
Python ≥ 3.10 (recommended 3.10.16).

Pip Install PDF-Craft and PIP Install Onnxruntime == 1.21.0 (or Onnxruntime-GPU == 1.21.0 for CUDA).

For an EPUB conveier, you need access to the LLM service (for example, Deepseek).

🟡 Github


#پایتون #Python #library

🆔 @Python4all_pro

BY پایتون ( Machine Learning | Data Science )


Warning: Undefined variable $i in /var/www/tg-me/post.php on line 283

Share with your friend now:
tg-me.com/Python4all_pro/1585

View MORE
Open in Telegram


telegram Telegram | DID YOU KNOW?

Date: |

The SSE was the first modern stock exchange to open in China, with trading commencing in 1990. It has now grown to become the largest stock exchange in Asia and the third-largest in the world by market capitalization, which stood at RMB 50.6 trillion (US$7.8 trillion) as of September 2021. Stocks (both A-shares and B-shares), bonds, funds, and derivatives are traded on the exchange. The SEE has two trading boards, the Main Board and the Science and Technology Innovation Board, the latter more commonly known as the STAR Market. The Main Board mainly hosts large, well-established Chinese companies and lists both A-shares and B-shares.

A project of our size needs at least a few hundred million dollars per year to keep going,” Mr. Durov wrote in his public channel on Telegram late last year. “While doing that, we will remain independent and stay true to our values, redefining how a tech company should operate.

telegram from fr


Telegram پایتون ( Machine Learning | Data Science )
FROM USA