Skip to content

Transform your PDFs into captivating audio podcasts with this PDF-to-Podcast pipeline! Combining advanced language models and high-quality Text To Speech (TTS), it converts text into polished audio, ideal for creators, researchers, or anyone needing audio accessibility.

License

Notifications You must be signed in to change notification settings

MoAshour93/Construction_Convert_Books_to_Podcasts

Repository files navigation

🎙️ PDF to Podcast Conversion with LLMs 📄➡️🎧

This repository hosts a pipeline for transforming PDF files into audio podcasts! Leveraging advanced Language Models (LLMs) and Text-to-Speech (TTS) technology, this tool converts textual PDF information into engaging, high-quality audio content. It’s ideal for creators, researchers, or anyone looking to make static documents more accessible through audio.


📑 Table of Contents

  1. 📋 Project Overview
  2. ✨ Features
  3. 🔧 Installation
  4. 🔄 Workflow
  5. 📝 Detailed Steps
  6. 📋 Requirements
  7. 🚀 Usage
  8. 📂 File Structure
  9. 🙏 Acknowledgments

📋 Project Overview

This project uses modern LLMs and TTS technology to automate the conversion of PDFs into audio podcasts. Built on Llama 3.1 8b and high-quality TTS models (Parler and Bark), this tool delivers a seamless process from raw PDF to professional-sounding podcast with minimal manual input. The modular Jupyter Notebook structure makes each step transparent and easy to customize.


✨ Features

  • 📄 Automated PDF Text Extraction: Quickly extracts text from PDFs, accommodating various formats.
  • 🧠 LLM-Driven Content Generation: Creates scripts that are rephrased and summarized for podcast-style delivery.
  • 📝 Script Refinement: Enhances readability and flow, tailoring content for audio listeners.
  • 🎙️ Natural Sounding TTS Output: Uses Parler and Bark TTS models to produce clear, natural-sounding audio.

🔧 Installation

To set up the environment, ensure Python is installed, then execute:

pip install -r requirements.txt

This installs dependencies, including foundational packages like torch and transformers needed for TTS and NLP tasks.


🔄 Workflow

The project workflow includes four primary stages, each with a dedicated notebook:

  1. Extract Text 📄 from the PDF.
  2. Generate Podcast Script 📝 using Llama 3.1 8b.
  3. Refine Script ✍️ for clarity and flow.
  4. Synthesize Audio 🎧 for podcast-ready MP3 output.

📝 Detailed Steps

📕 Step 1: PDF Text Extraction

  • Notebook: Step1_PDF_preprocessing.ipynb
  • Purpose: Extracts text from PDF documents for easy processing.
  • Libraries: Utilizes PyPDF2 for PDF parsing.
  • Process:
    • Converts each PDF page into cleaned text.
    • Saves extracted text as a .txt file, ready for LLM processing.

✍️ Step 2: Podcast Script Generation

  • Notebook: Step2_Transcript_Writer.ipynb
  • Purpose: Uses Llama 3.1 8b to summarize and rephrase content for audio.
  • Highlights:
    • The LLM converts extracted text into a listener-friendly script.
    • Produces a draft script saved as a text file for refinement.

🛠️ Step 3: Transcript Refinement

  • Notebook: Step3_Rewriter.ipynb
  • Purpose: Polishes the initial script, optimizing it for audio narration.
  • Process:
    • Improves readability and conversational flow.
    • Reduces redundancy and clarifies complex ideas.

🎙️ Step 4: Audio Generation

  • Notebook: Step4_TTS_Workflow.ipynb
  • Purpose: Converts the refined script into high-quality audio.
  • Models Used: Combines Parler and Bark TTS models.
  • Output: Creates an .mp3 file that’s ready for playback or distribution.

📋 Requirements

The requirements.txt file provides a comprehensive list of dependencies, including:

  • NLP and LLM: transformers, torch, datasets
  • PDF Parsing: PyPDF2
  • Audio Processing: pydub, audioread, librosa
  • TTS Models: parler_tts, bark

Key Dependencies

  • LLM Processing: transformers, torch
  • Text Extraction: PyPDF2
  • Audio Tools: pydub

🚀 Usage

  1. Prepare PDF Files 📄: Place PDFs in a folder accessible to the notebooks.
  2. Run Notebooks Sequentially:
    • Step 1: Run Step1_PDF_preprocessing.ipynb to extract text.
    • Step 2: Run Step2_Transcript_Writer.ipynb to create a script.
    • Step 3: Run Step3_Rewriter.ipynb to refine the script.
    • Step 4: Run Step4_TTS_Workflow.ipynb to generate audio.
  3. Review Output 📄: Check the output at each step to ensure quality.

📂 File Structure

The project includes the following files:

  • Step1_PDF_preprocessing.ipynb: PDF text extraction.
  • Step2_Transcript_Writer.ipynb: Script generation.
  • Step3_Rewriter.ipynb: Script refinement.
  • Step4_TTS_Workflow.ipynb: Audio synthesis.
  • requirements.txt: Lists all dependencies.

🙏 Acknowledgments

This project is powered by the amazing open-source tools provided by:

  • Meta AI for the Llama 3.1 8b language model.
  • Parler and Bark TTS for high-quality audio generation.
  • PyPDF2 and Pydub for essential PDF and audio handling.

🔗 General Links & Resources

About

Transform your PDFs into captivating audio podcasts with this PDF-to-Podcast pipeline! Combining advanced language models and high-quality Text To Speech (TTS), it converts text into polished audio, ideal for creators, researchers, or anyone needing audio accessibility.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published