Health Test Data Extraction and Analysis

This project streamlines the extraction of clinical test results from PDF documents, processes the data with OpenAI's GPT model to identify and extract specific test values, stores the results in a MySQL database, and allows users to interactively query the data using natural language questions. It's designed to enhance accessibility and analysis of medical test data for healthcare professionals and researchers.

Features

PDF Text Extraction: Automatically extracts text from PDF documents containing clinical test results.
Data Processing with OpenAI: Utilizes OpenAI's powerful GPT model to identify and extract specific test results from the unstructured text data.
Data Storage: Processes and stores the extracted data in a structured MySQL database format for easy access and analysis.
Dynamic Configuration: Easy updates without code changes through external configuration files for tests of interest and database schema mappings.
Interactive Data Queries: Ability to ask questions in natural language about the data, with the system generating SQL queries, executing them, and returning results.

Getting Started

Prerequisites

Python 3.8+
MySQL Server
OpenAI API Key
Required Python packages: pdfplumber, openai, mysql-connector-python, python-dotenv

Installation

Clone the repository to your local machine:

git clone https://github.com/gauri-nagavkar/Talk_with_your_data.git

Navigate to the project directory:
```
cd Talk_with_your_data
```
Install the required Python packages:
```
pip install -r requirements.txt
```

Set up your .env file with the necessary environment variables:

OPENAI_API_KEY=your_openai_api_key_here
DB_HOST=localhost
DB_USER=your_database_user
DB_PASSWORD=your_database_password
DB_NAME=your_database_name

Adjust the tests_of_interest.txt and data_mapping.json files to match your specific requirements.

Usage

To insert the results from your pdf file into the database, run the following command, followed by the path to your pdf file:
```
python read_insert.py
```
To talk with your data, run the answer_questions.py file followed by your question.

Configuration

Tests of Interest: Update tests_of_interest.txt to modify or add new tests to be extracted.
Database Mapping: Adjust data_mapping.json to map extracted test names to your database column names.

Contributing

Contributions are welcome! Please feel free to submit pull requests, report bugs, and suggest features.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

OpenAI for providing the GPT model used for data extraction and processing.
The developers of pdfplumber, mysql-connector-python, and python-dotenv for their excellent Python packages.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
README.md		README.md
answer_questions.py		answer_questions.py
create_table.py		create_table.py
data_mapping.json		data_mapping.json
read_insert_report.py		read_insert_report.py
requirements.txt		requirements.txt
tests_of_interest.txt		tests_of_interest.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Health Test Data Extraction and Analysis

Features

Getting Started

Prerequisites

Installation

Usage

Configuration

Contributing

License

Acknowledgments

About

Releases

Packages

Languages

gauri-nagavkar/Ask_your_data

Folders and files

Latest commit

History

Repository files navigation

Health Test Data Extraction and Analysis

Features

Getting Started

Prerequisites

Installation

Usage

Configuration

Contributing

License

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages