This project focuses on semantic similarity scoring between questions and paragraphs using different methodologies. The dataset consists of train and test questions organized in two separate sheets.
- Programming Languages: Python
- Libraries & Frameworks: Sentence Transformers, Parrot, Levenshtein, Pandas, NumPy
- Similarity Metrics: Cosine Similarity, Levenshtein Distance
- Tools: Jupyter Notebook, Excel
- Methodology: Sentence Transformer for embedding generation, Cosine Similarity for similarity calculation.
- Accuracy: 78%
- Methodology: Custom embeddings using the Levenshtein library for similarity computation.
- Accuracy: 43%
- Methodology: Data augmentation using Parrot, Sentence Transformer for embeddings, and Cosine Similarity for similarity calculation.
- Accuracy: 83%
- Methodology: Sentence Transformer for embedding generation and Cosine Similarity for pairwise paragraph similarity.
- The dataset is organized into two sheets:
- Train Questions
- Test Questions
To run the v1_Semantic Paragraph Similarity Scoring.ipynb
, install the required Parrot library:
pip install git+https://github.com/PrithivirajDamodaran/Parrot_Paraphraser.git