This repository showcases practical implementations of various Natural Language Processing (NLP) and Generative AI techniques. It includes both foundational methods and advanced architectures, demonstrating a comprehensive approach to understanding and applying these technologies.
- Counting Tokens: Tokenizing text and counting tokens for preprocessing tasks.
- Bag-of-Words & TF-IDF: Creating document-term matrices and extracting meaningful features from text.
- Word2Vec: Generating dense vector representations that capture word semantics and relationships.
- LSTM (Long Short-Term Memory): Sequence modeling for tasks like text generation and sentiment analysis.
- Autoencoders: Compressing and reconstructing text data for unsupervised learning applications.
- Seq2Seq (Sequence-to-Sequence): Building models for tasks such as machine translation, text summarization, and more.
- BERT (Bidirectional Encoder Representations from Transformers): Fine-tuning state-of-the-art models for advanced NLP applications.