Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] 🦛 Roadmap for Q1 2025 #123

Open
2 of 29 tasks
shreyashnigam opened this issue Jan 2, 2025 · 0 comments
Open
2 of 29 tasks

[RFC] 🦛 Roadmap for Q1 2025 #123

shreyashnigam opened this issue Jan 2, 2025 · 0 comments
Assignees
Labels
roadmaps Chonkie Roadmaps!

Comments

@shreyashnigam
Copy link
Collaborator

shreyashnigam commented Jan 2, 2025

"Every mighty hippo's journey begins with a single CHONK!" - the wisest person to ever live, probably

Hey Chonkers! 🦛

Your favorite hippo has some big plans for Q1 2025! After swimming through your feedback and munching on some feature requests, we've got a CHONK-tastic roadmap for you

For Q1 2025, we want to focus on the eight following themes. Each theme is discussed in detail in its own section

  1. 🚀 Add new core features!
  2. 🪓 Add more Chunkers!
  3. ✨ Add more Refineries
  4. 📄 Support for more document formats
  5. 🤝 Add more integrations
  6. ⚡ Improve Chonkie efficiency
  7. 🌐 Chonkie as a service
  8. 👥 OSS Community

We'd love your feedback on this roadmap! If you've got ideas or want to contribute, drop them in the comments below! Remember, even tiny hippos can make big splashes!

🚀 New Core Features

Chonkie plans to add a few core features as part of it’s API interface, that allows for seamless chunking~

  • Add initial support for Chomp (Chonkie’s Multi-step Pipeline)
  • Add initial support for Pre-chunkers (Document ingestion support)
  • Add initial support for Genie (Generative model Interaction Engine)
  • Add initial support for Porters (Ease of exporting chunks)

🪓 Add New Chunkers

Core to Chonkie is its Chunking capabilities. In addition to its already 7 supported chunking techniques, we hope to add a few additional chunking techniques based on the latest research

  • 🪆 Recursive Chunking (Done early in v0.4.0!)
  • 🔀 CrossEncoderChunker
  • 🤵🏻‍♂️ Propositional/Agentic Chunker
  • 🪓 LumberChunker

✨ Add New Refineries

Even more means to refine your chunks further!

  • Add ContextualRefinery (using Genie)
  • Introduce an EmbeddingRefinery to generate embeddings for the chunks

📄 Support For More Document Formats

Currently, Chonkie fully supports only .txt and text like formats. This quarter, we hope to add support for more popular formats. In order of priority, we will working to extend support to

  • Add initial support for Pre-chunkers (Document ingestion)
  • Support Markdown (.MD and .MDX)
  • Support PDF
  • Support HTML
  • Support JSON

🤝 Chonkie + Your Favorite Service

Integrations to different services helps Chonkie use embeddings and in the future generative models for chunking easily:

⚡Enhance Chonkie Performance

In addition to speed and space use, we also want to optimize Chonkie’s memory usage

  • Reduce Chonkie’s memory usage during batch chunking
  • Run memory usage test with Chonkie using all chunking techniques
  • Add support for stream=True to reduce peak memory usage
  • Add support for Async embed and chunk operations for APIs

🌐 Chonkie As A Service

To enable the use of Chonkie in live ingestion pipelines, we want to provide Chonkie as a service that watches a document source for changes and posts chunks of modified or new documents automatically.

  • Create database watchers for AWS S3 and GCP BigTable
  • Add support for posting chunk results to popular vector databases
  • Enable live chunking through Chonkie running on a hosted environment

👥 OSS Community

  • Add more code examples for the chunkers and core features!
  • Develop better documentation for onboarding new contributors
  • Create more “good first issues”
  • Support new contributors through peer mentorship

Chonkie is a friendly hippo! If you've got ideas not covered in this roadmap and want to contribute, drop them in the comments below! Remember what Mama Hippo always says: "It takes a village to raise a CHONK!"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
roadmaps Chonkie Roadmaps!
Projects
None yet
Development

No branches or pull requests

2 participants