Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Add a camera and microphone so TARS can see you and hear you. #2

Open
rkeshwani opened this issue Aug 11, 2024 · 7 comments
Open

Comments

@rkeshwani
Copy link

GPT models are now multi-modal so would be nice if the cad file had a spot for a camera that could be connected. Same goes for the microphone.

@poboisvert
Copy link
Owner

You can find on Youtube tutorial to extrude the"spots" required to add microphone and camera

@JFerguson576
Copy link

JFerguson576 commented Aug 12, 2024 via email

@rkeshwani
Copy link
Author

You could use function calling from large language models to call python functions to connect to movement and other functionality. For voice I suggest taking a look at https://github.com/coqui-ai/TTS.

@rkeshwani
Copy link
Author

@poboisvert I'll take a look, not much recent experience with CAD. Do you have any free software suggestions? I have FreeCAD installed but I find the navigation a little clunky. Also, I was unable to download the file linked here but I was able to access the original. It seems at first glance that the hands are partially completed but might be due to my own ignorance around how the CAD software works. I see it is completed piecemeal. Example, no idea where smaller servos go. The arms appear to be partially completed but not linked to any servo. I am wanting to use aluminum for the outside and plastic for the inside but trying to figure out the best way to sort out the internal components.

@SAMSAMPOP
Copy link

I too am struggling with the code for TARs CHATGPT integration. Am currently working through the python scripts. I have the internals assembled and just calling the tars_runner.py file. I did get the servos working but it's stopped for some reason. Anyway, if anyone has success with the TARS voice and is happy to share I would be super grateful. Thank you

@rkeshwani
Copy link
Author

I've got a working prototype on the voice to text to AI on this except for the TARS voice. I found this library that could be used but unsure of copywrite rules about voice clips. https://docs.cartesia.ai/getting-started/using-the-api

For the microphone. I'm using what I have for now but here is a potential device:
https://www.amazon.com/DEWIN-Microphone-Portable-Household-Recording/dp/B086DRRP79/ref=sr_1_4?crid=2MWJ0DR7IZCN3&dib=eyJ2IjoiMSJ9.mMEXdxDyLwei6orkRikf2i9utuskE-QfhPpD5qbiqOg8TilnPwnQWio-JE7UqNmZ4KMpNg4CTbgnR_sOPbYEW0rpVCSI4gf2ROEi_2Lnisc32GCPYuCJCNRI8uYeHA2rDAiqEJzS2wvM81L5FafZ0ok0pGnLtmjW-Rkdi4_BQUleUct-kFcJjY81I7aIJk2dVvDKsyJHUbwChVeKltMqGHL2gSJ-UXe00ycY4L2d_kg.MLqbxDm9ERU84-7O-lVsnN73dl8xhkSy1qp_1YpaiY4&dib_tag=se&keywords=usb+microphone+for+raspberry+pi&qid=1725731111&sprefix=usb+microphone+for+%2Caps%2C135&sr=8-4

Use pyaudio to send voice to https://console.groq.com/docs/speech-text
If you have a powerful enough board, you could run the tiny-whisper model locally on device.
Then send that to your favorite LLM. Then send that text to cartesia.

The https://github.com/coqui-ai/TTS. I mentioned above is too heavy of a library and won't run on my sbc board but it could run locally if you have a nvidia jetson nano or a coral tpu.

Once I have something more refined and a camera working I will create a pull request.

@pyrater
Copy link

pyrater commented Nov 24, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants