How I embedded ChatGPT into Telegram.

ChatGPT is a thing that everyone is playing with right now with varying degrees of effectiveness. It's insanely interesting, creepy, weird. But it can't be helped, it's already here =) The main inconvenience of interacting with ChatGPT is the website. And we want to drag it all into our cozy Telegram, right? And with the ability to make requests not only by text, but also by voice. Let's get to it.

Base

Python is ideal for bots. We take Poetry and put dependencies for the project:

[tool.poetry.dependencies]
python-telegram-bot = {version = "^20.0a4", allow-prereleases = true}
pydantic = "^1.10.2"
loguru = "^0.6.0"
pydub = "^0.25.1"
openai = "^0.27.2"
emoji = "^2.2.0"

The framework of the bot looks like this:

from telegram import Update
from telegram.constants import ParseMode, ChatAction
from telegram.ext import Application, CommandHandler, ConversationHandler, CallbackContext, ContextTypes, MessageHandler, filters

application = Application.builder().token(TOUR_BOT_TOKEN).build()

async def noncommand_handler(update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
    """ Sending prompt to ChatGPT """
    response = await ChatGPTRequest(prompt=update.message.text) # here we make request to chatgpt
    await update.message.reply_text(response, parse_mode=ParseMode.MARKDOWN)
    return ConversationHandler.END


def new_app() -> None:
    # Command handlers
    commands.setup_handlers(application)
    application.add_handler(MessageHandler(filters.TEXT & ~filters.COMMAND, noncommand_handler))

    application.run_polling()

if __name__ == '__main__':
    new_app()

Prompt to ChatGPT

So, the bot works and receives text, which we can now send as a prompt to ChatGPT.

To do this, we'll use the OpenAI API, for which the first thing we need to do is to get a key from the OpenAI website, which we save and use for each request to the neural network.

After getting the key, we need to figure out how to save the context between requests to the network. The wonderful thing about ChatGPT is context preservation. When we can ask "When did Napoleon die?" and the next query is "And Cleopatra?" and the neuron will realize that the second query is also about the date of death. So that each query is not a new query, but preserves the context, let's make a small class to work with dialog:

class GPTConversation:
    def __init__(self):
        self.messages = [{"role": "system", "content": "You are a helpful assistant."}]
        self.last_update = datetime.utcnow()

    def add_message(self, role: str, message: str):
        diff = datetime.utcnow() - self.last_update
        if diff.total_seconds() > 60 * 2:
            print("Reset conversation because context timeout reached")
            self.reset_conversation()

        self.messages.append({
            "role": role,
            "content": message
        })
        self.last_update = datetime.utcnow()

    def reset_conversation(self):
        self.messages = [{"role": "system", "content": "You are a helpful assistant."}]

There are two interesting points here. First, we set a role for the neuron. We say who she is: friend, rapper, astrophysicist. In this case, the default is "Helpful Assistant" :) Then follows the request of the person labeled role: user. And we save the chatgpt response with role: assistant and put it all into a messages array. This will allow us to preserve the context in the requests to neuronka. But the more context, the more tokens the queries use, the more expensive they are. That's why we reset the context if a new request comes more than a couple of minutes after the previous one - it seems logical to me.

messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who won the world series in 2020?"},
    {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
    {"role": "user", "content": "Where was it played?"}
]

Ok, we've got the format of chat requests sorted out, let's deal with the ChatGPT request itself:

async def openai_conversation(messages: list, timeout=30):
    url = "https://api.openai.com/v1/chat/completions"
    headers = {"Content-Type": "application/json", "Authorization": f"Bearer {OPENAI_API_KEY}"}
    data = {
        "model": "gpt-3.5-turbo",
        "max_tokens": 2048,
        "top_p": 0.75,
        "messages": messages
    }
    async with httpx.AsyncClient(timeout=timeout) as client:
        response = await client.post(url, headers=headers, json=data)
        return response.json()

Here we can limit the token limit and play with the top_p parameter, which is responsible for the level of specificity in the neural network's response.

We wrap the neural network's response in our GPTConversation class to preserve the context for possible subsequent requests and send the response as a telegram message. Cool!

gpt = GPTConversation()

async def ChatGPTRequest(prompt: str) -> str:
    gpt.add_message(role="user", message=prompt)
    # sending "user is typing" status:
    await application.bot.sendChatAction(chat_id=CHAT_ID, action=ChatAction.TYPING)
    response = await openai_conversation(gpt.messages)
    if response:
        reply_content = response.choices[0].message.content.strip()
        gpt.add_message(role="assistant", message=reply_content)
        reply_comp = f"{reply_content}\n\n_tokens: {response.usage.total_tokens}, conversation: {len(gpt.messages) - 1}_"
        return reply_comp
    else:
        return "No response from openai"

async def noncommand_handler(update: Update, context: ContextTypes.DEFAULT_TYPE):
    response = await ChatGPTRequest(prompt=update.message.text)
    await update.message.reply_text(response, parse_mode=ParseMode.MARKDOWN)
    return ConversationHandler.END

Voice recognition

OpenAI has a separate neuron that recognizes speech. What we do to send requests by voice:

Save the voice to disk
Convert it from OGG telegram format to a format understandable to neuronka. For example MP3
Send the file to the OpenAI Whisper API for speech recognition.
Send the received text to ChatGPT as a request.

Despite a couple of extra steps: format conversion and speech recognition, it all works quickly and does not cause any discomfort. Speech is recognized with incredible accuracy, even bilingual speech. Voice queries are very fun to type, saving time on typing.

Ok, let's add a handler for voice messages to the bot:

application.add_handler(MessageHandler(filters.VOICE & ~filters.COMMAND, handle_voice))

save and convert the voice message. Send the result to ChatGPT.

from emoji import emojize
from pydub import AudioSegment

async def handle_voice(update: Update, context: CallbackContext = None):
    def ogg_to_mp3(input_file, output_file):
        ogg_audio = AudioSegment.from_file(input_file, format="ogg")
        ogg_audio.export(output_file, format="mp3")

    voice = update.message.voice
    file = await context.bot.get_file(update.message.voice.file_id)
    ogg_path = Path(f"temp/{voice.file_id}.ogg")
    mp3_path = Path(f"temp/{voice.file_id}.mp3")
    result = await file.download(ogg_path)
    ogg_to_mp3(ogg_path, mp3_path)

    result = await openai_transcription(audio_file_path=mp3_path)
    if result['text']:
        reply_comp = f"{emojize(':microphone:')} {result['text']}"
        await update.message.reply_text(reply_comp)
        
        response = await ChatGPTRequest(prompt=result['text'])
        await update.message.reply_text(response, parse_mode=ParseMode.MARKDOWN)
        return ConversationHandler.END

The ability to send requests by voice is a great energy saver, because the easier it is to make a request, the more often we will use it. It saves time and allows us to make more specific queries simply because it is easier to add specifics by voice than to type additional details. And of course it is possible to alternate text queries with voice queries in a dialog, the context between which is not lost.

Let's see where all these neurons will lead us, but for now we are having fun. Good luck to all of us!