Open AI Stream response missing the first token

The streaming response from OpenAI has undergone changes, impacting the handling and parsing of the response. A more robust approach is required to effectively manage this.

const completion = await openai.createChatCompletion({
            model: 'gpt-3.5-turbo',
            messages: [{ role: 'user', content: template }],
            stream: true,
            temperature:0.7
        }, { responseType: 'stream' });

const stream = completion.data;

stream.on('data', (chunk) => {
            const payloads = chunk.toString().split("\n\n");
            for (const payload of payloads) {
                if (payload.includes('[DONE]')) return;
                  console.log(payload)
            }
        });

so the payloads here is the array returned as a chunk from Open ai in a stream, previously the chunks would be

chunk -1

[
  `data: {"id":"chatcmpl-8Jh3daCazSSQaONS2m3x8ThuZ7NXw","object":"chat.completion.chunk","created":1699704437,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"delta":{"content":"Hell"},"finish_reason":null}]}`,
  ''
]

chunk-2


[
  'data: {"id":"chatcmpl-8Jh3daCazSSQaONS2m3x8ThuZ7NXw","object":"chat.completion.chunk","created":1699704437,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"delta":{"content":"o world"},"finish_reason":null}]}',
  ''
]

BUT after openai mode changed the payloads’s elements which are strings are spliced
chunk -1

[
  `data: {"id":"chatcmpl-8Jh3daCazSSQaONS2m3x8ThuZ7NXw","object":"chat.completion.chunk","created":1699704437,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"delta":{"content":"Hell"},"finish_reason":null}]}`,
  ''data: {"id":"chatcmpl-8Jh3daCazSSQaONS2m3x8ThuZ7NXw","o'
]

chunk-2

[
  'bject":"chat.completion.chunk","created":1699704437,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"delta":{"content":"o world"},"finish_reason":null}]}',
  ''
]

so parsing code that you have used to handle the chunks may not be suitable for processing a spliced JSON string.

The point is we need just the content, ignore rest other fields

const regex = /"choices":.*?"delta":\{.*?"content":"(?<newToken>.*?)"/s;
        stream.on('data', (chunk) => {
            const payloads = chunk.toString().split("\n\n");
            for (const payload of payloads) {
                if (payload.includes('[DONE]')) return;
                const matchPattern = regex.exec(payload);
                if (matchPattern && matchPattern.groups.newToken) {
                    try {
                      let chunk = matchPattern.groups.newToken;
                      if (chunk) {
                        console.log(chunk);
                        res.write(chunk);
                      }
                    } catch (error) {
                        console.log(`Error with JSON.parse and ${payload}.\n${error}`);
                    }
                }
            }
        });

const regex = /"choices":.*?"delta":\{.*?"content":"(?<newToken>.*?)"/s;

This regular expression is crafted to extract information from a string that follows a specific pattern. It looks for a substring starting with "choices":, followed by any characters until it encounters "delta":{, and then continues to capture any characters until it finds the substring "content":". The content between "content":" and the next occurrence of " is placed in a named capturing group called newToken.

As I said before we need only the content and the regex does that, ignores all other fields.

Open AI Stream response missing the first token

Comments (0)

Leave a comment