One of the Wolfram/Chatbook developers here. I'm happy to answer questions or ac...

ComplexSystems · on June 9, 2023

What is the chat context length? Or, to be precise:

1. How long can a single chat input be? 2. If I am using a chat notebook with several previous inputs, when entering some new input below all of those, is it just appending all previous inputs together and sending the entire thing to the LLM? If so, how long can this be? Will it summarize if it exceeds that length? 3. What model is this using? GPT-3.5 or GPT-4 or something else? 4. What plans do you have next to integrate AI into Wolfram products?

firtoz · on June 10, 2023

https://github.com/WolframResearch/Chatbook/blob/ba3e783602f... should help answer some of those - I am on a mobile device so hard to write much, but, OpenAI 3.5 at least

connordg · on June 10, 2023

1, 2. In the Chat Notebooks implementation, we aren't currently imposing any limits of our own. Currently we support OpenAI's API, and the only limits imposed are those inherent to the model being used (i.e. GPT-3.5 or GPT-4).

When you have a sequence of alternating 'ChatInput' and 'ChatOutput' cells, those are sent in-order as individual, respectively, "role" -> "user" and "role" -> "assistant" messages to the chat API. It doesn't, for example, concatenate them into a single message.

The one exception is Chatbook's implementation of Personas, which are defined (roughly speaking) as a "system" message that gets prepended as the first message in the chat conversation on each ChatInput evaluation, and which in some cases is concatenated with other ChatInput cells that have been set to "role" -> "system".

Chatbook's don't currently do automatic behind-the-scenes summarization, but we've experimented with some possible approaches. There are UI and and UX considerations around how we present that possibility of automatic summarization that I think require greater consideration.

3. Currently we support the OpenAI Chat API, and all models supported by that API. However, Chatbook currently only implements quick UI access to the GPT-3.5 and GPT-4 models. If you want to use a different OpenAI model in a particular notebook, you can set the relevant setting by evaluating:

    CurrentValue[
        EvaluationNotebook[],
        {TaggingRules, "ChatNotebookSettings", "Model"}
    ] = "gpt-3.5-turbo-0301"

Now all subsequent 'ChatInput' evaluations in that notebook will inherit that "Model" setting (which can itself be overridden on a per-cell basis).

Our focus has been on flexible and easily user-extensible support for Personas (for example, developing the Prompt Repository[1]), but we're also planning to add built-in support for additional foundation models, and greater customizability so that new models are easy to add and use from the Chat Notebooks interface.

[1]: https://resources.wolframcloud.com/PromptRepository/

4. We've got a wide range of exciting projects coming down our R&D pipeline, and things are changing rapidly.

In some sense more fundamental that Chat functionality in notebooks is our work integrating LLM-related functionality more broadly into the Wolfram language: e.g. LLMFunction[..], LLMSynthesize[..], LLMPrompt[..], LLMTool[..] (all part of the Wolfram/LLMFunctions paclet[2]), where we're putting the power of LLMs directly in the hands of Wolfram users.

Making LLMs "just" a feature area added to the many existing powerful capabilities of Wolfram is making it easy for ourselves and our users to maximally leverage the new capabilities LLMs provide. So stay tuned, more is definitely coming :)

[2]: https://paclets.com/Wolfram/LLMFunctions

SkyMarshal · on June 9, 2023

What is your sense on how well current AI models can handle mathematics, vs human language? Is hallucinating or inventing info a problem with math too? Do you guys do anything to minimize, mitigate, or solve it?

And do you think current AI models will be able to solve as yet unsolved problems in math and physics, perhaps with increasingly large datasets or augments like Trees of Thought or process supervision? Or do you think it will require a different fundamental model?

easygenes · on June 10, 2023

As someone who has been using GPT-4 and Claude working with Jupyter notebooks the last month, I’ll say that sometimes I am surprised by how well they can understand and segment out a problem. Other times it is obvious I’m asking too much of them, because no amount of clever reframing gets them to a right answer.

A surprise win was as much luck I had talking through an issue and getting correct, functional one-shot time grid and 3D plots including extra complications for the calculation of the axes.

A recent lose was trying to get it to generate a script that could optimize the gravity coefficient in the Reddit algorithm based on target timeframes and historic data. It offered salvageable output, if you pieced together several attempts and reframes, but never pulled it over the finish line.

visarga · on June 10, 2023

Not even humans with PhD can do physics research without a lab. How could a model with just a few tokens of context window and no body do that?

ElFitz · on June 10, 2023

What would should have a lab have, today, in the context of research on topics like string theory? That a simple office with a desk would not have, I mean.

Genuinely curious.

Zondartul · on June 10, 2023

Most of researchers' work is with formulas and reading articles. Even if you had a lab, you'd still need to think long and hard about what exactly you are trying to measure, and then design an experiment on paper before it can be carried out.

So it's possible, in theory, that an AI-in-a-box could do the hardest parts of science aside from experiments. Whether modern LLMs are any good at it is a different question.

SkyMarshal · on June 10, 2023

Obviously I’m talking about mathematical/computational physics, not experimental physics. I don’t expect the AI to build itself a tokamak to run experiments on.

TOGoS · on June 10, 2023

> There’s another important difference too. In an ordinary Notebook there’s a “temporal thread of evaluation” in which inputs and outputs appear in the sequence they’re generated in time

I've always found this kinda confusing, and the "evaluate everything above" approach used by the LLM cells much more intuitive, but having the two types of cells behave so differently brings it back to confusing and asking for trouble. Do you think it would make sense to have an option to force all cells to work that way?

theodoregray · on June 10, 2023

Hoo boy that's a long and complicated question. I think there might actually be an option you can set on cell groups, or perhaps at the notebook level, to force re-evaluation of everything above every time you evaluate a cell (which would be the closest one could do to implementing the same behavior for kernel evaluations as for chat evaluations).

Why do I, who would have been the one to implement such a feature, not even remember whether it exists or not? Because if it does, it's horribly useless. It's just rarely what you actually want, and if you do want to do that, you can simply select the cell group, or command-A select the whole notebook, and use shift-return to evaluate everything. That's something I do regularly, but only when I need to, and it would be horrible if it tried to do that every time. I think it was Maple that tried such an interface. (Except their's was 2-dimensional: you could move equations/definitions around on the screen, and anything below or to the right would use definitions from above or to the left of it. This interface was one of the things that convinced me that this was a bad idea.)

The situation with GPTs is different: they do not have the equivalent of kernel state. It took me a while to believe it's actually true, but they have infinite amnesia: every time you send an API call, or submit a new prompt in their web interface, the model is starting completely from scratch with no state or memory whatsoever of anything that's happened before.

That fact that it looks like you're having an interactive conversation in which it knowns what you said before is an illusion created by the fact that the entire conversation all the way back to the start is re-submitted with every new prompt.

Given that reality, and the way people seem to want to interact with these things, the use of a physical context rather than a temporal one seemed most natural. We did talk about implementing our own temporal context mechanism to artificially make the LLM appear to have time-based internal state, like a kernel, but I personally hated the idea, and we never implemented it.

Now, it's possible that at some point these systems will in fact start having internal state (i.e. you have your own personal LLM running on a wad of gum stuck behind your ear, and as you interact with it it constantly fine tunes its weights, or something). At that point we might regret the decision, but I feel like this whole world is so unstable and rapidly evolving that it made sense for us to do what's right now, and worry about the future, if there is one, during that future.

Theodore

lurker137 · on June 10, 2023

Will the notebook feature allow branching out from the linear chain of chats or even reverting the chain to an earlier point? It wasn't clear from the post

connordg · on June 10, 2023

There isn't a screenshot of this in Stephen's blog, but he does mention[1] the ability to evaluate a single 'ChatInput' multiple times, which preserves all previous evaluations and allows you to tab through them to pick the best one. (You can see a screenshot of what the arrow tab switcher looks like on the left side of this[2] example in the Chatbook README.md).

One of the nice advantages of a notebook interface to LLMs is that you can easily go back and edit and rearrange the chat history in the normal ways that any cell in a notebook can be edited, moved, deleted, etc.

So e.g. if you want to go back to an earlier point, you can simply move your cursor between cells at an earlier position in the "physical" chat history, insert a new chat cell, and you're done.

[1]: https://writings.stephenwolfram.com/2023/06/introducing-chat...

[2]: https://github.com/WolframResearch/Chatbook#generate-immedia...

lurker137 · on June 11, 2023

That's fantastic, it seems like such an essential feature that should have been part of the regular OpenAi UI on launch

jerpint · on June 9, 2023

What LLM are you using in the backend? Is this an openAI model or a model trained by wolfram?

TeMPOraL · on June 9, 2023

With the article mentioning you need OpenAI API key, and section about personas having a screenshot of a configuration window offering you a choice between GPT-3.5 and GPT-4, I'm going to guess that this version uses, by default, the usual GPT models via OpenAI API.

digdugdirk · on June 9, 2023

Can this be released as a plain ipynb add-in/extension?

easygenes · on June 9, 2023

That’s already a thing, and works rather well: https://github.com/jupyterlab/jupyter-ai

The repo README doesn’t make it obvious how it works, but if you go to the Chat Interface portion of the docs it’s illustrated: https://jupyter-ai.readthedocs.io/en/latest/users/index.html...