I'm happy to answer questions or accept feedback about the new functionality.
We're very excited about the potential of Wolfram technology + LLMs, and we've got a number of interesting projects underway in this area. Stephen's other recent blog posts linked at the top of the Chat Notebooks post provide a nice tour.
The Wolfram/Chatbook[1] package mentioned in the post is freely available for any Wolfram 13.2 users. It's also open source and available on GitHub[2].
What is the chat context length? Or, to be precise:
1. How long can a single chat input be?
2. If I am using a chat notebook with several previous inputs, when entering some new input below all of those, is it just appending all previous inputs together and sending the entire thing to the LLM? If so, how long can this be? Will it summarize if it exceeds that length?
3. What model is this using? GPT-3.5 or GPT-4 or something else?
4. What plans do you have next to integrate AI into Wolfram products?
1, 2. In the Chat Notebooks implementation, we aren't currently imposing any limits of our own. Currently we support OpenAI's API, and the only limits imposed are those inherent to the model being used (i.e. GPT-3.5 or GPT-4).
When you have a sequence of alternating 'ChatInput' and 'ChatOutput' cells, those are sent in-order as individual, respectively, "role" -> "user" and "role" -> "assistant" messages to the chat API. It doesn't, for example, concatenate them into a single message.
The one exception is Chatbook's implementation of Personas, which are defined (roughly speaking) as a "system" message that gets prepended as the first message in the chat conversation on each ChatInput evaluation, and which in some cases is concatenated with other ChatInput cells that have been set to "role" -> "system".
Chatbook's don't currently do automatic behind-the-scenes summarization, but we've experimented with some possible approaches. There are UI and and UX considerations around how we present that possibility of automatic summarization that I think require greater consideration.
3. Currently we support the OpenAI Chat API, and all models supported by that API. However, Chatbook currently only implements quick UI access to the GPT-3.5 and GPT-4 models. If you want to use a different OpenAI model in a particular notebook, you can set the relevant setting by evaluating:
Now all subsequent 'ChatInput' evaluations in that notebook will inherit that "Model" setting (which can itself be overridden on a per-cell basis).
Our focus has been on flexible and easily user-extensible support for Personas (for example, developing the Prompt Repository[1]), but we're also planning to add built-in support for additional foundation models, and greater customizability so that new models are easy to add and use from the Chat Notebooks interface.
4. We've got a wide range of exciting projects coming down our R&D pipeline, and things are changing rapidly.
In some sense more fundamental that Chat functionality in notebooks is our work integrating LLM-related functionality more broadly into the Wolfram language: e.g. LLMFunction[..], LLMSynthesize[..], LLMPrompt[..], LLMTool[..] (all part of the Wolfram/LLMFunctions paclet[2]), where we're putting the power of LLMs directly in the hands of Wolfram users.
Making LLMs "just" a feature area added to the many existing powerful capabilities of Wolfram is making it easy for ourselves and our users to maximally leverage the new capabilities LLMs provide. So stay tuned, more is definitely coming :)
What is your sense on how well current AI models can handle mathematics, vs human language? Is hallucinating or inventing info a problem with math too? Do you guys do anything to minimize, mitigate, or solve it?
And do you think current AI models will be able to solve as yet unsolved problems in math and physics, perhaps with increasingly large datasets or augments like Trees of Thought or process supervision? Or do you think it will require a different fundamental model?
As someone who has been using GPT-4 and Claude working with Jupyter notebooks the last month, I’ll say that sometimes I am surprised by how well they can understand and segment out a problem. Other times it is obvious I’m asking too much of them, because no amount of clever reframing gets them to a right answer.
A surprise win was as much luck I had talking through an issue and getting correct, functional one-shot time grid and 3D plots including extra complications for the calculation of the axes.
A recent lose was trying to get it to generate a script that could optimize the gravity coefficient in the Reddit algorithm based on target timeframes and historic data. It offered salvageable output, if you pieced together several attempts and reframes, but never pulled it over the finish line.
What would should have a lab have, today, in the context of research on topics like string theory? That a simple office with a desk would not have, I mean.
Most of researchers' work is with formulas and reading articles. Even if you had a lab, you'd still need to think long and hard about what exactly you are trying to measure, and then design an experiment on paper before it can be carried out.
So it's possible, in theory, that an AI-in-a-box could do the hardest parts of science aside from experiments. Whether modern LLMs are any good at it is a different question.
Obviously I’m talking about mathematical/computational physics, not experimental physics. I don’t expect the AI to build itself a tokamak to run experiments on.
> There’s another important difference too. In an ordinary Notebook there’s a “temporal thread of evaluation” in which inputs and outputs appear in the sequence they’re generated in time
I've always found this kinda confusing, and the "evaluate everything above" approach used by the LLM cells much more intuitive, but having the two types of cells behave so differently brings it back to confusing and asking for trouble. Do you think it would make sense to have an option to force all cells to work that way?
Hoo boy that's a long and complicated question. I think there might actually be an option you can set on cell groups, or perhaps at the notebook level, to force re-evaluation of everything above every time you evaluate a cell (which would be the closest one could do to implementing the same behavior for kernel evaluations as for chat evaluations).
Why do I, who would have been the one to implement such a feature, not even remember whether it exists or not? Because if it does, it's horribly useless. It's just rarely what you actually want, and if you do want to do that, you can simply select the cell group, or command-A select the whole notebook, and use shift-return to evaluate everything. That's something I do regularly, but only when I need to, and it would be horrible if it tried to do that every time. I think it was Maple that tried such an interface. (Except their's was 2-dimensional: you could move equations/definitions around on the screen, and anything below or to the right would use definitions from above or to the left of it. This interface was one of the things that convinced me that this was a bad idea.)
The situation with GPTs is different: they do not have the equivalent of kernel state. It took me a while to believe it's actually true, but they have infinite amnesia: every time you send an API call, or submit a new prompt in their web interface, the model is starting completely from scratch with no state or memory whatsoever of anything that's happened before.
That fact that it looks like you're having an interactive conversation in which it knowns what you said before is an illusion created by the fact that the entire conversation all the way back to the start is re-submitted with every new prompt.
Given that reality, and the way people seem to want to interact with these things, the use of a physical context rather than a temporal one seemed most natural. We did talk about implementing our own temporal context mechanism to artificially make the LLM appear to have time-based internal state, like a kernel, but I personally hated the idea, and we never implemented it.
Now, it's possible that at some point these systems will in fact start having internal state (i.e. you have your own personal LLM running on a wad of gum stuck behind your ear, and as you interact with it it constantly fine tunes its weights, or something). At that point we might regret the decision, but I feel like this whole world is so unstable and rapidly evolving that it made sense for us to do what's right now, and worry about the future, if there is one, during that future.
Will the notebook feature allow branching out from the linear chain of chats or even reverting the chain to an earlier point? It wasn't clear from the post
There isn't a screenshot of this in Stephen's blog, but he does mention[1] the ability to evaluate a single 'ChatInput' multiple times, which preserves all previous evaluations and allows you to tab through them to pick the best one. (You can see a screenshot of what the arrow tab switcher looks like on the left side of this[2] example in the Chatbook README.md).
One of the nice advantages of a notebook interface to LLMs is that you can easily go back and edit and rearrange the chat history in the normal ways that any cell in a notebook can be edited, moved, deleted, etc.
So e.g. if you want to go back to an earlier point, you can simply move your cursor between cells at an earlier position in the "physical" chat history, insert a new chat cell, and you're done.
With the article mentioning you need OpenAI API key, and section about personas having a screenshot of a configuration window offering you a choice between GPT-3.5 and GPT-4, I'm going to guess that this version uses, by default, the usual GPT models via OpenAI API.
While I think his notebooks and features look pretty useful, he frames the article around having invented "notebooks" 36 years ago before anyone else, and talks about other notebooks not having features that they've had since 1987. This is a pretty odd context to basically describe a new chat feature.
Is this a record that should be set straight?
I've used Jupyter since it was IPython notebook, but I don't think that community claims to be the first coming of notebooks. The accessibility of python along with the breadth and depth of the scipy community makes it a quite a tour de force. So perhaps these articles are aimed at people who only use open source tools.
It's Wolfram. If they didn't claim to invent something fundamental in the first paragraph, I'd immediately assume the site was hacked or it's a poorly timed April fools joke written by an intern.
Yes, he is a narcissist. Smart guy, cool products, but it's just like how Trump cannot talk about anything without shoehorning in a statement like "I got more votes than any sitting president ever" when being asked about something completely unrelated.
My favorite example is when Caltech put out a press release about a young woman got her PhD and nearly beat Wolfram's record for youngest PhD from Caltech. He felt immediately compelled to write a long winded several-thousand word essay about how glad he is to still have the record for being the youngest PhD, by a full two weeks! And also, he wanted to remind everyone that he knew Richard Feynman. Cringe...
What's the predecessor with a notebook interface from before 1987? I'm not aware of one. Wolfram loves to claim he invented stuff but this one might actually be true.
The Wikipedia article suggests a contemporary in 1987 (MathCAD) but I'm not seeing a Mathematica/Jupyter-style notebook interface before that year. That would seem to have been the year of the notebook interface, and Mathematica 1.0 did come out that year with one. As far as Wolfram claims go, this one doesn't seem too offensive.
It's a fair claim if you make it for Wolfram Research rather than Stephen Wolfram personally (which is all he ever does).
Jupyter notebooks were inspired by Mathematica notebooks: that's widely known and has been explicitly said in blog posts and elsewhere by the creators of Jupyter. It's no secret… And as the designer and developer of Mathematica notebooks I am flattered and honored that my ideas are now so widely used, even if it is in a product other than the one I wrote myself.
Time for a trip down memory lane. There were several notebook-adjacent products around at that time (1987) but none of them implemented notebooks as we know them today.
At the time I was using Xcode, which had a terminal-like interface that was a plain text document in which you could put all the shell commands you needed to build your app. This could include calls to the compiler, linker, or just any generic shell commands. To evaluate a given command (which could be one or more lines long) you had to manually select the exact range of text you wanted to run, which was kind of irritating. Any output would be placed directly below the input line. Because the document was pure plain text, it had no way of knowing what was input and what was output, or distinguish new output from old. So all the outputs accumulated, and you had to manually select and delete them periodically. It was an improvement over a glass teletype terminal interface, but I thought I could do better for Mathematica (code name Omega at the time).
The solution I came up with was to create a text document that was annotated with “zones” delimited by vertical bars along the right side of the screen (where cell brackets are now). Initially I used different black and white texture patterns on 4 or 5 pixel wide vertical bars. One texture for input and another texture for output.
This immediately gave two huge advantages: first, you didn’t have to manually select the range of a multi-line input. Just putting the cursor anywhere within an input zone and evaluating would evaluate the whole input. Second, because the system was keeping track of what was output, old output could be deleted automatically and replaced with new output.
Initially I called these things zones, but Steve Jobs suggested I use large close-square-bracket symbols instead of patterned bars, and think of them more like cells in a spreadsheet than like zones in a text document. Once you have brackets instead of bars, it’s immediately obvious that you could have larger brackets that encompass two or more cells. So I created cell groups, initially just to group input/output pairs, but of course it’s not a big leap to also have sections and subsections, then different cell types, styles, graphics cells, etc.
These key defining features of notebooks did not exist in any other system at the time, and to the best of my knowledge were not independently developed by other people. All the current notebook systems that share these broad features, definitely including Jupyter notebooks, derive from the original Mathematica notebooks I developed in 1987-8. (There are other styles of user interface, including some used by Maple and Matlab around the same time, that are called notebooks, but they did not share the key features I’ve described. They were and are fine products that I’m sure have both advantages and disadvantages compared to our Notebooks, but unlike our Notebooks, they are not the precursors of Jupyter.)
So can Stephen Wolfram personally claim to have invented notebooks? No, and he doesn’t: he claims that Wolfram Research did, which is true, since I was one of the co-founders of that company, and I invented that form of notebook.
Likewise. Especially when it's liberally sprinkled with his catch phrase:
"A new kind of..."
It's unfortunate as clearly he has a huge amount to contribute. Maybe with an LLM someone can build a de-Stepheniser that takes in his pompous, smug text and outputs something more mellow and reasonable whilst keeping all the facts, which is what we are there for after all.
Of course it's unnecessary, but I suspect there's at least some substance to this claim. If it were anyone else, a harmless plug like this would be accepted by most, might even be an interesting nugget a reader would appreciate. The issue is the style of claim (being the originator) is dropped far more than it should be.
I'm personally a fan of many ideas Wolfram pushes, just not a fan of the style they're pushed. I'd say I'm not in the minority, to the point it's a tired cliche.
I was always under the impression that Donald Knuth “invented” notebooks in 1984¹. Maybe Mathematica was the first to implement it, but that’s not the impression you get from the article.
AFAICT, literate programming does not imply interactive workflow. Also, Mathematica notebooks are extremely flexible thanks to its lisp-like language, and it's not really about writing docs and code. You can draw diagrams, format texts and paragraphs, and embed dynamic/interactive elements. It's indeed the first and the foremost in its direction.
But also, even if it's true, it's kind of an offputting way to start the article. I know that's his thing and I know it's discussed every time one of these is posted here so apologies for the repetitive observation.
> According to Stephen Wolfram: "The idea of a notebook is to have an interactive document that freely mixes code, results, graphics, text and everything else.
By that description Smalltalk with its workspace goes back to the 70s. (I don't know if there's anything closer to the Mathematica notebooks from that era.)
The notebook interface of Mathematica is simply unique. It is based on a Lisp-like language (Wolfram language), and fully exploits everything that Lisp can offer (and Lisp can get weird infinitely). It provides an unmatched level of freedom and versatility to the end-user. Wolfram is clearly the first one who built a end-user interface this much powerful, and nothing literally comes close, even Jupyter.
So, yeah, seeing that Jupyter is being used more broadly than Mathematica, Wolfram should feel uncomfortable. (Also note that Jupyter is officially inspired by Mathematica.)
> I've used Jupyter since it was IPython notebook, but I don't think that community claims to be the first coming of notebooks.
My recollection (which could be wrong, it's been a while) is that SageMath’s notebook preceeded iPython notebooks. IIRC iPython started as a CLI repl, which may have come before the notebook.
(Mathematica came before all of them, and likely inspired SageMath’s interface)
Tom Boothby, Alex Clemesha and I wrote the first web based notebook interface in 2006, which was indeed the Sage Notebook. It was very painful at the time - jquery wasn’t even a thing yet! It was inspired by Mathematica notebooks, mainly because Alex was a heavy Mathematica user when he was a physics undergrad at UCSD. Similarly, Fernando Perez’s Ipython CLI was also inspired by Mathematica notebooks. I personally didn’t use Mathematica notebooks a lot, and was even more inspired by various clever Emacs modes and also Mathcad, and by the first version of Google docs. In any case, we were all inspired by Mathematica notebooks!
Speaking of which, has anyone made an open source pretty printed input interface for sage or do I have to keep flogging my old Mathematica 7 license for a few more years?
This doesn't crack my list of the top 1000 programming debates, but it's not controversial to say that given that literate programming, the Smalltalk IDE, and the REPL were all around before Mathematica notebooks, they didn't bring any new ideas to the world. It may still have been an innovation to create a remix of ideas borrowed from others.
Community may not claim to be the first with notebooks; but they certainly seem to think they have something nobody else has with it. Per their website, "The Jupyter Notebook is the original web application for creating and sharing computational documents. It offers a simple, streamlined, document-centric experience."
Which, I suppose I can cede that they are among the first "web applications" that do this. But I have a hard time really getting behind a ton of the discussion on it. Even talking about the "breadth and depth" of part of the python community is to ignore how much had been done for years in the mathematica community.
They do say what you quote with those words, but I think their intention is just trying to clarify the difference between Jupyter classic and JupyterLab. The Sage notebook predates the Jupyter project by at least five years, but was visually and functionally very similar.
To be fair, I'm not so worried on this. Even if Jupyter was what introduced some (many?) folks to notebooks, that is fine.
I was soured as early advocates were pitching it as way more impressive than it is. With some being proud of themselves for realizing you can put a "pip install" at the top of the notebook, thinking that somehow made it repeatable. I recall senior engineers wanting notebooks to have prod access so that you could deploy from a notebook. And thinking they were making a good practice statement. Ugh.
Neat idea, but some of it is made out to be unnecessarily complex and esoteric.
E.g. giving it your name, and now it becomes 'aware' of your name. I guess in normal programming that would be like saying once you declare your variable the program is 'aware' of it. It's not wrong to say that, but unnecessarily makes it more complicated. The tacking of multiple such instances makes the overall post more complicated than needed. The core idea is neat though.
They explain why this is different (and more complicated) - emphasis added:
> Thus, for example, if you evaluate x = 5 in an input cell, then subsequently ask for the value of x, the result will be 5 wherever in the notebook you ask—even if it’s above the “x = 5” cell
And
> successive chat cells are “aware” of what’s in cells above them. But even if we add it later, a cell placed at the top won’t “know about” anything in cells below it.
Surely this is because the OpenAI api is chat based and they need to evaluate that chat in-order. it does seem like an inconvenient interface for programmatic LLM generators if you don’t want a chat UX.
About halfway through this blog post the topic of LLM "personas" is mentioned, and while the given examples are mostly just jokes, the actual use case for that would solve the problem you perceive of this UX, in that you could simply use a persona that is more concise.
But to answer the "why?" - likely because this type of AI is a tool that is going to become universal very rapidly, and the overwhelming majority of the population has little to no understanding of the finer details of the technology. So the default persona has been fine-tuned to include the annoying "as an AI language model [...]" disclaimer very frequently, amongst other disambiguations, couched remarks, and clarifications that a power user would not necessarily require.
I love the concept of notebooks (as referred to in this article). It reminds me of the way HyperCard made it very easy for non-techies to create something that looks great to share out with the world. I'm honestly surprised nobody else has exploited this functionality before!
I just looked up MacIvory (Symbolics). I wish I had worked at place that needed that. I have Luther Johnson's MakerLisp box that I want to boot up and add to my set-up to define symbols for the daily status of a building's systems.
I believe PowerPoint simply buried many better ways of presenting bytesized information chunks to the world. It prioritized the author's laziness over audience comprehension.
> prioritized the author's laziness over audience comprehension.
Is that even possible?
The audience - people in general - are very resistant to having knowledge/ideas shoved into their brains, and I don't blame them, 90% of it is crap.
Authors have to work very hard to overcome this. "Tell them what you're going to tell them, tell them, tell them what you told them", best advice I got on presentations.
If there was anything better the Powerpoint, short of "killing the unbelievers", it would probably have made a mint by now.
What’s nice with it is both the input and output are modifiable text areas, so you can tune the angent’s example responses to be exactly what you intend. And the on disk format is just the message and parameter JSON, so you can tune your prompt in a live notebook then consume it directly from anything that speaks JSON (most bundlers, for example).
Elon dropped out of openai because of differences regarding direction and conflicts with what Tesla is doing. He then secretly started developing sth similar and asked for a.moratorium on research+dev so his secret thing can catch up.
I'm happy to answer questions or accept feedback about the new functionality.
We're very excited about the potential of Wolfram technology + LLMs, and we've got a number of interesting projects underway in this area. Stephen's other recent blog posts linked at the top of the Chat Notebooks post provide a nice tour.
The Wolfram/Chatbook[1] package mentioned in the post is freely available for any Wolfram 13.2 users. It's also open source and available on GitHub[2].
[1]: https://paclets.com/Wolfram/Chatbook
[2]: https://github.com/WolframResearch/Chatbook