What do you mean by speak? That word is doing a lot of work. Chatgpt can respond to a prompt, but speak seems a bridge too far. It performs statistical analysis.


I'm not sure what i would call its actions, if not speech though. It does not think independently, but it does pass information. I guess it computes an answer.

February 19, 2024 at 11:35 PM


"Cells are not alive, they just execute chemical reactions"

"Computers can't actually do calculations, they're just a bunch of circuits"

"Cars don't actually move, they're just a heat engine"

"Chat GPT doesn't speak and understand natural language, it just performs statistical analysis"


Eppur si muove.


We define cells as being alive.

We use computers to do maths.

Cars are used to travel from A to B.

And an LLM can accept instructions in a natural language, carry them out, and report back in natural language.

February 20, 2024 at 1:31 AM


Maybe the word is technically accurate.


Alive is also a word with a lot of ambiguity associated. I'm not sure we have a clear definition. If "I think therefore I am" is the requirement, gpts do not qualify, as they do not act independently.


Definition 1) To produce words by means of sounds; talk. - gpt passes

Definition 2) To express thoughts or feelings to convey information in speech or writing. - gpt fails


You used quotes, but I never said most of those things.

February 20, 2024 at 1:34 AM


If you lazily googled it the same way I did, you would have also found:

Definition 3) To convey information or ideas in text. (gpt passes this one)

By the way, dictionary definitions are often very fuzzy, so this is not really any sort of great evidence for anything. More me returning your lackadaisical volley with the same vim and vigor as I received it. ;-)

I bet you can do better!

February 20, 2024 at 1:53 AM


Why is it lazy? Should I have used a paper dictionary? Would that be hard work as opposed to lazy?

I didn't bother to include the third definition as I'm not sure how it is different than either of the first two.

So what if it passes the third?

February 20, 2024 at 1:59 AM


Because the third definition is the strongest plausible interpretation of the original statement (that can be found in that particular dictionary at least). [1]

And: selectively leaving out part of a quote that appears to (partially) contradict your thesis might be seen by some as an attempt to mislead. Best to err on the side of caution!

Though -like I said- dictionaries are possibly the weakest source of definitions you can find. Dictionaries need to keep things extremely short, so they're unlikely to have much nuance.

Even Encyclopedias (including Wikipedia) are more accurate and reliable, simply because they have more space to expand on a concept in detail.

Books or papers are best.

[1] "Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith. "

February 20, 2024 at 2:09 AM


Ah, yes, I found it ridiculous the first definition formally excluded everything but sounds, and in my mind interpreted it to include text. That is an error.


i.e. "To produce words by means of sounds or text." Whoops.

February 20, 2024 at 2:24 AM


The best kind of accurate

February 20, 2024 at 1:36 AM


Except not really.

February 20, 2024 at 1:39 AM


It communicates in English. It can literally "speak" if connected to a voice interface, of course. Are you trying to miss the point, or do you not believe that ChatGPT can fluently communicate in English without requiring additional input modes for communication?

You clearly have a point you'd like to make, but I don't think claiming that ChatGPT can't "speak" English is a effective way to make it. It can.

February 20, 2024 at 12:48 AM


To me speaking requires independent thought and motive, which gpt's do not have. If I as a human want to express myself, I am capable of that. Gpt's only respond with a series of plausible tokens.

Someone mentioned in fine arts they make a distinction between craft and art, which is an excellent point.



If all it takes to "speak" is vocalizing sounds, does text to speech count?


I guess it's in the name, text to "speech." Again, gpts do not have independent agency.

February 20, 2024 at 12:51 AM


What word would you to describe what ChatGPT is doing then? Would "mimic-ing speech" suffice, for you?

GPT responds in a "series of plausible tokens" that would pass as fluent English if generated by a human.

Let's put aside the vocalization point, that's a distraction.

February 20, 2024 at 1:29 AM


I think compute is a more appropriate term, but your right it's certainly an awkward phrase. It pains me when people compare gpts to humans - we're incredibly more complex and advanced.

February 20, 2024 at 1:33 AM


It generates text.

February 20, 2024 at 3:51 AM


So does the program

    while True:
GPT-n (not chat) generates continuations to text in which the probability distribution for the next output it produces matches the distribution of outputs after similar text in the training corpus, by most metrics you might examine. Trivial examples of those metrics include word and n-gram frequencies, but also it turns out that at sufficiently low loss that looks like "given a couple of input texts and a context which affords insightful commentary, produce insightful commentary at the base rate".

There are, of course, caveats. Notably:

- ChatGPT has its own stuff going on around chat tuning, tool use, tuning to be less likely to produce outputs OpenAI doesn't want, etc

- The statistical regularity thing is not magic. If it requires more than n_layers steps of computation to determine the sensible next output, the model will not be able to do that. I think the canonical example here is usually having the model complete something like "'d072c916029965a7676da4244160c413e31bc8a0' == sha1('I saw it on hn'); '265149165fcb742a900a44b8f123885dc6ac5d12' == sha1('" and then having the model brute-force sha1 -- obviously it's not going to be able to do that.

- The model generates text sequentially, one token at a time. This is importantly not the process by which most text in the training corpus was written. So in the cases where earlier text importantly depends on text which was written earlier temporally, but which occurs later in the string, the model will be likely to make mistakes (where a "mistake" is "writing text which is statistically surprising, relative to the training corpus").

February 20, 2024 at 5:28 AM


Is there not a way for me to express the crucial difference between non-language character repetition (or repetition of any string) and the ability or the ability to interpret and respond to human language in the same language?

I just feel like we're not in a position to even begin understanding our disagreement until you at least recognize the question or point I'm trying to get here. If you disagree with the question, or don't understand it: why?

February 21, 2024 at 3:50 AM


I don't think that's where the difference lies, no. Sampling from a markov model built from English n-gram frequencies will produce non-repeated, mostly grammatical English text, which frequently is even sensible-sounding once n >= 4 or so.

But I don't think there's anything going on in language generation beyond "based on the current context, produce an appropriate output token for that context based on the observed and inferred distribution of training inputs". I think that's also how human language generation works, though "the context" for humans includes a lot more than just a few thousand words of text. But I think that the surprising thing about e.g. GPT-4 is how well it does the thing, rather than the fact that it does the thing at all.

February 21, 2024 at 6:14 AM


I mean you can have a conversation with it in English. You can use slang, idioms, misspellings, invalid grammar, and it will track you just fine. This was a sci-fi dream a decade ago and now it’s reality.

February 19, 2024 at 11:44 PM


The dream is for intelligence, not just a parrot.

February 20, 2024 at 1:03 AM


If intelligence is your personal dream, that's fine. The statement -however- was "it speaks fluent English."

I think several LLMs have sufficiently demonstrated the ability to communicate concepts using natural language to be able to say that the statement is true.

If you tie an LLM to underlying software (and thus also ultimately hardware, if desired), it can take your instructions in natural language, translate them into a form suitable for the software to process, and then take the output and render that back into intelligible natural language.

On that basis, I would argue that the statement "it speaks fluent English" is essentially correct.

If you assert "but that has nothing to do with intelligence", you may or may not be correct, but -combining personal empirical observation with your world model- , intelligence would then appear to be orthogonal to the ability to speak English.

Either that, or there is a flaw in your world model.

Whatever the case may be, LLMs are quite evidently capable of communicating in English.

February 20, 2024 at 1:43 AM


Yes, and I don't disagree it communicates. I even went so far as to look up the definition of the word speak, and there are two that are relevant. One is a mechanical interpretation, producing english words, and the other is about feelings and thoughts. gpts qualify under the mechanical definition, but not the thoughts and feelings one.


And the dream isn't one of mine, the parent references "sci fi dream."

February 20, 2024 at 1:50 AM


Ok, if we agree it communicates in natural language, we're pretty much in agreement.

Are we still arguing whether "able to speak english" is or is not a subset of "able to communicate in natural language", or are we in agreement in our entirety now?

And there are many "sci fi dream"s of course, one of which is/was for computers to be able to communicate in natural language. (See eg: star trek, iron man)

February 20, 2024 at 2:00 AM


There's a third option! We agree gpts do speak in the mechanical sense, but not in the sense of sharing thoughts or feelings. I often think of the "Fool's Choice" from "Crucial Conversations:" saying nothing, or being emotional, when the best path is the middle ground, a thoughtful response. Something I am still working on!

February 20, 2024 at 2:19 AM


I will reserve judgement on Sora until I actually get to use it.

Anything less than Midjourney for video will be hugely disappointing.

I still love chatGPT but my use is so limited compared to what it was 9 months ago. Then I think about how a language model could help with basically anything I have ever been paid to do and the reality is I don't think it would be much help. Then if we go back to college/school I wouldn't have used it to learn more. I would have used it to do less work and probably learned less.

February 20, 2024 at 2:34 AM