Language has always been seen as the last fortress that characterizes the human being; it is the tool through which we not only communicate, but also build thinking, transmit knowledge, produce culture, understand ourselves, and the world around us. And since the philosopher Aristotle, who described man as a speaking being, this perception has remained firmly established; language is a purely human property, incomparable neither in nature nor in machines with other communication systems.
But this certainty began to undergo successive tremors with the rapid development of linguistic AI models, and with the qualitative leap that the models experienced in 2025, the question no longer revolves around the ability of the machine to simulate, but moved to a more profound area: have algorithms really begun to understand the logic behind the language and decode it as humans do
First, artificial intelligence models from speech simulation to language thinking:
In recent years, huge linguistic models, such as ChatGPT, have demonstrated an outstanding ability to produce natural texts, conduct coherent dialogues, and write in various styles. However, many linguists saw in it only a clever statistical simulation, falling short of real understanding.
For years, linguists have argued that huge linguistic models are just (statistical parrots) that are good at predicting the next word, but cannot completely assimilate deep generative grammar. The famous linguist Noam Chomsky summed up this position in 2023, when, together with other participants, he wrote in the New York Times that “correct interpretations of language are complex, and they cannot be learned by immersion in huge amounts of data alone. According to this proposition, AI models may be good at using language, but they are unable to analyze it in depth and abstractly.
Simply put, Chomsky believes that the accumulation of millions of data in the mind of a machine does not mean that it has come to understand the “geometry of language” as understood by humans.
But it seems that the machine is starting to rebel against these expectations. In a recent study conducted by professor Jasper Bigos, a linguist at the University of California at Berkeley, in collaboration with Maximilian Dabkowski, who received a Ph. D. in linguistics from the same university in 2025, and Ryan Rhodes, an assistant professor at the Rutgers Center for cognitive science, one of the models successfully deciphered invented languages that he had never known before.
During the study, the researchers decided to put artificial intelligence to a real language test, so they subjected some large language models to rigorous language tests, designed specifically to avoid any previous knowledge of the language. instead of common questions, the models were asked to analyze new sentences using classic tools in linguistics, such as synthetic trees, which decompose the sentence into its deep grammatical structure.
The results were both amazing and confusing at the same time, while most of the models faltered, one model, the O1 model from OpenAI, succeeded in analyzing the language with amazing ingenuity, and not only produced sound sentences, but extended to analyzing the deep structure of the language, he was able to map sentences and solve their linguistic complexities just as a graduate student in linguistics would do.
The model was also able to deal with complex linguistic concepts such as (iterative) recurrence – the ability of a language to contain sentences within sentences in an infinite way – which was considered a red line that only the human mind crosses.
According to Jasper Bigos, we are facing not just a technical modernization, but a radical transformation in the rules of the game; today we are “facing an intelligence that is not content with repeating our words, but has begun to understand the philosophy and logic on which the language is built”.
Tom McCoy, a computational linguist at Yale University, also called this study a turning point that came at just the right moment. McCoy explained that the accelerated penetration of artificial intelligence into the details of our daily lives forces us to examine its capabilities in depth, pointing out that linguistic analysis remains the most accurate test to measure the extent to which logical thinking has reached these models compared to the human mind.
Secondly, Super linguistic awareness.. What is it and how did artificial intelligence acquire it
One of the main challenges in subjecting AI models to rigorous language testing is to ensure that they have no prior knowledge of the answers; these models have already devoured most of what has been written online, including the mothers of books and academic references in linguistics. So the challenge for Pegus and his team was: how to make sure that the machine is “thinking” and not just retrieving answers stored in its huge digital memory
To solve this dilemma, Bigos and his colleagues devised a four-part language test. Three of them involved asking the model to analyze sentences designed specifically for this purpose using Tree Diagrams, schemes that were first presented in Chomsky's 1957 reference book entitled Syntactic Structures. Such schemes disassemble sentences into their basic structures, which are actual noun phrases, and then divide them into exact units, which include nouns, verbs, adjectives, adverbs, prepositions, conjunctions, and other components of the grammatical structure.
The fourth part of the test focused on recursion, the magical ability that allows us to build sentences within sentences, like layering Russian dolls (matryoshka). The sentence (the sky is blue) is simple, but the sentence (Mona said that the sky is blue) inserts the original sentence into another, slightly more complex sentence.
This process of repetition can continue indefinitely; a sentence like “Mary wondered; did Yunus know that Omar heard that Mona said that the sky was blue” is a grammatically correct repetitive sentence, which causes mental confusion to the reader or listener.
This process of repetition can continue indefinitely; a sentence like “Mary wondered; did Yunus know that Omar heard that Mona said that the sky was blue” is a grammatically correct repetitive sentence, which causes mental confusion to the reader or listener.
Chomsky, along with a number of linguists, has described recursion as one of the defining intrinsic properties of human language, and perhaps even the hallmark of the human mind itself. According to them, it is this infinite potential that gives human languages the ability to generate an infinite number of possible sentences using a limited vocabulary and fixed grammar rules. To date, there is no convincing scientific evidence that any other being, other than man, possesses this ability in an advanced and systematic way.
It can appear (repetitive) at the beginning or end of a sentence, but its most complex and difficult to master form is what is known as (central insertion) Center Embedding, a linguistic term that refers to putting a phrase inside another phrase at the heart of the same sentence, in a way that doubles the burden of mental processing in humans, and poses a greater challenge to computer systems. This becomes clear when moving from a simple composition like (the cat died) to a more complex formulation like: (the cat bitten by the dog died).
To verify that AI models do not rely on verbatim recall of stored knowledge – such as retrieving familiar examples from linguistics textbooks on which they have been trained – the researchers provided the models with thirty original sentences designed specifically to include complex patterns of repetition. By analyzing these sentences using the syntactic tree, one of the models, the O1 model from OpenAI, was able to determine the syntactic structure of sentences with remarkable accuracy, and he did not stop there, but showed a sharp intelligence in distinguishing between sentences that carry contradictory meanings, intelligently distinguishing between description and action based on their hidden context.
The model not only deciphered the grammar but also succeeded in deriving the phonetic rules of the tested languages, predicting how their words would be pronounced without any previous knowledge of them. This clearly indicates that the model does not retrieve answers from its memory but deduces mathematical and linguistic logic, which governs any communication system presented to it.
Neither Bigos nor his colleagues expected that the study would lead to the disclosure of an artificial intelligence model with a high-level Metalinguistic (metallurgical ability); according to him, it is that Queen that allows its owner not only to use the language, but to contemplate, analyze and think about its being.
This discovery also excited David Mortensen, a computational linguist at Carnegie Mellon University, who saw in the results of the study a resolution to a long debate about the reality of what these models do; do these models simply predict the next word (or linguistic symbol) in a sentence based on probabilities, or do they possess a deep understanding similar to that of us humans Mortensen emphasizes that this study has settled the debate in favor of the machine; while many linguists have argued that these models do not actually practice language, the results came to refute those doubts and open a new chapter in the story of the relationship between man and machine.
Third, how far can these linguistic models go
The results of this study raise an important existential question: how far can these models go on their journey Is the secret of superiority simply in the language of numbers, that is, by collecting more training data and doubling computing power, or is the language the fruit of a unique evolutionary path, formulated in the corridors of biological history to be the exclusive property of our species alone
The results of this study have shown that, in principle, such models have become capable of conducting advanced linguistic analysis; however, no model has yet created something original, nor has not revealed a new truth about language beyond what humans have already come up with.
And here the researchers are divided, Pegus believes that if the improvement of artificial intelligence is related to the volume of training data and processing speed, then its overtaking of the human mind is only an inevitable fate.
David Mortensen adopts a more conservative view, noting that current models remain constrained by the nature of their primary task; they have been trained to perform a specific function, which is to predict the next symbol or word depending on the previous context, making it difficult to generalize and create contexts outside the boundaries of what they have been trained.
Mortensen, however, doesn't seem optimistic that the gap will last forever; he doesn't see a barrier preventing a machine from having an understanding of our language that exceeds our own. It is only a matter of time before we build more creative models, capable of deriving wisdom from less data, and in a more liberal way.
Fourth, the outlook for 2026:
While we say goodbye to 2025 with unprecedented technical victories in natural language processing, it seems that next year will be not just be a repetition of the above, but the year of creative popularization. If 2025 has proved that the machine can disassemble the logic of the language, then the forecast indicates that 2026 will witness a shift in the course of development, based on three main directions, namely:
Learning from scarce data: the next challenge is to free the machine from the limitations of big data; we are looking to build models with logical intuition that enable it to learn a new language or a rare dialect from limited examples, in a way that mimics the way a human child acquires his first language.
Bridging the common sense gap: large companies will seek to integrate General models with linguistic models, to enable artificial intelligence to understand the physical and social dimensions, expressed by words, reducing the cases of linguistic hallucinations.

