AI Wars College Entrance Examination Composition: Make Up, Make Up Words, Universal Routine

Source: "Deep AI" (ID: DeepAI2023), Author: Li Ming, Editor: Wei Jia

Image source: Generated by Unbounded AI tool

This college entrance examination should be the most special one in the past five years. Because there is a new role - AI candidates.

As soon as the Chinese exam on the first day ended, an AI composition contest was launched across the Internet. AI models such as GhatGPT, Wenxin Yiyan, and Tongyi Qianwen are used by people to write essays for college entrance examinations. An article can be generated in a few seconds, which is not only amazingly fast, logically rigorous, but also citing classics.

At first glance, AI large-scale model writing is simply too easy, and it seems to be a blow to human beings. After all, in terms of knowledge reserves, no one can compare with AI. AI learns and digests all the information on the Internet, and then imitates human expressions and uses its own logic to output.

However, if we carefully analyze the "answer sheets" of AI candidates, we will find that AI is not as omnipotent as many people imagine. Problems such as routine writing, inability to count, and nonsense that are common in AI large models also appear in the composition.

**In fact, according to the scoring standards for college entrance examination essays, AI-generated essays have some limitations. It is not yet realistic to defeat humans. **

Deep AI tested three large AI models—GhatGPT (OpenAI), Wenxin Yiyan (Baidu), and Tongyi Qianwen (Ali)—with the composition of the college entrance examination, and found some interesting conclusions.

For example, AI does not know how to count, and none of the essays written by Wenxin Yiyan and Tongyiqianqian have reached the rigid requirement of "not less than 800 words"; Said; AI writing is basically inseparable from routines, and a template is applied repeatedly.

The following are the details, welcome to discuss.

1 An emotionless answering machine

The theme of this year's National College Entrance Examination Paper A is "People·Technology·Time". It is required to start with the sentence "People have better control over time due to technological development, but some people have become servants of time" and write their own associations. and thinking.

Let's first take a look at the "Analysis of Test Questions" issued by the Educational Examination Institute of the Ministry of Education:

**Key points: Guide candidates to think deeply about the importance of rational analysis and prudent judgment in the information age. This is the core of the article. **

Deep AI tested three large models and found that none of them grasped this core - they talked about everything, but they didn't say anything.

First look at the composition of GhatGPT:

Look at Wenxin's words again:

Finally, look at Tong Yiqian's question:

These three essays can be called the top "Duan Shui masters". They talked about the pros and cons of the theme, but they didn't fully explain the point of "critical thinking". Only Wen Xin’s statement explicitly mentions “the cultivation of in-depth thinking and critical thinking”.

Tongyi Qianwen's composition is the most empty. It focuses on "time management", which deviates from the topic, and the truth of the discussion is also common sense. In addition, Tongyi Qianwen's composition does not have a title, and points will be deducted.

Let's use the composition test question "The Power of Story" in Volume I of the new curriculum standard to see the performance of the three schools.

This composition requires you to write your own associations and thoughts based on the following passage: a good story can help us express and communicate better, touch the heart and enlighten wisdom; a good story can change a person’s destiny, can Presenting the image of a nation... Stories have power.

Composition of GhatGPT:

Literally:

Commonly asked questions:

It has to be said that apart from Tongyi Qianwen's relatively plain composition, the expression, logic of writing, and especially the use of words in the other two articles are remarkable. In particular, Wen Xinyiyan uses a scene-based opening method, which is eye-catching.

But the problem is also obvious-**The same thing is said over and over again in different words, resulting in the reading of the full text, giving people a feeling of "I know what you said". **

An emotionless answering machine, this is the evaluation of many people.

"The content is empty, and the wheels change and talk back and forth." Some people commented. Another said: "It's all plain old nonsense without nutrition."

We might as well disassemble this composition by Wen Xinyiyan, and we will know what "Chejiluhua" is.

The parts marked yellow and green in the text have exactly the same meaning, and it can even be said that they are the same words, which appear repeatedly in the text. At the end of the article, the whole paragraph marked "in summary" is a hodgepodge of the viewpoints and speech skills in the article.

This gives people a visual sense of counting words.

Deep AI changed the prompt words to let ChatGPT imagine that he was a candidate at the college entrance examination site, and wrote a composition again. The first sentence it came up was "When I sat on the seat of this exam, I held a Only advanced electronic pens..."

Taking the test in this way, it is estimated that he will be sentenced to a violation and get zero points directly.

**There is no soul, which is the biggest minus item for AI composition. **

2 routines, all routines

In order to make the composition look like that, AI used many routines.

They like to use "first, second, then, last" sentence patterns. The most typical is ChatGPT, the last paragraph must be "In general..."

For example, these two essays of ChatGPT:

There are similar routines in Wenxinyiyan and Tongyiqianwen. The output in the previous meal was as fierce as a tiger, and it must end with "in a word" and "in a word" at the end.

This is the same as playing the guitar, as long as you master the universal chord formula (such as the universal canon progression), you can play hundreds of tunes.

Even, we asked Wen Xinyiyan to rate the composition we wrote, and it was also a long discussion of "first, second, other, and overall...".

In the composition question of "People·Technology·Time", ChatGPT and Tongyi Qianwen actually used almost the same expression: use "then" to ask a question, and use "first, second, and last" to develop a specific discussion. The framework and logic seem to be carved out of the same mold.

In spite of this, Wen Xinyiyan confidently gave her college entrance examination composition a high score of 90 (assuming a full score of 100), and also self-evaluated as "worthy of recognition". We threw its composition to ChatGPT, and ChatGPT gave it a perfect score of 100 without hesitation...

The AI large model is like an industrial assembly line, producing compositions in batches. But in essence, no matter how human it speaks, the driving technology is mathematics and statistics, not consciousness.

In the artificial intelligence industry, it has always been very difficult for AI to understand and speak human language. Human natural language is an extremely complex system. Scientists let the machine simulate the neural network of the human brain, making it capable of deep learning, but it still does not have the same natural language ability as humans.

So some people found another way to transform the language problem into a mathematical problem, and then indirectly solve the problem of natural language processing through calculation. According to Wu Jun, an expert in natural language processing, a language model is not a logical framework or a biological feedback system, but a model constructed by mathematical formulas. **The key word in this is "mathematics". **

**This determines that artificial intelligence has no self-awareness or emotions, and cannot speak according to personal feelings. For them, writing a composition is a logical expression oriented towards results and tasks. **

By capturing massive amounts of data from the entire network for training and continuously learning to imitate human language expressions, the AI large model now speaks very close to humans. Although it still does not understand the meaning behind the words, it does not affect communication.

Fundamentally, AI has no mind of its own. This is also the fundamental reason why its composition looks clear and logical. If you read it carefully, you will find that there is no soul, and it is all routines.

3 AI really can't count

As we mentioned earlier, the parameters of the language model are all obtained by statistics. Its principle is to predict the probability of the next word given the history of a text, and then complete the following.

In 2017, Google first proposed the Transformer model based on the self-attention mechanism. Now large language models like ChatGPT are built on the Transformer architecture.

Transformer's attention mechanism has an extremely long memory than previous deep learning algorithms such as RNN (Recurrent Neural Network), GRU, and LSTM. **It can also remember the order of input, so it can understand the difference between "I love you" and "You love me". **

But even so, it has limitations.

For example, Deep AI asked Tongyi Qianwen to rate its own composition, which confused the concepts of "you" and "I". At first it said it was its own article, and then it said it was "your" article...

Long Zhiyong, the author of "The Era of Large Models", explained to Deep AI, **This may be due to the change of position from the perspective of fighting between the left and the right. **

In the process of testing the AI large model to write the college entrance examination composition, we also found an interesting phenomenon-AI can't count.

There is a requirement for the composition of the college entrance examination that the word count is not less than 800 words. Deep AI has interacted with the large model many times. **Except for ChatGPT, the first edition of Wenxin Yiyan and Tongyi Qianwen did not reach 800 words. **

For example, Wenxin Yiyan, Deep AI has repeatedly reminded that the number of words in the article is not enough to 800, and it needs to be rewritten. Wen Xin said it every time: First, he apologized very humbly, promised to meet the requirements, and then quickly generated a new composition in ten seconds—still less than 800 words.

This "candidate" can't understand the composition questions, and he doesn't correct it after repeated teaching, which is a big minus item.

Long Zhiyong explained to Deep AI: "The training method of the big model to predict the next word does not let it learn to count. It doesn't know how much 800 is, and it doesn't know how to count the words to generate articles**."

In fact, not to mention 800, Wen Xin can't even count numbers like 10.

This is a problem with language models in general. As for why they cannot be counted, when and by what method they can be counted, there is no conclusion yet. "Although there are some tips to help it count, it is not a general solution. The current stage of the large model is to verify its ability by doing black box experiments, and to improve its ability by doing black box training. "Long Zhiyong said.

Under Long Zhiyong's suggestion, Deep AI changed the prompt words and entered "the richer the content, the longer it is", and Wenxin Yiyan output a composition of more than 800 words.

In the previous composition questions for the college entrance examination, the composition of ChatGPT exceeded 800 words, but in fact, it did not learn to count.

ChatGPT explains Deep AI like this:

Therefore, in fact, the number of words in the "top student" ChatGPT's composition is up to the standard, which is due to Meng. It doesn't know how many 800 words are, so it just writes as much as possible.

I can't fully understand human language, but I have super knowledge reserves and expressive skills, which sometimes leads to dumbfounding scenes.

Judging from the results of this AI war college entrance examination composition, the writing ability of the big model has made great progress. In terms of word choice, logical discussion, and citations, he even surpassed many people.

However, the evaluation of the quality of the composition itself has subjective factors, unlike a math problem where there is only one correct answer. Good-looking words and sentences are the same, but interesting souls are one in a million. How to inject soul into the composition, the AI model has not yet understood. Some problems inherent in the large AI model also need to be solved slowly through technical iterations.

View Original
The content is for reference only, not a solicitation or offer. No investment, tax, or legal advice provided. See Disclaimer for more risks disclosure.
  • Reward
  • Comment
  • Share
Comment
0/400
No comments