NLP engines arms race
Natural Language Processing (NLP) is all the rage in AI for business and consumers.
The technology enables users to access unstructured data in the text form and free text documents, as well as communicate with clients using chatbots and automated document analysis.
As consumers we use it in our voice assistants, grammar and spelling checks, automatic translations, search engines, automatic summaries of documents, etc.
Up to 80% of business data would otherwise (without NLP engines) be out of reach of modern digital solutions.
Was NLP hitting the wall?
There’s no surprise that we are facing an arms race towards the perfect NLP engine. Even with very easy cases, early attempts failed as the NLP engines were generating sensible outputs in less than 20% of the cases, which made them barely a scientific curiosity and not something that could be useful for business.
All the same, progress was visible and every year another model took the crown and moved the entire world further towards the goal of a near-human NLP engine. However, the results were always somewhat disappointing and the AI community started to lose hope.
Data scientists were worried that we were nearing the end of what is possible with current technologies. Algorithms and models are modern and complex, but still heavily based on the old concepts, so hitting the wall became a very popular prediction. There was also a growing concern that decades would be needed to achieve another significant breakthrough.
→ Is Deep Learning hitting the wall?
They will soon be back and in greater numbers
At the same time that scalability, overall computing power, and memory of machine learning clusters were growing, they were being improved with better hardware, storage and smarter learning algorithms. Companies wanted to try and see what would happen when they increased the number of so-called variables in the model.
In other words, the idea is that while we wait for major scientific breakthroughs, we can throw more data, more computing power, and optimized training algorithms into the mix and see what happens. . . . More? How much more? Like one thousand times more! Is it still hitting the wall? Is it still easily detectable by people as a product of AI, not other humans?
This is a brute-force approach to the problem, and although it’s surprising, these huge AI models show a very high degree of generalization. This means they were able to answer questions that they were not asked before with a very high probability that they would make sense to humans. This is called ‘zero-shot performance’.
Successful attempts to build NLP models with billions of parameters proved the validity of the so-called ‘power law’ of dataset size, training computer power and memory used.
From time to time, a new model steals the crown of the best model as it is the closest one to passing the Turing test (or even passing it according to the majority of people interacting with the engine).
Now, the results being achieved are really impressive as they are really human-like, and it is very hard to figure out if they were generated by a machine or not.
GPT-3 vs. GPT-J
Each new GPT model created by OpenAI increases the number of parameters and thus becomes smarter with each version; GPT has 117 million parameters, GPT-2 has 1.5 billion parameters, and GPT-3 … 175 billion parameters. Only GPT-3 has reached the level of accuracy where half of the people cannot say if the answer to their question was generated by machine or not. OpenAI achieved this major breakthrough, which was unprecedented before.
There are other engines from Google and other companies with enormous potential and a number of parameters, but let’s stick to GPT.
This article is not about GPT-3, but about GPT-J. They are different NLP models, although they have many similarities.
The fundamental difference is that GPT-3 is owned by Microsoft and offered as part of their ML cloud offer. GPT-J aims to provide a similar level of accuracy and quality in a totally open source spirit. GPT-J was released by a group of enthusiasts from EleutherAI.
The GPT-3 model was never released to the public and it can only be accessed by an API, and the model itself cannot be copied or modified by 3rd party data scientists.
Open source organizations do not have the vast budgets and hardware resources of enterprise IT giants such as Microsoft, so the model is smaller (6 billion parameters instead of 175 billion) and is trained on a smaller dataset.
The Pile dataset
There’s no data related project without … data.
In the case of GPT-J, they selected a dataset called “The Pile”. It’s more than 800 GB of text from multiple sources, and at this time (in contrast with GPT-3) they are derived mostly from scientific and professional sources.
It was a brilliant move to skip the controversial sources, especially Internet discussion forums, which made the model much less biased and less aggressive.
Let’s remind ourselves how GPT-3 “behaved” in the summer of 2020 by reading this interview with the GPT-3 engine.
As always in AI projects, basically “you are what you eat” as so much depends upon input data.
GPT-J in action
Of course, I could not resist trying the model myself. Let me share the results.
Q: Avenga is an IT services company located in Europe, The USA and Asia. What services should they offer business clients?
A: The answer: an always-on, secure, intelligent, adaptable, automated and managed global cloud services solution.
Avenga is a leading IT company, providing information technology services to businesses of all sizes, public sector and education institutions. The company’s core business is IT consulting, technology integration and managed IT services to enable its customers to succeed. Avenga’s solutions include enterprise-wide technology services, such as enterprise networking, data center and storage, cloud solutions, unified communications, mobility solutions, collaboration, and security. Avenga provides its customers with a range of IT services including managed IT services, IT….
We like this answer. Maybe the part about the hardware is less relevant, but the rest is an almost perfect brief about our offer. The grammar is not perfect but fair.
→ Avenga service offerings
Q: Should we be worried about enormous progress in AI, and NLP in particular? How do we avoid fake text messages, emails and even books?
A:As usual, I had some interesting conversations with my friends and colleagues this week. We were talking about NLP and the future of writing and text mining in the context of the current coronavirus pandemic, and about what could come after that.
Let’s take a look at some of the thoughts below.
We have been talking about a possible end to the pandemic, but it might be far from what many of us think.
The first question is about the “possible end” of the pandemic.
This time GPT-J got it completely wrong and did not truly “understand” the question. The answer is grammatically fair and the beginning is OK, but after the first two initial sentences it seemed to completely lose its “mind”.
The further we go, the longer the answer is, and the more it drifts away from a human-alike answer. This is a stark contrast to the GPT-3, which was able to generate longer texts while still preserving “common sense”.
Still, both engines failed at preserving the context of the conversation. But, the advantage of GPT-3 is clearly visible even in the single and relatively short answers.
The dark side of modern NLP engines
Anthropomorphizing AI just got worse
In regards to machine learning (ML), I am to blame, as well as millions of technologists, because we tend to use verbs like “learn”, “understand”, “makes sense”, etc. Machine learning is not really learning, it’s more about the repetitive process of reconfiguring huge amounts of values stored in memory by using smart mathematical techniques. We don’t even know how much it resembles the actual process of learning in our brains; the neurons are not the same neurons, etc.
Improving NLP models causes people to be more hyped about the capabilities of AI, especially as a step towards general AI, however from the business perspective they may do more harm than good. As you could see, even from our simple examples, we are still far far away from human-like conversations.
Hardware might become an issue
Model sizes grow tenfold each year on the average. It’s an enormous growth rate which cannot be matched by hardware improvements (TPUs, GPUs, memory, storage). Even taking into account the improvements in the efficiency of algorithms, the infrastructure cannot keep up with the ambitions of data scientists.
The next major step is relatively far away and in the form of quantum computing. It is supposed to save the day when we run out of options using binary computers.
Cost of GTP-3 model training
Another big issue is the cost, as training the models and preparing data require a lot of resources which means a high cost. It’s estimated that training the GPT-3 model would probably cost several million dollars/EUR for each training session.
The real cost remains unknown, but certainly it will be very high.
CO2 emissions and sustainability
In the era of increased efforts toward sustainability, which includes green computing, training such huge models have become ethically questionable.
→ More about Energy efficient software development
According to many, there’s no direct benefit to using petaflops of computer power to play with languages, words and sentences. I tend to disagree with them because, for example, there are many other energy consuming computer activities which consume enormous amounts of power and they are not moving our civilization any further along (cryptomining for starters).
Out-of-control syndrome
I reviewed the code examples, available online, that show how to use GPT-J in real world situations. The amount of control is limited, for instance, we can decide what the parameter called “temperature” should be and how long the answer is going to be. In many cases, the answers are narrowed down to a single sentence/line in order to avoid the NLP model drifting away from the topic (exactly what we saw in the example above). On the other hand, one-liners may be perfectly acceptable when we want to create simpler question-answer solutions without worrying too much about the context. It makes GPT-J a much smarter version of web search, which is, by the way, driven by NLP engines as well so there’s no surprise. Again, it may or may not be enough in your particular situation.
The main form or way of controlling GPT-J is by entering the right text as an input. And, it takes some trial and error to “sense” how the engine will behave so you can write the text in a way that will generate the desired output.
If you expect long human-like conversations to take place, you will still have to resort to science fiction books.
So what is next for massive NLP engines?
The future of NLP engines is easy to predict and the arms race will go on and on. If binary computers are deemed to not be an economically or environmentally viable option, quantum computers may save the day.
The achievements in the NLP space are ported in smaller and more digestible advancements but are still very useful forms to open source models, frameworks and datasets. So, the entire IT community benefits from this arms race.
GPT-J is a perfect example of adapting and improving a very advanced model (GPT-3) into a smaller, but less ‘evil’, dataset and sharing this with the open source world.
We hope that all these ambitious pioneers can move the frontiers even further, while taking into account our environmental issues.
There’s so much that can be done today to improve your business with the current state-of-the-art technologies, like Artificial Intelligence, and we would be thrilled to begin a conversation with you.