AI in banking: from data to revenue
Watch our free webinar, “AI in Banking: From Data to Revenue,” to explore how AI is transforming the BFSI industry.
Enterprises want to benefit from Machine Learning (ML) models to improve their business efficiency, and enable new types of more personalized offerings and products.
The main obstacle that needs to be fixed is having access to tons of high-quality data. This is often impossible. Even if somehow enterprises obtained those petabytes of data, they would lack infrastructure, and even using cloud Machine Learning services would consume vast amounts of IT budgets rendering entire ML initiatives lacking any business justification.
But what about using already existing pre-trained models, trained by someone else and ready to be used from day one, and pay as you go, or even better free of charge?
Seems like a dream come true for enterprises. Despite the ones and zeros being the founding blocks of our modern digital world, the situation is, as always, not so easy and obvious. Let’s take a look at foundation models, their opportunities, and their challenges.
We’ve already figured out what they are for, but what are they exactly?
Let’s take a look at the official definition from Stanford University (Center of Research on Foundation Models).
The huge amounts of data and its adaptability to many applications are the key properties of foundation models.
Those models are self-supervised, as they cannot be compared to labeled sets because the amount of work required to label all those petabytes of data would not be a feasible option (i.e., time, budget, resources).
There’s a great paper on the subject available here. On 212 of the pages, the scientists analyze different aspects of foundation models.
One of the most recognizable examples of foundation models is GPT-3, an enormous model and the most powerful language model, which revolutionized the way we perceive language models since its release. Despite its flaws, it’s an example of a ready to use API with a pay as you go model.
Training such a huge model is practically infeasible for 99,9% of enterprises, due to the lack of data, hardware requirements, and cost constraints. Yet, we all can use it and integrate its capabilities with our business applications (if we accept the cons of GPT-3, but that’s a topic for another article). Another example is BERT, which is another language model.
Foundation models enable enterprises to launch and benefit from digital data science initiatives that would not be at all possible without them.
Training large models is impossible without vast amounts of data, and very expensive and sophisticated technological infrastructure and experience. Think of thousands of GPU racks and enormous amounts of storage and networking.
So these models may be a true game-changer, especially for the medium-sized and smaller companies who are without high levels of data agility and lacking petabytes of high-quality data.
In this case, it’s a simple yes or no, a one or a zero. Either we can move on with the digital program or we have to stop it in its tracks because of the lack of data and/or appropriate infrastructure.
Transfer learning is also an interesting option when models are not learned from scratch, but more as an augmentation of existing models. This approach may enable a few shot models to perform business tasks for which they can be tuned.
Foundation models are readily available to be used immediately, after setting up the payment and downloading the API keys. APIs are well documented and reliable, and entry barriers are much lower than training a huge model on your own which may take months (if even possible).
Using foundation models, which are shared with thousands or even millions of other data engineers/scientists/developers, also means being a part of a community. And, it means sharing the best practices, avoiding common problems, and finding ways to get most of the models in the most effective way.
Foundation models are also a perfect fit for cloud transformation strategy. Instead of maintaining and managing a local infrastructure, they are usually readily available as a cloud service. It further reduces time to market and up-front costs. (Operational costs may be another story, but let’s focus on the positives here).
Foundation models also enable parameter tuning, which impacts how they perform as well as adapting them for particular business applications.
Their flexibility is lower compared to the models built from scratch locally, but again such models wouldn’t even be possible to design, build and test.
Foundation models are large and costly to train. So, they are not updated too often. Unfortunately in case of model error, everyone using the model is likely to suffer from this and there won’t be a quick fix in sight.
It’s not a (relatively) simple layered web application where the fix can arrive in hours, or days at tops. This may take many months to release a new model version with a set of bug fixes and… a new set of bugs introduced.
The consequences may vary from negligible to critical, depending on the particular bug and application.
Huge models usually tend to suffer from the explainability point of view. The optimization for efficiency of training and (relatively) compact size happens at the expense of the ability to explain the behavior of the model. It’s a tradeoff in which efficiency wins and it has to win given the vast amount of data that needs to be processed.
Because of the black box characteristic of the foundation model, bias is deeply embedded into the model and it’s much harder to deal with compared to custom models created from readily available data.
To make it even worse, there’s no single definition of what bias there is, what levels are natural correlations in data, and what is an unacceptable lack of fairness. This depends upon a particular use case, regulatory, and law environment, as it’s much harder to tune the foundation model than the local custom model.
Those large foundation models are designed, trained, and tested by relatively small teams. The impact of individual skills, ideas, and even world views may influence the model and then be repeated over and over again later by everyone who uses the model.
Large models can only be trained by companies and institutions with tons of hardware, petabytes of high-quality refined data, and very professional and skilled data science and MLOps teams. Models are built and assessed against the criteria specified by those companies, governments, and also the particular people working on them.
This gives an enormous advantage and implicit or explicit control over the model for the single business entity and a handful of its employees.
Minority groups can be largely underrepresented in those foundation models, and it does not apply to the usual victims such as racial and gender minorities, but to entire nations and cultures.
Will the citizens of a small African country really be represented by a model trained on US data? Not likely. What is highly probable is that this bias will be driven by profits, so the richer and more technologically advanced nations and social groups will benefit even more compared to the others.
Another example is targeting a nation whose language is relatively rare. It is likely that the foundation models will underperform in this situation and they will make minorities vanish from future digital space. This is plain dangerous and is the exact opposite of AI democratization.
Federated analytics and learning enable companies to collaborate on data projects without accessing the raw data. This is a great opportunity to open the floodgates of new data-related value streams for data analytics across organizations, countries, and entire regulatory zones (i.e., EU, US, Canada, APAC), without jeopardizing data privacy and protection requirements.
This kind of collaboration in ML space may replace foundation models as an alternative strategy. It may be more controllable and manageable.
Another benefit is the fact that the federated models are more up to date, they are not generic, and can be optimized easier for business purposes compared to foundation models.
The pressure on explainability, which is fully understandable, creates a problem for a foundation model’s adaptation. Being unable, or close to impossible, to provide a reason why for business decisions which affects customers or even more importantly patients is unacceptable.
That’s the most probable future, as everyone wants to benefit from Machine Learning and advanced models that would be otherwise out of reach. Foundation models, as enablers, are so powerful that other alternatives simply don’t work at all.
Sometimes they are even called a “necessary evil” as they enable building Machine Learning solutions “on the shoulders of the giants”.
Behavioral testing is recommended in order to figure out what are the practical consequences of using foundation models. This adds an additional cost, but helps to reduce the possible negative impacts while preserving the opportunities. For example, testing by minority groups may help to discover biases.
Everybody can benefit from this trend and this new option to accelerate digital AI programs, however, we need to be aware of the pitfalls and address them appropriately. Everything starts with the humans and ends with the humans, so let us not be at the mercy of foundation models, but take what’s best for our customers, business partners, and employees.
* US and Canada, exceptions apply
Ready to innovate your business?
We are! Let’s kick-off our journey to success!