Why citizen data scientists?
The idea behind this trend comes from the mega trend of digitalization by the majority of all organizations, in one form or another. As they become more and more software and data driven, they need more people who can work with modern technologies.
We addressed low code platforms before, which enables non-pro developers to create business software that targets both professional and citizen developers.
→ Have a closer look at Low code and citizen developers: Pros are invited
Is there any equivalent of this trend in the data area?
The luxury of waiting for IT departments to get reports and perform data analytics is something that cannot be afforded anymore.
Idea behind citizen data scientist
Help yourself
The requirements for data reporting and analysis change fairly consistently. There are new disruptions on the market, like changes of an environment, and the business users want to find out what is really happening within the business processes and what they can predict about the future.
Working with professional data scientists means transferring a lot of knowledge about business processes and answering questions about what they want to find out. Sometimes it’s very time consuming and frustrating. Ok, they are fluent with data tools, but have to learn business again and again.
This is visible, especially when multiple hypotheses are to be checked ad hoc and the business user wants to experiment and get results immediately.
The modern idea is to give broader access to source data and to create data platforms which allow business users to do it themselves.
Resources shortage
Data scientists and engineers can be hard to obtain. It takes time to hire them, and to teach them the business domain and the specifics of the local data sources of their enterprise. It could be months until the companies observe a return on their investment.
Unfortunately, we cannot wait that long.
Building a large team based on a sudden spike of demand is also a bad idea, because it creates a long lasting cost effect and reduces the efficiency of the data team. Citizen data scientists could be a solution for this problem.
Sadly, people often forget they can seek external help and they are stuck focusing solely on the local team.
→ Explore Avenga capabilities within data science
Domain knowledge is the key
Citizen scientists have a huge benefit over technologically fluent professionals.
They know their business domain much better because of their practical experience in running their own business processes and optimizing them.
However, it’s an oversimplification; first of all, sometimes the darkest place is under the candle. Business users and managers focused on the details of the here and now have trouble working on higher level abstractions and seeing the whole picture, which is necessary for effective data analytics.
On the other hand, experienced professionals can also possess a lot of business knowledge about the domain, plus their fluency in the tools and techniques of data analysis. This match of both technology skills and business knowledge is a very powerful combination.
How to become a citizen data scientist
Get access to more data
The first step towards becoming a citizen scientist is to get access to the data. Ideally to non aggregated source data, which enables more flexibility for analytics.
And very often, this very step is something that generates problems from an IT and compliance perspective. Normally, source data is not meant to be viewed and processed outside of the data team. It is usually because of both data protection related risks and the sensitivity of the data.
Another reason is related to source data quality. When users attempt to access data before filtering and cleansing, they will deal directly with the source which is not appropriate for data analytics.
→ Explore Data quality at the source pattern – what???
With the explosion of data lakes, it’s even more apparent that raw unprocessed source data is not really “digestible” for non-professionals and their limited tools.
Learn BI software
Learning to use data analytics software is a must. Normally, business users learn both how to use the tool and about data processing in general. This includes data filtering and aggregation as well as finding dependencies between different data elements. And, it’s a great opportunity for someone to figure out if they want to really become a citizen data science professional or if they want to leave it for others to do.
Not everyone has to become a citizen data scientist and obviously a lack of motivation will jeopardize any training efforts.
→There is more about Business Intelligence and Data Visualization
Even though the tools create friendly abstractions over hard core data processing knowledge, someone’s lower level IT knowledge will surface eventually. Not everyone is prepared for this and it can be frustrating when instead of a nice UI they see technical error messages.
Stay focused
It’s counterintuitive at first, but it happens all the time. The more citizen data scientists are fluent with the tools, the more they forget about the business goals of their analytical efforts. In other words, they often lose focus of the business’ purpose and what they are doing.
→ Download our real world case studies and Build your business case with smart data
Learning new tools is often fascinating, as they will experience new options and possibilities, so it’s understandable. Still, the advice for all citizen data science specialists is to stay focused on their business domains.
Challenges related to citizen data scientists
The challenge, similar to low code, is the loss of control over data processing. Users store their “projects” locally but not in a very structured and organized way.
If what they create is useful for business, it pretty much creates a dependency on their “amateur” work which occurs if a lifecycle is not managed properly. Lack of backups, documentation and versioning is a typical set of problems.
Even worse, when the good citizen data scientist leaves the company and leaves behind a ton of their own creations, they are hard for successors to take over and maintain.
Another problem is that non-professionals have much lower knowledge about optimization of queries and the performance of data systems. It’s very difficult to teach non-technical people about this very important aspect. I personally have witnessed a professional data science specialist optimize a query to work a hundred times faster than when business users clicked-in using the tool.
Poor performance is something painful and quickly visible, but it’s just the tip of the iceberg. The entire data system may collapse and suffer from an outage when their processing engines are killed by inefficient queries from multiple self-made data solutions. So, for IT, it also means the need to prepare capacity and processing capabilities that allow much less efficient queries to be done by citizen analysts.
Future of citizen data scientists
Magic solutions
Citizen data scientists are tempted all the time by the data analytics industry with false promises of “just” using prepackaged models for machine learning and analytics.
It’s true that sometimes they may work, but very often they create a lot of frustration.
The main reason why is the poor performance of models and low accuracy whenever data sets are different from those that were used during training. . . .
So don’t believe in miracles, it really is a science.
Together
Another good idea is to have data science specialists work with citizen data scientists on a regular basis, especially to remove silos, as it’s beneficial to both roles. Professional data scientists can learn about business domain faster and citizen scientists will learn about tools and methodologies much faster than in a regular training.
What about the future of professional data scientists?
Should data science professionals be afraid of their future? No, not at all.
Their role is evolving towards building in the future data platforms, data fabrics, and meshes to enable both professionals and non-professionals to use the data efficiently.
Data quality problems and building analytical models, especially in the context of machine learning, will still require a high level of skill.
Building the right foundations is the key for any data project and that requires real IT skills.
And again, the combination of business domain expertise and technological expertise will create positive synergy effects.