Yseop is an AI company, whose software produces texts requiring a significant amount of domain knowledge and accuracy in details. As such, it deals daily with complex Natural Language Generation AI systems and associated technologies, including Rules Based Automation systems, Machine Learning systems and depending on use cases, Natural Language Understanding and its little brother, Natural Language Inference, all subtasks of Natural Language Processing.
This series of posts explores how data science fits into the AI equation, more particularly this first record will explore NLP as a data science problem, how it can succeed in delivering expected value, and at what terms? In our first article we look at some of the big issues surrounding data science, AI and NLP and how they are perceived by businesses.
Data science and NLP
Data science is meant to solve business problems -with data, inferential or bayesian statistics and machine learning engineering techniques. So basically, if you are a business with no problems, you don’t need data science. Data science is a field of well-known expertise, whereas NLP is driven by cutting edge research, both its strength and its weakness: having high research performance is completely different from being production ready, and the ‘research to industry’ journey is neither casual nor easy.
The very same reasons which make AI a fascinating and attractive area of research makes it tricky for the industry: there is no consensus today regarding which technology an AI system should embark on, so there is no clear process as how it should be designed. Let’s consider this as the business problem of NLP and look at how Data Science could help to solve it.
What is an AI system?
Still too many tend to see AI systems as all in one problem-solving machines. They are not. An AI system is built just like any other complex engineered system, block by block, component by component, and is highly specialized. Let’s try a comparison with traditional industry here, as no one would ever even think of asking a piping or electrical engineer to architect a system without taking into account the costs and liability of the design, yet NLP Machine Learning engineers are summoned to do so every step of the way. And so, they fail.
A well-known and efficient way data science can help to avoid design failures is by taking a problem-solving approach to building complex systems. These systems should not be driven by technical opportunities but by focusing first on creating a product to solve a problem.
Using AI to Solve Business Problems
Well, as in data science, it all starts with data. We often hear of bias in Machine Learning, this is a well-advertised topic -and for good reasons, especially when it comes to natural language processing. But even before we get there, observing how the data is biased, we need to solve a more basic issue: we need to find data that meet our needs, in our case: data that fits the product (remember : data science is meant to solve business problems.). It seems an obvious and pretty simplistic thing to say, yet so many projects end up dying because this very step has never been considered in the first place.
We have loads of data, tons of open resources, but they are not dedicated to the case we are trying to solve, this is: they don’t answer the question the business is asking a solution for. Trying to solve a problem with ‘any’ data would just be like building a chatbot answering in English to French native speakers: it’s technically feasible, but completely irrelevant. In our next article we’ll focus on some use cases and how Data Science is being used to solve problems today.
To learn more about how Yseop’s practical solutions for business problems, download our How to Choose Your First NLG Use Case guide!