Arabic NLP Guide [2022 Update]

arabic nlp guide

Introduction

Arabic is the fourth most spoken language on the internet and arguably one of the most difficult languages to create automated conversational experiences for, such as chatbots. An Arabic chatbot is a program that can understand and respond in Arabic.

Natural language technologies enabling us to simulate and process human conversations in Arabic have improved a lot over recent years. Enabling us to train to understand the emotions, and meanings, and detect the misspellings and sentiments of the language.

In this post, we wanted to take a look at the challenges, and available tools and create a brief proof of concept chatbot.

The intention is to build an Arabic Chatbot by using the Botpress platform which supports the Arabic language.

Arabic NLP Challenges

As most of us know Arabic is one of the most difficult languages in the world, so the challenges are considerable when machines are trying to learn Arabic.

As you know Arabic is very different from other languages.

We will highlight the hardest challenges:

  • The appearance of characters and the spelling of words can differ depending on their context
  • It is written from right to left.
  • The same verb can take thousands of different forms
  • There are numerous dialects of Arabic, each with significant differences
  • Because Arabic is a phonetic language, different ways to write the same word can exist when writing in dialectal Arabic, for which there is no agreed-upon standard
  • A single verb can have thousands of meanings of different forms; The form of characters and spelling of words can vary depending on their context.

Technical Solutions

CAMeL Tools

CAMeL Tools is a suite of Arabic natural language processing tools developed by the CAMeL Lab at New York University Abu Dhabi.

The camel-tools package comes with a nifty ‘morphological analyzer’ which — in a nutshell — compares any word you give it to a morphological database (it comes with one built-in) and outputs a complete analysis of the possible forms and meanings of the word,

The tool will reduce orthographic ambiguity to account for several common spelling inconsistencies across dialects. Camel-tools accomplishes this by removing specific symbols from specific letters.

Repustate

The Repustate platform provides a number of natural language processing tools for analyzing Arabic dialects. It understands three major Arabic dialects – Gulf Peninsular, Egyptian, and Levantine Arabic also it Obtains granular Arabic emotion analysis by aspect rather than Visualize all the insights in a customer insights dashboard

Arabic natural language processing (Arabic NLP) powers the sentiment model, such that it differentiates between Arabic dialects while picking up on colloquialisms, language nuances, social media short forms, and even emojis.

Repustate enables you to quickly and accurately capture customer and employee sentiments to increase efficiency and improve customer experience, provides native language analysis for 23 languages, and makes social media listening effortless by seamlessly integrating with the world's most popular social networks, review sites, and news sources.

Watson NLU

IBM Watson is one of the most well-known conversational AI platforms.

IBM Watson Natural Language Understanding gives you access to detailed developer resources that help you get started fast, including documentation and SDKs on GitHub.

The Arabic Natural Language Understanding enables users to extract meaning and metadata from unstructured text data. Text analytics can be used to extract categories, classifications, entities, keywords, sentiment, emotion, relationships, and syntax from your data.

Some high-level features of the platform

  • Train Watson to understand the language of your business and extract customized insights with Watson Knowledge Studio.
  • Surface real-time actionable insights to provide your employees with the tools they need to pull meta-data and patterns from massive troves of data.
  • Deploy Watson Natural Language Understanding behind your firewall or on any cloud.

There are some Arabic language limitations, some features are not supported in Arabic such as classifications, concepts, emotions, and semantic roles for these features.

Azure Cognitive Service

Azure Cognitive Service for Language is a new cloud-based service that provides NLP features for understanding and analyzing text.

This language service unifies Text Analytics, QnA Maker, and LUIS and provides several new features.

Most importantly it supports 96 languages including Arabic.

You can create an FAQ bot trained on unstructured data or use this to create advanced conversational experiences with the Microsoft Bot Framework.

Other Options

This is not an exhaustive list. There are many other Arabic NLP options out there (e.g Farasa, MADAMIRA, and Stanford (CoreNLP)

Botpress

Botpress is a favourite of ours as it's an all-in-one conversational AI platform.

Most importantly for this post is that the Botpress natural language understanding engine also provides Arabic natural language understanding out of the box.

Botpress Language Choice
Botpress is a platform that makes it easier for developers to create chatbots.

The platform assembles all of the boilerplate code and infrastructure you'll need to get a chatbot up and running, as well as providing a complete dev-friendly platform with all of the tools you'll need.

The platform contains the following features:

  • To build multi-turn conversations and workflows, there's a visual Conversation Studio.
  • To simulate chats and debug your chatbot, you'll need an emulator and a debugger.
  • Natural Language Processing activities are built-in, including intent categorization, spell checking, entity extraction, and more.

To expand the functionality, there is an SDK and a Code Editor.

Botpress is multi-channel so your Arabic chatbot can be deployed to Slack, Telegram, Microsoft Teams, Facebook Messenger, and an embeddable online chat are among the major messaging services supported.

The platform also provides Analytics, human handoff, and other post-deployment technologies.

Botpress facilitates the creation of FAQ-style chatbots. Typically, this chatbot will rely primarily on pre-populated responses.

The platform also enables you to create more complex multi-turn conversational experiences capable of comprehending Arabic and communicating in a human-like manner. They may extract information like dates, amounts, and locations from talks.

Botpress, like any other adaptable chatbot builder platform, offers limitless bot development possibilities. Botpress may be used for almost anything, from virtual enterprise assistants to consumer-facing bots that live on popular messaging networks.

Botpress Interface Features

Although it's beyond the scope of this document to review the Botpress platform in too much detail it's useful to briefly cover the basics.

The first thing that should be mentioned is that the interface of the platform is very smooth and easy to learn in a short time, building a chatbot using Botpress is quite simple, Let's review the interfaces of Botpress.

Studio Interface

When you choose a bot, you'll be taken to the Conversation Studio. For a new chatbot, Conversation Studio creates a new flow. Update the conversational flow and train an NLU model after testing, and then test and debug the chatbot Flows

Using a user-friendly design, the Flows page assists you in creating a conversational flow.

Natural Language Understanding

Botpress is an intent-based platform.  You can create intents and train the model with utterances and specify how the bot should respond. The platform also offers many of the standard NLP features:

  1. Entity extraction. Every phrase contains entities that help your bot understand a user’s intent and respond appropriately.
  2. System and custom entities. System entities are known entities that you can incorporate into your bot to accelerate development. You can also provide custom entities in the form of patterns or lists.
  3. Slots. These are the parameters that must be fulfilled to complete an action associated with intent. You define your slots and the NLU tags certain words from a user input that can be identified as intent slots.
  4. Slot filling. The engine gathers info required to satisfy a particular intent.
Q&A

The user can post frequently asked questions and their answers using the Q&A page.

Libraries

You can use hooks and actions on the Libraries page to import your custom code.

Analytics

The Analytics page shows dashboards that contain analytics data obtained during user chats.

Bot Improvement

The Bot Improvement tab helps you to monitor and develop your chatbot by managing negative comments from users.

Other Features
  • Broadcast: You can use the Broadcast page to deliver information to a big group of individuals.
  • Code Editor: Without leaving the Botpress Conversation Studio, you may create and update actions, hooks, libraries, configurations, and module configurations on the Code Editor page.
  • HITL Next: The HITL page allows you to integrate humans into the loop of the conversation when human intervention is needed.
  • Misunderstood: The Misunderstood page includes the user's input that triggered the error-handling cycle, as well as when they give negative feedback regarding the Q&A.
  • Testing: You can build conversation scenarios on the Testings tab to confirm that the bot maintains its good behaviour regardless of the scenario. Unit tests are what they're called.

Arabic Chatbot POC

Botpress was chosen for this project because the easy-to-use interface and out-of-the-box functionality allowed us to create a working chatbot fairly quickly.

For this project, it's going to be an Information Provider only for a Hotel chatbot concierge. A simple FAQ Bot which is the customer will ask and the bot will respond. We used the Q&A feature in Botpress to train the bot in Arabic to understand and respond to questions.

The challenge that was faced in the early stages was that there is not enough information about the Arabic language that may help to build the best Chatbot. There is scope for more information.

Tips. Insight. Offers. Are You In?

Conclusion

There are a number of excellent natural language tools and conversational AI platforms available to create chatbots that can converse in Arabic, with the accuracy and technology of Arabic natural language understanding improving day by day.

However, there are still challenges in creating and maintaining Arabic chatbots. This is compounded by a skills shortage of Arabic speakers in the AI world who have experience in creating chatbots in multiple languages and dialects and designing conversations in these languages whilst taking each nuance of a specific language into account.

Natural Language Processing (NLP) is a challenging field and it feels like some of the major players in this space need to step up their game. Google Dialogflow and Amazon Lex are conspicuous in their absence of Arabic support.

Of course, even if Arabic NLU's strength has increased significantly, it is always possible to improve it. The NLU engines are improving all the time, and further breakthroughs are undoubtedly on the way. There will always be work to do until NLU reaches anywhere near human levels.

About The Bot Forge

Consistently named as one of the top-ranked AI companies in the UK, The Bot Forge is a UK-based agency that specialises in chatbot & voice assistant design, development and optimisation.

If you'd like a no-obligation chat to discuss your project with one of our team, please book a free consultation.