In a post last January, I had termed 2017 to be the year of the banking bots and chatbots , virtual assistants and robo advisors did seem to get their fair share of the spotlight. Throughout the year chatbots and their use cases occupied a fair share of attention and saw very senior business leadership take interest and action to enable NLP based intelligent systems become part of their ecosystem. And over and again , the basics of the technology or how-it-works was asked . Presenting a small primer to enable anyone to be able to understand the nuances.
Natural Language Processing
This science exists because there is a need to translate interactions between computers and human (natural) languages. It is in other words the science of creating structured approaches to a multi-step process that translates and extracts text based information into formats that are understandable and computable by machines.
Within Natural Language Processing the two essential parts or components are
- Text (Written Language)
Natural Language Processing is distinctly divided into two parts
Natural Language Understanding (NLU) – This is how the intelligent system is made to understand a natural language input via text or speech. It involves
- Mapping [the given input in natural language into useful representations]
- Analysis [of the different aspects of the language
Lexical Analysis involves identifying and analyzing the structure of words and dividing the whole chunk of text into paragraphs, sentences, and words.
Syntactic Analysis (Parsing) involves analysis of words in the sentence for grammar and arranging words in a manner that shows the relationship among the words. The sentence such as “The school goes to boy” is rejected by English syntactic analyzer.
Semantic Analysis draws the exact meaning or the dictionary meaning from the text. The text is checked for meaningfulness. It is done by mapping syntactic structures and objects in the task domain. The semantic analyzer disregards sentence such as “hot ice-cream”.
Discourse Integration − The meaning of any sentence depends upon the meaning of the sentence just before it. In addition, it also brings about the meaning of immediately succeeding sentence.
Pragmatic Analysis − During this, what was said is re-interpreted on what it meant. It involves deriving those aspects of language which require real world knowledge.
Natural Language Generation (NLG) – This is the process of producing meaningful phrases and sentences in the form of natural language from some internal representation. This involves
- Text planning [retrieving the relevant content from knowledge base]
- Sentence planning [choosing required words, forming meaningful phrases, setting tone of the sentence]
- Text Realization [ mapping sentence plan into sentence structure]
Discourse Generation is the process where the input is the communication goal and the output can be the discourse often in the form of the content tree.
Sentence Planning involves surface realization or linearization according to grammar.
Lexical Choice involves choosing the content words (nouns, verbs, adjectives, and adverbs) in a generated text.
Sentence Structuring is the process of creating the sentence text, which should be correct according to the rules of syntax
Morphological Generation is the final structuring that may involve correction such as tense discrepancies or gender discrepancies of a language in context to the entity, situation etc. For example, using will be for the future tense of to be.
While chatbots rely heavily on training and data corpus , advanced institutions are using the technology to feed in documents and textual data to build a corpus to train their chatbots. In data corpus creation , NLP plays a huge role enabling text analytics . You can read more about it in my next post – Reference Architecture for NLP based Text Analytics
This post is compiled from sources publicly available on the internet and not biased towards any organization.