A series of articles for and by deep learning and NLP enthusiasts
The Language Modeling Path
We are living an exciting time in the field of Natural Language Processing (NLP) and Deep Learning. Neural network language models are spreading like never before and the top players like Google and Open AI are investing huge efforts in the development of ever larger and more powerful models. The ultimate goal you are asking? Capture the essence of the human language, whatever this could mean. Semantic, syntactic, grammar, entities etc all this kind of information and patterns can be extracted from text using NLP and exploited to perform most of the AI magic we always see generating huge hype on social networks. But what’s behind them? How is the GPT3 model able to generate text almost indistinguishable from a human writer? How modern chat bot can deceive the user on thinking a real person is answering their questions and even create puns?
Since I started working for Machine Learning Reply lot of projects required the application of such models:
- In semantic search engines (for example in e-commerce) this models enabled a more flexible and accurate search, retrieving more relevant results.
- In text embedding to improve the performances of any text based models starting from pre-trained sematic representation.
- In documents retrieval, for example applied to automatically answer questions on a certain platform usage returning appropriate section of the documentation.
- In natural language classification, for example applied in automatic ticket routing
- In chat-bot development these models are able to achieve a higher level of comprehension and context awareness.
I have always been amazed by the ability of those kind of models to process natural language and wanted to know exactly what was behind them. For some years I dived into papers and lectures and now I think to have at least a good grasp on the key concepts that were touched during the evolution of this models up to the current state of the art.
In this collection of articles, I will try to give you all the basic information on the extraordinary insights that top researchers in data labs and universities had in order to bring such beautiful AI magic into existence. I am not a researcher myself so I am not claiming to be the ultimate expert on these particular models but as a mathematician with some background in engineering and NLP passionate with 5 years of Machine Learning expertise, I love to keep me updated with any development in this outstanding field of research, and I’d love to help it spreading sharing with you what I learned in a more accessible way for the average deep learning amateur then reading tons of papers.
On the other hand, since the Language Modeling Path could include the history of NLP and Deep Learning that could itself be traced back to the fundamentals of statistic and so on until we will end up including the whole human history, I needed to select an arbitrary starting point for this story. I made my choice looking for the simplest topic in my checklist not having a huge amount of already existent online educational material on it. I hope my perception was correct and that you won’t have any problem following the explanation or finding the appropriate material to fill the gaps. In any case I will always provide links for required knowledge. If you are not familiar with some concepts go check the material and then come back to this article. And remember you are not alone. I will always leave my contacts if you need some additional explanations or have any doubts.
If you managed to read so far I am already grateful for your patience, so without further delay we can start this journey. Fasten your seatbelts, please!
This series of articles is resulted from the intent of sharing the experiences, knowledge and skills gained by the Machine Learning Reply data scientists and data engineers. If you want to know more of what we do and how we realize or solutions you can find it here.