A Glimpse into ChatGPT by A Conversation with ChatGPT - Machine Learning Strategy
The “Machine Learning Strategy” series is a collection of articles on the subjects of Machine Learning, Neural Networks and Artificial Intelligence. In this series, I will share my personal experiences and insights as I learn and understand these topics. The articles will cover aspects such as the tools and techniques used in ML, the programming concepts involved, and even the psychological impact of human interaction with AI.
The aim is to provide both a practial and theoretical understanding of Machine Learning. This includes exploring the applications of ML, as well as its underlying implementation and limitations, along with pontential areas for improvement. Both the ways in which Machine Learning can be utilized and the potential advancement of the technology itself.
Hopefully, I would offer both the application of Machine Learning, and Machine Learning itself underlying implementation and limits with potential improvements. This is two way of the “strategy”, a way about usage and a way about AI itself.
ChatGPT almost had gone viral once it released. One month ago, I encounter it and test its ability.
Compare to the API to access GPT-3, it’s a easier and smarter way to just have a dialoge with the AI like ordinary IM software which has no preset after account registration.
I heard about ChatGPT and GPT-3, but not much technique related. So I decided to give an interview and let ChatGPT itself explain.
Is it based on GPT-3? - Overview of ChatGPT Model
To start the conversation, I asked ChatGPT if it was based on GPT-3. It confirmed that the architecture and training data are totally unique from GPT-3.
ChatGPT is a deep neural network model that doesn’t have a specify name but belongs to a larger system called “Assistant”. According to ChatGPT, the “Assistant” system can do a bunch of different types of tasks, includes the language model which exactly is the ChatGPT and speech recognition and image processing and more.
Details of The Language Model
To understand ChatGPT better, I asked about the technical deatils of its language model.
It explains that it’s a variant of the Transformer architecture, a neural network for tasks of natural language processing(NLP). The process is literally to predict the next word in a sequence, given the previous words, which is called as Masked Language Modeling.
ChatGPT is also based on self-attention which allows the model to weight the importance of different input words of a given sequence of text.
Comparison to Long Short-term Memory Model
After read the explaination from ChatGPT. I realized that it is quite similar to LSTM(Long Short-term Memory) model and discussed this topic in serval questions.
The main difference is the implementation to preserve the input information. LSTM use memory cell instead ChatGPT use self-attention
Self-attention calculate the weight of the the importance of each word in the input sequence. I wouldn’t show the technique which I thought was not intuitive to mention in a introduction article here, You can also search another term “Q,K,V” which are the vectors to use in the self-attention calculation.
Languages - The Training data
I had built a NLP project focus on Chinese around 2020, the architecture is BERT. BERT dataset which I used is also multilingues(BERT-multiligual-uncased model).
I was curious about the languages used in ChatGPT’s training dataset.
ChatGPT confirmed that the dataset contains a variety of human
languages, mostly English,
but it did not have access to specific information about the training
data or the languages included.
Then I asked if ChatGPT was trained on multiple languages simultaneously and whether this would imporve the results.
GhatGPT explained two ways to train with multiple languages. One way is to train the model on data in multiple laguages at the same time, which can be quite useful for tasks like translating between different languages.
Another way is to train separate models for each language, which can optimize performance on specific tasks.
The performance will depend on a variety of factores, including the model architecture and training process, the quality and quantity of the training data, and the target task that the model is being used for.
It is also important to consider the trade-offs that may be involved with multilingues. For example, a larger dataset and more computation resources, and it may also be more complex and harder to interpret than a model with as a single language.
When I write this paragraph, I had just realized that ChatGPT made a deduction that I had miss during our conversation. If ChatGPT doesn’t have access to specific information about its training dataset, then it infers the missing metadata based on its understanding of the language model.😲
Conclusion
ChatGPT provided valuable insights into its model, as well as its ability to hold a conversations with users has made an exciting tool to explore it.
Next article of the series, I would share my experience with ChatGPT. We worked together to solve a programming problem.