Hi, my name is Quan Duong

I'm a random developer who is enthusiastic about machine learning and AI.

Know more

About me

Profile Image

I have been working on machine learning and deep learning since 2018. My interest is on natural language processing.
I also have experience to solve ML problems on structured data and image data.
Below are some applications I have done.

  • Recommender System
  • Document Similarity
  • Image Processing
  • Time Series Analysis
  • Sentiment Analysis
  • Text Generation
  • Image Generation
  • General Classification
  • Synthetic Data Generation
  • Machine Translation

I can also build an end to end machine learning pipeline on cloud service like AWS.
My current stack: Python, Pytorch, Sklearn, Pandas, Django, Flask, React, Fedora.
Finally, I love to do research, code up, write down, submit paper and never stop learning.

View Resume

Projects

VIP/Churn Prediction

The project was part of my work at previous employer. The idea is to detect whether a user is VIP or churn user.
The given data is the monitored user's action on the site like clicks, login time, money they spent, etc.
VIP user are defined on spending threshold. Churn users are defined on how long have they not login to the site.
The point is the predict an user type in the future based on the current behaviors.

See Live Source Code

Similar Articles Recommendation

This project is also from my previous company. Interesting is that the customer is my university.
The idea is to recommend articles have the same semantic content for a given article.
Everyday, the pipeline will update new data and auto training as well as predicting the new results.
The model and pipeline had been developed by me. We support three languages: English, Finnish and Swedish.
The site is still runing strong and you can see the real demo by clicking the button below.

See Live Source Code

Similar Products Recommendation

The idea is very similar to the article recommendation above, however, this model is not only for text.
The product data includes text, images and categories format. This means a hybrid model is needed to solve this problem.
My proposed model is able to recieve these data as input and allow to adjust the weight of each data type.
For example, if you believe your product image is more reliable source than text, we can set the weight for image higher.
Futher more, the pipeline also works automatically without tuning.
This project is suitable for retail customers.

See Live Source Code

OCR Post-correction

This project is for my publication (paper and source code bellow).
Historical Finnish text corpus has many OCR errors due to the image quality as well as the old form of words.
Further more, Finnish is extremely rich morphology language when one word can have many variant or be a combound of different words.
We proposed a totally unsupervised learning method using Transformer and contextual data to solve the problem.
It is similar to fix spelling errors, but also normalize the ancient word to the modem word.

See Live Source Code

News Trending Detection

This is my another publication (still under reviewing).
News in the timeline usually contains valuable information like special event or changing topic.
For example, the changing in the article about war or relious (increase or decrease) or sport event (spike up).
The idea is to detect if a given corpus in the timeline has some special pattern and when the pattern starts and ends.
To be able to do that, we need anotated datasets. However, this kind of datasets are not available and very time consuming to do manually.
We propose a synthetic data generator tool and evaluation framework to tackle this task. More detail about the paper will be updated soon.

See Live Source Code

Auto Music Composition

This is part of my master thesis.
The idea is not to generate the whole song by AI, but use AI as an assistant tool for human to compose music.
The data used is symbolic pop music collected in midi format.
Symbolic music has very similar characteristic like language.
So, I propose different language model like BERT and TransformerXL to solve the problem.
The final tool will be able to do 3 tasks: generate new piece from previous music pieces, generate harmony for given melody, and fill in blank notes.

See Live Source Code

Death Psalm

This is project for the hackathon held by Digital Humanities department.
We use data science technique to discover what happen behind the death of the Bishop Henry - the national saint of Finland.
The data we have is the text corpus which contains some story and poem about the murder case.
The topic is very strange for me as I'm not Finnish nor historical researcher.
What I can help here is playing with data to find out some relationships between to text.

See Live Source Code

Creative Chat bot

This is my hobby project. The idea is to create a chatbot able to answer the categoried questions in creative ways.
The chatbot is hybrid model of retrieval type (domain specific) and creative type (general conversation).
The work is still in very begining, I will update the progress here soon.

See Live Source Code

Contact

quan.duong.vn(at)gmail.com

Call to Action