Deep Learning Projects

Deep Transformer Soft Actor-Critic Network for Reinforcement Learning Utilize Transformer as memory module for both Actor and Policy networks Hyperparameter tuning for SAC performance Sentiment Analysis on MyAnimeList User Ratings MyAnimeList is a popular anime rating website. Predict user rating based on review using Recurrent Neural Network (RNN) Setup a data-mining pipeline utilizing self-hosted REST API with a Redis server for caching inside dockerized container Used different models (RNN with LSTM, CNN, CNN with Word2Vec embedding layers) for training and stacking model for ensemble. Achieved 94% validation accuracy with ensemble model Classification of Extended MNIST using Persistent Homology EMNIST dataset is MNIST (handwritten digit) dataset with handwritten characters Applied Persistent Homology and Principal Component Analysis to reduce the dimensionality of dataset. Reduced feature size from 784 (28x28) to 35 while retaining 99% variance Utilized libraries giotto-ai along-side standard deep learning libraries sklearn, NumPy, Tensorflow/Keras Achieved 97%-91% training-testing accuracy with a shallow neural network with only 3 hidden layers Utilization of CNN in speech recognition Classify Google Speech Command using Convolutional Neural Network (CNN) on audio data Created pipelines to process audio data to image features Audio data augmentation with respect to image features Used multiple CNN architectures (LeNet, MiniGoogleNet, AlexNet) for training and stacking model for ensemble. Achieved 91% validation accuracy with ensemble model.

2 min · 214 words · Minh Nguyen

Data Analysis Projects

Analysis of ProtonDB Linux Distribution Analyze trends of distributions market share in Gaming segment, based on ProtonDB user reports. Visuals to demonstrate the impact of Steam Deck release on Linux distribution market share. Spotify API Audio Feature Analysis From audio data predict track’s attribute, reverse engineer/analyze audio features of Spotify API. A (close to) comprehensive analysis of Spotify API Audio Features. Using datamined audio samples, convert to image representation of audio data. Use image representation to predict Spotify audio features.

1 min · 80 words · Minh Nguyen

MLOps/Data Science DevOps Projects

Jupyter Notebook Docker with Spark and DeltaLake support Attempts to replicate Databricks Runtime, plus features from feature-rich jupyter/docker-stacks. Based image on NVIDIA’s rapidsai/rapidsai image. Support for Spark/PySpark 3.2.x and Delta Lake 1.1.0. Monthly cronjob to update the image with latest features from upstream jupyter/docker-stacks CD/CI automate building of image and pushing to DockerHub and ghcr.io Docker container for Data Science: Based on Jupyter docker-stack jupyter/datascience-notebook

1 min · 65 words · Minh Nguyen

Games Reverse Engineering and Data Mining Projects

Date A Live: Spirit Pledge Game Analysis Assets Decryption Tool: Reverse Engineer mobile game Date A Live: Spirit Pledge using Static analysis tool from NSA ghidra and dynamic analysis tool frida. Re-implement decryption functions using Python, implement methods to convert PowerVR, Ericsson Texture Compression format to digital images format (JPEG/PNG) Assets Mining CD/CI: - Data-mined source logics to find insecure API/server that allows easy download/extraction of new game contents. - Datamining repository above developed decryption tool. Using cronjob and Github Action to automate fetch, decrypt and datamine new contents. Usable mined data examples: Extract Live2D assets compatible with Live2DViewerEX: Link Dating Route and Favorites: Link Other games reverse engineering/analysis Azur Lane (Autopatcher): Reverse engineer Azur Lane game client and edit (patch) the game logic automatically. Arknights Assets Decryption: Decrypt Arknights game assets by extracting AES encryption key via dynamic analysis (Frida).

1 min · 141 words · Minh Nguyen

Self-hosting Projects

Vaultwarden on Cloudflare A turn-key deployment for self-hosting Bitwarden using Cloudflare Tunnel. This is very useful for people who want to self-host Bitwarden but don’t have a static IP address. With the recent attacks on LastPass and other password manager providers, it’s time to take control of your own data. WandB self-hosting license generator For education purpose only, support generating license for self-hosting WandB server. Docker Compose for Docker-OSX Quick docker-compose deployment to run macOS in docker environment for security research. ...

1 min · 81 words · Minh Nguyen

Miscellaneous Small Projects

These repos contains all of my personal codes and guides for personal setups. Most scripts work with all common consumer-based distros (Debian/Ubuntu, Arch, maybe RHEL-based, Fedora for some) Library Genesis Torrent Scrapper: Scrapes torrents that need seeding for Library Genesis Project for preservation. Not intended for piracy Jpopsuki Torrent Scrapper: Scrapes small torrents for hoarding seed points on private music tracker Jpopsuki. Not intended for piracy pwned password checker: Check export BitWarden passwords against haveibeenpwned.com API. ReVanced Build Action: For education purpose only, support building ReVanced - a modded YouTube app for Android with a single click using Github Actions.

1 min · 100 words · Minh Nguyen

Shopify Fall 2022 Data Science Intern Challenge

Download Notebook{: .btn .btn–info } Note: All graphs and plots are interactive. Feel free to zoom, pan, and edit the graphs for more granular details. Question 1 Part A Code import pandas as pd import plotly.express as px px.defaults.width = 600 px.defaults.height = 400 A quick view (first 5 rows) of the data Code data = pd.read_csv("https://docs.google.com/spreadsheets/d/16i38oonuX1y1g7C_UAmiK9GkY7cS-64DfiDMNiR41LM/edit#gid=0".replace('/edit#gid=', '/export?format=csv&gid=')) data.head() ...

5 min · 957 words · Minh Nguyen