Projects

AI Training Data Pipeline & Automated Web Scraping

End-to-end ownership of data collection and model deployment for the company's core AI models — from compliant scraping and data cleaning all the way to running PyTorch Lite on iOS devices.

Respects robots.txt and prioritizes official APIs; analyzes dynamic rendering and anti-bot mechanisms, switching between Requests and Selenium as needed
BeautifulSoup HTML parsing + precision RegEx extraction to guarantee clean, format-specific data (e.g. .wav only)
Objective-C lower-level integration deploys PyTorch Lite model on iOS, with strict audio format calibration to prevent inference errors

PythonSeleniumBeautifulSoupRegExPyTorch LiteiOSObjective-C

Dataset Automation Pipeline

Built an enterprise-grade training data management platform from scratch to replace scattered audio files and manual workflows. Non-technical team members can now generate model-ready datasets in one click.

Enterprise NAS + PostgreSQL centralize audio files and metadata, solving the storage chaos and permission control issues
PyQt GUI automatically pulls files from DB and NAS on login — zero technical knowledge required to operate
Fully automated batch pipeline: audio splitting, alignment, augmentation (mix & synthesis), and metadata generation

PythonPyQtPostgreSQLEnterprise NASPyTorchRegEx

Hybrid Cloud AI Automated Inference System

A fully automated AI inference workflow: scheduled cloud data ingestion, local PyTorch inference with memory optimization, and automatic real-time dashboard rendering — zero manual intervention.

Python scripts pull data from cloud infrastructure on a schedule, clean and transform it, then feed it into inference
Local PyTorch model inference with tight memory monitoring and optimization for stable high-volume processing
Inference results auto-written to the database, backend auto-pushes to a real-time frontend dashboard

PythonPyTorchPostgreSQLVue.jsFastAPICloud Scheduler

End-to-End Cloud-Native Full-Stack Platform

Led the design and build of a cloud-native system ingesting high-frequency data from iOS/Android apps and Edge Devices. Pure Serverless handles scale; full CI/CD handles delivery.

AWS Serverless: API Gateway + Lambda absorbs device bursts automatically; data persisted to DynamoDB + S3
CI/CD: GitHub Actions automates Build & Test; Lambda functions and Vue.js frontend auto-deploy to S3/CloudFront
Cross-platform data alignment: iOS dev background ensures format standardization and transmission stability

AWSLambdaAPI GatewayDynamoDBS3CloudFrontGitHub ActionsVue.jsiOSFlutter

Internal Management & Automated Task Dispatch System

A centralized management platform that replaced Google Sheets. Covers role-based permissions, forced version updates, NAS file management, SQL Injection prevention, and Discord real-time push notifications.

PyQt GUI with role-based permissions; forced version check ensures everyone always runs the latest build
One-click NAS upload; parameterized queries across the board to eliminate SQL Injection risk
Database state listener triggers Discord API notifications automatically on any task status change

PythonPyQtPostgreSQLEnterprise NASDiscord APISQL Security

Multi-Cloud Microservices & DevOps Automation

A self-initiated side project built before an interview to prove the ability to bring up an unfamiliar cloud (GCP) from scratch in a short time — and work through real cross-cloud integration pain points.

CI/CD decoupled: GitHub Actions for CI, GCP Cloud Build for CD — achieving Keyless Authentication
Workload Identity Federation enables GKE pods to pull Docker images from AWS ECR without stored credentials
Django Auth + strict CORS rules protect Vertex AI (Gemini) API endpoints from unauthorized consumption

TerraformAWSGCPGKEKubernetesGitHub ActionsFastAPIDjangoVertex AI