Skip to content

Projects

AI Training Data Pipeline & Automated Web Scraping

End-to-end ownership of data collection and model deployment for the company's core AI models — from compliant scraping and data cleaning all the way to running PyTorch Lite on iOS devices.

  • Respects robots.txt and prioritizes official APIs; analyzes dynamic rendering and anti-bot mechanisms, switching between Requests and Selenium as needed
  • BeautifulSoup HTML parsing + precision RegEx extraction to guarantee clean, format-specific data (e.g. .wav only)
  • Objective-C lower-level integration deploys PyTorch Lite model on iOS, with strict audio format calibration to prevent inference errors
PythonSeleniumBeautifulSoupRegExPyTorch LiteiOSObjective-C

Dataset Automation Pipeline

Built an enterprise-grade training data management platform from scratch to replace scattered audio files and manual workflows. Non-technical team members can now generate model-ready datasets in one click.

  • Enterprise NAS + PostgreSQL centralize audio files and metadata, solving the storage chaos and permission control issues
  • PyQt GUI automatically pulls files from DB and NAS on login — zero technical knowledge required to operate
  • Fully automated batch pipeline: audio splitting, alignment, augmentation (mix & synthesis), and metadata generation
PythonPyQtPostgreSQLEnterprise NASPyTorchRegEx

Hybrid Cloud AI Automated Inference System

A fully automated AI inference workflow: scheduled cloud data ingestion, local PyTorch inference with memory optimization, and automatic real-time dashboard rendering — zero manual intervention.

  • Python scripts pull data from cloud infrastructure on a schedule, clean and transform it, then feed it into inference
  • Local PyTorch model inference with tight memory monitoring and optimization for stable high-volume processing
  • Inference results auto-written to the database, backend auto-pushes to a real-time frontend dashboard
PythonPyTorchPostgreSQLVue.jsFastAPICloud Scheduler

End-to-End Cloud-Native Full-Stack Platform

Led the design and build of a cloud-native system ingesting high-frequency data from iOS/Android apps and Edge Devices. Pure Serverless handles scale; full CI/CD handles delivery.

  • AWS Serverless: API Gateway + Lambda absorbs device bursts automatically; data persisted to DynamoDB + S3
  • CI/CD: GitHub Actions automates Build & Test; Lambda functions and Vue.js frontend auto-deploy to S3/CloudFront
  • Cross-platform data alignment: iOS dev background ensures format standardization and transmission stability
AWSLambdaAPI GatewayDynamoDBS3CloudFrontGitHub ActionsVue.jsiOSFlutter

Internal Management & Automated Task Dispatch System

A centralized management platform that replaced Google Sheets. Covers role-based permissions, forced version updates, NAS file management, SQL Injection prevention, and Discord real-time push notifications.

  • PyQt GUI with role-based permissions; forced version check ensures everyone always runs the latest build
  • One-click NAS upload; parameterized queries across the board to eliminate SQL Injection risk
  • Database state listener triggers Discord API notifications automatically on any task status change
PythonPyQtPostgreSQLEnterprise NASDiscord APISQL Security

Multi-Cloud Microservices & DevOps Automation

A self-initiated side project built before an interview to prove the ability to bring up an unfamiliar cloud (GCP) from scratch in a short time — and work through real cross-cloud integration pain points.

  • CI/CD decoupled: GitHub Actions for CI, GCP Cloud Build for CD — achieving Keyless Authentication
  • Workload Identity Federation enables GKE pods to pull Docker images from AWS ECR without stored credentials
  • Django Auth + strict CORS rules protect Vertex AI (Gemini) API endpoints from unauthorized consumption
TerraformAWSGCPGKEKubernetesGitHub ActionsFastAPIDjangoVertex AI