Skip to content

Hybrid Cloud AI Automated Inference System

Category: Hybrid Cloud · MLOpsStack: Python · PyTorch · PostgreSQL · FastAPI · Vue.js · Cloud Scheduler


Background

Once a model is trained, the real engineering challenge begins: how do you get it to produce valuable inference results continuously, every day, without manual intervention — and without letting cloud costs spiral?

This system addresses two core goals:

  1. Significantly reduce the cost of pure cloud-based inference
  2. Eliminate all manual steps from the pipeline

The solution is a hybrid cloud architecture: cloud handles storage and scheduling, a local server handles the compute-heavy model inference.


System Architecture

1. Cloud Data Layer

A scheduler (Cloud Scheduler) triggers daily to pull the latest bulk data from cloud storage (S3, database). This data may come from the previous day's device uploads or upstream processing systems.

2. Automated Data Processing

Once data arrives, Python scripts handle everything automatically:

  • Cleaning: filter invalid records, fill missing fields
  • Format conversion: normalize audio format (sample rate, bit depth, channels)
  • Preprocessing: align to model input specs (feature extraction, normalization)

No human involvement — data moves directly to inference.

3. Local Model Inference

Processed data is sent to a local server where PyTorch loads the trained model for batch inference.

Local inference demands stability and performance:

Memory Management Large batch inference can easily cause memory leaks or OOM (Out of Memory) crashes. I optimized the server configuration carefully:

  • Control batch size to avoid excessive memory usage per run
  • Explicitly release tensors after inference to prevent accumulation
  • Monitor memory usage with automatic alerts on threshold breach

Stability Monitoring Detailed logs capture processing time per batch, success rate, and any anomalies — problems are visible and traceable.

4. Result Storage and Dashboard

Inference results are written to PostgreSQL. A FastAPI backend reads and pushes results to the frontend on a schedule.

The Vue.js Dashboard renders the data into real-time charts — users see the latest inference output in their browser without touching a database.


Full Pipeline

Cloud Data → Python Clean/Transform → Local PyTorch Inference

          Vue.js Dashboard ← FastAPI ← PostgreSQL

From data pull, cleaning, inference, through to chart rendering — fully automated, running daily on schedule. Users open the dashboard and see current results.


Results

  • Moved heavy inference workload from cloud to local, significantly reducing cloud compute costs
  • Full automation eliminates daily manual operations
  • Real-time dashboard makes inference output accessible to non-technical stakeholders
  • Local memory optimization ensures stable high-volume inference without human monitoring

Takeaway

Hybrid cloud isn't about using many services — it's about assigning each layer of work to the place most suited to it. Cloud's elasticity is ideal for scheduling and storage; local servers' fixed compute is ideal for high-frequency batch inference. Finding that balance is where real cost control and performance come from.