First-principles systems builder

TIAGO OLIVEIRA

30 years of systems thinking // from diesel engines to distributed systems

Bridging domains others avoid: mechanical to software, operations to computer vision, telephony to AI. Same systems thinking, different tools!

Scroll to explore

The Journey

From rebuilding diesel engines in rural Brazil to architecting AI platforms serving tens of millions—this is a story of systems thinking.


The Shop Floor 1994–2013

I started working at age 8 in my father's truck mechanic shop in Xanxerê, a small city in southern Brazil. For 19 years, I worked on diesel engines, hydraulic systems, and pneumatic equipment.

This wasn't hobby work. It was full-time mechanical engineering in an environment where diagnostic manuals didn't exist and parts had to be fabricated rather than ordered.

Diagnosing why a diesel engine fails under load requires systematic elimination of variables—fuel delivery, compression, timing, electrical systems. This same mental model later applied to debugging distributed systems at scale.

As automation became more common in heavy machinery, I began working with PLCs, PIC microcontrollers, and early Arduino boards. I built digital controllers interfacing with mechanical, hydraulic, and pneumatic systems for wood processing equipment, tube bending machines, and food manufacturing.

The critical insight: Edge computing and IoT came naturally later because I was already solving hardware-software integration at industrial scale. The jump to cloud infrastructure wasn't abandoning mechanical work—it was taking the same systems thinking from factory floors to distributed platforms.


The Forced Upgrade 2004–2007

My mother recognized that physical labor, while honorable, limited long-term opportunity. She insisted I enroll in a three-year software development bootcamp while continuing to work in the shop.

This wasn't a gentle suggestion. It was a forced update—Mom.exe pushing a mandatory patch.

The bootcamp taught Java, SQL, and web development basics. The value wasn't the specific technologies—it was learning that software systems could be decomposed and debugged using the same systematic thinking I'd developed with diesel engines.

I earned a Bachelor of Technology in Information Technology and a Post-Graduate Specialization in Software Development with Java from Universidade do Oeste de Santa Catarina.


The Bridge Years 2013–2017

My first professional software role at NewFocus involved customizing ERP systems for industrial clients. This was the perfect bridge between my mechanical background and software development—solving the "last mile" problem of connecting physical operations to digital systems.

A wood processing company needed to track trucks leaving their premises and capture load weights automatically. I built custom ERP modules integrating PDA devices, industrial weighing scales, and real-time data pipelines. I wasn't just writing database queries—I was solving how to make factory floor operations visible to business systems.

At Nokia Siemens Networks, I built a personnel safety system for cell tower technicians using cellular triangulation before GPS was ubiquitous in mobile devices. If a technician didn't physically move within a specified time window, the system escalated automatically. This was real distributed systems work—sensor data, time-series analysis, event-driven alerting.

At Dell Technologies, I pioneered server-side JavaScript rendering in 2013—years before React SSR became standard practice.

At Zenvia Mobile, I dockerized applications when Docker was pre-1.0. The goal was handling unpredictable SMS traffic spikes—political campaigns, breaking news, marketing blasts. I built L1/L2/L3 escalation procedures and a "FireFighter" on-duty rotation system that's still operational at Zenvia today, a decade later.

At AGCO, I built military-grade IoT security for agricultural machinery: sub-millisecond authorization using custom nonce calculation on mutual TLS with Erlang and RabbitMQ. Orchestrated autonomous machine-to-machine coordination for harvesters and grain carts using MQTT-based handshakes with sub-meter GPS accuracy.


The International Protocol 2017–2020

In 2017, I moved to Berlin.

At PayU, I consolidated 14 markets' payment reconciliation—normalizing disparate formats from banks, merchants, and acquirers into serverless platform. 60% cost reduction.

At OSRAM, I architected zero-trust IoT security with OAuth 2.0 extensions and HSM-backed cryptography.

Then to Stuttgart at Mercedes-Benz.io. Multiple departments needed different views of vehicle data across the lifecycle. Legacy system required manual view creation and individual integrations per department.

My architectural insight: event sourcing. Unified vehicle state across all lifecycle stages, allowing any department to materialize their own view from the same event stream.

Built platform using AWS Lambda, containers, S3, DynamoDB, EventBridge, ElastiCache. Connected 60+ global systems, 47% cost reduction, deployment velocity from months to days.

The principle: One source of truth, infinite flexible views, no tight coupling.

Each move required rebuilding credibility: learning new languages, adapting to different engineering cultures, navigating compliance requirements. The gap between internal certainty and external validation creates energy but also stress.


The Cloud Native Era 2020–Present

I joined AWS in May 2020 as Senior Solutions Architect in Stuttgart, focused on Germany's premier manufacturing companies: BMW, Bosch, Siemens, Festo; implementing Industry 4.0 initiatives.

Manufacturing systems have strict reliability requirements. A production line stopping costs hundreds of thousands per hour. My background in mechanical systems gave me intuitive understanding of physical constraints that pure software engineers often miss—vibration, temperature, electromagnetic interference, network reliability in industrial environments.

In October 2021, I moved to Austin as Senior Product Architect at AWS Industry Products, working on computer vision platforms. I invented the CVOps framework: extending MLOps principles to cover the entire computer vision lifecycle. I demonstrated how it works, and led platform development. I focused on the embedded stack, creating a flexible Rust-based system enabling cloud flexibility on resource-constrained devices.

In October 2024, I became Principal Architect focused on telecommunications and generative AI, moving to Seattle.

Real-Time AI Telephony

My current work focuses on real-time AI-powered voice platforms: enabling generative AI interactions over traditional phone systems for major telecommunications carriers.

The technical challenge: Building systems that bridge 1970s telephony protocols (SIP/RTP) with modern AI inference platforms, maintaining carrier-grade reliability and real-time performance.

This is genuinely uncharted territory. When new AI capabilities emerge, I build working prototypes within hours to validate architectural approaches. These prototypes become the foundation for production systems spanning multiple regions, handling sub-100ms latency requirements at massive scale.

The work includes invention disclosures for privacy-preserving AI quality evaluation, and agent-to-agent handover.


The Through-Line

From 1994 to today—truck mechanic's shop in rural Brazil to architecting AI systems at global scale—the through-line is consistent:

Systems thinking applied to ambiguous problems with real-world constraints.

Whether diagnosing why a diesel engine fails under load or why a distributed system fails under load, the mental model is the same:

  1. Decompose to fundamental components
  2. Identify where constraints bite
  3. Build prototype to test assumptions
  4. Iterate based on reality, not theory
  5. Make it operationally excellent before scaling

The tools changed—wrenches to keyboards, diesel engines to distributed systems, mechanical shops to cloud infrastructure, but the approach remained constant!

How I Think

"The gap between demo and production is where I live."

Most architects can design for the happy path. The value is in knowing what breaks at 3am under 10x load with a team that's never seen the code.


First Principles from the Shop Floor

Every distributed system is just another machine with predictable failure modes you can debug and prevent. The more complex the systems get the harder it is to predict, but never impossible!

I spent 19 years diagnosing mechanical failures without manuals. When a diesel engine fails under load, you don't guess. You systematically eliminate variables: fuel delivery, compression, timing, electrical systems. You decompose the problem until you find the constraint.

Software systems work the same way. The abstractions are different, but the physics are the same: latency is limited by speed of light, compute requires energy, energy generates heat, heat requires cooling. Every system has constraints. Find them.


Core Beliefs

Constraint Thinking

Every problem has one constraint that matters most. Find it. Everything else is noise until that constraint is addressed.

In manufacturing, it's usually throughput at a specific station. In distributed systems, it's usually the slowest component in the critical path. In organizations, it's usually the decision that's blocked or the person who's overloaded.

Tradeoff Clarity

Frame decisions so stakeholders can choose. Don't hide complexity. Expose it clearly enough that the right people can make informed tradeoffs.

"We can have consistency or availability, not both during a partition" is useful. "It's complicated" is not.

This applies to technical and organizational decisions equally. When a team is stuck, it's often because the tradeoffs aren't visible to the people who need to make them.

Teaching as Liberation

I love teaching. But not the kind that produces copies of the teacher.

Following Paulo Freire's thinking, I see education as a tool for liberation, not indoctrination. The goal isn't to make people think like me. It's to help them think for themselves. Questioning, dialogue, co-discovery. The best outcome is when someone reaches a conclusion I wouldn't have, and it's better than mine.

Individual contribution doesn't scale. What scales is enabling others to solve problems you'll never see. The best architectural decisions are the ones teams can extend without you. The best debugging sessions are the ones where someone else finds the root cause because they learned how to look.

Design for Evolution

Today's architecture is tomorrow's legacy. Build systems that can be replaced piece by piece, not rewritten wholesale.

Event sourcing at Mercedes-Benz wasn't just about current requirements. It was about enabling views we couldn't predict yet. One source of truth, infinite flexible views.

Two-Way Door Decisions

I'm a fervent advocate for reversibility.

One-way doors are decisions that are costly or impossible to undo. They deserve caution, analysis, and buy-in. Two-way doors are decisions you can reverse if wrong. They deserve speed and experimentation.

Most decisions are two-way doors mistaken for one-way doors. Teams slow down unnecessarily, seeking consensus for choices that could simply be tried and reverted. Recognizing which type you're facing changes everything about how you should move.


The Mechanical Foundation

I maintain a full workshop in my garage for building metal pieces. This isn't nostalgia. It's philosophy.

Pure abstraction without physical reality feels incomplete. The best software systems account for real-world constraints that pure software engineers often miss:

  • Vibration affects sensors and connections
  • Temperature changes component behavior
  • Electromagnetic interference corrupts signals
  • Network reliability varies by environment
  • Power availability isn't guaranteed

When you've rebuilt an engine in a shop where the nearest replacement part is 500km away, you develop a different relationship with operational excellence. You build systems that can be diagnosed and repaired, not just deployed and replaced.

One lesson that stuck: never assemble without proof you're going in the right direction. I've mounted an engine back into the chassis only to discover I needed to pull it again for one oil retainer I missed. That teaches you something about validation. You learn to verify before you commit. Check the next layer before closing up the current one. In software, this translates directly: don't merge without confidence, don't deploy without verification, don't architect yourself into a corner you can't back out of.


How I Work

Hands-On Leadership

I don't design systems in ivory towers. For my current telephony work, I didn't just draw architecture diagrams. I built WebSocket servers, tuned GStreamer pipelines, debugged SIP flows, and solved jitter buffer timing issues.

This keeps architectural decisions grounded in implementation reality. It's also how you earn credibility with engineering teams. People trust your judgment differently when they've seen you debug alongside them.

Prototype-First Validation

Plans are hypotheses. Prototypes are evidence.

When new AI capabilities emerge, I don't write documents about what we could do. I build working prototypes, sometimes within 24 hours. Those prototypes accelerate enterprise decisions and de-risk architectural choices.

The pattern:

  1. Identify the riskiest assumption
  2. Build the smallest thing that tests it
  3. Learn from reality, not theory
  4. Scale what works

Navigating Organizations

Technical problems are often organizational problems in disguise. A system that requires three teams to coordinate for every deployment isn't a technical architecture problem. It's a team boundary problem.

I've learned to read organizational dynamics the same way I read system architecture: where are the bottlenecks, who holds context that others need, what decisions are blocked and why. Sometimes the right technical choice is the one that works with organizational reality rather than against it.

Operational Excellence as Default

I come from environments where system failure had immediate economic consequences. Factory lines stopping. Trucks broken down. I build observability and failure recovery from day one.

Not as an afterthought. Not as a "phase 2." From day one.


What I'm Skeptical Of

Process theater. Meetings about meetings. Documentation that no one reads. Ceremonies that don't produce decisions.

Premature abstraction. Three similar lines of code are better than a premature abstraction. Build for today's requirements, not tomorrow's hypotheticals.

Architecture astronauts. People who design systems they'll never implement or operate. The gap between diagram and deployment is where most architectures fail.

"Best practices" without context. What works for Google doesn't work for a 5-person startup. What works in a microservices architecture doesn't work for a monolith. Context determines correctness.


What I Optimize For

Clarity over cleverness. Readable code over clever code. Explicit over implicit. Boring technology over exciting technology.

Operational simplicity. Can someone debug this at 3am? Can a new team member understand it in a week? Can it fail gracefully?

Speed on two-way doors. If a decision can be reversed, make it fast. Save the deliberation for the ones that can't.

Learning velocity. How fast can we discover what we don't know? Prototypes beat documents. Production beats staging. Customer feedback beats internal review.

Team capability. Am I leaving this team better equipped than I found them? Can they operate and evolve this system without me?


The Through-Line

Whether debugging diesel engines or distributed systems, the approach is the same:

  1. Decompose to fundamental components
  2. Identify where constraints bite
  3. Build prototype to test assumptions
  4. Iterate based on reality, not theory
  5. Operationalize before scaling

The tools changed. The thinking didn't.

Tiago Oliveira

Principal Engineer | Real-Time Distributed Systems & AI Infrastructure

tiago@tiago.sh · LinkedIn · GitHub · tiago.sh · Greater Seattle Area


Summary

Build production AI infrastructure where latency, multi-tenancy, and telco integration meet. Designed, built, and operate a multi-tenant real-time agent platform spanning Nova Sonic, OpenAI Realtime, and Gemini Live: sub-100ms p99 end-to-end across four regions, full-silo tenant isolation, custom SIP UA, MCP gateway. Authored the open-source GStreamer plugin (C, libsoup-3.0) deployed as the audio transport layer for a top-three US telecom operator's production rollout, now serving millions of subscribers. Earlier: event-sourcing platform at Mercedes-Benz, serverless payments across 14 markets at PayU, zero-trust IoT at OSRAM.


Skills

Languages: Rust, Go, C, Python, TypeScript, Erlang, Java

Systems: distributed systems, low-latency real-time inference, multi-tenant isolation, event sourcing, cell-based architecture, multi-region HA, OpenTelemetry, SLO design, p50/p95/p99 analysis, TLA+

AI Infrastructure: real-time multi-model inference (OpenAI Realtime, Nova Sonic, Gemini Live), agentic systems, MCP, RAG, LLM serving, privacy-preserving evaluation

Networking & Streaming: SIP, RTP, B2BUA, GStreamer, WebRTC, WebSocket streaming, IMS AKA, STIR/SHAKEN, mutual TLS, OAuth 2.0, OIDC, HSM, zero-trust

Cloud: AWS (EKS, Lambda, Fargate, Firecracker, Outposts), Kubernetes, serverless


Experience

Principal Engineer / Architect, Generative AI Platforms

Amazon Web Services · October 2024 – Present · Seattle, WA

  • Designed, built, and operate a multi-tenant real-time AI agent platform: voice-over-WebSocket, multi-model inference across Nova Sonic, OpenAI Realtime, and Gemini Live, custom SIP UA, cell-based HA, and an MCP gateway. Designed full-silo tenant isolation on serverless infrastructure and authored internal research on cross-tenant retrieval attacks, memory contamination, and credential leakage that application-level controls cannot address.

  • Authored the open-source GStreamer plugin (C, libsoup-3.0) deployed as the audio transport layer for a top-three US telecom operator's production rollout. Implemented voice ducking and barge-in handling that converted robotic turn-taking into natural real-time conversation.

  • Built a network-integrated real-time inference platform: bidirectional audio over SIP/RTP, dual-channel inference, custom B2BUA across four AWS regions, sub-100ms end-to-end. Now in production serving millions of subscribers.

  • Built utterance-level latency observability with OpenTelemetry, and a privacy-preserving translation-quality evaluation framework in Go.

Senior Engineer / Architect, Computer Vision & AI

Amazon Web Services · October 2021 – October 2024 · Austin, TX

  • Implemented the initial Python/Rust core of CVOps, a computer-vision operations framework, then scaled it to a multi-engineer team. Established the CI/CD and shared libraries that let the team work independently. Production system served sub-millisecond inference across a fleet of millions of cameras.

  • Designed and built an edge-to-cloud inference continuum: MEC/Outposts for near-edge, TFLite for embedded edge, Lambda control plane with auto-scaling Fargate data plane. 5,000+ predictions/second at sub-100ms inference latency.

  • Multi-stream correlation and GenAI-powered scene summarization drove ~40% fewer false-positive dispatches and ~35% faster investigations for security operators.

Senior Architect, Industry 4.0 & Edge AI

Amazon Web Services · May 2020 – October 2021 · Stuttgart, Germany

  • Built an edge computing framework for manufacturing control systems: critical decisions stay local at millisecond response times while cloud handles analytics. 99.99% reliability, ~35% fewer unplanned outages.

  • Built IoT fleet management for 10,000+ heterogeneous industrial endpoints. Created an abstraction layer bridging decades-old legacy machinery with modern AI/ML pipelines.

  • Built a ROS-based path planner that learned from human operator patterns and generated collision-free routes in simulation. ~30% fewer robot space-invasion events, ~20% fewer safety stops.

Principal Software Engineer, Platform Architecture

Mercedes-Benz.io · September 2018 – May 2020 · Stuttgart, Germany

  • Designed and built an event-sourcing platform (Lambda, containers, S3, DynamoDB, EventBridge, ElastiCache) creating a unified vehicle state across all lifecycle stages. Any team could materialize their own view from the same event stream. 60+ systems integrated, ~50% cost reduction, deployment cycles from months to days.

  • Built a high-performance pricing engine processing 1,000+ evaluations/second on decades of sales data, validating a hypothesis-driven ML pipeline pattern under production load.

Senior Staff Software Engineer, IoT Security

OSRAM · April 2018 – September 2018 · Berlin, Germany

  • Built zero-trust IoT security for smart lighting infrastructure from scratch — custom OAuth 2.0 and OpenID Connect extensions for fine-grained device permissions, with hands-on cryptographic implementation and HSM integration for key management.

Senior Staff Software Engineer, FinTech

PayU · May 2017 – March 2018 · Berlin, Germany

  • Consolidated payment reconciliation across 14 markets into a single serverless platform, normalizing disparate bank/merchant/acquirer formats into a common model. ~60% cost reduction; auto-scaled per market and to zero when idle.

  • Built ML-based fraud detection with autonomous weekly retraining, handling millions of daily transactions.

Senior Software Engineer, AgTech IoT

e-Core · February 2016 – May 2017 · Porto Alegre, Brazil

  • Implemented custom nonce calculation on mutual TLS using Erlang and RabbitMQ; contributed the plugin back to the RabbitMQ project. Built an air-gapped firmware signing process using the Yubikey hardware API. ~30% infrastructure cost reduction through protocol optimization.

  • Built MQTT-based handshake mechanism with sub-meter GPS accuracy for autonomous machine-to-machine coordination between harvesters and grain carts (approach, align, transfer, signal, separate).

Earlier Career

  • Zenvia Mobile, Principal Software Engineer (2014 – 2016). Deployed Docker 0.x in production (2014) for unpredictable SMS traffic bursts; built an L1/L2/L3 incident framework still in operational use.
  • Dell Technologies, Senior Software Engineer (2013 – 2014). Server-side SPA architecture using Java Nashorn, years before React SSR. First recipient of Dell's Application Development Quality Gold Award.
  • Nokia Siemens Networks, Software Engineer (2013). SLA maintenance matrix for global cellphone sites using graph theory and finite state machines; 3GPP and ETSI compliant.
  • Earlier: bespoke ERP modules at NewFocus, including a PIC-controller hack streaming semi-analog truck-scale readings into the ERP in real time. Background in industrial mechanical, hydraulic, and pneumatic systems before software.

Education & Recognition

  • B.Tech, Information Technology, Universidade do Oeste de Santa Catarina (2004 – 2007)
  • Mechanical Engineering coursework (incomplete), Unochapecó (2011 – 2013)
  • 12 patents filed in computer vision model monitoring, cryptographic video signing, multi-modal scene understanding, device optimization, and privacy-preserving evaluation
  • AWS AllStar Award, Customer Obsession
  • Languages: English (fluent), Portuguese (native), Spanish (professional), German (conversational), Italian (conversational)
  • Mentorship: Mentored 35+ engineers to promotion across multiple seniority levels through 1:1 coaching, design reviews, and promo doc support.