Site Reliability Engineer (Palo Alto) Job at Archetype AI, Palo Alto, CA

ZlhwbkM1bU1vb2ptVmRqaVdpeTdGaEN3R2c9PQ==
  • Archetype AI
  • Palo Alto, CA

Job Description

Get AI-powered advice on this job and more exclusive features.

About Archetype AI

Archetype AI is developing the world's first AI platform to bring AI into the real world. Formed by an exceptionally high-caliber team from Google, Archetype AI is building a foundation model for the physical world, a real-time multimodal LLM for real life, transforming real-world data into valuable insights and knowledge that people will be able to interact with naturally. It will help people in their real lives, not just online, because it understands the real-time physical environment and everything that happens in it.

Supported by deep tech venture funds in Silicon Valley, Archetype AI is currently pre-Series A, progressing rapidly to develop technology for their next stage. This presents a unique and once-in-a-lifetime opportunity to be part of an exciting AI team at the beginning of their journey, located in the heart of Silicon Valley.

We are actively growing, so if you are an exceptional candidate excited to work on the cutting edge of physical AI and dont see a role that exactly fits you below you can contact us directly with your resume via jobsarchetypeaiio.

About The Role

As a Site Reliability Engineer (SRE) at Archetype AI, you will be responsible for designing, scaling, and maintaining the infrastructure that powers our AI-driven products. You will collaborate with backend engineers and ML researchers to ensure that our distributed platforms are fault-tolerant, performant, and highly available.

Core Responsibilities

  • Design, build, and operate highly available distributed systems.
  • Collaborate with engineering and ML teams to ensure reliable deployment of backend services (in Rust, C++ or similar).
  • Implement monitoring, alerting, and observability solutions across infrastructure.
  • Automate deployments, scaling, and infrastructure provisioning using infrastructure-as-code.
  • Diagnose and resolve performance bottlenecks, system outages, and production incidents.
  • Support AI/ML infrastructure for training and serving models at scale, including GPU clusters, pipelines, and inference services.
  • Contribute to infrastructure architecture, standards, and operational best practices.

Minimum Qualifications

  • 5+ years of experience as SRE, DevOps, or Systems Engineer.
  • Strong expertise in distributed systems, fault-tolerant architectures, and large-scale production environments.
  • Proficiency in Rust, C++, or other backend languages with willingness to learn.
  • Solid experience with Kubernetes, containers, and cloud platforms (AWS, GCP, Azure).
  • Hands-on experience with monitoring and observability tools (Prometheus, Grafana, ELK, OpenTelemetry).
  • Experience with data pipelines, messaging systems, and streaming technologies (Kafka, Pulsar, etc.).
  • Familiarity with AI/ML infrastructure (training pipelines, GPU clusters, inference systems).
  • Strong debugging, problem-solving, and automation mindset (Terraform, Ansible, Pulumi, scripting).
  • Excellent communication and collaboration skills.

Preferred Qualifications

  • Experience with real-time or low-latency systems.
  • Open-source contributions to distributed systems or infrastructure projects.
  • Knowledge of security best practices for distributed environments.
  • Experience with edge or embedded systems and sensor-based infrastructure.
  • Background in multimodal data fusion or physical-world perception systems.

What We Value

  • Ownership You take initiative, follow through, and care deeply about quality and outcomes.
  • Motivation Youre driven to solve complex problems and continuously raise the bar for yourself and your team.
  • Excellence You bring discipline, clarity, and rigor to your craftand help others do the same.
  • Collaboration You work well with others, mentor generously, and contribute to a high-trust, high-performance culture.

Get notified about new Site Reliability Engineer jobs in Palo Alto, CA .

#J-18808-Ljbffr

Job Tags

Full time,

Similar Jobs

California Retina Consultants

Medical Receptionist Job at California Retina Consultants

 ...Job Description Medical Receptionist Our Medical Receptionist assists our patients by checking them in and out of appointments, handling patient referrals, scheduling appointments, insurance authorizations, and managing incoming calls, just to name a few. So, check... 

Uline, Inc.

Web Designer Job at Uline, Inc.

 ...Web Designer Corporate Headquarters 12575 Uline Drive, Pleasant Prairie, WI 53158 Uline's Creative department is a powerhouse of over 170 team members who execute content on tight deadlines with consistent brand clarity. Join us as a Web Designer to contribute... 

Code Ninjas of Seattle

After School Coding Instructor Job at Code Ninjas of Seattle

Overview 9 / 8 / 2025) We are currently hiring instructors for Thursday and Friday afternoon shifts, with opportunities for extra hours.Who are we?Code Ninjas is the nations fastest-growing kids coding franchise. In our center, kids ages 7-14 learn to code in a fun,... 

CarVision Inc.

Data Entry Clerk Job at CarVision Inc.

 ...salespeople to close more deals. The ideal candidate will have experience hiring, training, and leading a team, will have a strong...  ...for prolonged periods of time* Ability to perform repetitive data entry tasks, manual dexterityWhat We Offer* Health Insurance* 40... 

Town & County Auto Body Collision Center

Automotive Damage Appraiser/Estimator Job at Town & County Auto Body Collision Center

 ...structure. Compensation & Benefits: - Competitive base salary with opportunity for commission - Health, dental, and vision insurance - Paid time off and holidays - 401(k) retirement plan - Ongoing training and professional development opportunities Responsibilities...