High-performance computing & infrastructure engineer

From the GPU kernel
to the cluster in production.

I build high-performance computing tools, and I run the infrastructure that ships them: GPU energy measurement for Kokkos at Oak Ridge, HPC for nuclear simulation at EDF, and a five-node cluster running about 20 services in production, operated end to end from Docker to CI/CD.

Selected work
/01 Selected work

Things I built, and what they cost

Case studies, not a vignette grid: the problem, the work, and where it stands. The flagship first.

Flagship

Measuring where the energy goes on the GPU

Energy-measurement tooling for Kokkos, the US Department of Energy's performance-portability framework. Connectors merged into Kokkos Tools, plus an analysis dashboard.

The problem

Kokkos lets one C++ source run across NVIDIA, AMD, and Intel GPUs, which is exactly why energy is hard to reason about: the same kernel draws different power on every backend, and application teams had no portable way to see it. On DOE machines, where power is now a first-class constraint, that blind spot matters.

What I built

A set of Kokkos Tools connectors that sample power while kernels run and attribute the integrated energy to the Kokkos regions that caused it: an NVML backend for NVIDIA GPUs, a Variorum backend for node-level power, a background daemon sampling on a fixed interval, and CSV export. On top, a Python dashboard turns that output into per-kernel energy analysis. It hooks the Kokkos profiling interface, so application code is untouched.

Where it stands

The periodic-sampling daemon is merged into kokkos-tools, written up in an ORNL report, and presented as a poster, 'Understanding GPU Energy Dynamics in HPC Applications', at the 2025 Smoky Mountains Conference. The NVML and Variorum connectors are in review, with ROCm SMI sketched for AMD.

HPC for nuclear simulation

A three-year apprenticeship building and running the high-performance computing behind the simulation codes used in nuclear engineering.

The context

EDF's ASICS group develops the scientific computing that nuclear simulation depends on. As an apprentice there alongside my engineering degree, I work on the performance side: the numerical methods and the cluster infrastructure that let large simulations run and finish.

The work

It spans the stack a real HPC problem touches: production C++ on Linux clusters, from the numerics of the schemes themselves to the systems they run on, across a three-year apprenticeship. The specifics sit under industrial confidentiality; what carries over is the discipline of making demanding physics run correctly and fast on shared hardware.

An industrial apprenticeship. I describe only what is cleared for public mention.

Running my own production

A five-node Proxmox cluster, sentinel, hosting around 20 publicly reachable services on hardware I run and automate myself.

The setup

Five Proxmox nodes (cerberus, echelon, mikoshi, cynosure, ultron) with Ceph storage and a VyOS edge over a WireGuard uplink. One Traefik terminates Let's Encrypt TLS for around 20 services under kerboul.me: a Gitea forge, a Coolify PaaS, Nextcloud, a media stack, and the apps I deploy, including this site. The cluster's runbooks and automation are themselves a repo.

Why it's here

It is the DevOps and SRE half of the profile, and it is real: uptime, backups, certificate renewal, monitoring, and the unglamorous failure modes you only learn by being on call for your own infrastructure. The site you're reading ships to it through a CI/CD pipeline that builds a versioned image, scans it for vulnerabilities, and rolls back automatically on a failed health check.

sentinel, live

querying the cluster…

Polled live from the cluster's own Proxmox API.

Running events for 120+ players

Opération Endgame, a DCS World operation I founded in 2020 and run yearly: 120+ simultaneous players, 150+ registered this edition.

The other kind of systems

Opération Endgame is a large multiplayer DCS World operation I've organised yearly since 2020: briefing, coordination, and the logistics of moving 120+ simultaneous players (150+ registered this edition) through one coherent four-hour event, across pilots, JTAC, AWACS/GCI, and logistics. It is the soft-skills counterpart to the technical work: leadership, operations, and keeping a crowd aligned in real time.

Mapping the French DCS scene

A live directory of French-speaking DCS World communities I built and host, with stats and infographics on the scene.

What it is

Commus indexes the French-speaking DCS World communities, around 95 of them, with filtering, comparison, and a set of infographics: a periodic table of modules, a timeline, an activity pulse. A Vue front end I host, kept current by a small updater service. It is the data-and-interface counterpart to the leadership side of Opération Endgame.

/02 Expertise

By domain, each tied to proof

No skill bars. Five areas, and the project that demonstrates each.

Writing for the GPU and reasoning about what it costs, in time and now in energy.

  • CUDA
  • OpenMP & MPI
  • Kokkos & performance portability
  • GPU power & energy telemetry

Scientific computing

Proven by EDF · ASICS

The numerics underneath simulation: schemes, stability, and integrators that behave.

  • Finite-difference schemes
  • von Neumann stability
  • Symplectic & IMEX integrators
  • Quantum computing (coursework)

Infrastructure & DevOps

Proven by sentinel cluster

The full path from a commit to a request served, and the reliability work behind it, on hardware I'm accountable for.

  • Proxmox & Ceph
  • Kubernetes / K3s
  • Traefik, TLS & reverse proxy
  • Docker & Gitea CI/CD

Full-stack & real-time

Proven by commus

Interfaces and live systems, including the one rendering this page.

  • Vue 3 / Nuxt 3
  • TypeScript
  • Self-hosting & deployment
  • Astro

The defensive groundwork an infrastructure profile is expected to hold.

  • PKI & cryptography
  • Post-quantum
  • Zero Trust
  • NIS2 / DORA · ANSSI frameworks
/03 Trajectory

Polytech → EDF → Oak Ridge

Ethan Puyaubreau, High-performance computing & infrastructure engineer
Ethan Puyaubreau a.k.a. Kerboul · DaKerboul Paris, France

I work two tracks at once. One is high-performance computing: the GPU and numerical work that makes scientific code fast. The other is the infrastructure that puts software into production and keeps it there: containers, pipelines, reverse proxies, and the cluster underneath. The rarer and more useful thing is being credible at both.

At Oak Ridge National Laboratory I built GPU energy-measurement tooling for Kokkos, the US Department of Energy's performance-portability framework. The periodic-sampling daemon is merged upstream into Kokkos Tools, and the work became a poster at the 2025 Smoky Mountains Conference. It is what I would point a reviewer to first.

Alongside that I spent three years as an apprentice on HPC for nuclear simulation at EDF, and I run a five-node production cluster of my own: around twenty services behind Traefik and TLS, deployed with Docker and CI/CD, with image scanning and automatic rollback, and I am on call for the uptime, backups, and certificates. I have shipped scientific computing and operated real infrastructure, not just studied them.

I finish my engineering degree at Polytech Paris-Saclay in September 2026 and am open to roles from January 2027. HPC labs are a natural fit, the Bay Area (Berkeley Lab, LLNL) and Paris with the CEA among them, but I am just as interested in infrastructure, DevOps, SRE, and platform engineering, on-prem or in the cloud: I would rather a role use both halves of this page than only one. Off the clock I have flown Kerbal Space Program since 2011, and I care about self-hosting and owning my data, which is why this site and the cluster behind it run on hardware I keep myself. If your team needs someone who can make a GPU kernel fast and keep the production cluster that runs it healthy, I would like to hear from you.

/04 Contact

For recruiters, in one screen

Open to HPC, infrastructure & DevOps roles from January 2027

For HPC labs or infrastructure and platform teams, in the Bay Area or Paris.

The fastest way to reach me

ethan.puyaubreau@gmail.com