Nicholas Kilby - CV

Nicholas Kilby MCP

Platform & Infrastructure Engineering Leader

Shropshire

About Me

Modern, forward-thinking Platform & Infrastructure Engineer with over a decade of experience building and leading high-scale, high-availability systems across multi-site datacentres and hybrid cloud environments. I specialise in automation-first engineering, distributed platforms, and operational leadership — bringing a hands-on mindset paired with strategic oversight.

I’ve led infrastructure for a large self-hosted SaaS platform powering universities, police forces, and public bodies across Europe. My work combines deep technical capability (VMware, Kubernetes, Ansible, ZFS, distributed storage) with team leadership, vendor strategy, datacentre optimisation, and reliability practices.

I’m now seeking Platform Engineering Manager, Infrastructure Manager, SRE Manager, or General IT Manager roles where I can lead teams, own platform direction, and drive engineering excellence in a fast-moving environment.

What I’m Good At

Platform leadership Team mentoring Vendor strategy Distributed systems Datacentre operations Infrastructure as Code Automation Observability Hybrid cloud Performance optimisation ISO 27001

Career Highlights

Built and led large-scale platform infrastructure

Expanded and managed a 3k+ VM / 50-node vSphere estate across multiple datacentres and AWS, supporting public-sector and enterprise workloads.

Engineered distributed, self-healing systems

Delivered production Kubernetes, CockroachDB, Minio, ClickHouse and other clustered components powering mission-critical services.

Agentic development projects

  • Bespoke ZFS backup automation using snapshot intelligence and replication logic.
  • Adaptive storage orchestration tools built using Python and custom Ansible modules.
  • Automated datacentre decisioning workflows (load-aware and temperature-aware operations).

Cut datacentre cooling energy usage by up to 75%

Designed and built an innovative free-air cooling system to significantly reduce energy consumption during cooler periods — saving thousands annually and reducing environmental impact.

Introduced full observability

Defined and deployed a Prometheus, Grafana and Loki stack, establishing alerting, metrics, and real-time diagnostics across the platform.

Experience

Zengenti — Platform Engineer / Hosting Operations Engineer

Oct 2015 – Present · Ludlow

Lead platform engineer within a high-scale self-hosted SaaS environment, delivering infrastructure, reliability, and automation across multi-site datacentres.

Leadership & management

  • Technical lead for day-to-day infrastructure operations, mentoring engineers and aligning priorities with product and leadership teams.
  • ISO 27001 audit lead for the hosting team, implementing processes and controls.
  • Vendor-side lead for hardware strategy, negotiation, and lifecycle planning.

Engineering & delivery

  • Managed a multi-site 3k+ VM estate running VMware vSphere, Windows and Linux.
  • Architected and deployed clusters: Kubernetes, CockroachDB, Minio, ClickHouse.
  • Designed ZFS storage with replication, snapshot workflows, and agentic backup tooling.
  • Delivered Infrastructure as Code using Ansible, Packer and Python (including custom modules).
  • Owned end-to-end datacentre operations including power, cooling, capacity planning, and hardware optimisation.
  • Introduced full observability using Prometheus, Grafana and Loki, improving MTTR and transparency.
  • Reduced cooling energy consumption by up to 75% using free-air cooling engineering.

Kingspan Insulation — Technical Services Engineer

May 2015 – Oct 2015 · Pembridge

Provided global infrastructure support including oversight of Northern UK sites and remote support for US facilities. Gained international project experience and strengthened cross-cultural operations.

SimplyWISP Ltd — Director & Co-Founder

2013 – 2018

Founded and operated a rural broadband ISP providing wireless connectivity to underserved areas. Built the network, managed operations, handled budgets, and led customer delivery, gaining commercially focused leadership experience.

S&A Produce (UK) Ltd — IT Technician

Aug 2010 – May 2015 · Hereford

Delivered infrastructure, network, AD/Exchange, VPN, Wi-Fi, and ERP/MRP support to a £50m business across multiple UK sites.

Earlier Roles (2001–2010)

Technical Specialist, Planner, Administrator and Events Management roles providing broad operational and organisational experience.

Community & Leadership

Trustee — Ashford Carbonell Village Hall & Recreation Ground

2021 – 2025

Supporting governance, modernisation efforts and community digital strategy.

Technical Skills

Infrastructure: VMware vSphere, Windows Server, Ubuntu, Storage, Hardware

Distributed systems: CockroachDB, Minio, Kubernetes, ClickHouse

Automation: Ansible, Packer, Python, CI/CD

Networking: Juniper, Dell, Firewalls, PtP wireless

Observability: Prometheus, Grafana, Zabbix

Cloud: AWS, hybrid environments

Other: SQL, Git, datacentre design, capacity modelling, budgeting, team management