Manuel Villanueva

Experience

CloudOps Analyst (AI & Security)

Pragma

September 2024 - Present

Hybrid, Panama City

Developed ETL and RAG pipelines for +1,000 internal documents, boosting retrieval ranking quality (nDCG +18%)
Lead AI Red Teaming for a GRC Platform with clients in over 110 countries, 50% increase in successful guardrails activations during adversarial attacks.
Reduced Time To First Token by 64.7% for Legal & Compliance Applications.
Designed internal platform for Labeled Datasets generation used for Supervised Learning

Site Reliability Engineer, Intern

BAC

January 2024 - August 2024

Panama City

Implemented SLO/SLIs, effectively managing over 1600 L1-L6 monitoring metrics for infrastructure and services, resulting in improved system reliability
Achieved a 50% reduction in alerts and a 75.2% reduction in down hosts over 6 months by optimizing monitoring strategies.
Resolved SNMPv3 issues in multiple VMware ESXi clusters related to misconfiguration, enhancing network stability and performance
Created 5 Observability Dashboards for NOC to visualize Data Center and VPN traffic.

DevOps Engineer, Intern

Triangulum S.A.

January 2023 - February 2023

San José, Costa Rica

Implemented AI best practices for DevOps/SRE, which decreased Time To Production (TPP) by 20%
Took part of DOS Attacks Incident Response.
Refactored K8S Clusters from Docker to Podman.

Publications

Guide for Preparing and Responding to Deepfake Events ↗

(Co-authored with OWASP Members)

Defined deepfake incident-response

playbooks with detection, containment, eradication, and recovery workflows for fraud, impersonation, and misinformation scenarios.

GenAI Red Teaming Guide ↗

(Co-authored with OWASP Members)

Defined deepfake incident-response

Designed an adversarial testing blueprint (Model, Implementation, System, Runtime) integrated into CI/CD to fortify AI services against attacks. 2nd most downloaded OWASP Top 10 for LLMs White paper.

LLM & GenAI Data Security Best Practices ↗

(Co-authored with OWASP Members)

Defined defense measures including layered

encryption, SIEM/XDR monitoring, secure data flows for agentic LLMs and governance frameworks to ensure integrity and reliability of AI data pipelines.

Skills

Systems, Security & Observability

Linux
MacOS
eBPF
DNS
BGP
PKI
RoCE, Infiniband
ELK Stack
Prometheus, Grafana
Datadog

Programming & Automation

C++
Cmake
Python
FastAPI
Go
Swift
SQL
Terraform
Ansible
Jenkins

SystemsCloud & Container Orchestration

AWS (EC2, S3, IAM, Bedrock, SageMaker)
GCP (BigQuery, VertexAI)
Azure (AI Foundry, AzureML)
Docker
Podman
Kubernetes
Cilium
FluxCD

AI/ML & Data Platforms

PyTorch
MLX
CUDA
RAG Pipelines
Model Quantitization
LoRA / QLoRA
RLHF
Apache Spark
Databricks

Early Career SRE turned ML Engineer working on Tools and AI/ML Infrastructure.Currently contributing to Apple’s MLX.

Github

↗

manuel@villanuev.com

↗

X / Twitter

↗

Substack

↗

Experience

CloudOps Analyst (AI & Security)

Pragma

September 2024 - Present

Hybrid, Panama City

Developed ETL and RAG pipelines for +1,000 internal documents, boosting retrieval ranking quality (nDCG +18%)
Lead AI Red Teaming for a GRC Platform with clients in over 110 countries, 50% increase in successful guardrails activations during adversarial attacks.
Reduced Time To First Token by 64.7% for Legal & Compliance Applications.
Designed internal platform for Labeled Datasets generation used for Supervised Learning

Site Reliability Engineer, Intern

BAC

January 2024 - August 2024

Panama City

Implemented SLO/SLIs, effectively managing over 1600 L1-L6 monitoring metrics for infrastructure and services, resulting in improved system reliability
Achieved a 50% reduction in alerts and a 75.2% reduction in down hosts over 6 months by optimizing monitoring strategies.
Resolved SNMPv3 issues in multiple VMware ESXi clusters related to misconfiguration, enhancing network stability and performance
Created 5 Observability Dashboards for NOC to visualize Data Center and VPN traffic.

DevOps Engineer, Intern

Triangulum S.A.

January 2023 - February 2023

San José, Costa Rica

Implemented AI best practices for DevOps/SRE, which decreased Time To Production (TPP) by 20%
Took part of DOS Attacks Incident Response.
Refactored K8S Clusters from Docker to Podman.

Projects

Apple MLX ↗• Contributor adding CUDA support to MLX, working on FFT CUDA Backend.

• Modified mx.sort() behavior in CPU and Metal Backends to match NaN Sorting behavior based on IEEE-754.

• Added QoL Improvements such as #2689 (Einsum Error Msg Improvement) and #2690 (Fixed Type Annotations in Stubs).

Applied AI/ML for Space Biology ↗• Processed and integrated multi-omics datasets from NASA OSDR (RNA-seq, microscopy, tonometry, microCT) via programmatic API access, handling 23K+ features across multiple experimental conditions.• Applied SHAP to Random Forest and Logistic Regression models, identifying top predictive genes (e.g., Cryab, Col4a1) for ocular phenotype prediction.• Implemented feature selection pipeline using DESeq2, reducing dimensionality from 23,419 to 353-600 highly informative genes.

Apple Silicon RDMA Simulation Testbed

• Deployed K3s Cluster on LimaVM on a M4 Mini, using it to run AI/ML Workloads

• Enabled Soft-RoCE to simulate RDMA behavior, set up Cilium Hubble for eBPF enabled observability of networking.

Gear9 ↗

• Designed a Digital Pit wall which works with F1 23 Video game, streaming the game data via UDP.

• Created Data Streaming Pipeline hosted on AWS for real time data during racing.

• Data stored and collected on Prometheus, with all Data Visualization on Grafana.

Publications

Guide for Preparing and Responding to Deepfake Events ↗

(Co-authored with OWASP Members)

Defined deepfake incident-response

playbooks with detection, containment, eradication, and recovery workflows for fraud, impersonation, and misinformation scenarios.

GenAI Red Teaming Guide ↗

(Co-authored with OWASP Members)

Defined deepfake incident-response

Designed an adversarial testing blueprint (Model, Implementation, System, Runtime) integrated into CI/CD to fortify AI services against attacks. 2nd most downloaded OWASP Top 10 for LLMs White paper.

LLM & GenAI Data Security Best Practices ↗

(Co-authored with OWASP Members)

Defined defense measures including layered

encryption, SIEM/XDR monitoring, secure data flows for agentic LLMs and governance frameworks to ensure integrity and reliability of AI data pipelines.

Education

Technological University of Panama

BSc. in Computer Science

& Systems Engineering

Skills

Systems, Security & Observability

Linux
MacOS
eBPF
DNS
BGP
PKI
RoCE, Infiniband
ELK Stack
Prometheus, Grafana
Datadog

Programming & Automation

C++ • Cmake
Python • FastAPI
Go
Swift
SQL
Terraform
Ansible
Jenkins

SystemsCloud & Container Orchestration

AWS (EC2, S3, IAM, Bedrock, SageMaker)
GCP (BigQuery, VertexAI)
Azure (AI Foundry, AzureML)
Docker
Podman
Kubernetes
Cilium
FluxCD

AI/ML & Data Platforms

PyTorch
MLX
CUDA
RAG Pipelines
Model Quantitization
LoRA / QLoRA
RLHF
Apache Spark
Databricks

Certifications

NVIDIA Certified Professional - AI Infrastructure (NPC-AII) ↗

AWS Certified AI Practitioner (Early Adopter) ↗

Advanced Deep Learning Specialist (IBM) ↗

Al Red Teaming Professional (AIRTP+) (LearnPrompting) ↗

NASA TOPS-T ScienceCore AI/ML in Space Biology (NASA Ames Research Center)

Experience

CloudOps Analyst (AI & Security)

Pragma

September 2024 - Present

Hybrid, Panama City

Developed ETL and RAG pipelines for +1,000 internal documents, boosting retrieval ranking quality (nDCG +18%)
Lead AI Red Teaming for a GRC Platform with clients in over 110 countries, 50% increase in successful guardrails activations during adversarial attacks.
Reduced Time To First Token by 64.7% for Legal & Compliance Applications.
Designed internal platform for Labeled Datasets generation used for Supervised Learning

Site Reliability Engineer, Intern

BAC

January 2024 - August 2024

Panama City

Implemented SLO/SLIs, effectively managing over 1600 L1-L6 monitoring metrics for infrastructure and services, resulting in improved system reliability
Achieved a 50% reduction in alerts and a 75.2% reduction in down hosts over 6 months by optimizing monitoring strategies.
Resolved SNMPv3 issues in multiple VMware ESXi clusters related to misconfiguration, enhancing network stability and performance
Created 5 Observability Dashboards for NOC to visualize Data Center and VPN traffic.

DevOps Engineer, Intern

Triangulum S.A.

January 2023 - February 2023

San José, Costa Rica

Implemented AI best practices for DevOps/SRE, which decreased Time To Production (TPP) by 20%
Took part of DOS Attacks Incident Response.
Refactored K8S Clusters from Docker to Podman.

Publications

Guide for Preparing and Responding to Deepfake Events ↗

(Co-authored with OWASP Members)

Defined deepfake incident-response

playbooks with detection, containment, eradication, and recovery workflows for fraud, impersonation, and misinformation scenarios.

GenAI Red Teaming Guide ↗

(Co-authored with OWASP Members)

Defined deepfake incident-response

Designed an adversarial testing blueprint (Model, Implementation, System, Runtime) integrated into CI/CD to fortify AI services against attacks. 2nd most downloaded OWASP Top 10 for LLMs White paper.

LLM & GenAI Data Security Best Practices ↗

(Co-authored with OWASP Members)

Defined defense measures including layered

encryption, SIEM/XDR monitoring, secure data flows for agentic LLMs and governance frameworks to ensure integrity and reliability of AI data pipelines.

Skills

Programming & Automation

C++ • CMake
Python • FastAPI
Go
Swift
SQL
Terraform
Ansible
Jenkins

Systems, Security & Observability

Linux
MacOS
eBPF
DNS
BGP
PKI
RoCE, Infiniband
ELK Stack
Prometheus, Grafana
Datadog

AI/ML & Data Platforms

PyTorch
MLX
CUDA
RAG Pipelines
Model Quantitization
LoRA / QLoRA
RLHF
Apache Spark
Databricks

SystemsCloud & Container Orchestration

AWS (EC2, S3, IAM, Bedrock, SageMaker)
GCP (BigQuery, VertexAI)
Azure (AI Foundry, AzureML)
Docker
Podman
Kubernetes
Cilium
FluxCD