I am a senior AI infrastructure engineer specializing in large-scale GPU clusters and production AI systems. My work focuses on diagnosing and resolving performance bottlenecks in distributed training environments, including GPU locality, interconnect behavior, and network fabric dynamics at scale. I have led and advised teams on deploying and scaling AI infrastructure for reliability, cost efficiency, and predictable performance under heavy synchronization pressure. My experience spans on-prem and cloud environments supporting modern deep learning workloads, and my work has informed peer-reviewed research and industry best practices in AI systems engineering. I also am a Venture advisor, and have invested in and mentored over 15 US based startups in the AI and healthcare space.
...Principal Member of Technical Staff at AMD