Machine Learning Engineer Intern

GMI cloud logo

GMI cloud

About Us

At GMI, we are at the forefront of scalable AI infrastructure solutions. Our platforms power state-of-the-art machine learning, enabling cutting-edge applications in the generative AI domain. As a fast-moving and innovative team, we thrive on leveraging open-source solutions and industry best practices to deliver robust, high-performance AI systems for our clients.

About the Role

We are seeking a Software Engineering Intern who will focus on adapting and optimizing open-source foundation models for our GPU inference platform. You will work closely with experienced engineers and AI researchers, gaining hands-on exposure to large-scale model deployment techniques. This is an opportunity to build valuable skills in model optimization, GPU acceleration, and systems-level engineering while contributing to the next generation of AI-powered products.

Key Responsibilities

  • Model Adaptation & Integration: Adapt open-source foundation models (e.g., LLMs, vision transformers, multimodal models) to run efficiently on our custom GPU inference infrastructure
  • Performance Optimization: Identify bottlenecks in model inference pipelines, implement GPU kernels, and optimize code to reduce latency and improve throughput
  • Platform Tooling & Automation: Develop scripts and tooling for automating model conversion, quantization, and configuration processes to streamline deployment workflows
  • Testing & Validation: Implement benchmarking tests and validation suites to ensure model accuracy, reliability, and performance meet internal standards
  • Collaboration with Cross-Functional Teams: Work closely with machine learning researchers, MLOps engineers, and infrastructure teams to refine performance strategies and ensure smooth integration of foundation models into production environments
  • Documentation & Knowledge Sharing: Document adaptation procedures, best practices, and lessons learned. Contribute to internal knowledge bases and present findings in team meetings

Qualifications

  • Educational Background: Currently pursuing a Graduate degree in Computer Science, Electrical Engineering, or a related technical field
  • Programming Skills: Proficiency in Python and familiarity with go and CUDA is a plus
  • Foundational Knowledge in Machine Learning: Understanding of attention-based models, PyTorch, and GPU-accelerated computing
  • Problem-Solving Mindset: Strong analytical skills, with the ability to troubleshoot performance issues and propose innovative optimization strategies
  • Team Player: Excellent communication skills, eagerness to learn, and the ability to collaborate effectively with diverse teams

What You’ll Gain

  • Real-world exposure to large-scale, production-grade AI deployments
  • Hands-on experience with state-of-the-art models and GPU acceleration techniques
  • Mentorship from experienced engineers and researchers
  • Opportunities to impact performance-critical aspects of cutting-edge AI products

If you’re passionate about AI systems engineering and excited to work at the intersection of machine learning and high-performance computing, we encourage you to apply!

Location

    Mountain View, CA

Job type

  • Internship

Role

Engineering

Keywords

  • LLMs
  • ON-SITE
  • Internship