Machine Learning Engineer | Python | Pytorch | Distributed Training | Optimisation | GPU | Hybrid, San Jose, CA Job at Enigma, San Jose, CA

bVluSDA1NjYrSTd2eDhHak9IeHp4VUk0Umc9PQ==
  • Enigma
  • San Jose, CA

Job Description

Machine Learning Engineer | Python | Pytorch | Distributed Training | Optimisation | GPU | Hybrid, San Jose, CA

Title: Machine Learning Engineer

Location: San Jose, CA

Responsibilities:

  • Productize and optimize models from Research into reliable, performant, and cost-efficient services with clear SLOs (latency, availability, cost).
  • Scale training across nodes/GPUs (DDP/FSDP/ZeRO, pipeline/tensor parallelism) and own throughput/time-to-train using profiling and optimization.
  • Implement model-efficiency techniques (quantization, distillation, pruning, KV-cache, Flash Attention) for training and inference without materially degrading quality.
  • Build and maintain model-serving systems (vLLM/Triton/TGI/ONNX/TensorRT/AITemplate) with batching, streaming, caching, and memory management.
  • Integrate with vector/feature stores and data pipelines (FAISS/Milvus/Pinecone/pgvector; Parquet/Delta) as needed for production.
  • Define and track performance and cost KPIs; run continuous improvement loops and capacity planning.
  • Partner with ML Ops on CI/CD, telemetry/observability, model registries; partner with Scientists on reproducible handoffs and evaluations.

Educational Qualifications:

  • Bachelors in computer science, Electrical/Computer Engineering, or a related field required; Master’s preferred (or equivalent industry experience).
  • Strong systems/ML engineering with exposure to distributed training and inference optimization.

Industry Experience:

  • 3–5 years in ML/AI engineering roles owning training and/or serving in production at scale.
  • Demonstrated success delivering high-throughput, low-latency ML services with reliability and cost improvements.
  • Experience collaborating across Research, Platform/Infra, Data, and Product functions.

Technical Skills:

  • Familiarity with deep learning frameworks: PyTorch (primary), TensorFlow.
  • Exposure to large model training techniques (DDP, FSDP, ZeRO, pipeline/tensor parallelism); distributed training experience a plus
  • Optimization: experience profiling and optimizing code execution and model inference: (PTQ/QAT/AWQ/GPTQ), pruning, distillation, KV-cache optimization, Flash Attention
  • Scalable serving: autoscaling, load balancing, streaming, batching, caching; collaboration with platform engineers.
  • Data & storage: SQL/NoSQL, vector stores (FAISS/Milvus/Pinecone/pgvector), Parquet/Delta, object stores.
  • Write performant, maintainable code
  • Understanding of the full ML lifecycle: data collection, model training, deployment, inference, optimization, and evaluation.

Machine Learning Engineer | Python | Pytorch | Distributed Training | Optimisation | GPU | Hybrid, San Jose, CA

Job Tags

Similar Jobs

Fleet Farm

Instructional Designer Job at Fleet Farm

Are you a creative learning professional who loves turning complex business needs into engaging training experiences? Fleet Farm is looking for a talented Instructional Designer to develop innovative learning programs for our stores, distribution centers and corporate ...

SynapseTBI - Traumatic Brain Injury Testing & Rehab

Physical Therapist or Chiropractor Job at SynapseTBI - Traumatic Brain Injury Testing & Rehab

Position Overview We are seeking a licensed Chiropractor or Physical Therapist with at least one year of hands-on clinical experience, preferably working with patients recovering from traumatic brain injury (TBI) or post-concussive conditions. The ideal candidate will...

Floor & Decor

Construction Business Administrator Job at Floor & Decor

 ...Let's build what's next. Your Work Matters Build more than budgetsbuild confidence. As the Construction Business Administrator at Floor & Decor, youll be the go-to expert behind the number, ensuring every construction project is financially sound, accurate... 

Miller Bros. Const., Inc.

Heavy Equipment Mechanic Job at Miller Bros. Const., Inc.

 ...Bros. Construction, Inc. is a family-owned heavy civil construction company with over...  ...seeking a highly skilled and experienced Mechanic to join our Central Ohio operations. The...  ...mechanical expertise, particularly with diesel engines, hydraulics, and electrical systems... 

Comrise

Dispatch Specialist (NO EXPERIENCE NEEDED)- Fulltime and Onsite - Portland, Oregon Job at Comrise

 ...activities. Qualifications & Skills: HS diploma or equivalent required; associate or bachelors degree is a plus. Proven experience in a dispatch, logistics, or fleet coordination role. Exceptional communication and interpersonal skills, with the ability to...