Machine Learning Platform Engineer

Full Time

Remote

Posted 11 months ago

Job Summary:
We are looking for a talented and experienced Machine Learning Platform Engineer to join our team. The ideal candidate will be responsible for designing, implementing, and maintaining scalable infrastructure and tools to support our machine learning initiatives. You will collaborate with data scientists, software engineers, and other stakeholders to deploy and manage machine learning models in production environments, ensuring reliability, scalability, and performance.

Responsibilities:

Design, develop, and maintain scalable machine learning infrastructure and platforms to support model training, deployment, and monitoring.
Collaborate with data scientists and software engineers to implement machine learning workflows and pipelines, including data preprocessing, feature engineering, model training, and evaluation.
Build and deploy containerized machine learning applications using frameworks such as Docker and Kubernetes, and manage orchestration and scaling in cloud environments.
Develop and maintain CI/CD pipelines for automating the deployment and testing of machine learning models and applications.
Implement monitoring and logging solutions to track model performance, data quality, and system health, and proactively identify and address issues.
Ensure security and compliance requirements are met in machine learning systems, including data privacy, access control, and regulatory standards.
Collaborate with DevOps and IT teams to integrate machine learning infrastructure with existing systems and services, and optimize resource utilization and cost efficiency.
Stay abreast of emerging technologies and best practices in machine learning, distributed systems, and cloud computing, and evaluate their potential impact on our platform.
Provide technical guidance and mentorship to junior engineers, and contribute to knowledge sharing and collaboration within the team.
Participate in code reviews, design discussions, and cross-functional projects to drive innovation and continuous improvement.
Requirements:

Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or a related field.
2+ years of experience in software engineering, with a focus on building scalable, distributed systems and infrastructure.
Strong programming skills in languages such as Python, Java, or Go, and experience with software development frameworks and tools (e.g., TensorFlow, PyTorch, scikit-learn, Apache Spark).
Proficiency in cloud computing platforms such as AWS, Azure, or Google Cloud, and experience with containerization and orchestration technologies (e.g., Docker, Kubernetes).
Solid understanding of machine learning concepts and techniques, and experience deploying and managing machine learning models in production environments.
Familiarity with DevOps practices and tools (e.g., Git, Jenkins, Terraform) for automation, configuration management, and infrastructure as code.
Excellent problem-solving skills and attention to detail, with the ability to troubleshoot complex issues and optimize system performance.
Strong communication and collaboration skills, with the ability to work effectively in a cross-functional team environment.
Experience with big data technologies (e.g., Spark) and streaming data processing frameworks (e.g., Kafka) is a plus.
Certifications in cloud computing (e.g., AWS Certified Solutions Architect, Google Cloud Professional Data Engineer) and machine learning (e.g., TensorFlow Developer Certificate) are desirable.

Apply For This Job