IT-DevOps

We are looking for motivated candidates, with proven experience in IT and Cloud systems management, to be part of a team dedicated to the design and implementation of high-performance computing and storage systems dedicated to supporting research projects.

It is intended that the candidate has knowledge and experience to manage, based on good practices, the configuration, maintenance, updating and monitoring of the calculation system at two levels:

  1. The HPC, Artificial Intelligence and Big Data hardware infrastructure, which includes local clusters of machines that use the latest GPU technology and other hardware environments for training, testing and inference of Deep Learning, distributed storage and CI / CD models.
  2. The software platform that provides applications and services that allow research work to be carried out efficiently, in addition to simplifying integration, maintenance and monitoring operations.

Other objectives will be linked to active participation in research projects as support and support to researchers in the implementation of emerging and innovative technologies.

Candidates must have:

  • Qualifications: Engineering or Degree in Computer Science, Telecommunications or Software Development.
  • Languages: Spanish and English.

We value candidates:

  • Experience in Linux environments (user management, scripting, service management, process monitoring and tuning)
  • Experience in network configuration (traffic monitoring in communication and security networks)
  • CI/CD tools: GitLab (devops tools, gitlab runners, CI/CD pipelines)
  • Containerization Technologies: Docker
  • Agile software development
  • Microservices and Orchestration Technologies: Kubernetes
  • Distributed storage systems (GPFS, Ceph, NAS configuration)
  • HPC Job Scheduling System: Slurm
  • IT monitoring: Prometheus and Grafana
  • TI logging and auditing: ELK stack
  • Experience in scanning CI/CD code and pipeline security, and optimizing vulnerability management.
  • Experience in HPC architectures, GPU servers, data-driven architectures, distributed storage
  • Experience with bare-metal virtualization solutions: Open Nebula, Proxmox, MAAS, OpenStack
  • Experience in iImplementation of Big Data and DB systems: Kafka, PostgreSQL, Spark, MongoDB, Cassandra
  • Knowledge of different cloud service providers and their service offerings (eg IaaS, PaaS): Amazon Web Services, Google Cloud Platform, Microsoft Azure
  •   MLOPs and AI workflow management tools: Airflow, Kubeflow, MLFlow, DVC etc.

Tasks and responsibilities:

  • Assess existing HW infrastructure (focused on GPU servers, file servers, and networks), identify needs, and participate in the system modernization design process
  • Plan future HW needs
  • Maintain, update and support internal HPC
  • Implement good practices in CI/CD and MLOps
  • Middleware development for MLOps

Provide support/consulting for the implementation of MLOps for third parties, in private or public clouds/clusters

Vicomtech

Parque Científico y Tecnológico de Gipuzkoa,
Paseo Mikeletegi 57,
20009 Donostia / San Sebastián (Spain)

+(34) 943 309 230

Edificio Ensanche,
Zabalgune Plaza 11,
48009 Bilbao (Spain)

close overlay