IT-DevOps

All offices

IT-DevOps

We are looking for motivated candidates, with proven experience in IT and Cloud systems management, to be part of a team dedicated to the design and implementation of high-performance computing and storage systems dedicated to supporting research projects.

It is intended that the candidate has knowledge and experience to manage, based on good practices, the configuration, maintenance, updating and monitoring of the calculation system at two levels:

The HPC, Artificial Intelligence and Big Data hardware infrastructure, which includes local clusters of machines that use the latest GPU technology and other hardware environments for training, testing and inference of Deep Learning, distributed storage and CI / CD models.
The software platform that provides applications and services that allow research work to be carried out efficiently, in addition to simplifying integration, maintenance and monitoring operations.

Other objectives will be linked to active participation in research projects as support and support to researchers in the implementation of emerging and innovative technologies.

Candidates must have:

Qualifications: Engineering or Degree in Computer Science, Telecommunications or Software Development.
Languages: Spanish and English.

We value candidates:

Experience in Linux environments (user management, scripting, service management, process monitoring and tuning)
Experience in network configuration (traffic monitoring in communication and security networks)
CI/CD tools: GitLab (devops tools, gitlab runners, CI/CD pipelines)
Containerization Technologies: Docker
Agile software development
Microservices and Orchestration Technologies: Kubernetes
Distributed storage systems (GPFS, Ceph, NAS configuration)
HPC Job Scheduling System: Slurm
IT monitoring: Prometheus and Grafana
TI logging and auditing: ELK stack
Experience in scanning CI/CD code and pipeline security, and optimizing vulnerability management.
Experience in HPC architectures, GPU servers, data-driven architectures, distributed storage
Experience with bare-metal virtualization solutions: Open Nebula, Proxmox, MAAS, OpenStack
Experience in iImplementation of Big Data and DB systems: Kafka, PostgreSQL, Spark, MongoDB, Cassandra
Knowledge of different cloud service providers and their service offerings (eg IaaS, PaaS): Amazon Web Services, Google Cloud Platform, Microsoft Azure
MLOPs and AI workflow management tools: Airflow, Kubeflow, MLFlow, DVC etc.

Tasks and responsibilities:

Assess existing HW infrastructure (focused on GPU servers, file servers, and networks), identify needs, and participate in the system modernization design process
Plan future HW needs
Maintain, update and support internal HPC
Implement good practices in CI/CD and MLOps
Middleware development for MLOps

Provide support/consulting for the implementation of MLOps for third parties, in private or public clouds/clusters

IT-DevOps

Closed offer