Recherchez une offre d'emploi

Artificial Intelligence - Large Language Models Optimizations For In-Memory Computing Hardware H/F - 38

Description du poste

CEA
Grenoble - 38
Stage
Publié le 23 Octobre 2025

Le CEA est un acteur majeur de la recherche, au service des citoyens, de l'économie et de l'Etat.

Il apporte des solutions concrètes à leurs besoins dans quatre domaines principaux : transition énergétique, transition numérique, technologies pour la médecine du futur, défense et sécurité sur un socle de recherche fondamentale. Le CEA s'engage depuis plus de 75 ans au service de la souveraineté scientifique, technologique et industrielle de la France et de l'Europe pour un présent et un avenir mieux maîtrisés et plus sûrs.

Implanté au coeur des territoires équipés de très grandes infrastructures de recherche, le CEA dispose d'un large éventail de partenaires académiques et industriels en France, en Europe et à l'international.

Les 20 000 collaboratrices et collaborateurs du CEA partagent trois valeurs fondamentales :

- La conscience des responsabilités
- La coopération
- La curiositéLarge Language Models (LLMs), such as ChatGPT, have led to a new AI revolution with applications in every domain. However, LLMs are very resource-consuming (energy, compute...) and, hence, an important line of research focuses on optimizing these models. Existing open-source tool chains, such as LLM Compressor [1] and OpenVINO [2], enable almost-automatic optimizations to compress LLMs into smaller ones by, e.g., quantization and pruning. However, they only target conventional hardware, such as GPUs. New hardware paradigms, such as In-Memory Computing (IMC) are promising to accelerate and reduce the energy consumption of LLMs [3]. However, running LLMs on such hardware requires specific optimizations due to the characteristics of these hardware. For instance, they require extreme quantization of the model (reducing the number of bits on which data, weigths and activations are encoded), because analogue IMC fabric has a limited number of bits, and optimizing the robustness of the model, because IMC computations are prone to errors. Nevertheless, software tools and methods for mapping state-of-the-art LLMs on these hardware platforms lag behind.
This internship aims at putting together a software infrastructure for mapping, simulating and exploring the performance of LLMs on IMC hardware, starting with existing open-source tools chains and integrating functionalities dedicated to IMC hardware, such as quantization and error models. The student will be integrated within a multidisciplinary team of research engineers, PhDs, PostDocs and interns, at the heart of an ecosystem of industrial and academic partners in the world of embedded AI. He/she will have access to supercomputers infrastructure. He/she will benefit from increased expertise in LLMs, compression methods, and efficient hardware for AI. Leveraging the tools and knowledge developed during the internship, the student could be offered the opportunity to pursue a PhD on compression methods for LLMs.
[1] https://github.com/vllm-project/llm-compressor [2] https://github.com/openvinotoolkit/openvino [3] Analog Foundation Models, Büchel et al, NeurIPS 2025.