About Me

December 2022 - Slides

An Introduction to Type Sytems and Typing tools in Python

The Python programming language has become the de facto standard for writing a vast majority of scientific code. While being highly general-purpose (unlike alternatives ... like matlab or R), Python is *interpreted* and *dynamically-typed*.

On the one hand, these properties have spurred the emergence of interactive Python-powered development toolkits such as IPython or Jupyter, making it easy for scientists and software engineers to explore data and write Python code in an iterative manner. On the other hand, these properties imply that the type safety of a program is not checked by a compiler, but by the programmer, the remaining uncaught type errors being raised at runtime. Detrimental consequences follow: the mental charge of writing code increases, refactoring becomes daunting, interactive development loses in fluidity.

Type Hints, introduced in Python by PEP 484, seek to reintroduce type safety to Python programs, through the concept of "type annotations". Type annotations are used in Python programs to declare the expected type of variables. These annotations do not affect runtime, but can be leveraged by static Python type checkers such as mypy or pyright to produce ahead-of-time warning messages describing the mismatch between a variable's type annotation and its runtime type as predicted by the checker.

On the one hand, type annotations can be adopted gradually, inducing a low barrier to entry. On the other hand, a holistic usage of type hints requires some foundational knowledge of type theory, a subfield of theoretical computer science which is not frequently mastered by domain scientists.

In this tutorial, we will learn how to use Python type annotations: we will go through the basic type-theory prerequisites that are necessary for an informed usage of such annotations. We will make a panorama of the current available type-related features in Python, through a series of basic illustrative examples. Finally, we will discuss the additional feature stemming from the integration of type annotations with popular code development toolchains (in particular Language Servers), leading to more informative completion and error checking.

Tutorial on Jax

Current modern scientific computing frameworks such as numpy or pytorch offer an attractive trade-off between ease of use and performance by exposing a set of ...highly optimized common numerical subroutines (implemented in low-level languages) via a high-level Python API. However, not much is done in these frameworks to optimize performance at a program level.

jax fills this gap by defining and implementing a system of function transformations, which can be operated on *pure and statically composed* (PSC) Python programs allowed to call a large set of numpy-like subroutines. These function transformations encompass in a unified manner operations as varied Just In Time (JIT) compilation, Automatic Differentiation, Vectorization and more, making programs written in jax both highly-optimized and often less complex than their pytorch counterpart.

In this tutorial, we will go through a high-level introduction to jax, discuss its most common features and how to write idiomatic jax code through a series of examples. Finally, we will also consider the potential limitations of jax, and in particular the "compiled language creep", e.g the additional structural constraints that jax-compatible programs must verify.

Large Scale Interactive Data Exploration and Experiments

Data Analysis, and as a consequence, many research tasks, are iterative and exploratory operations. The Jupyter Notebook is a celebrated programing environment that, in contrast ... to traditional code editors and file-based programming workflows, fits with the exploratory nature of data analysis.

While Jupyter Notebooks are now widely used for quick data wrangling on personal laptops, it is less common to use them when launching large scale experiments (or running large scale data analysis) on a cluster of computing resources.

In this talk, we will see how to run experiments from a SLURM cluster using Jupyter Notebooks alongside with Dask, a distributed computing library written in Python, allowing for large scale and iterative data analysis. This talk is mostly relevant for Python Programming.

Parallel Computing in Python: Current State and Recent Advances

Modern hardware is multi-core. Itis crucial for Python to provide high-performance parallelism. This talk will expose to both data-scientists and library developers the current state of affairs and the recent advances for parallel computing with Python. ...The goal is to help practitioners and developers to make better decisions on this matter.

I will first cover how Python can interface with parallelism, from leveraging external parallelism of C-extensions –especially the BLAS family– to Python’s multiprocessing and multithreading API. I will touch upon use cases, e.g single vs multi machine, as well as and pros and cons of the various solutions for each use case. Most of these considerations will be backed by benchmarks from the scikit-learn machine learning library. From these low-level interfaces emerged higher-level parallel processing libraries, such as concurrent.futures, joblib and loky (used by dask and scikit-learn) These libraries make it easy for Python programmers to use safe and reliable parallelism in their code. They can even work in more exotic situations, such as interactive sessions, in which Python’s native multiprocessing support tends to fail. I will describe their purpose as well as the canonical use-cases they address.