What’s new for Python in 2025? | R bloggers

What’s new for Python in 2025? | R bloggers

10 minutes, 57 seconds Read



Python 3.14 was released on October 7, 2025. Here we summarize some of the more interesting changes and some trends in Python development and data science over the past year. We will emphasize the following:

  • the colorful Python command-line interface;
  • project management tool uv;
  • free threading;
  • and a brief summary of other developments.

The Release notes for Python 3.14
also describe the changes to base Python.

Colorful REPL

At Jumping Rivers we have taught many people to program in Python. During a programming career, you become accustomed to making and learning from mistakes. The most common mistakes in introductory programming classes may still trip you up a decade from now: unmatched parentheses, typos, missing quotes, unimported dependencies.

Our Python training courses are delivered using
Jupiter. Jupyter notebooks feature syntax highlighting, making it easy to identify an uncompleted string or misspelled keyword.

But most Python students don’t use Jupyter (or other high-level programming tools) on day one – they experiment with Python on the command line. You can type “python” into your shell/terminal window and start programming in the “REPL” (read-evaluate-print loop).

Any effort to make the REPL easier to work with will be beneficial to novice programmers. So the introduction of syntax highlighting in Python 3.14 REPL is really helpful.

uv and package development

One of the big trends in Python development in 2025 is the rise of the project management tool
uv. This is a Rust-based command-line tool and can be used to initialize a package/project structure, specify a project’s development and runtime environment, and publish a package to PyPI.

At Jumping Rivers we used poetry for many of the jobs that uv
excels at. Python is used for the data preparation tasks for diffify.com, and we use
poetry to ensure that our developers use exactly the same package versions when working on that project (see our current
blog series about poetry). But,
poetry does not prevent developers from using different versions of Python. For that we need a second tool, like
pyenv (which allows you to switch between different Python versions) or whether each developer has the same Python version installed on their machine.

uv goes one step further poetry and allows us to pin Python versions for a project. Let’s use uv to install Python 3.14 so we can test features in the new release.

First follow the
instructions for installing uv.

Next we will use on the command line uv to create a new project in which we will use Python 3.14.

# [bash]
cd ~/temp
mkdir blog-py3.14
cd blog-py3.14

# Which versions of Python 3.14 are available via uv?
uv python list | grep 3.14
# cpython-3.14.0rc2-linux-x86_64-gnu 
# cpython-3.14.0rc2+freethreaded-linux-x86_64-gnu 

You will see something similar regardless of the operating system you are using. There are two versions of Python 3.14 – one with an optional system called “Free Threading” (see later). We install both versions of Python:

uv python install cpython-3.14.0rc2-linux-x86_64-gnu
uv python install cpython-3.14.0rc2+freethreaded-linux-x86_64-gnu

Users of pyenv will be able to install Python 3.14 in a similar manner.

We can choose between the two different Python versions on the command line. First use the version that doesn’t have free threading:

uv run --python=3.14 python
# Python 3.14.0rc2 (main, Aug 18 2025, 19:19:22) [Clang 20.1.4 ] on linux
# ...
>>> import sys
>>> sys._is_gil_enabled()
# True

Then use the version with free thread (note the t suffix)

uv run --python=3.14t python
# ...
# Python 3.14.0rc2 free-threading build (main, Aug 18 2025, 19:19:12) [Clang 20.1.4 ] on linux
# ...
>>> import sys
>>> sys._is_gil_enabled()
# False

Project creation and management with uv

uv is capable of much more than letting us switch between different versions of Python. The following commands initialize a Python project with uv:

# From ~/temp/blog-py3.14

# Indicate the default python version for the project
uv python pin 3.14

# Initialise a project in the current directory
uv init .

# Check the Python version
uv run python --version
# Python 3.14.0rc2

This adds some files for project metadata (pyproject.toml, README.md) and version control:

tree -a -L 1
# .
# ├── .git
# ├── .gitignore
# ├── main.py
# ├── pyproject.toml
# ├── .python-version
# ├── README.md
# ├── uv.lock
# └── .venv
#
# 2 directories, 6 files

Now we can add package dependencies using uv add and other standard project management tasks. But one thing I wanted to emphasize is that uv allows us to start a Jupyter notebook using the project’s Python interpreter, without adding anything jupyter as a dependency or explicitly defining a kernel for it jupyter:

uv run --with jupyter jupyter lab

Creating a new notebook with the standard Python 3 kernel in the
JupyterLab session that starts should ensure that you are using the currently running Python 3.14 environment.

Threading

Python 3.13 introduced an experimental feature, ‘Free-threading’, which is now officially supported as of 3.14.

But first: what is a ‘thread’? When a program is running on your computer, there are many different tasks going on. Some of these tasks can be performed independently of each other. As a programmer, you may have to explain to the computer which tasks can be performed independently. A thread is a way to delineate one of those tasks; it’s a way of telling the computer your software is running this task here can be performed separately those tasks thereand the logic for running
this task at. (In principle).

Python has allowed developers to define threads for some time now. If you have a number of tasks that are largely independent of each other, each of these tasks can run in a separate thread. Threads access the same memory space, which means they can access and modify shared variables in a Python session. In general, this also means that a computation in one thread may update a value used by another thread, or two different threads may perform conflicting updates on the same variable. This freedom can lead to bugs. The CPython interpreter was originally written with a locking mechanism (the Global Interpreter Lock, GIL) that prevented several threads from running simultaneously (even when multiple processors were available) and limited the scope of these bugs.

Traditionally you would have used threads for “non-CPU bound tasks” in Python. These are the types of tasks that are not affected if more or faster processors are available to the Python instance: network traffic, file access, waiting for user input. For CPU-bound tasks such as computation and data processing, you could use Python’s ‘multiprocessing’ library (although some libraries such as ‘numpy’ have their own low-level mechanisms for splitting work across cores). This launches multiple Python instances, each of which performs some of the processing, and allows a workload to be distributed across multiple processors.

The main other differences between threading and multiprocessing in Python lie in memory and data management. With threading you have a single Python instance, with each thread accessing the same memory space. With multiprocessing you have multiple Python instances that work independently: the instances do not share memory, so to distribute a workload using multiprocessing Python must send copies of (subsets of) your data to the new instances. This may mean storing two or more copies of a large data set in memory if you are using multiprocessing on it.

Concurrent processing between threads sharing memory space is now possible using the free-threaded build of Python. Many third-party packages have been rewritten to accommodate this new build, and you can learn more about free-threading and the progress of the changes in the
“Python Free-Threading Guide”.

As a simple example, let’s look at natural language processing. There is a wonderful blog post about parallel processing with the
nltk package on the
“WZB Data Science Blog”. We’ll extend that example to use free-threading.

ntlk provides access to some of the
Project Gutenberg books, and we can access this data as follows:

# main.py
import nltk

def setup():
 nltk.download("gutenberg")
 nltk.download("punkt_tab")
 nltk.download('averaged_perceptron_tagger_eng')
 corpus = { f_id: nltk.corpus.gutenberg.raw(f_id)
 for f_id in nltk.corpus.gutenberg.fileids()
 }
 return corpus

corpus = setup()

The key-value pairs arrive corpus are the abbreviated book title and contents for 18 books. For example:

corpus["austen-emma.txt"]
# [Emma by Jane Austen 1816]
#
# VOLUME I
#
# CHAPTER I
#
#
# Emma Woodhouse, handsome, clever, and rich, with a comfortable home ...

A standard part of a word processing workflow is tokenizing and tagging the “parts-of-speech” (POS) in a document. We can do this with two nltk
functions:

# main.py ... continued
def tokenise_and_pos_tag(doc):
 return nltk.pos_tag(nltk.word_tokenize(doc))

A function can be written to sequentially tokenize the contents of a corpus of books and tag them with a POS:

# main.py ... continued
def tokenise_seq(corpus):
 tokens = {
 f_id: tokenise_and_pos_tag(doc)
 for f_id, doc in corpus.items()
 }
 return tokens

You must install or build Python in some way to use “Free-threaded” Python. In the above we have installed Python “3.14t” using
uvso we can compare the speed of free-threaded and sequential, single-core processing.

We will use the
timeit package to analyze processing speed, from the command line.

# Activate the threaded version of Python 3.14
uv python pin 3.14t

# Install the dependencies for our main.py script
uv add timeit nltk

# Time the `tokenise_seq()` function
# -- but do not time any setup code...
PYTHON_GIL=0 \
 uv run python -m timeit \
 --setup "import main; corpus = main.setup()" \
 "main.tokenise_seq(corpus)"

# [lots of output messages]
# 1 loop, best of 5: 53.1 sec per loop

After some initial steps where the nltk datasets have been downloaded and the
corpus object was created (neither was timed, as these steps were part of the timeit --setup block), tokenise_seq(corpus) was run several times and the highest speed was around 53 seconds.

A small note: we used the environment variable PYTHON_GIL=0 here. This makes it explicit that we are using free-threading (disabling GIL). This would normally not be necessary to take advantage of free-threading (in Python “3.14t”), but was necessary because one of the dependencies on nltk has not yet been validated for the free-threaded build.

To write a threaded version of it, we introduce two functions. The first is a helper that takes (filename, document-content) pairs and returns (filename, processed-document) pairs:

def tupled_tokeniser(pair):
 file_id, doc = pair
 return file_id, tokenise_and_pos_tag(doc)

The second function creates a Thread pool, using as many CPUs as are available on my machine (16, counted per multiprocessing.cpu_count()). Each document is processed as a separate thread and we wait for all documents to be processed before returning the results to the caller:

import multiprocessing as mp
from concurrent.futures import ThreadPoolExecutor, wait
# ...
def tokenise_threaded(corpus):
 with ThreadPoolExecutor(max_workers=mp.cpu_count()) as tpe:
 try:
 futures = [
 tpe.submit(tupled_tokeniser, pair)
 for pair in corpus.items()
 ]
 wait(futures)
 finally:
 # output is a list of (file-id, data) pairs
 tokens = [f.result() for f in futures]
 return tokens

# Time the `tokenise_threaded()` function
# -- but do not time any setup code...
PYTHON_GIL=0 \
 uv run python -m timeit \
 --setup "import main; corpus = main.setup()" \
 "main.tokenise_threaded(corpus)"
# [lots of output messages]
# 1 loop, best of 5: 32.5 sec per loop

I could see that every core was used in processing the documents, using the
htop tool on Ubuntu. At certain points during the run, each of the 16 CPUs was nearly 100% utilized (while only one or two CPUs were busy at any time during the consecutive run):

Visual demonstration that 16 processors were busy

But despite using 16x as many CPUs, the multithreaded version of the processing script was only about 40% faster. There were only 18 books in the data set and there was some disparity between book lengths (the Bible, which contained millions of words, was processed much more slowly than the others). Perhaps the speed would be greater with a larger or more balanced data set.

The post on the WZB Data Science blog contains a multiprocessing implementation of the above. Running their multiprocessing code with 16 CPUs gave comparable speed to multithreading (minimum time 31.2 seconds). If I were writing this code for a real project, multiprocessing would remain my choice, because the analysis for one book can be done independently of that for any other book and the data volumes are not that large.

Other news

Python 3.14 also introduced some improvements in exception handling, a new approach to string templates, and improvements in the use of concurrent interpreters. See the
Release notes for Python 3.14 for more information.

In the broader Python Data Science ecosystem, several other developments have occurred or are expected before the end of 2025:

  • The first stable release of the
    Positron IDE was taken in August;
  • Pandas 3.0 coming before the end of the year, introducing strings as data type, copy-on-write behavior, and implicit column access in DataFrame modification code;
  • Tools that include DataFrames become agnostic to the DataFrame library through the Narwahls project. See the
    Written suddenly
    on this subject.

Python data science is advancing so quickly that we can only scratch the surface here. Have we missed something in the broader Python ecosystem (2025 edition) that will make a huge difference to your data work? Let us know at
LinkedIn or
Blue sky.

For updates and revisions to this article, see the original post


#Whats #Python #bloggers

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *