« Jupyter Notebook Best Practices

March 27, 2019 • ☕️ 2 min read

Data ScienceJupyterJupyter NotebookPythonProductivity

Concise advice to use Jupyter notebooks more effectively.

Photo by [SpaceX](https://unsplash.com/@spacex?utm_source=medium&utm_medium=referral) on [Unsplash](https://unsplash.com?utm_source=medium&utm_medium=referral)
Photo by [SpaceX](https://unsplash.com/@spacex?utm_source=medium&utm_medium=referral) on [Unsplash](https://unsplash.com?utm_source=medium&utm_medium=referral)

Caveat: The advice in this article refers to the original Jupyter notebook. While much of the advice can be adapted to JupyterLab, the popular notebook extensions can’t.


Table of Contents

1. Structure your Notebook

2. Refactor & outsource code into modules

3. Be curious about productivity hacks

4. Embrace reproducibility

5. Further reading

Conclusion


1. Structure your Notebook

  • Give your notebook a title (H1 header) and a meaningful preamble to describe its purpose and contents.
  • Use headings and documentation in Markdown cells to structure your notebook and explain your workflow steps. Remember: You’re not only doing this for your colleagues or your successor, but also for your future self.
  • The toc2 extension can automatically create heading numbers and a Table of Contents, both in a sidebar (optionally a floating window) and in a markdown cell. The highlighting indicates your current position in the document — this will help you keep oriented in long notebooks.
  • The Collapsible Headings extension allows you to hide entire sections of code, thereby letting you focus on your current workflow stage.
  • This default template extension causes notebooks to not be created empty, but with a default structure and common imports. Also, it will repeatedly ask you to change the name from Untitled.ipynb to something meaningful.
  • The Jupyter snippets extension allows you to conveniently insert often needed code blocks, e.g. your typical import statements.

Using a Jupyter notebook template (which sets up default imports and structure) and the Table of Contents (toc2) extension, which automatically numbers headings. The Collapsible Headings extension enables hiding of section contents by clicking the grey triangles next to the headings.
Using a Jupyter notebook template (which sets up default imports and structure) and the Table of Contents (toc2) extension, which automatically numbers headings. The Collapsible Headings extension enables hiding of section contents by clicking the grey triangles next to the headings.


2. Refactor & outsource code into modules

  • After you’ve written plain code in cells to get ahead quickly, acquire the habit of turning stable code into functions and move them to a dedicated module. This makes your notebook more readable and is incredibly helpful when productionizing your workflow. This:
df = pd.read_csv(filename)
df.drop( ...
df.query( ...
df.groupby( ...

becomes this:

def load_and_preprocess_data(filename):
"""DOCSTRING"""
# do stuff
# ...
return df

and finally this:

import dataprep
df = dataprep.load_and_preprocess_data(filename)
  • If you edit a module file, Jupyter’s autoreload extension reloads imported modules:
%load_ext autoreload
%autoreload
  • Use ipytest for testing inside notebooks.
  • Use a proper IDE, e.g. PyCharm. Learn about its features for efficient debugging, refactoring and testing.
  • Stick to the standards of good coding — think Clean Code principles and PEP8. Use meaningful variable and function names, comment sensibly, modularize your code and don’t be too lazy to refactor.

3. Be curious about productivity hacks


4. Embrace reproducibility

  • Version Control: Learn to use git — there are many great tutorials out there.
  • Depending on your project and purpose, it might be reasonable to use a git pre-commit hook that removes notebook output. It will make commits and diffs more readable, but might discard output (plots etc.) you actually want to store.
  • Run your notebook in a dedicated (conda) environment. Store the requirements.txt file in the git repository alongside your notebooks and modules. This will help you reproduce your workflow as well as facilitate transitioning into a production environment.

5. Further reading


Conclusion

Good software engineering practices, structuring and documenting your workflow as well as customizing Jupyter to your personal taste will increase your notebook productivity and sustainability.

I’m happy to hear your own tips and your feedback in the comments.