« Jupyter Notebook Best Practices
March 27, 2019 • ☕️ 2 min read
Data ScienceJupyterJupyter NotebookPythonProductivity
Concise advice to use Jupyter notebooks more effectively.
Caveat: The advice in this article refers to the original Jupyter notebook. While much of the advice can be adapted to JupyterLab, the popular notebook extensions can’t.
Table of Contents
2. Refactor & outsource code into modules
3. Be curious about productivity hacks
1. Structure your Notebook
- Give your notebook a title (H1 header) and a meaningful preamble to describe its purpose and contents.
- Use headings and documentation in Markdown cells to structure your notebook and explain your workflow steps. Remember: You’re not only doing this for your colleagues or your successor, but also for your future self.
- The toc2 extension can automatically create heading numbers and a Table of Contents, both in a sidebar (optionally a floating window) and in a markdown cell. The highlighting indicates your current position in the document — this will help you keep oriented in long notebooks.
- The Collapsible Headings extension allows you to hide entire sections of code, thereby letting you focus on your current workflow stage.
- This default template extension causes notebooks to not be created empty, but with a default structure and common imports. Also, it will repeatedly ask you to change the name from
Untitled.ipynb
to something meaningful. - The Jupyter snippets extension allows you to conveniently insert often needed code blocks, e.g. your typical import statements.
2. Refactor & outsource code into modules
- After you’ve written plain code in cells to get ahead quickly, acquire the habit of turning stable code into functions and move them to a dedicated module. This makes your notebook more readable and is incredibly helpful when productionizing your workflow. This:
df = pd.read_csv(filename)df.drop( ...df.query( ...df.groupby( ...
becomes this:
def load_and_preprocess_data(filename):"""DOCSTRING"""# do stuff# ...return df
and finally this:
import dataprepdf = dataprep.load_and_preprocess_data(filename)
- If you edit a module file, Jupyter’s autoreload extension reloads imported modules:
%load_ext autoreload%autoreload
- Use ipytest for testing inside notebooks.
- Use a proper IDE, e.g. PyCharm. Learn about its features for efficient debugging, refactoring and testing.
- Stick to the standards of good coding — think Clean Code principles and PEP8. Use meaningful variable and function names, comment sensibly, modularize your code and don’t be too lazy to refactor.
3. Be curious about productivity hacks
- Learn the Jupyter Keyboard Shortcuts. Print the list and hang it on the wall next to your screen.
- Get to know Jupyter extensions: Codefolding, Hide input all, Variable Inspector, Split Cells Notebook, zenmode and many more.
- Jupyter Widgets (sliders, buttons, dropdown-menus, …) allow you to build interactive GUIs.
- The tqdm library provides a convenient progress bar.
4. Embrace reproducibility
- Version Control: Learn to use git — there are many great tutorials out there.
- Depending on your project and purpose, it might be reasonable to use a git pre-commit hook that removes notebook output. It will make commits and diffs more readable, but might discard output (plots etc.) you actually want to store.
- Run your notebook in a dedicated (conda) environment. Store the
requirements.txt
file in the git repository alongside your notebooks and modules. This will help you reproduce your workflow as well as facilitate transitioning into a production environment.
5. Further reading
- Working efficiently with JupyterLab Notebooks
- Bringing the best out of Jupyter Notebooks for Data Science
- Boosting Your Jupyter Notebook Productivity
- … and of course Joel Grus’ famous I Don’t Like Notebooks
Conclusion
Good software engineering practices, structuring and documenting your workflow as well as customizing Jupyter to your personal taste will increase your notebook productivity and sustainability.
I’m happy to hear your own tips and your feedback in the comments.