Installing Python Libraries
The world of Python includes more than just the libraries that come pre-installed on your computer. Thousands of Python programmers all over the world have released their own Python libraries for others to install and use. Here are a few that I use all the time:
- Twython, which makes it easier to make requests to the Twitter API
- Pattern, a library for natural language processing with an easy-to-use interface
- Requests, a cleaner and simpler HTTP client for Python
Awesome Python is a great list of useful Python libraries. Take a look and see if anything piques your interest!
Wrong ways
So: libraries are good. Let’s learn how to install Python libraries on our computers.
There are a few ways to install Python libraries, but really only one right way. The right way is also sort of tricky and counterintuitive, but trust me— it’s worth it.
Before we get to the right way, I want to talk about a few of the wrong ways.
You might occasionally see instructions that tell you to use a program called
easy_install
. Don’t do this if you can avoid it! The program called
easy_install
is an outdated Python package
manager that hardly anyone uses
any longer (because it has many drawbacks, and because there are now better
alternatives). You should probably only use easy_install
once: to install
pip
(a better package manager, which we’ll discuss below).
The documentation for some libraries may sometimes instruct you to do something like this:
sudo python setup.py install
… or, even worse, you might be instructed to just copy files to particular directories. Again, don’t do this if you can avoid it! This will copy files to arcane locations on your computer, and it will be difficult or impossible for you to remove them later, or to even get a good list of what libraries you’ve installed in this manner.
A harrowing scenario
There are several problems with the library installation strategies mentioned in the previous section. With both of these methods, there’s no easy way for you to know which libraries you’ve installed, or where exactly the files you’ve installed are located on your hard drive. As a consequence, there’s no easy way to uninstall these libraries, or to know (at a glance) which libraries you have installed. It can become a huge mess. (Trust me, I know.)
A deeper problem with these methods is that you can only have one version of a particular library installed at a time. This doesn’t seem like a big deal at first, but imagine this scenario:
You’re working on your thesis project. It’s a python project, and you’ve happily been using version 1.7 of Library X to get work done. Your thesis project is going great and you’re about to give your presentation in two hours! Then you’re futzing around on a project for one of your 2nd year blow-off classes and you realize that you really need version 2.0 of Library X to do some particular thing. So you install version 2.0 of Library X, and everything seems good initially until you get up to present your thesis project, and… everything’s broken! your thesis project program doesn’t run… or it runs weird, because of some change that they made to Library X between version 1.7 and version 2.0. You have to scramble to uninstall version 2.0 and re-install… wait, which version of Library X were you using again? You don’t remember! All 20 minutes of your thesis presentation are taken up by you futzing with Python packages in front of your teachers and your peers and all of posterity. As a consequence, you don’t actually get to talk about your thesis project during your presentation, and you don’t get the grant/seed funding that you were depending on in order to pay your rent post-graduation. Your roommates kick you out and you have to go back to Utah to live with your parents.
This is a horrible and all too realistic scenario. In order to avoid it, we want our method of installing Python libraries to have the following properties:
- It should be easy to tell which libraries are installed.
- It should be easy to uninstall a library.
- It should be possible to manage multiple isolated installations of libraries, so that it’s easy for one project to use version X of a library, and another project to use version Y.
Above all, we want the process of installing libraries to be repeatable. Once you have all of the libraries installed that you need, it should be trivial to install all of them again on another computer, or communicate to another person the libraries they need to run your code.
The right way
Fortunately, there is a system for installing Python libraries that meet all of
the criteria above. The system involves two programs: pip
and virtualenv
.
Pip is a “package manager” for python. it makes it easy to find, install, upgrade, and uninstall python libraries. It manages library dependencies as well (i.e., if the library you’re trying to install requires you to first install some other library, it will take care of that step for you). There are alternatives, but pretty much everybody uses pip.
Note: If you’re using Python 2.7.9 or newer, or Python 3.4 or newer, pip is included with your Python distribution! That means you don’t have to perform the steps below to install pip and virtualenv. The way you’ll run pip and virtualenv is also slightly different; see the official documentation for more information.
Here’s how to install pip (on OSX or Linux):
$ sudo easy_install pip
(This is the one time it’s okay to use easy_install
.)
You can find a list of libraries that pip can install on the Python Package Index, sometimes known as “the cheeseshop.”
The first library we’re going to install with pip is virtualenv
, which is a
tool for making multiple independent Python environments. You can install it
like so:
$ sudo pip install virtualenv
The goal of virtualenv
is to avoid installing libraries “globally,” and
instead make it possible to make a separate “environment” for each Python
project you create. That way, the libraries that you install for one project
won’t interfere with the libraries you install for another project, and you can
remove libraries from one project without affecting other projects that might
still be using them.
Here’s how to use virtualenv. First, open a terminal window in the directory where your code is located. Then:
$ virtualenv venv # you can call this directory whatever you want, but most people use env or venv
$ source venv/bin/activate
The first line creates a “virtual environment”—essentially, a modified version of the Python interpreter, pre-programmed to look in a separate directory for libraries, instead of in the system default location. The second line “activates” the virtual environment; as long as it’s active, typing “python” will refer to the Python interpreter in your virtual environment, instead of the “global” Python installed on your machine.
When you’re done working in the virtual environment, you can either
$ deactivate
… or simply close the terminal window.
Using pip in a virtual environment
Once you’ve created a virtual environment with virtualenv
, you can start
installing Python libraries of your choice with pip
. Here’s the workflow:
(1) Make a directory for your project. This is where your source files will go.
(2) Create a virtual environment inside that directory, like so:
$ virtualenv venv
(3) Activate the virtual environment, like so:
$ source venv/bin/activate
(4) Do your work in this directory. If you need to install a library, use pip
’s install
command, like so:
$ pip install your_library
… replacing your_library
with the library you want to install. (Again, you
can search the Python Package Index for
libraries you might want to install.)
(5) When you’re done working on that project, deactivate the virtual environment, like so:
$ deactivate
… or close the terminal window.
The next time you want to work on that project, use cd
to switch to the
directory containing the virtual environment. Use source venv/bin/activate
to
activate the environment. You’re ready to go!
NOTE: If you use git
, make sure to add “venv” to your .gitignore file!
Otherwise, your commits will contain the entire contents of all of the libraries
you have installed—which is probably not what you want.
Tricks with pip
The great thing about pip is that you can get a list of exactly which libraries you’ve installed in the virtual environment, along with their versions. To get this list, type:
$ pip freeze >requirements.txt
The resulting file (requirements.txt
) is called a pip “requirements file,”
and is a standard way of communicating the dependencies for your project (i.e.,
the list of libraries it needs to have installed in order to work as expected).
Many tools and services (such as
Heroku)
recognize and make use of requirements files.
You can also uninstall a library like so:
$ pip uninstall your_library
Again, if you’re using pip
in a virtual environment, uninstalling a library
will only affect that environment.