An introduction to IPython

Stéfan van der Walt

CTPUG, 8 February 2013

What is IPython?

  • Powerful interactive shells (terminal and Qt).
  • A browser-based notebook with support for code, text, mathematical expressions, inline plots and other rich media.
  • Support for interactive data visualization and use of GUI toolkits.
  • Flexible, embeddable interpreters to load into your own projects.
  • Easy to use, high performance tools for parallel computing.

What is IPython also?

  • Full list here: https://github.com/ipython/ipython/wiki/Projects-using-IPython
  • Build your own: http://andrew.gibiansky.com/blog/ipython/ipython-kernels/

Who develops IPython?

  • Team 1: Berkeley under Fernando Perez
  • Team 2: Cal Poly, San Luis Obispo under Brian Granger
  • Team 3: The internet

Who funds IPython development?

In December 2012, IPython was awarded a $1.15 million grant from the Alfred P. Sloan Foundation that will fund the core team for the 2013-2014 period.

In the summer of 2013, Microsoft made a $100,000 donation to support all aspects of the IPython project.

Learning Resources

  • http://ipython.org/documentation.html
  • http://www.packtpub.com/learning-ipython-for-interactive-computing-and-data-visualization/book

Getting It

  • https://store.continuum.io/cshop/anaconda/
  • https://www.enthought.com/products/epd/

sudo apt-get install ipython-notebook

pip install --user ipython-notebook

Or, if you feel enthusiastic:

sudo apt-get install "python-(numpy|scipy|matplotlib|skimage|sklearn|pandas|sympy)$" ipython

Command prompt demo

The IPython Notebook

nbviewer.ipython.org

Interactive hosting: wakari.io, cloud.sagemath.com

IPython Notebook Demo

  • File format
  • Modal navigation
  • Inline plotting
  • LaTeX
  • Widgets
  • export (e.g. slides)

Audio & Video

In [1]:
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt

f = 300
T = 2
rate = 8192

t = np.linspace(0, T, T * rate)
s = np.sin(t * 2 * np.pi * f)

plt.plot(t[:100], s[:100]);
In [2]:
from IPython.display import Audio
Audio(data=s, rate=8192)
Out[2]:
In [3]:
from IPython.display import YouTubeVideo

YouTubeVideo('dhRUe-gz690')
Out[3]:

Other rich objects

In [4]:
x = np.linspace(0, 1, 100)
y = x[:, np.newaxis]

plt.imshow(np.sin(x**3 + np.sqrt(y)), interpolation='nearest', cmap='jet')

# (By the way, never use the jet colormap);
In [5]:
%%file data.csv
Date,Open,High,Low,Close,Volume,Adj Close
2012-06-01,569.16,590.00,548.50,584.00,14077000,581.50
2012-05-01,584.90,596.76,522.18,577.73,18827900,575.26
2012-04-02,601.83,644.00,555.00,583.98,28759100,581.48
2012-03-01,548.17,621.45,516.22,599.55,26486000,596.99
2012-02-01,458.41,547.61,453.98,542.44,22001000,540.12
2012-01-03,409.40,458.24,409.00,456.48,12949100,454.53
Overwriting data.csv

In [6]:
import pandas
df = pandas.read_csv('data.csv')
df
Out[6]:
Date Open High Low Close Volume Adj Close
0 2012-06-01 569.16 590.00 548.50 584.00 14077000 581.50
1 2012-05-01 584.90 596.76 522.18 577.73 18827900 575.26
2 2012-04-02 601.83 644.00 555.00 583.98 28759100 581.48
3 2012-03-01 548.17 621.45 516.22 599.55 26486000 596.99
4 2012-02-01 458.41 547.61 453.98 542.44 22001000 540.12
5 2012-01-03 409.40 458.24 409.00 456.48 12949100 454.53
In [7]:
df[df.Date < '2012-03-01']
Out[7]:
Date Open High Low Close Volume Adj Close
4 2012-02-01 458.41 547.61 453.98 542.44 22001000 540.12
5 2012-01-03 409.40 458.24 409.00 456.48 12949100 454.53

Build your own

In [8]:
import urllib2
import BeautifulSoup as bs


class CTPUG(object):
    def __init__(self):
        data = urllib2.urlopen('https://ctpug.org.za/wiki/OtherActivities').read()
        html = bs.BeautifulSoup(data)
        list_items = html.find(id='content').find('ul')
        links = [x.find('a') for x in list_items]
        self.games = [(game.text, game.get('href')) for game in links]

    def _repr_html_(self):
        out = '<h2>CTPUG Games Central</h2>'
        out += '<ul>'
        for (game, url) in self.games:
            out += '<li><a href="%s">%s</a></li>' % (url, game)
        out += '</ul>'
        
        return out

Parallel processing

IPython supports many different styles of parallelism including:

  • Single program, multiple data (SPMD) parallelism.
  • Multiple program, multiple data (MPMD) parallelism.
  • Message passing using MPI.
  • Task farming.
  • Data parallel.
  • Combinations of these approaches.
  • Custom user defined approaches.

Most importantly, IPython enables all types of parallel applications to be developed, executed, debugged and monitored interactively.

Concepts: engine, controller

The two primary models for interacting with engines are:

  • A Direct interface, where engines are addressed explicitly.
  • A LoadBalanced interface, where the scheduler is trusted with assigning work to appropriate engines.

From http://ipython.org/ipython-doc/dev/parallel/parallel_intro.html

In [13]:
%%script bash --bg

ipcluster start -n 4
Starting job # 2 in a separate thread.

In [15]:
from IPython.parallel import Client
rc = Client()
In [16]:
rc.ids
Out[16]:
[0, 1, 2, 3]

Choice: load balanced scheduler or direct view?

In [17]:
dview = rc[:] # use all engines
dview
Out[17]:
<DirectView [0, 1, 2, 3]>

Apply vs map

In [18]:
def hostname():
    import socket
    return socket.gethostname()
In [19]:
dview.apply_sync(hostname)
Out[19]:
['shinobi', 'shinobi', 'shinobi', 'shinobi']
In [20]:
dview.map_sync(lambda x: x**2, range(16))
Out[20]:
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225]

Asynchronous execution

In [21]:
with dview.sync_imports():
    import time
    
# or import inside the function
importing time on engine(s)

In [22]:
def wait(t):
    tic = time.time()
    time.sleep(t)
    return (t, time.time() - tic)

results = dview.map_async(wait, np.random.random(10) * 5)
results
Out[22]:
<AsyncMapResult: wait>
In [23]:
results.ready()
Out[23]:
False
In [24]:
results.get()
Out[24]:
[(0.52673117926659907, 0.527306079864502),
 (4.3635856117499907, 4.3654868602752686),
 (3.5556426688539657, 3.5570261478424072),
 (2.4988755516125023, 2.499159097671509),
 (3.8373981749231452, 3.841087818145752),
 (3.287314158513361, 3.290397882461548),
 (2.8595795712850425, 2.8625378608703613),
 (0.32649847009072308, 0.32690882682800293),
 (1.24421457789256, 1.2452821731567383),
 (1.5049725249444452, 1.5050890445709229)]

Define functions for parallel execution

In [25]:
@dview.remote(block=True)
def getpid():
    import os
    return os.getpid()
In [26]:
getpid()
Out[26]:
[10907, 10908, 10911, 10914]
In [27]:
import numpy as np
A = np.random.random((64, 48))

# parallel -> only for element-wise operations
#          -> automatically breaks up input into chunks
@dview.parallel(block=True)
def pmul(A, B):
    return A*B
In [28]:
(pmul(A, A) == A * A).all()
Out[28]:
True
In [29]:
dview.scatter('x', range(16))
Out[29]:
<AsyncResult: scatter>
In [30]:
%%px
[i**2 for i in x]
Out[0:1]: [0, 1, 4, 9]
Out[1:1]: [16, 25, 36, 49]
Out[2:1]: [64, 81, 100, 121]
Out[3:1]: [144, 169, 196, 225]
In [31]:
%%px
y = [i**2 for i in x]
In [32]:
y = dview.gather('y')
y
Out[32]:
<AsyncMapResult: finished>
In [33]:
y.get()
Out[33]:
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225]
  • EC2 cluster: http://star.mit.edu/cluster/docs/latest/plugins/ipython.html
  • Documentation: http://ipython.org/ipython-doc/dev/parallel/parallel_multiengine.html

Widgets: merged today

In [34]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

from IPython.html.widgets import interact, interactive, fixed
from IPython.display import display
from IPython.html.widgets import FloatSliderWidget, HTMLWidget
In [35]:
import skimage
from skimage import data, filter, io, img_as_float
from skimage import exposure

i = img_as_float(data.coffee())

plt.imshow(i);
In [39]:
def edit_image(image, sigma=0.1, r=1.0, g=1.0, b=1.0, gamma=1,
                      interpolation={'Nearest Neighbor': 'nearest',
                                     'Linear': 'bilinear',
                                     'Cubic': 'bicubic'},
                      zoomed=False):
    new_image = image
    
    if zoomed:
        new_image = new_image[55:75, 55:75]
    
    new_image = filter.gaussian_filter(new_image, sigma=sigma, multichannel=True)
    
    new_image[:,:,0] = r*new_image[:,:,0]
    new_image[:,:,1] = g*new_image[:,:,1]
    new_image[:,:,2] = b*new_image[:,:,2]
    
    new_image = np.clip(new_image, 0, 1)
    
    new_image = exposure.adjust_gamma(new_image, gamma)
    
    plt.imshow(new_image, interpolation=interpolation)
    plt.show()
    
    return new_image
    
r = FloatSliderWidget(min=0, max=2, step=0.1, value=1)
g = FloatSliderWidget(min=0, max=2, step=0.1, value=1)
b = FloatSliderWidget(min=0, max=2, step=0.1, value=1)

w = interactive(edit_image, image=fixed(i),
                sigma=(0, 2, 0.1),
                r=r, g=g, b=b, gamma=(0.0, 1.0, 0.1))

w.children = [HTMLWidget(value='<h3>Exploring image adjustment</h3>')] + list(w.children)

display(w)
In [40]:
def fourier(X, n):
    plt.plot(x, 'b')
    
    N = len(X)
    zeros = (N - n) // 2
    
    X = X.copy()
    X[N//2 - zeros:N//2 + zeros] = 0
    
    x_ = np.abs(np.fft.ifft(X))
    
    plt.plot(x_, 'r')
    plt.show()
    return x

x = np.zeros(1000)
x[300:600] = 1

X = np.fft.fft(x)
w = interactive(fourier, X=fixed(X), n=(0, 200))
display(w)
In []: