From: "Sebastian Haase" <seb.haase AT (nospam) gmail . com> [use @ for AT, remove spaces and the (nospam) part]
Date: 2007-03-10
Priithon
1. Introduction
1.1. Motivation
Priithon is my attempt to make (certain) people's lifes easier. The lab I work in has thousands of line of fortran and C code meant to make image analysis accessible to biologists and microscopists. The collection of all those programs is called Priism/IVE (Imaging and Visualization Environment). Ten plus years ago that was a state of the art platform running on (expensive - cheap compared to main-frames !) SGI workstations. When I joined the lab I quickly realized that big parts of IVE are about providing a graphical user interface, parsing command-line options and file handling. Each program contains often extended sections dealing with that. The "core" algorithm implementing the actual image analysis is hidden in between.
I decided to start over.
Python is a modern programming language. It's relatively fast, 1 interactive 2, objective-oriented, extensible3, clean(intuitively readable) and easy-to-learn 4. That's why it is already the choice of many in the scientific community including space science telescope institute (Hubble) and LLNL. Python provides very powerful libraries for building graphical user interfaces. Lots and lots of functions for data handling 5 are provided by many labs and the scientific-python-community around the world. The main complaints about using Python in science might be that there is no consistent documentation and that it requires to much prior system-administrative knowledge to get started.
I have put together all the needed components for image analysis. Just extract the zipped-archive into a directory, and start the "master-script" program. This is currently available for Windows (any:98/NT/2000/XP), Linux(Debian/RedHat/...) and OS-X (10.2?/10.3/10.4?).
Priithon is an integrated image analysis and algorithm development platform. I will use the term Priithon to meaning three different aspects of this:
The entire collection of all the provided modules
The PyShell graphical text interface (including drag-and-drop functionality)
A package of many Useful modules that I wrote; including the OpenGL based image viewer and many computation shortcuts that allow quick interactive development
1.2. Design
The focus of Priithon lies more in algorithm development than in having a simple "mouse-click-clack" user interface.6 As the interactive program (shell) Priithon uses PyShell. 7 The feature I like most is the help popup windows that provide a list of all available commands, their function arguments and a short description of the selected function. This feature minimizes the need for remembering names and exact spellings. All functions are sorted into one letter modules. The actual letter used could be considered as rather arbitrary; mainly this is needed to trigger PyShell's help mechanism (which only kicks in when typing a dot. But it also makes clear that the respective function is not a build-in (standard) python function and where to look for documentation. There is already a plethora of (standard/buildin and scientific-community provided) python libraries for data (memory) handling, file handling, (graph-) plotting, linear-algebra and much more. My own code can be listed as follows:
a fast OpenGL based image viewer that ...
a very fast (because it uses STScI's memory mapping module) MRC/Priism file handling module 8
a module for using fftw (using short function names I like)
collection of "ND test images" ("hard" disk or square, gaussian bell shape, gaussian noise, poisson noise, ...)
collection of "useful" (one-liner type) functions (min-max-mean-stddev, phase(of complex valued images)
framework for easy C/C++/fortran extensions writing (uses SWIG with my typemaps)
1.3. Start
Get Python version 2.4:
Linux will probably have it
Mac or Windows: go to [www.python.org/download] and follow the link
Get Priithon as a zipped-archive:
Linux: PrLinN_2007......tbz or PrLinN64_2007......tbz for i386 or 64-bit-Linux/-Python
Mac OSX: PrMacNPPC_2007.....tbz or PrMacNInt_2007.....tbz for PowerPC or Intel
Windows: PrWinN_2007......zip
Extract it - All is contained inside one directory:
Linux: PrLinN (or PrLinN64) - put this directory were ever you want - e.g. your home directoy
Mac: PrMacNPPC or PrMacNInt - put this directory were ever you want - e.g. your home directoy
Windows: PrWinN must be put into C:\ (i.e. C:\PrWinN)
The master-script is called priithon_script or priithonN.bat on Windows
create a link to the script to a convenient place:
Windows: drag and drop e.g. to Desktop
Mac and Linux: use symbolic links (e.g. ln -s priithon_script ~/bin/priithon)
start priithon by typing priithon or double-click on the priithon-script file
1.4. Demo 1: synthetic "star/bead images"
Simulation of 2d images of resolution limited point sources with two types of noise sources:
1 >>> a = F.gaussianArr((256,256), sigma=3, peakVal=100, orig=0, wrap=1) # shape of a perfect bead
2 >>> b = F.poissonArr((256,256), mean=.001) # random bead positions
3 >>> c = 100 + F.convolve(a,b)
4 >>> d = c + F.noiseArr((256,256), stddev=1)
5 >>> e = F.poissonize(d)
6 >>> Y.view(c) # noise free
7 >>> Y.view(e) # with "readout" noise and quantum shot noise
1.5. Demo 2: image file analysis
Image analysis of data saved in any file format
1 Drag image-file into PyShell window (jpg,bmp,tiff, ... or fits or MRC/Priism format)
2 Select: 'view'
3 >>> a = Y.vd(-1) # get data from latest viewer
4 >>> U.mmms(a) # show min,max,mean,stddev of whole (nd)data-set
5
6 # set viewer into mode so that each left-mouse-click shows a line profile
7 # (averaged over deltaY=15 pixel) in graph-plot window
8 >>> Y.vLeftClickHorizProfile(-1, 15, '-+')
9 # click into the image viewer window !
2. Python Tutorial
There are already many many Python tutorial and lot's of books. The documentation of Numerical Python is also quite extensive (many hundred pages). But I have not seen a tutorial that explains both in a satisfatory manner to just get going. This is my take on a quick tutorial that focuses on new comers.
2.1. Python
The name comes from "Monty Python's Flying Circus" - not the snake. It's meant to be "short, unique, and slightly mysterious". Guido van Rossum, started it all about 1990, then at http://www.cwi.nl/~guido The Release 1.0.0 is dated 26 January 1994. Python is an interpreted, interactive, object-oriented programming language.
2.2. Things to memorize
There are a few guidelines one has to realize when using Python:
A "variable" is really "just a name" for something !! Every assignment to a variable, like a=5 or a=F.fft(a) throws the old meaning (value) of a away 9 and reassigns a to whatever is the result of whats on the rights side of =-sign. But note: a[:] = F.fft(a) is NOT an assignment: there is more than "just the variable name" left of the =-sign. In other words: is a was a big array, a[:]= 2 would overwrite a's values (each 'pixel' would be set to 2, whereas a=2 would destroy that array and a would from then on refer to the number 2.
Indentation matters: This must be the strangest "feature" of python: Instead of using curly brackets (like in C/C++/Java) or BEGIN/END (like in PASCAL) or do/done (still others) Python infers the "bracketing" from the way a code-section is indented. This feature seems very strange at first 10, but it is supposed to make everyones code look more alike; [ also you don't have to decide if those opening brackets go at the end of the last section or in a separate line between the code sections. Think of it this way: every time you save two lines of code ] Please make up your mind if you want to use tabs or spaces to indend; there are pros and cons for each choice -- but stick with it and don't mix!!11
Everything is case-sensitive
Every variable lives inside its modules
Importing a module makes its variables12 visible - all code inside the module's defining file gets executed only the first time it gets imported (by anything - since python started)13
2.3. Now let's start
Three greater-than symbols (>>>) make the interactive prompt; type something, press return and you get the output on the next line (mostly the value of the typed expression) and another prompt on the next line. Often the typed line does not have an output or value (its value is None), then you are awarded with another prompt immediately. 14 The number-sign (#) introduces a comment up to the end of its line. If a line requires a next line - like an if-statement - pressing return automatically gives you the "three dot" (...) continuation prompt: enter now the other lines; an empty line ends the continuation and finishes the command. (If you forgot a line and need to insert a new line in the middle of a ...-section, press "Control-Return")
Python has a special mechanism for dealing with errors: "exception handling" (just like Java and (newer) C++ ). The way it works is that whenever an fault-situation occurs, the respective function just raises ("throws" in C++/Java's lingo) an exception which can be detected inside the same function or the "calling function" 15. By default this results in an error message including a (somewhat) meaningful traceback telling you where the error occurred and which function called the failing function:
1 >>> def f():
2 ... a = 4/0
3 ...
4 >>> f() # "uncaught" error
5 Traceback (most recent call last):
6 File "<input>", line 1, in #
7 File "<input>", line 2, in f ZeroDivisionError: integer division or modulo by zero
The following shows how one would have a "controlled" response to an (anticipated) error situation. Most people will get by without ever using this !
1 >>> try: # now we catch it and handle it in a "controlled" way
2 ... f()
3 ... except Exception, e:
4 ... print "*** there is a problem:", e
5 ... *** there is a problem: integer division or modulo by zero
6 >>>
Standard Python knows of a few (quite interesting) data types: numbers, strings, lists, tuples and dictionaries 16 Numbers are either integer (of unlimited size - really!), floating point (double precision)17 or complex (j is the imaginary unit: e.g. 1j or 10-.1j). (More differentiated number types come with "Numerical Python", see later) Python has these common operators: +,-,/(division), *(multiplication), **(power), %(modulo). 18 ;the bit-wise operators |,&;, 'and', 'or', ....)
Each such operator comes also with a "special assignment" (seb: better/correct word!!) operator: +=,-=,/=,*=,**=,%= ; these get very useful with Numerical Python, because they are inplace operations and do not require allocation of a new array (and garbage-collecting the old one). Python understands "chained" comparisons, like a == b == c or 0 <= a < 1. 19
Strings are either between single or double quotes: 'cat' or "cat" are exactly the same. Additionally you can use: '''cat''' or """cat""" are still exactly identical to those before, but the triple single/double quotes can go over multiple lines; They are often used for documentation strings (e.g. at the beginning of a functions); also they are a convenient way of effectively commenting out large sections of code. Strings can be concatenated with +, or appended-to with +=; 'cat'*2 equals 'catcat' If a string is preceded by r then the backlash loses it's special meaning: r"C:\new folder\myData" is what you want; without the r the '\n' would have been a newline (like in C). Best of all is the %-operator for strings: "%d) %8.2f" %(4, 2.1) equals "4) 2.10" (think printf in C/C++). This is how to get nicely formatted print-lines; and it works wherever a string works, not only for print.
Lists are "sequences" of things; anything can be mixed into the same sequence: integers, floats, strings, modules, ...!
Important note: indices are used differntly then you might expect:
the first element has index 0 (like in C/C++)
the last element has index length - 1 (like in C/C++)
slices do NOT include the the last element specified: a=[0,1,2,3]; a[1:3] gives [1,2] ( this might seems strange at first, but it's going to safe you many -1 and +1 fixes in your code )
1 >>> a = [1,3.1, 4j, "hello"] # list
2 >>> a[0] # get first element - index 0 !!
3 1
4 >>> a[1:3] # get '''slice''' of elements 1 and 2
5 # when slicing, the last one (3) is NOT included
6 [3.1, 4j]
7 >>> a[-1] # negative indices count from the end
8 'hello'
9 >>> len(a)
10 4
11 >>> a + ['qqqq'] # appends to list
12 [1, 3.1, 4j, 'hello', 'qqqq']
13 >>> a += [1,2,3] # appends to list
14 >>> # a += 4 # error: you meant: a += [4]
range(...) returns a list of integers; (float-lists can be gotten with N.arange(...) in Numerical Python.)
1 >>> range(5) # returns a list, start at 0, up to (not including !) 5, increment 1
2 [0,1,2,3,4]
3 >>> range(4,8) # start at 4, up to < 8, in steps of 1
4 [4,5,6,7]
5 >>> range(4,13,3)
6 >>> range(12,6,-3) # return list: start with 12, go in steps of -3, stay larger than 6
7 [12, 9]
8 >>> range(12,5,-3)
9 [12, 9, 6]
Now comes something pretty cool: "List Comprehensions", you might never need them, but they are just so cool ... see for your self:
1 >>> [i**2 for i in range(3)] # list comprehension: [/any-expression/ for VAR in LIST]
2 [0, 1, 4]
3 >>> [i**2 for i in range(1,5) if i**3!=27] #list comprehension with if-clause
4 [1, 4, 16]
2.4. loops and functions
1 >>> for i in range(10):
2 ... print i,i**2
3 ...
4 >>> a = 10
5 >>> while a>5:
6 ... print a
7 ... a -= 1
8 ...
9 < prints result here >
10 >>> def f(a=10): # argument with default value
11 ... if a==0:
12 ... return 1
13 ... return a*f(a-1)
14 ...
15 >>> del a # delete variable name a; if no one else has a reference to a, a get garbage collected
16
17 >>> import string # if you want to use other modules/packages
2.5. we don't need this (yet)
Dictionaries are like telephone books: You can find anything in them.
1 >>> a = {} # empty dictionary
2 >>> a['boss'] = 1000000
3 >>> a[1234] = "string"
4 >>> a
5 {1234: -1, 'boss': 1000000}
6 >>> len(a) 3
7 >>> a.keys()
8 [1234, 'boss']
9 >>> a.values()
10 [-1, 1000000]
11 >>> a.items() # returns a list of tuples: (key,value) pairs
12 [(1234, -1), ('boss', 1000000)]
Tuples are very similar to lists: the one big difference is that they are unchangeable (immutable). This allows using them as keys in dictinaries. Together lists and tuple are referred to as sequences.
1 >>> b = (1,3.1, 4j, "hello") # tuple
2 >>> len(b)
3 4
4 >>> b + [3,4]
5 Traceback (most recent call last):
6 File "<input>", line 1, in #
7 TypeError: can only concatenate tuple (not "list") to tuple
8 >>> a = {} # empty dictionary
9 >>> a[(4,5)] = [6,7]
10 >>> a['boss'] = 1000000
11 >>> a[1234] = "string"
12 >>> a.items() # returns a list of tuples: (key,value) pairs
13 [((4, 5), (6, 7)), (1234, -1), ('boss', 1000000)]
14
15 >>> b += (3,4) # works - NO error: BUT new b get's created
Generic file I/O is supported (if you realy need it ! But see the Priithon-"useful" sections first !)
1 f = open("c:/lines.txt")
2 for line in f.readlines():
3 print "%4d " % len(line), line, # line already contains a 'newline'
4 f.close()
Python is a real object-orieted language - so here are your classes. (But you might never need to create a class in your life.)
1 class myClass:
2 def __init__(self, a=0):
3 self.a = a
4 def getVal(self):
5 return self.a
6 def setVal(self, a):
7 self.a = a
8 def __str__(self): # define "string"-representation
9 return "<myClass with val = %s" % self.a
2.6. last Python notes
Python knows a "special value" called None. It is both different from 0 or the empty string "". If is very useful in cases where 0 or the empty string represent valid values.
There is a special operator testing the identity of two things as opposed to their equality: the is-operator. This one is to be used to check if a variable is None. 20 (Example: if a is None: print "using default...")
2.7. debugger
If something goes wrong and you get a traceback, you might be interested in the python debugger. He allows you (among other things) to go in ("post mortem" !) and check on variables and their values right before the exception occurs
1
2 >>> U.debug() # short for: U.pdb.pm()
3 (pdb) l # 'l' lists some lines before and after where the error occured
4 (pdb) up # 'up' goes one level up into the calling function
5 (pdb) l # 'l' lists some lines before and after - now in the calling function
6 (pdb) p varname # print value of a variable
7 (pdb) help
8 ...
9 ...
10 ...
11 (pdb) quit
12
13 >>>
(SEB TODO more excercise!)
3. Tutorial: Numerical Python
When I started with Priithon in 2001/2002 the (supposed to be) current version of numerical python was numarray. But it was never accepted by everyone in the cummunity (mostly because small array/matrix handling was not optimized enough). Now (2007, starting in 2005) the two camps are reunited into numpy. Numpy is now the official name (of the project and of the actual python module). Before it was used for the original Numeric package as a nickname (keep this is mind when you google old web pages)
3.1. some history
First some confusion regarding the current state (Jun 2005)
Numerical Python (often called "numpy" or Numeric) grew out of a "Python special interest group" (SIG) formed in Aug 1995. Jim Hugunin wrote most of the code and documentation while a graduate student at MIT. Numerical Python 1.1 was released Mar 1998 mostly written by Paul F. Dubois (LLNL).
In Mar 2002 STScI announced numarray 0.3, a complete reimplementation focused on efficient and fast handling of large (astronomical) data sets. 21 Among the key features is e.g. "Ability to reference byteswapped data or non-aligned data (as might be found in record arrays) without producing new temporary arrays.")
Sadly Numeric and numarray are not efficiently convertable into each other22; even though (python) code written for one likely runs with the other without any change needed.
The good news is that starting about Jan 2005 Oliphant Travis ( http://www.ece.byu.edu/faculty/oliphant/) is trying to bring everybody back into the same boat: including Guido van Rossum himself to make (at least the basic array data type) numerical python go into the standard python library. First the name Numeric3 was used; then scipy_core; now it's officially called numpy
http://scipy.org is a good place to start looking for numerical python related documentation.
Also there is an (electronic) book for sale at http://www.tramy.us/guidetoscipy.html. Its title is Guide to NumPy and it has about 300 pages describing all(!) details about both the Python and the C (C-API) side of numpy.
http://numeric.scipy.org decribes how some of the C parts of numpy (including header files) can go into standart python. This would allow other projects like wxPython, PIL (python imaging library) or PyOpenGL to optimize and standardize memory (array) handling
As of 2007 priithon imports numpy as N (the old numarray imported as na is gone)
3.2. Quick start
(we use numpy; priithon imports numpy as N)
Numpy provides a efficient data type for multi-dimensional numerical data array; we will call them always arrays even though vector would sometimes sound nicer (especially for the mathematically trained ear).
Arrays are not extensible in size. 23 In Numerical Python the convention for n>1 dimensional arrays is to use the axis order as z,y,x. That is for 2d arrays (matrices) the first index references to the row and the second to the column.
Indices start at \0\ and end "one early" (like in C/C++: a[5] has elements from a[0] to a[4]; a[5] gives an error message (no segmentation fault )
1 # create an single precision float array with 256 by 256 elements
2 >>> a = N.zeros((256,256), N.float32)
3 # 3 by 256 by 256 elements unsigned short (16bit) integer, all elements set to one.
4 >>> b = N.ones((3,256,256), N.uint16)
5 # create 1d-array: start with 1 increment .1, up to (not including!) 10
6 >>> c = N.arange(1,10,.1)
7 # N.arange: type defaults to double precision or int32(if that's enough)
8 >>> c.dtype
9 float64
10 >>> c.shape
11 (90)
3.3. more exercising
1 >>> a = N.array([[1,2], [3,4], [5,6]]) # create array from sequence (list or tuple)
2 >>> a
3 [[1 2]
4 [3 4]
5 [5 6]]
6 >>> b = N.arange(1,7); b.shape=(3,2); b
7 [[1 2]
8 [3 4]
9 [5 6]]
10 >>> a[0] # values for row(y) 0
11 [1 2]
12 >>> a[0,:] # all following (non specified) indices default to ':' anyway !!
13 [1 2]
14 >>> a[:,1] # values column(x) 1
15 [2 4 6]
16 >>> a.shape
17 (3, 2)
18 >>> a.rank # number of dimensions of a ( equals len(a.shape) ) 2
19 >>> a.dtype
20 int32
21 >>> b.shape= (2,3) # you can assign a new shape IF the total number of elements matches
22 >>> b
23 [[1 2 3]
24 [4 5 6]]
25 >>> a.sum()
26 21
27 >>> a.stddev()
28 1.87082869339
29
30 >>> a.diagonal()
31 [1 4]
32 >>> a.trace()
33 5
34 >>> N.where(a>2,1,4)
35 [[4 4]
36 [1 1]
37 [1 1]]
38 >>> a.flat # python uses C order (not Fortran)
39 [1 2 3 4 5 6]
40 >>> a.dtype
41 int32
42 >>> N.average(a.flat)
43 3.5
44 >>> a.transpose()
45 >>> a
46 [[1 3 5]
47 [2 4 6]]
48 >>> a[:,0] = 77
49 >>> a
50 [[77 3 5]
51 [77 4 6]]
52 >>> a[1] = 0
53 >>> a
54 [[77 3 5]
55 [ 0 0 0]]
56 >>> N.arange(10)[::3] # strides !
57 [0 3 6 9]
58 >>> N.arange(0,10,3)
59 [0 3 6 9]
60
61
62 >>> N.identity(3)
63 [[1 0 0]
64 [0 1 0]
65 [0 0 1]]
66
67 >>> N.pi
68 3.14159265359
69 >>> N.e
70 2.71828182846
4. Priithon Tutorial
Now the tutorial for my image analysis package (a collection of several module that I wrote)
4.1. images
U.loadImg(filename) loads a tiff/jpg/bmp/gif/... image and returns it as an array. RGB images are returned as a stack of 2d-images (shape = (3,height,width)) 24
U.loadFits(filename, slot=0) loads a fits file and returns the data contained in the given "slot" as array
Mrc.bindFile(filename, mode='r') returns a ("memory mapped") array that is "bound" to the MRC/Priithm file on disk. This allows instantaneous access to very large files 25 Changing values in that array gets written to the original file on disk - but only if the "binding" is done in "write mode" - not the default! (in read-only-mode an exception is raised instead).
Likely you won't need any of those three commands for a while: You can just drag (using the mouse) the file from the explorer(windows) or konqueror (Linus/KDE) or alike (Mac,Linux/Gnome,...) into the priithon PyShell window or any viewer window and a new viewer window opens showing the data of that file. 26 The data array of a viewer window can be accessed with Y.vd(viewerID). To open a new viewer window showing the data of an n(>1) dimensional array call Y.view(array).
For reading text files containing 2d (seb: or 3d ??) "tables" of space (or comma ?) separated values use U.readArray(filename). The corresponding "saving" functions are: U.saveImg(), 27 U.saveFits() and Mrc.save().
To quickly browse through many files on the disk call Y.listFilesViewer(): a window lists all files in the current directory, a single-click on a filename shows the data (reusing a newly opened viewer window).
As a mnemonic:
all commands in the Y module deal with graphic display (in one way or the other)
all commands starting with Y.v... have to do with the image viewer:
Y.view()
Y.vd(id): gets the data array of a viewer
Y.vLeftClickTriangleMeasure(viewerID): measures circle diameter and center pos for each three mouse clicks into the viewer window number viewerID
... and many more ...
All commands starting with Y.plot... have to do with 1d graph plotting:
Y.ploty(array) plots a graph where array contains the y-values (the x-values are assumed to be [0,1,2,3,...]). Array can contain multiple rows (or columns) of such y-values; then multiple graphs are plotted in colors ordered as red, green, blue, black, cyan, magenta.
Y.plotxy(): if the x-value are to be specified;
if given a single array the first row(or column) is to specify the x-values the following specify (multiple rows/columns of) y-values;
call Y.plotxy(xVals, yVals) where xVals gives one row(or column) of x-values and yVals works the same as in Y.ploty().
Y.plothold(on=1): add more graphs to a plot-figure by setting into "hold"-mode with.
Y.plotFigure(): open a new plot-figure window, or to re-activate an old one.
Y.plotProfileVert() and
Y.plotProfileHoriz() plot line-pofiles along a given axis and position, possibly averaging over a "band" of pixel more than 1 pixels wide.
These functions can be called with as argument either a viewerID (integer number) or a two-dimensional array.
Y.vLeftClickHorizProfile()
Y.vLeftClickVertProfileHoriz() set a viewer window into a mode such that clicking into it will plot a line pofile (either vertical or horizontal), possibly averaging over a "band" of pixel more than 1 pixels wide.
Y.inspect() opens a special browser window allowing you to inspect all variables, functions, modules; and viewer their source code !
FN() and DIR() Use FN() whenever you need to specify a filename but prefer using a graphical dialog window instead 28 Use DIR() for directory names. 29.
Instead you can actually drag file or directories from your systems file explorer into the shell window and there path appears 30
Y.saveSession(filename) saves all text of the shell window in to a file. [[FootNote( the default filename (None) means that a file dialog window opens (default name: "_pySession_<date><time>.py") ]]
Mac OSX has a bug in it's OpenGL library (concerning textures with floating point values) since version 10.3. Call Y._bugOSX1036(1) to activate my workaround code for that.
4.2. Coordinate System
this paragraph is from PIL: ( we do it differently !!!! todo fixme)
The Python Imaging Library uses a Cartesian pixel coordinate system, with (0,0) in the upper left corner. Note that the coordinates refer to the implied pixel corners; the centre of a pixel addressed as (0, 0) actually lies at (0.5, 0.5). Coordinates are usually passed to the library as 2-tuples (x, y). Rectangles are represented as 4-tuples, with the upper left corner given first. For example, a rectangle covering all of an 800x600 pixel image is written as (0, 0, 800, 600).
4.3. "Useful" stuff
The purpose of the U-module is to provide "useful" "one-liner kind-of" functions.
First, if you have a dataset and want to see all of it's min,max,mean and standard-deviation values, call U.mmms( array ).
If your eyes start hurting because you look at many arrays (their element value) and don't care at all about all those decimals and e-numbers try calling U.naSetArrayPrintMode(precision=4, suppress_small=1) 31
U.deriv1D(arr) returns the gradient (derivative) of a one dimensional array (by finite differencing)
For (1D?) curve fitting we have:
U.fitLine()
U.fitPoly() (polynomial function: y=a+bx+cx**2+dx**3+...)
U.fitDecay() (fit single exponential or sum of many exponentials with offset; each exponential is given by it's time=0 value and it's half-time; the offset is to be given as the first parameter; total number of parameters needs to be odd: 1 offset, two for each exponential)
fitGaussian1D() (parameters: it's sigma value and it's peak value (seb: add x-center value as 3rd parameter !!))
fitAny() (you can use any python function that takes a data array and a parameter tuple as model to be fit).
Each of these functions take either just an array of y-values to be fitted or additionally an array of corresponding x-values; if the x-values are not given [0,1,2,3,...] is automatically used.
To calculate a histogram of any (real values) data array you can use either U.histogram() or U.histogramYX().32 Please notice that if you specify too many bins then some might stay at zero count.
Suppose you have a time series of 2d images (array), to plot the mean intensity as a function of time use
Y.plot( U.mean2d(array) ).
There is also
U.min2d()
U.max2d()
U.mm2d()
U.mmm2d()
U.mmms2d(). The last three functions return an array with a first axis contain all or parts of min,max,mean,stddev. All the above functions work for any-dimensional arrays (rank must be >= 2; probably should be > 2) and do the min/max/mean/stddev calculation over each(!) plane/section of the last two dimensions; e.g. for 3d z-y-x stacks for each z value over the y-x sections, for 4d w-z-y-x stacks for each w-z pair over the corresponding y-x section.
U.topPercentile(array, percentile=3) returns the value of the given percentile (counted from the highest values down; with percentile=50 it returns the median)
If the values of a data set are assumed to be gaussian distributed U.FWHM(arr) returns the Full-Width-Half-Max value of that distributions.33
U.phase(complexArr) returns a real valued array with same shape as complexArr containing the complex phase-values of complexArr (it uses numpy's N.arctan2(a.imag, a.real))
U.signal2noise(arr)
U.writeDataSets()
U.thrsh(arr, min=0)
noiseSigma(arr, backgroundMean=None) (check this with Erik Hom)
U.asFloat32(arr) - just a little less typing than N.asarray(arr,N.float32)
4.3.1. "This is really part of numpy"
- U.nd is a "module shortcut" to scipy.nd_image
This module is written by Peter J. Verveer; it contains various functions for multi-dimension image processing. Those include operations for smoothing filter, first and second derivative filter, image segmentation, segmented image measuring functions and much more.
4.3.1.1. wrong: was true when Priithon was using numarray
- U.la is a "module shortcut" to numarray.linear_algebra
This module provides some commonly used linear algebra routines; it uses (if properly installed) very machine optimized implementation provided by system libraries (lapack/blas/atlas). 34 35
( from the numarray docs: )
- U.la.linear_least_squares( a, b, rcond=1e-10)
This function returns the least-squares solution of an overdetermined system of linear equations. An optional third argument indicates the cutoff for the range of singular values (defaults to ). There are four return values: the least-squares solution itself, the sum of the squared residuals (i.e. the quantity minimized by the solution), the rank of the matrix a, and the singular values of a in descending order.
- U.la.solve_linear_equations( a, b)
This function solves a system of linear equations with a square non-singular matrix a and a right-hand-side vector b. Several right-hand-side vectors can be treated simultaneously by making b a two-dimensional array (i.e. a sequence of vectors). The function inverse(a) calculates the inverse of the square non-singular matrix a by calling solve_linear_equations(a, b) with a suitable b.
- U.la.singular_value_decomposition( a, full_matrices=0)
This function returns three arrays V, S, and WT whose matrix product is the original matrix a. V and WT are unitary matrices (rank-2 arrays), whereas S is the vector (rank-1 array) of diagonal elements of the singular-value matrix. This function is mainly used to check whether (and in what way) a matrix is ill-conditioned.
- U.ra is a "module shortcut" to numarray.random_array
This provides a multitude of different random number generators/distributions: like uniform, normal(or gaussian), poisson and more. They are implemented using "ranlib", which is a good quality C implementation of a random-number generator. All functions in U.ra return float64 numbers (randint() returns int32)
U.ra.random(shape=[]) returns a single random number between 0.0 and 1.0 (never returning either 0 or 1) or a whole array of them if you specify a shape (different from []).
U.ra.randint(min,max, shape=[]) does the same returning always integers >= min and <max.
U.ra.permutation(n) returns a random permutation of N.arange(n).
U.ra.normal(mean,stddev,shape=[]) returns normal (gaussian) distributed values.
- U.mlab is a "module shortcut" to numarray.mlab
This provides some functions you might know from matlab which are still missing in the normal na module.
(seb: check: "Masked arrays" are arrays that may have missing or invalid entries)
(seb: check if "numarray.convolve" is useful besides nd_image)
4.4. Priithon: F-module
"F" as in "FFT" or "fields" (think fancy electric or magnetic or scalar fields) 36
F is a collection of two different kinds of functions. Both have inherently to do with either generation or operating on ("big, whole") n-dimensional arrays.
- First there is a set of fft (fast fourier transform) functions
F.fft() transforms a given array and returns (of fills in a specified output array) a complex array of the same shape.
F.ifft() does the inverse operation: it takes a complex array and but returns (or fills in) still a complex valued array of same shape. Your starting array could have also been complex valued and the fact that it might have been real will only be reflected in the output of ifft having small imaginary components (numerical rounding error/machine precision maybe about 1e-6 for float32). If your start array is real valued you might want to use
F.rfft(): This should be twice the speed and use (about) half the memory. Note that the output array is complex valued with different shape: the last axis has half plus one length of the input array (shape=(nz,ny,1+nx/2)). (Y.view() recognized this special shape (if the input shape was squared) and automatically reorders the display so that you see a squared image with origin in the center! Refer to o key in Y.view())
F.irfft takes those complex ("half shaped") arrays and returns the inverse transform as real valued having the original shape. All those functions do the full fft over all dimensions. If you have a stack of 2d images and want the fft done section-wise use the corresponding fft2d, ifft2d, rfft2d and irfft2d functions. There are also functions for 1d. (seb: check 3d)
these function use the fftw library (version 2)
Secondly the F modules contains a collection of test or base arrays (of 1, 2 or 3 dimensions). These are:
F.coneArr()
F.cosSqDiscArr()
F.discArr()
F.gaussianArr()
F.mexhatArr()
F.noiseArr()
F.poissonArr()
F.ringArr()
F.squareArr()
F.zzernikeArr()
Each such function takes a shape argument. If not specified it defaults to F.defshape ( = (256,256) ).
All other arguments all have reasonable default values so that calling any of the above functions give you something nice to look at.
Many functions have arguments for a radius (disc, cone, cosSqDisc, gaussian, maxhat) or two radii (ring) or parameters like mean and/or standard deviation (noise, poisson).
Most of the functions take an orig (origin) argument (a coordinate tuple); it defaults to None which is to mean "center" of the total volume (or area in 2d; or length in 1d); use a scalar as orig and it gets expanded to the full rank, so that 0 means e.g. (0,0) for a 2d array.
F.zzernikeArr() returns a zernike base-function (2d) of given zernike order. 37 The crop argument is either 1 ( yes, do crop at radius ) or 0 (no cropping, extend the polynomials to fill the full (squared) array; this might not be useful!(still one might be curious to take a look))
The functions F.maxNormRadialArr(), F.radialArr(), F.radialPhiArr() are generic work horse functions used to implement the functions above.
F.zeroArrU()
zeroArrF()
zeroArrD()
zeroArrS()
zeroArrI()
zeroArrC()
These function are meant to be (slightly) easier to type than N.zeros(): The capital letter at the end specifies the array type: U - uint16 F - float32 D - float64 S - int16 I - int32 C - complex64
(The choice of types comes from image analysis of large data sets acquired by CCD cameras as 16 unsigned integer pixel values)
And the shape can be given by the sequence of arguments: E.g. use
1 >>> a = F.zeroArrF(256,256) # instead of: N.zeros((256,256), N.float32)
2 >>> b = F.zeroArrC( a.shape ) # a shape-tuple argument also works!
Additionally the F modules contains other functions:
F.poissonize(arr) takes an array and returns a version with added "quantumn noise/shotnoise" (type N.uint16). That is, each pixel in the output array is taken from a random poisson distribution with the input pixel value giving the distribution standard deviation. (That is: the std.dev. is different for each pixel! High values get high noise, low values get low noise - BUT relative noise goes down the higher the input value ("photon count"))
seb TODO:
lowPassGaussFilter2d , convolve, copyPadded, shift
4.5. Scripts
Even thought the main focus of Priithon is it's convenient interactive PyShell interface you would want to write script-files sooner or later. The simplest reason for wanting to do this might be just to preseve a sequence of commands that you found to work well. Then you can "mold" it into a new function and later execute the whole sequence by typing one line.
To do this you start your favorite editor and create a file with it's name ending on ".py". If you then start the file with the first line being
1 from Priithon.all import *
you can from then on use all the familiar module names like in the interactive mode: e.g. N.arange or F.zeroArrF(512,512). To use those new functions you type (assuming you file is called "my1.py" and is in the current directory)
1 import my1
2 my1.myFunction()
You can see that your new file acts just like any other module in Python !
Instead of having to be in the directory where you files are you can set a enviroment variable in your system. I have for example on my Linux system this line in my .bashrc file:
export PYTHONPATH=/home/haase/myPy
Then I put all my little script files in to the myPy folder in my home-directory and Priithon always finds them. (If you have more then one such directory separate them by colons (like: PYTHONPATH=dir1:dir2:dir3)
Now, when you edit a file you need to type
1 reload(my1)
This is NOT a prefect way to do this, because:
It doesn't reload modules that are imported for my1
If you were to (accentially) delete a global name (like a function or a variale) from you file the old definition still "sticks" around
reload only overwrites and adds - it does NOT delete things inside the module
IF you reload multiple modules that also import each other the ORDER MATTERS in which you call the reloads
5. Design
5.1. PyShell
5.2. Swig
5.3. OpenGL
5.4. SciPy: plot
5.5. ScientifiPython
5.6. MatplotLib (pylab)
5.7. MayaVi / VTK / UCSF-Chimera
6. Reference
6.1. N: numerical python modules
todo:
6.2. Priithon
6.2.1. U: "useful" module
6.2.2. Y: wxPython/OpenGL based display module [[FN(),DIR()]]
Y.viewers: list of all viewer windows [(class Priithon.splitND) - None if window got closed]
Y.view(): opens a new (2d image + slider for each addition dimension) viewer window
...function arguments...
A text line above the image shows the current mouse pointer (pixel) position and the corresponding pixel value; also the zoom factor is shown if it is not 1. If the data has more than two dimensions a slider is provided left of the text display; one for each extra dimension, and the same order as the appear in /arr.shape/
key commands:
o cycle through 4 /origin/ modes:
(0,0) at left bottom
(0,0) at left top
(0,0) at center ( for fft images )
(0,0) at center & "double width" ( for rfft images)
c cycle through many /color map/ modes:
gray scale
gray scale /logarithmic/
rainbow red to blue
/circular/ rainbow - good for phase images, where -PI is the same as +PI
"heat like" black body
fast cycling rainbow with 0 black - good to see iso-conturs and gradients
a show /amplitudes/ of complex data set
p show /phases/ of complex data set
f open new viewer showing the ("half-shaped",real-) fft(2d) of the image
F open new viewer showing the inverse fft(2d) of the current image (should be a "half-shaped" fft image)
v open new viewer showing "maximum intensity projection" (/"vomit"/) along z-axis
g cycle through: no /grid/, one pixel grid, ten pixel grid
x open new viewer showing x-z side on view
y open new viewer showing y-z side on view
0 reset zoom to one pixel per data point and move image to left bottom in viewer frame
9 center image in viewer frame d "double" zoom - zoom in
h "half" zoom - zoom out
<page up/down> zoom in/out l toggle histogram with and with out /logarithmic/ /y/-axis
<Home>
arrow keys left,right - walk through /z/ axis (or what ever the higher dimensions are for)
arrow keys up, down - walk through /t/ axis (or what ever the higher dimensions are for)
arrow keys with <shift> OR <control> - shift image by quarter of it's size in the respective direction
Mouse interaction in image part:
press middle mouse button to drag image
press middle mouse button with <shift> or <Ctrl> key to zoom (move mouse up/down)
or use mouse wheel to zoom in/out
right mouse button gives "context menu"
Mouse interaction in histogram part:
press middle mouse button to move histogram graph:
left/right; up/down zooms in/out
also you can use the mouse wheel to zoom (seb: CHECK on Win98 wheel still zooms image)
press left mouse button to change image "scaling" (meaning: it brightness and contrast):
click close to left (red) bracket to change min (black) value,
click close to right red bracket to modify the max (white) value
or click in the center to move them both together.
right mouse button gives "context menu"
V.vd(): return data array of whats displayed in a viewer
Y.varrange()
Y.vLeftClickHorizProfile()
Y.vLeftClickVertProfile()
Y.vLeftClickZProfile()
Y.ploty()
Y.plotxy()
Y.plotDatapoints()
Y.plothold()
Y.plotFigure()
Y.plotRaise()
Y.plotsave()
6.2.3. F: "think (physical) fields" and fft module
6.2.4. Mrc: MRC/Priism file module
The Mrc module provides file I/O functions to read and write Mrc/Priism files. These are used in the Biology and Microscopy community.
The module uses a technology called memory mapping: so after opening a file you get a (special) numpy that is still "connected" to that file; any changes to the array will eventually get written to the file if it was opened as writable (NOT the default!!). To remind you of this "connected-ness" instead of "load" the function is called Mrc.bindFile().
The numpy object returned by those functions all have a special attribute named Mrc; you can access Mrc-specific parts of your data through this attribute: Example:
1 >>> a = Mrc.bindFile("myData.mrc")
2 >>> a.Mrc.info()
3 <some summary of the Mrc-header is printed here>
4 >>> a.Mrc.hdr.title[0]
5 <returns the first (of the 10 possible) titles stored in the Mrc-header>
Mrc.bindFile(): "loads" (rather binds, see above) an Mrc-file to a memmapped array
Mrc.load(): loads(!) an Mrc-file to regular (non-memmapped) array
Mrc.save(): saves an array into a Mrc-file -- you can use the evalHdr argument to set header information (e.g. save(...evalHdr='''hdr.d=(.1,.1,.5)''')
NOTE: Mrc.bindArr() and Mrc.bindNew() are not available anymore since the switch to numpy -- use Mrc.save()
6.2.5. gone: P: obsolete C++ based MRC/Priism file I/O
6.2.6. S: C++ (fortran) based functions (make array from known memory address, think e.g. hardware driver lib for CCD cam)
the S module is mostly to be used via samenamed funtion in U. E.g. U.findMax(). These functions ensure that the array is (if needed copied into a) well-behaved array (i.e. contiguous, native byt-order, ...) as required by C/fortran functions.
Therefore the S module is nort even loaded into the top-level (__main__) module, but is loaded into U (i.e. U.S)
The (maybe!?) only exceptions to the don't use S rule is creating a buffer object from a memory pointer, to be used as buffer in N.ndarray(buffer=...) are
U.S.my_PyBuffer_FromReadWriteMemory()
U.S.my_PyBuffer_FromMemory()
I dno't remember the status / usefulness of
U.S.my_PyErr_SetInterrupt()
6.2.7. Wvl: see wavelet paper
THIS IS *NOT* PART OF PRIITHON !!
This is being given out on special request to UCSF for academic use only
J Microsc. 2005 Aug;219(Pt 2):43-9.
A novel 3D wavelet-based filter for visualizing features in noisy biological data
Moss WC, Haase S, Lyle JM, Agard DA, Sedat JW.
Lawrence Livermore National Laboratory, CA 94550, USA. wmoss@llnl.gov
Summary We have developed a three-dimensional (3D) wavelet-based filter for visualizing structural features in volumetric data. The only variable parameter is a characteristic linear size of the feature of interest. The filtered output contains only those regions that are correlated with the characteristic size, thus de-noising the image. We demonstrate the use of the filter by applying it to 3D data from a variety of electron microscopy samples, including low-contrast vitreous ice cryogenic preparations, as well as 3D optical microscopy specimens.
Exercise
1 >>> import Wvl
2 >>> b1 = F.discArr((512, 512), radius=10, orig=0, valIn=1, valOut=0, wrap=1)
3 >>> b2 = F.discArr((512, 512), radius=20, orig=0, valIn=1, valOut=0, wrap=1)
4 >>> bp1 = N.where(F.poissonArr((512,512), 10)>25, 1,0)
5 >>> c0 =F.convolve(bp1, b1)
6 >>> bp2 = N.where(F.poissonArr((512,512), 10)>25, 1,0)
7 >>> c0+=F.convolve(bp2, b2)
8 >>> c0+=100
9 >>> c = c0 + F.noiseArr((512,512), 2) ## SNR: 1/2=(peak-bgk)/noiseStdDev
10
11 >>> s=512
12 >>> s*2**.5
13 724.077343935
14 >>> s*2**.5 + 3*30 # size * sqrt(2) + 3*wvlOrder
15 814.077343935
16 >>> cc = F.zeroArrF(800,800) # little less ;-)
17 >>> F.copyPadded(c,cc, 100) # bkg is 100
18 >>> ww = F.zeroArrF(800,800)
19
20 >>> Wvl.twoD(cc, ww, 40, nAngles=6, verboseLvl=1)
21 begin angle 0
22 begin angle 15.0
23 begin angle 30.0
24 begin angle 45.0
25 begin angle 60.0
26 begin angle 75.0
27
28 >>> w40 = F.zeroArrF(512,512)
29 >>> F.copyPadded(ww,w40)
30 >>> Y.view(w40)
31
32 >>> Wvl.twoD(cc, ww, 20, nAngles=6, verboseLvl=1)
33 begin angle 0
34 begin angle 15.0
35 begin angle 30.0
36 begin angle 45.0
37 begin angle 60.0
38 begin angle 75.0
39 >>> w20 = F.zeroArrF(512,512)
40 >>> F.copyPadded(ww,w20)
41 >>> Y.view(w20)
- 1
the automatically compiled code runs e.g. loops much faster than Matlab
- 2
of course you can write non interactive scripts also
- 3
One can add new functions written in C/C++ easily (see later)
- 4
because it's interactive
- 5
this might be the big difference where scripting languages like Perl and Ruby cannot compete
- 6
They tend to provide easy access to a relively small set of functions while blocking any access to more complex operations which were not anticipated.
- 7
The very popular shell IPython should also work
- 8
many in optical and electron microscopy use this format
- 9
This is done by a garbage collector
- 10
especially annoying if your editor messes with those "white spaces" ...
- 11
I use tabs so that everyone can set their tab-width to what they like best to look at
- 12
all names: variables,functions,classes,other modules
- 13
To force a reload of a module call the reload(''modulename'') - but there are some issues with reloading ... seb: more
- 14
Multiple instructions can be put in one line with a semicolon (and any number of spaces) in between them - don't use this too much !
- 15
or the function that called the function that called the "faulty" function and so on
- 16
boolean is now also a data type (since python2.2) and file objects and sets(since Python2.3) and ??
- 17
all scalars are always double precision; single precision is provided by Numerical Python for arrays (note: it supports rank-0 arrays which are almost practically scalars)
- 18
the complete list includes: ==, !=, <,>,<>,<=,>=,<=> (returns -1,0,1 for less,equal or greater relationship between the two of operands
- 19
finally! this has always been a so ugly since the old days of BASIC when you had to take those constructs apart and write things like 0<=a and a<1
- 20
don't use ==, since this raises an exception if the two types are not comparable
- 21
Numarray was written by Perry Greenfield, Rick White, Todd Miller, JC Hsu, Paul Barrett, and Phil Hodge at the Space Telescope Science Institute. Numarray is made available under a BSD-style License. See LICENSE.txt in the source distribution for details.
- 22
Conversion/Mixing Numeric and numarray might actually be quite efficient now, as of 2005
- 23
The resize function actually creates a new array and throws the old one away
- 24
The formats that are supported are those supported by PIL
- 25
Very large files here means smaller than about 1GB on 32bit computers
- 26
Internally one of the three described functions is called
- 27
U.saveImg8() forces 8 bit, e.g. in case of tiff which could save 16bit values which isn't supported by many viewers
- 28
it calls a dialog function, that open a window and returns the filename as string
- 29
Both FN() and DIR() are part of the Y-module (because it has to do with graphic user interface), but they are the only two functions in Priithon available "outside" any module
- 30
don't worry about the r in front the string, it means it's a raw-string (see python tutorial section) to allow backslashes (mostly under windows) to be part of the file path name
- 31
It's closely related to numarray/numpy's little appraised "pretty-print" arrayprint module
- 32
U.histogramYX() returns a tuple of two array: the bin-counts and their corresponding x-values.
- 33
that is the array's standard deviation times 2.35482004503
- 34
seb: On some platforms, precompiled optimized versions of the LAPACK and BLAS libraries are preinstalled on the operating system, and the setup procedure needs to be modified to force the lapack_lite module to be linked against those rather than the builtin replacement functions.
- 35
Edit Packages/LinearAlgebra2/setup.py and edit the variables sourcelist, lapack_dirs, and lapack_libs. In sourcelist you should remove all sourcefiles besides lapack_litemodule.c. In lapack_libs you have to specified all libraries needed on your system to successfully link against LAPACK. Often this includes 'lapack' and 'blas'. If you need to specify any non-standard directories to find these libraries, you can specify them in lapack_dirs. Note: A frequent request is that somehow the maintainers of Numerical Python invent a procedure which will automatically find and use the best available versions of these libraries. We welcome any patches that provide the functionality in a simple, platform independent, and reliable way. The scipy project has done some work to provide such functionality, but is probably not mature enough for use by numarray yet.
- 36
Also the fact that array in German is "Feld" might help to remember this letter.
- 37
the extra z is just to ensure that the much more often used zeroArr functions come first in the PyShell popup menu