Python basics Mon Feb 13 10:04:49 EST 2012 This is a short summary of a small part of Python; it shows mostly the common things and a few that I have trouble remembering. I am not a Python expert; caveat lector. It is very easy to experiment with Python; just type the command and start entering code. As long as you maintain consistent indenting and add one extra empty line after a block, it will execute on the fly. It is also easy to get help: run Python and type help at the prompt. Program structure: ============================================ A program usually has to import a number of modules; these are among the more common: import sys, string, re, os, fileinput Variables are typeless, but Python keeps track of what you've stored in one and it won't let you away with silly constructs, like comparing numbers to strings or adding them. You will have to use conversion functions like string.atof() (which require that the string be a number, not just have a numeric prefix). Variables are by initializing them, or by a global declaration within a function to state that a variable is external; by default variables are local to their functions. Variables must be initialized; test whether a variable has a value with if v != None: String constants are quoted with '...' or "..." and backslashes are interpreted. Raw strings are written r'...' or r"..." and backslashes are not interpreted, so they are good for things like regular expressions. Triple quoted strings ''' and """ can include bare newlines and are often used for documentation. Lists take the place of arrays for casual use. An empty list is defined with v = [] and a non-empty list with v = [ 'val1', 'val2' ] and elements are accessed with v[index] where indices run from 0 to len(v)-1. Add new elements at the end with v.append(val). There are a lot of other useful list operators, and you can use slices on lists as well to pick up a range of subscripts. Tuples are in effect immutable lists, but defined with () instead of []. Functions can return tuples; tuples can also be used in assignments, like (r, c) = get_row_col() Dictionaries == hash tables. dict = {} # empty dict[anything] = whatever if dict.has_key(anything): This creates a dictionary of 4 elements by spelling out the values: kw = { "if": 1, "for": 2, "while": 3, "else": 4 } Relational operators are as in C; comparisons seem to ignore the type info so carefully maintained for arithmetic. Control flow is unusual at least for the C tradition. Grouping is indicated by indentation only, and control flow constructs all use ":", as in if whatever: ... elif whatever: ... else: ... while expression: ... for i in range(min,max): for i in dict.keys(): # or whatever ... Indentation must be completely consistent throughout a file. Two or three spaces seems conventional. Within a loop, break, as in C continue, as in C Anywhere: pass is in effect a null statement, like bare ; in C Functions definitions give name and argument list: def name(arg1, arg2, arg3=defaultval, arg4=defaultval): statements of function def div(a, b): ''' computes quotient & remainder. b had better be > 0''' q = a / b r = a % b return (q, r) # returns a tuple Arguments can include named arguments and there are mechanisms for variable length lists and arrays of arguments. When calling a function, named arguments can appear (or not appear) in any order. Variables are local but you can read a global variable. Local names hide global ones without warning; watch out. def foo(args): global v1 # permits changing v1 Exceptions: try: whatever except: recover from any error Classes are defined with "class": class Stack: def __init__(self): # constructor self.stack = [] # local var def push(self, obj): self.stack.append(obj) def pop(self): return self.stack.pop() # array.pop def len(self): return len(self.stack) stk = Stack() stk.push("foo") if stk.len() != 1: print "error" if stk.pop() != "foo": print "error" del stk Member functions are always defined with "self" but that doesn't appear in the call. Within a class function function, you must use self.member to access the member value; undecorated names are just local variables. Class members are visible; there's not much hiding. Commandline arguments: ======================================= echo, brute force: import sys for i in range(1, len(sys.argv)): if i < len(sys.argv): print sys.argv[i], # comma suppresses newline else: print sys.argv[i] Input and output: ==================================== Simple open file and read it: import fileinput, sys def main(): lines = [] try: f = open("foo") # open(file, 'w+') to create & write except: print 'error' return line = f.readline() while (line != ""): lines.append(line[:-1]) line = f.readline() print lines f.close() main() Call function count() on each input line, from stdin or a list of files: wc = {} # empty dictionary def count(f): global wc # do something to wc def main(): if len(sys.argv) == 1: count(sys.stdin) else: for i in range(1,len(sys.argv)): f = open(sys.argv[i]) count(f) f.close() for i in wc: print "%s %s" % (wc[i], i) main() The print statement prints a list of expressions followed by a newline; if the list ends with ",", no newline is printed (hack). The % operator joins a format string and a list to produce a formatted string; this provides the same effect as printf, as in print "%d %s" % (wc[i], i) The function raw_input([prompt]) prints an optional prompt string and returns the user's response; the function getpass() gets a password without echoing it: import getpass pw = getpass.getpass() Associative arrays: =============================================== The classic "add up name-value pairs" example: import sys, string val = {} # empty dictionary def count(f): global val line = f.readline() while (line != ""): #line = line.strip() (n, v) = line.strip().split() if val.has_key(n): val[n] += string.atof(v) else: val[n] = string.atof(v) line = f.readline() def main(): if len(sys.argv) == 1: count(sys.stdin) else: for i in range(1,len(sys.argv)): f = open(sys.argv[i]) count(f) f.close() for i in val.keys(): print "%s\t%g" % (i, val[i]) main() Compute word frequency, but just by reading stdin into a giant string: wd = {} buf = sys.stdin.read() wordlist = string.split(buf) for word in wordlist: if wd.has_key(word): wd[word] = wd[word] + 1 else: wd[word] = 1 for k, v in wd.iteritems(): print k, v String manipulation: ==================================================== Concatenate strings with + s = 'abc" + "def" Use "..." % (list of exprs) for more complicated string construction. The string class has lots of functions, and some things are taken care of by slicing. If s is a string, s[0] is the first character s[1:] is all but the first character s[:-1] is all but the last character s[1:-1] drops first and last characters Slicing works for lists as well. Regular expressions: ==================================================== import re r'...' is a quoted string that doesn't need an extra level of backslashes: r_int = r'(\d+)' r_num = r'(\d+\.\d*|\.\d+|\d+)' s = re.sub(r',(\d\d\d)', r'\1', s) # 12,345 -> 12345 s = re.sub(r',(\d\d|\d)', r'.\1', s) # 12,34 -> 12.34 Fine print in manual: alternation goes left to right (not in parallel!!) and stops when it has a match. One must write carefully to get longest match. By default, re.sub replaces *all* instances; there's a fourth count argument to set a limit. Substrings matched parts are saved for later use in $1, $2, ... s/(\S+)\s+(\S+)/\2 \1/ swaps first two words Qualifiers can be put inside the re; they include g for global and i for case-insensitive XXX i think Shorthands \d = digit, \D = non-digit \w = "word" character, i.e., [a-zA-Z0-9_], \W = non-word char \s = whitespace char, \S = non-whitespace \b = word boundary, \B = non-boundary Gotchas and features: =================================================== This list is far from complete: indentation for grouping; always need ":" for conditionals no implicit conversions in arithmetic expressions though seem to be for string comparisons tup = (...) to define a tuple; list = [...] for a list, dict = {...} for a dictionary but access elements with [...] for all of these elif, not else if no do-while no ++, no --, no ?: function arguments passed call by reference (!) need global declaration to access non-local vars in functions if v != None: needed to test for uninitialized variable all lines in a single string: buf = sys.stdin.read() reads all input lines but does not read them one at a time for i in dict is DIFFERENT from for i in dict.keys() regular expressions are not leftmost longest re.match is anchored, re.sub replaces all by default Other random things added over time: ================================== How to make a #! line for python that's works on systems with python in different locations? According to the python FAQ (http://www.python.org/infogami-faq/library/ how-do-i-make-a-python-script-executable-on-unix/), #! /usr/bin/env python How to make a module and use it: in A.py: import sys class Ac: def p(self): print "this is Ac" in B.py: import sys print sys.path import A a = A.Ac() a.p()