Python standard library
Python standard library includes about 220 modules and 89 built-in
functions (Python 3.9). Each of these modules designed for specific
purposes and includes several functions (methods) and most of the
users might never use many of them (use help('modules')
to see list of the modules). On the other hand, built-in functions are
very limited and it is not hard to learn and apply most of them.
In this tutorial we will learn more about Python built-in functions and some useful modules for general Python users.
Sources:
You might also like these related articles:
Built-in functions and keywords
In general we can categorize built-in functions to:
- Mathematical:
abs
,divmod
,max
,min
,pow
,round
,sum
- Logical/test:
all
,any
,isinstance
,issubclass
- Structural:
bool
,bytes
,bytearray
,complex
,dict
,float
,frozenset
,int
,list
,set
,str
,type
,tuple
- Applicator:
exec
,eval
,filter
,len
,map
,reversed
,sorted
,slice
,zip
- Iteration:
enumerate
,iter
,next
,range
- In/out:
input
,open
,print
- Character converter:
ascii
,bin
,chr
,format
,hex
,oct
,ord
,repr
- Variable’s scope/location:
dir
,globals
,id
,locals
,vars
- Objects:
callable
,delattr
,getattr
,hasattr
,setattr
- Other:
breakpoint
,classmethod
,compile
,memoryview
,property
,staticmethod
,super
, …
Note that functions require parentheses, for instance
abs(-7)
or range(3)
. Some of the built-in
functions are part of a module that can add the module’s methods to
the objects, for instance open
function from module io
has read
, write
, seek
,
close
and more methods (see examples in the below).
The most common operators are:
- Arithmetic
+
,-
,*
,/
,**
,//
,%
,@
- Indexing:
[
- Sequence operator:
:
- Assignment:
=
,+=
,-=
,*=
,/=
,**=
,//=
,%=
,@=
- Ordering and comparison:
<
,>
,<=
,>=
,==
,!=
The following identifiers are used as reserved words, or keywords of the language, and cannot be used as ordinary identifier:
False await else import pass
None break except in raise
True class finally is return
and continue for lambda try
as def from nonlocal while
assert del global not with
async elif if or yield
Next are some examples for the above functions:
divmod(6,4)
## (1, 2) # (Quotient,Remainder)
6 // 4, 6 % 4)
(## (1, 2)
all([i < 5 for i in [1,2,3]])
## True
all([i < 5 for i in [1,2,10]])
## False
isinstance(i,str) for i in ['a','b',3]]
[## [True, True, False]
list(map(pow, a, [2]*len(a))) # or
list(map(lambda x: x**2, a)) # or
**2 for x in a]
[x## [1, 9, 25]
list(filter(lambda x: x < 5, a)) # or
for x in a if x < 5]
[x ## [1, 3]
= [1,3,5]
a = [2,4,6]
b list(zip(a,b)) # or
list(map(lambda x,y: (x,y), a, b))
## [(1, 2), (3, 4), (5, 6)]
for x in enumerate(b)]
[x ## [(0, 2), (1, 4), (2, 6)]
for i,j in enumerate(b)]
[i ## [0, 1, 2]
for i,j in enumerate(b)]
[j ## [2, 4, 6]
= 4
x eval('2 * x**2 + 6')
## 38
exec('z = 40')
z## 40
locals()/globals()
## Return a dictionary containing the current scope's local/global variables
# For example the following add x_0 = 0, x_1 = 1 and x_2 = 2 to the locals dictionary:
for i in range(3):
locals()['x_%s' % i] = i # or exec('x_%s = %s' % (i,i))
x_0,x_1,x_2## (0, 1, 2)
hex(id(x)) # this is the address of the object x in memory
## '0x1048d7f10'
with open('new_file.txt', 'w') as fw:
'The first line\nThe second line\n')
fw.write(
= open('new_file.txt', 'r')
my_file ## read the openned file from the begining to the end
my_file.read() ## 'The first line\nThe second line\n'
## since we read the file, the cursor is at the end
my_file.read() ## ''
0) ## move the cursor to the begining
my_file.seek(
my_file.read()## 'The first line\nThe second line\n'
0) ## move the cursor to the begining
my_file.seek(## read line by line
my_file.readline() ## 'The first line\n'
my_file.readline()## 'The second line\n'
0) ## move the cursor to the begining
my_file.seek(## read all lines as a list
my_file.readlines() 'The first line\n', 'The second line\n']
[
0) ## move the cursor to the begining
my_file.seek(for line in my_file:
print(line, end = '')
## The first line
## The second line
## we should close the file my_file.close()
Library
Python includes a very extensive standard library that offering a wide range of facilities. We can categorize the below modules as follows:
- Numeric and mathematical:
cmath
,decimal
,math
,random
,statistics
- File formats:
csv
,json
- Generic operating system services:
argparse
,ctypes
,os
,time
- System-specific parameters and functions:
sys
- File and directory access:
glob
,shutil
- Data persistence:
dbm
,pickle
,sqlite3
- Functional programming:
functools
,itertools
,operator
- Text processing services:
re
,readline
,srting
- Data types:
collections
,datetime
- Software packaging and distribution:
venv
- Launching parallel tasks:
concurrent.futures
The following are some of applications of the above modules.
OS
Miscellaneous operating system interfaces (os
) module
provides a portable way of using operating system dependent
functionality.
import os
# Run OS commands
"""
os.system(echo $(date) $(hostname) > date_hname.txt
""")
"""
os.popen(echo $(date) $(hostname)
""").read().strip()
## 'Fri Feb 21 18:19:11 CST 2020 UserHost.local'
"""
os.popen(echo $(date) $(hostname)
echo $HOME
""").readlines()
## ['Fri Feb 21 18:19:11 CST 2020 UserHost.local\n', '/home/user\n']
"""
os.popen(echo $(date) $(hostname)
echo $HOME
""").read().strip().split('\n')
## ['Fri Feb 21 18:19:11 CST 2020 UserHost.local', '/home/user']
# Make directory and navigation
os.getcwd()## '/home/user'
'./new_directory')
os.mkdir('./new_directory')
os.chdir(
os.getcwd()## '/home/user/new_directory'
# Rename (mv) and remove
'../') # move one directory up
os.chdir('./new_directory', './renamed_dir')
os.rename('./renamed_dir')
os.removedirs('<file_name>')
os.remove(
# Env variables
'HOME')
os.getenv(## '/home/user'
## returns all the envs as ENV:PATH os.environ
Glob
Unix style pathname pattern expansion (glob
) module
finds all the pathnames matching a specified pattern according to the
rules used by the Unix shell, although results are returned in
arbitrary order.
import glob
'*.py')
glob.glob(## ['file1.py', 'file2.py']
Sys
System-specific parameters and functions (sys
) module
help to pass arguments and standard inputs. Let’s create the following
Python script called test-sys.py
:
import sys
= sys.argv[0]
filename = sys.argv[1]
usr = sys.stdin
host
print('File name is "%s"' % filename)
print('User "%s" is in %s' % (usr,host.read()))
Now, open a Unix Shell and run:
| python3 test-sys.py buzz
hostname ## File name is "test-sys.py"
## User "buzz" is in hostname.local
When argv[0]
is file name, argv[1]
is
user name (buzz) and host name comes from the Shell pipe as standard
input (stdin
). Sys module has many methods that you may
find them in the Python docs.
Argeparse
Parser for command-line options, arguments and sub-commands
(argparse
) module makes it easy to write user-friendly
command-line interfaces. The program defines what arguments it
requires, and argparse
will figure out how to parse those
out of sys.argv
. Lets create a script
(reverse-file.py
) that read text files and print in
reverse order (bottom to top):
import argparse
import sys
= argparse.ArgumentParser(description = 'Read a file in reverse')
parser 'filename', help = 'the file to read')
parser.add_argument('-v', '--version', action = 'version', version = '%(prog)s 1.0', help = 'show program version and exit')
parser.add_argument('-l', '--limit', type = int, help = 'the number of lines to read')
parser.add_argument(
= parser.parse_args()
args
try:
= open(args.filename)
f except FileNotFoundError as err:
print("Error:", err)
2)
sys.exit(else:
with open(args.filename, 'r') as f:
= f.readlines()
lines
lines.reverse()
if args.limit:
= lines[:args.limit]
lines
for line in lines:
print(line.strip())
Now we can use the script to read files in reverse. We can use
-h
or -v
options to see help and
version:
python3 reverse-file.py -h
usage: reverse-file.py [-h] [-v] [-l LIMIT] filename
Read a file in reverse
positional arguments:
filename the file to read
optional arguments:
-h, --help show this help message and exit
-v, --version show program version and exit
-l LIMIT, --limit LIMIT the number of lines to read
To read last 2 lines of a file and print in reverse order we can run:
python3 reverse-file.py -l 2 new_file.txt
## The second line
## The first line
JSON
JSON encoder and decoder (json
) module read and write
JSON files.
import json
# Read
with open('./input.json', 'r') as jsf:
= json.load(jsf)
input_json
# Write
= [{'title': 'Monty Python and the Holy Grail', 'year': [1975, 'March 14']}]
list_dict with open('output.json', 'w') as jso:
json.dump(list_dict, jso)
# Serialize to a JSON formatted str
= json.dumps(list_dict) js
Note that we read files by system arguments (sys.argv
)
by using with open(sys.argv[1], 'r') as jsf
. Also, we can
read use standard inputs (sys.stdin
) to read the files.
For instance the following script, jread.py
, read the
output.json
(from the above example) and print
titles:
import json
import sys
= json.load(sys.stdin)
data
for i in data:
print(i['title'])
Now, open a Unix Shell and run:
cat output.json | python jread.py
## Monty Python and the Holy Grail
Pickle
Python object serialization (pickle
) module implements
binary protocols for serializing and de-serializing a Python object
structure. You might prefer JSON to pickle for many reasons such as
security and human readability. Learn more about pickle in here.
import pickle
# Read
= with open('input.pickle','rb') as pkl:
data = pickle.load(pkl)
input_pkl
# Write
= {'title': 'Monty Python and the Holy Grail', 'year': [1975, 'March 14']}
dict_data with open('output.pickle', 'wb') as pkl:
pickle.dump(dict_data, pkl)
# Serialize to a pickle formatted str
= pickle.dumps(dict_data)
pkl print(pkl)
## b'\x80\x04\x95H\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x05title\x94\x8c\x1fMonty Python and the Holy Grail\x94\x8c\x04year\x94]\x94(M\xb7\x07\x8c\x08March 14\x94eu.'
CSV
CSV file reading and writing (csv
) module read and
write CSV files.
import csv
# Read to list
with open('./input.csv', 'r') as fl:
= list(csv.reader(fl))
csv_list
print(csv_list)
## [['a', 'b', 'c', 'd'], ['22', 'yes', '5', '0'], ['34', 'no', '7', '8']]
# Write from list
with open('output1.csv', 'w') as nfl:
= csv.writer(nfl)
csv_writer
csv_writer.writerows(csv_list)
# Read to dict
with open('./input.csv', 'r') as fl:
= list(csv.DictReader(fl))
csv_dict
print(csv_dict)
## [{'a': '22', 'b': 'yes', 'c': '5', 'd': '0'}, {'a': '34', 'b': 'no', 'c': '7', 'd': '8'}]
# Write from dict
with open('output2.csv', 'w') as nfl:
= csv.DictWriter(nfl, fieldnames = csv_dict[0].keys())
csv_fl
csv_fl.writeheader() csv_fl.writerows(csv_dict)
We can also read CSV files as standard inputs
(sys.stdin
) or as system arguments
(sys.argv
). For instance the following script,
csvread.py
, read the output1.csv
(from the
above example) by sys.stdin
:
import csv
import sys
= list(csv.reader(sys.stdin))
data
for row in data:
print(row)
Now, open a Unix Shell and run:
cat output1.csv | python csvread.py
SQLite3
Interface for SQLite (sqlite3
) module provides a SQL
interface compliant. Example from Python
documentation.
import sqlite3
= sqlite3.connect('example.db')
conn = conn.cursor()
c
# Create table
"""CREATE TABLE stocks
c.execute( (date text, trans text, symbol text, qty real, price real)""")
# Insert a row of data
"INSERT INTO stocks VALUES ('2006-01-05','BUY','RHAT',100,35.14)")
c.execute(
# Save (commit) the changes
conn.commit()
# We can also close the connection if we are done with it
# just be sure any changes have been committed or they will be lost
conn.close()
Collections
Container datatypes (collections
) module implements
specialized container datatypes providing alternatives to Python’s
general purpose built-in containers, dict
,
list
, set
, and tuple
.
import collections
# Example 1
= collections.defaultdict(list)
dict_list for i in list(range(2))*3:
1)
dict_list[i].append(
dict(dict_list)
## {0: [1, 1, 1], 1: [1, 1, 1]}
for k in dict_list:
= sum(dict_list[k])
dict_list[k]
dict(dict_list)
## {0: 3, 1: 3}
# Example 2
= [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
s = collections.defaultdict(list)
d for k,v in s:
d[k].append(v)
sorted(d.items())
## [('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]
dict(sorted(d.items()))
## {'blue': [2, 4], 'red': [1], 'yellow': [1, 3]}
Time
Time access and conversions (time
) module provides
various time-related functions.
import time
= time.localtime() # a time tuple expressing local time
local_time
local_time## time.struct_time(tm_year=2021, tm_mon=5, tm_mday=6, tm_hour=22, tm_min=3, tm_sec=32, tm_wday=3, tm_yday=126, tm_isdst=1)
local_time.tm_year## 2021
"%X", local_time) # convert a time tuple to a string according to a format specification
time.strftime(## '22:03:32'
"%Y-%m-%d %H:%M:%S") # the default tuple is localtime()
time.strftime(## '2021-05-06 22:10:03'
# return the current time in seconds since the Epoch
time.time() ## 1620356510.8557692
# convert a time tuple in local time to seconds since the Epoch
time.mktime(local_time) ## 1620356612.0
= time.localtime()
local_time2 = time.mktime(local_time2) - time.mktime(local_time) # time difference in sec
difference
difference## 452.0
# convert seconds since the Epoch to a time tuple
time.gmtime(difference) ## time.struct_time(tm_year=1970, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=7, tm_sec=32, tm_wday=3, tm_yday=1, tm_isdst=0)
"30 Nov 20", "%d %b %y") # parse a string to a time tuple according to a format specification
time.strptime(## time.struct_time(tm_year=2020, tm_mon=11, tm_mday=30, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=0, tm_yday=335, tm_isdst=-1)
Commonly used time format codes:
%Y
Year with century as a decimal number.%m
Month as a decimal number [01,12].%d
Day of the month as a decimal number [01,31].%H
Hour (24-hour clock) as a decimal number [00,23].%M
Minute as a decimal number [00,59].%S
Second as a decimal number [00,61].%z
Time zone offset from UTC.%a
Locale’s abbreviated weekday name.%A
Locale’s full weekday name.%b
Locale’s abbreviated month name.%B
Locale’s full month name.%c
Locale’s appropriate date and time representation.%x
Locale’s appropriate date representation.%X
Locale’s appropriate time representation.%p
Locale’s equivalent of either AM or PM.%I
Hour (12-hour clock) as a decimal number [01,12].
Itertools
Functions creating iterators for efficient looping
(itertools
) module implements a number of iterator
building blocks.
import itertools
# Combinatoric iterators
list(itertools.combinations('ABC', 2))
## [('A', 'B'), ('A', 'C'), ('B', 'C')]
list(itertools.combinations_with_replacement('ABC', 2))
## [('A', 'A'), ('A', 'B'), ('A', 'C'), ('B', 'B'), ('B', 'C'), ('C', 'C')]
list(itertools.permutations('ABC', 2))
## [('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'C'), ('C', 'A'), ('C', 'B')]
list(itertools.product('ABC', repeat = 2))
## [('A', 'A'), ('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'B'), ('B', 'C'), ('C', 'A'), ('C', 'B'), ('C', 'C')]
= [1,3]
a = [2,4]
b
list(zip(a,b))
## [(1, 2), (3, 4)]
list(itertools.permutations(a+b,2))
## [(1, 3), (1, 2), (1, 4), (3, 1), (3, 2), (3, 4), (2, 1), (2, 3), (2, 4), (4, 1), (4, 3), (4, 2)]
# Accumulate
= [1,3,5,7,9,11,13]
mylist list(itertools.accumulate(mylist))
## [1, 4, 9, 16, 25, 36, 49]
RE
In Regular expression operations (re
) module, we
specify the rules for the set of possible strings that you want to
match; this set might contain English sentences, or e-mail addresses,
or TeX commands, or anything you like (from Python
HOWTOs).
import re
'begin', 'begin with this example for beginning') # Find all 'begin's in text
re.findall(## ['begin', 'begin']
'^begin', 'begin with this example for beginning') # Find 'begin' only at the beginnig (^) of text
re.findall('begin']
[
'begin$', 'this is another begin') # Search only last word ($)
re.findall(## ['begin']
'.*begin', 'this is another begin') # Search 'begin' and everything before (.*)
re.findall(## ['this is another begin']
'goo?al','goal vs goooooal') # Zero or one (?) 'o' character
re.findall(## ['goal']
'goo*al','goal vs goooooal') # Zero or more (*) 'o' character
re.findall(## ['goal', 'goooooal']
'goo+al','goal vs goooooal') # One or more (*) 'o' character
re.findall(## ['goooooal']
'\d', 'today is Oct 10') # Find digits (\d)
re.findall(## ['1', '0']
'\d{2}', 'today is Oct 10') # Find two digits (\d{2})
re.findall(## ['10']
'\w', 'today is Oct 10') # Find any words (\w)
re.findall(## ['t', 'o', 'd', 'a', 'y', 'i', 's', 'O', 'c', 't', '1', '0']
'\w+', 'yesterday was October 9') # Find any word with one or more (+) characters
re.findall(## ['yesterday', 'was', 'October', '9']
'\w{4}\w*', 'yesterday was October 9') # Find any word with 4 letters or more
re.findall(## ['yesterday', 'October']
'[A-Z]..', 'yesterday was October 9') # Find any capital word ([A-Z]) and two characters after (..)
re.findall(## ['Oct']
'([A-Z][a-z]* \d+)', 'yesterday was October 9') # Find any capital letter ([A-Z]) followed by two small letters ([a-z]{2}) and a space ( ) and two digits (\d{2})
re.findall(## ['October 9']
'(?<=; )[\w ]*', 'I want everything after; this part is important') # After (\w = [a-zA-Z0-9_])
re.findall(## ['this part is important']
'[\w ]+(?=;)', 'I want everything before; this part is not important') # Before
re.findall(## ['I want everything before']
= re.compile('[\w ]*(?=;)') # We can keep the pattern in this way
pattern = re.match(pattern, 'I want everything before; this part is not important') # Before
mm 0)
mm.group(## 'I want everything before'
Other packages
Beside the standard library, we can add numerous packages to the Python toolkit. The following are some of the most famous Python packages that can empower your toolbox.
- SymPy: a library for symbolic mathematics
- Matplotlib: a comprehensive 2D plotting library
- IPython/Jupyter: an enhanced interactive console
- Pandas: a tool for working with tabular data (DataFrame), such as data stored in spreadsheets or databases
- NumPy: a base N-dimensional array package - useful linear algebra
- SciPy: a Python-based ecosystem of open-source software for mathematics, science, and engineering
- Numba: an open source JIT compiler that translates a subset of Python and NumPy code into fast machine code
- Cython: a library for writing C extensions for Python as easy as Python itself
- mpi4py: it provides Python bindings for the Message Passing Interface (MPI) standard
- SciKit-Learn: a library for machine learning
- TensorFlow: a library for fast numerical computing and deep learning
- Dask: it natively scales Python and provides advanced parallelism for analytics
- Redis: is Python interface to the Redis key-value store
- PySpark: is Python API for Apache Spark
Review here to learn about virtual environments and package manager systems for Python.