Python data structures
Python is an easy to learn, powerful programming language. It has efficient high-level data structures and a simple approach to object-oriented programming. To learn Python, first we need to learn how Python treats with data. In this article we will learn about data types and structures in Python 3 through several examples. You may find more about Python programming at:
- Programming with Python
- Plotting and Programming in Python
- Analysis pipelines with Python
- The Python Tutorial
- Function programming in Python
Python object types
Most common types in Python are:
- Strings
str
- Numbers
int
,float
,complex
- Booleans
bool
- None
NoneType
- Lists
[]
- Tuples
()
- Sets
{}
- Dictionaries
{key:value}
for element in ['a', True, None, 123, 0.777, 8j, [1,2], (1,2), {1,2}, {'key':1}]:
print(type(element))
## <class 'str'>
## <class 'bool'>
## <class 'NoneType'>
## <class 'int'>
## <class 'float'>
## <class 'complex'>
## <class 'list'>
## <class 'tuple'>
## <class 'set'>
## <class 'dict'>
To learn more about the built-in types, review Python standard types in here.
Characteristics
Strings, tuples and lists can be concatenated:
'some text ' + 'MORE TEXT'
## some text MORE TEXT
'repetition ' * 3
## repetition repetition repetition
# Lists
1,3,'five'] + [7,9]
[## [1, 3, 'five', 7, 9]
1,3,'five'] * 2
[## [1, 3, 'five', 1, 3, 'five']
# Tuples
2,4,'six') + (8,10)
(## (2, 4, 'six', 8, 10)
2,4,'six') * 2
(## (2, 4, 'six', 2, 4, 'six')
Lists, dictionaries and sets are mutable. Mutable
objects can change their value but keep the same object (same
id()
):
= [1,2,3]
a id(a)
## 4504081480
+= [4,5]
a
a## [1, 2, 3, 4, 5]
id(a)
## 4504081480
Numbers, strings and tuples are immutable. An object with a fixed value:
= 2
a id(a)
## 4499865296
+= 3
a
a## 5
id(a)
## 4499865392
Strings, tuples, lists and dictionaries are subscriptable objects:
= 'Python'
sw 0]
sw[## 'P'
0:1]
sw[## 'Py'
2]
sw[::## 'Pto'
= ['turtle','polar bear','elephant','penguin']
names 2]
names[:## ['turtle', 'polar bear']
In general, numbers in the indexing square brackets can be in one of the following formats:
[element]
[begin:end] [begin:end:step]
When negative numbers can be interpreted as inverse actions:
-1] # last element
sw[## 'n'
-1] # step 1 but in inverse order
sw[::## 'nohtyP'
-2] # everything before the last two elements
names[:## ['turtle', 'polar bear']
-1]
names[::## ['penguin', 'elephant', 'polar bear', 'turtle']
And empty clones could interpret as all:
sw[:]## 'Python'
sw[::]## 'Python'
names[:]## ['turtle', 'polar bear', 'elephant', 'penguin']
As a summary review the following table:
Type | Concatenate | Subscriptable | Mutable |
---|---|---|---|
Number | No | No | No |
String | Yes | Yes | No |
Tuple | Yes | Yes | No |
List | Yes | Yes | Yes |
Dict | No | Yes | Yes |
Set | No | No | Yes |
Conversion
We can use the following commands to convert objects to other types:
str()
create a stringint()
create an integerfloat()
create a floatcomplex()
create a complex numberbool()
create a booleanlist()
create a listtuple()
create a tupleset()
create a setdict()
create a dictionary
str(5) + ' five'
## 5 five
int('5') + 5
## 10
float('5') + 5
## 10.0
list((2,4,6))
## [2, 4, 6]
list({'a':1,'b':2})
## ['a', 'b']
list({'a':1,'b':2}.values())
## [1, 2]
tuple([1,3,5])
## (1,3,5)
set([1,3,5,1,3,5])
## {1, 3, 5}
dict([('a',1),('b',2),('c',3)])
## {'a': 1, 'b': 2, 'c': 3}
Data structures
There are five major data structures in Python:
- Strings:
srt()
,' '
- Lists:
list()
,[]
- Tuples:
tuple()
,()
- Sets:
set()
,{}
- Dictionaries:
dict()
,{key:value}
Strings
One of the way that data can be stored in Python is strings and as
we discussed they are immutable and
subscriptable objects that can be
concatenated together. In python, any thing inside
single or double quotes (' '
or " "
)
considers as string. There are several methods available for strings
and the following are some of the main methods for strings:
str.capitalize()
capitalizestr.title()
titlecasedstr.lower()
lowercasestr.upper()
uppercasestr.find(x)
find index of character xstr.index(x)
index of character x (similar to.find(x)
if x is in the string)str.count(x)
count how many times x repeatedstr.replace(x,y)
replace character x with ystr.split(x)
split an string to a list of strings based on the separator x (can be empty)str.join(x)
join list of strings or string x to make an string by a separator - opposite of.split()
str.startswith(x)
True if the string starts with x characterstr.endswith(x)
True if the string ends with x characterstr.strip()
removing whitespace from the beginning and endingstr.center('chr', num)
see an example in the below
For example:
= 'python'
name
'p')
name.startswith(## True
name.capitalize()## 'Python'
name.upper()## 'PYTHON'
'p')
name.index(## 0
= name.replace('n', 'n3').replace('p', ',p')
name2
name2## ',python3'
= name + name2
nm
nm## 'python,python3'
= nm.split(',')
nm_split
nm_split## ['python', 'python3']
','.join(nm_split)
## 'python,python3'
'new york'.title()
## 'New York'
' This is a test '.center(30,'=')
## '======= This is a test ======='
Lists
A list is a set of objects enclosed by a set of square brackets
([]
). Lists are mutable, and their
elements are usually homogeneous and are accessed by
iterating over the list.
= [1,3,5,7]
ls 0] = 100 # Lists are mutable
ls[
ls## [100, 3, 5, 7]
# Lists can hold any type of item
= [1,True,None,['word',123],'test',(0,1),{'name id': 7}]
example
# Indexing
1:3]
ls[## [3, 5]
3][1]
example[## 123
Here are main lists methods:
list.append(x)
append xlist.extend(x)
or+=
extend/add xlist.insert(i,x)
insert x to index ilist.remove(x)
remove xlist.pop(i)
pop out and remove item at index i (similar todel(list[i])
)list.pop()
pop out and remove the last itemlist.sort()
sortlist.reverse()
reverse the orderlist.count(x)
count number of times x is repeatedlist.index(x)
find index of item xlist.copy()
copy listlist.clear()
clear list
= [1,4,5]
a += [2,3]
a
a## [1, 4, 5, 2, 3]
6,7])
a.append([
a## [1, 4, 5, 2, 3, [6, 7]]
6,7])
a.remove([
a## [1, 4, 5, 2, 3]
6,7])
a.extend([
a## [1, 4, 5, 2, 3, 6, 7]
a.sort()
a## [1, 2, 3, 4, 5, 6, 7]
a.reverse()
a## [7, 6, 5, 4, 3, 2, 1]
5)
a.count(## 1
5)
a.index(## 2
a.clear()
a## []
Be cautious when set a list equal to another list. It might change list unintentionally, see the example in below:
= [1, 2, 3]
list1 = list1
list2 += [4, 5]
list2
list1## [1, 2, 3, 4, 5]
list2## [1, 2, 3, 4, 5]
id(list1)
## 139724935851336
id(list2)
## 139724935851336
As we can see, by changing list2
, list1
is changing automatically since list1 = list2
- they have
a same ID as well.
We can work on list2
without changing
list1
by using .copy()
or clone
[:]
. For instance:
= [1,2,3]
list1 = list1.copy() # or list2 = list1[:]
list2 += [4,5]
list2
list1## [1, 2, 3]
list2## [1, 2, 3, 4, 5]
id(list1)
## 139724935851400
id(list2)
## 139724935851208
Iterating through lists:
**2 for x in range(5)]
[x## [0, 1, 4, 9, 16]
**2 for x in range(5) if x**2 < 10]
[x## [0, 1, 4, 9]
= []
nested = [1,2,3]
p for i in p:
for x in p])
nested.append([(x,i)
nested## [[(1, 1), (2, 1), (3, 1)],
## [(1, 2), (2, 2), (3, 2)],
## [(1, 3), (2, 3), (3, 3)]]
Tuples
A tuple consists of a number of values separated by commas. Though tuples may seem similar to lists, they are often used in different situations and for different purposes. Tuples are immutable, and usually contain a heterogeneous sequence of elements that are accessed via unpacking or indexing. Tuples may be input with or without surrounding parentheses.
= 1399,'hello',1400 # It might include parentheses or not
tp type(tp)
## <class 'tuple'>
tp## (1399, 'hello', 1400)
0]
tp[## 1399
0] = 1390 # Tuples are immutable
tp[## Traceback (most recent call last):
## File "<stdin>", line 1, in <module>
## TypeError: 'tuple' object does not support item assignment
= 'hello', # A tuple should include at least a comma (,)
singleton
singleton## ('hello',)
Since tuples are immutable, there are only two methods:
tuple.count(x)
count number of times x repeatedtuple.index(x)
find index of item x
By tuples we can change the variables at the same time. Let assume
we have two variables a = 10
and b = 20
and
want a = b = 20
and b = a + b = 30
. For
example:
= 10
a = 20
b = b
a = a + b
b
(a,b)## (20, 40) # we wanted (20, 30)
## Using tuple
= 10
a = 20
b = (b, a + b) # or a, b = b, a + b
a, b
(a,b)## (20, 30)
Dictionaries
Dictionaries (also called dicts) are key data structure including a set of keys and values. Unlike sequences (e.g. lists and tuples) which are indexed by a range of numbers, dictionaries are indexed by unique and immutable keys. At the same time, values of the list could be any type (mutable or immutable) and duplicated. The main operations on a dictionary are storing a value with some key and extracting value by given key.
= {}
example type(example)
## <class 'dict'>
'first key'] = 'value'
example[2] = 'two'
example['third key'] = 3
example[
example## {'first key': 'value', 2: 'two', 'third key': 3}
'first key']
example[## 'value'
Here are some of dictionaries methods:
dict.update()
update/add itemsdict.popitem()
remove the last itemdict.pop(k)
remove item with key kdict.keys()
return keysdict.values()
return valuesdict.items()
return itemsdict.get(k)
return value for key kdict.copy()
copy dictdict.clear()
clear dict
For example:
= {'new york': 'NY', 'missouri': 'MS', 'california': 'CA'}
state
list(state.items())
## [('new york', 'NY'), ('missouri', 'MS'), ('california', 'CA')]
list(state.keys())
## ['new york', 'missouri', 'california']
list(state.values())
## ['NY', 'MS', 'CA']
'missouri')
state.get(## 'MS'
'missouri': 'MO', 'Texas': 'TX' })
state.update({
state## {'new york': 'NY', 'missouri': 'MO', 'california': 'CA', 'Texas': 'TX'}
'missouri')
state.get(## 'MO'
Iterating through dicts:
## Example 1: counting and sorting
= ['blue', 'red', 'white', 'green', 'green', 'red', 'red', 'white', 'red', 'green']
color = {x:color.count(x) for x in color}
col
col## {'blue': 1, 'red': 4, 'white': 2, 'green': 3}
= {k:col[k] for k in sorted(col,key = col.get,reverse = True)}
c
c## {'red': 4, 'green': 3, 'white': 2, 'blue': 1}
## Example 2: make dict elements uppercase
for x,y in c.items()}
{x.upper():y ## {'RED': 4, 'GREEN': 3, 'WHITE': 2, 'BLUE': 1}
## Example 3: filtering dicts by value
= {'a': 10, 'b': 12, 'c': 20}
d for x,y in d.items() if y >= 12}
{x:y ## {'b': 12, 'c': 20}
For another example, let’s consider a list of dictionaries including production rate of two products, id 23 and id 35, in years 2005 and 2010:
= [{'id':23,'year':2005,'rate':2305},{'id':35,'year':2005,'rate':3505},{'id':23,'year':2010,'rate':2310},{'id':35,'year':2010,'rate':3510}] production
We can make a dictionary of production rates for each
id_year
combination such that:
= {}
annual_rates for i in production:
'%s_%s' % (i['id'],i['year'])] = i['rate']
annual_rates[
annual_rates## {'23_2005': 2305, '35_2005': 3505, '23_2010': 2310, '35_2010': 3510}
To make a dictionary of list of the production rates over years, we
can use collections
module.
defaultdict(list)
makes a default dictionary of lists
such that:
import collections
= collections.defaultdict(list)
annual_rates for i in production:
'id']].append(i['rate'])
annual_rates[i[
dict(annual_rates)
## {23: [2305, 2310], 35: [3505, 3510]}
Now let’s find total production over years:
= {}
annual_rates_total for i in annual_rates:
= sum(annual_rates[i])
annual_rates_total[i]
annual_rates_total## {23: 4615, 35: 7015}
Also, the dict()
constructor can accept an iterator
that returns a finite stream of (key, value)
tuples:
= [('Italy', 'Rome'), ('France', 'Paris'), ('US', 'Washington DC')]
L dict(L)
## {'Italy': 'Rome', 'France': 'Paris', 'US': 'Washington DC'}
Sets
A set is an unordered collection with no duplicate elements. Basic uses include membership testing and eliminating duplicate entries. Set objects also support mathematical operations like union, intersection, difference, and symmetric difference.
= {12,13,13,15,12,15}
st
st## {12, 13, 15}
1]
st[## Traceback (most recent call last):
## File "<stdin>", line 1, in <module>
## TypeError: 'set' object does not support indexing
Note: to create empty sets use set()
because {}
considered as empty dictionary in python.
for element in [[], (), {}]:
print(type(element))
## <class 'list'>
## <class 'tuple'>
## <class 'dict'>
Sets have several methods including set operations such as:
set.add(x)
: add a member to the setset.update(x)
: add a set/list to a setset.remove(x)
: remove a member of the setset.pop()
: pop out and remove the first memberset.union(x)
: union of a set/list to a setset.intersection(x)
: intersection of a set/list to a setset.difference(x)
: difference of a set/list to a set
= {1,2,3}
s = {3,4,5}
t
s.union(t)## {1, 2, 3, 4, 5}
s.intersection(t)## {3}
s.difference(t)## {1,2}
Iterating through sets:
**2 for x in [1,2,2,1,3]}
{x## {1, 4, 9}