Class 1: Python Refresher, Data Structures, Numpy

Class 1: Python Refresher, Data Structures, Numpy#

Goal of today’s class:

Make sure we’re all on the same page with Python
- Review the basics, discuss types, operations, data structures
Demonstrate best practices for writing functions, both in notebook environments and as scripts
Start discussing the basics of data summarization + communication (matplotlib)

Acknowledgement: Much of the material in this lesson is based on a previous course offered by Matteo Chinazzi and Qian Zhang.

Come in. Sit down. Open Teams.
Make sure your notebook from last class is saved.
Run python3 git_fixer2.py
Github:
- git add … (add the file that you changed, aka the _MODIFIED one)
- git commit -m “your changes”
- git push origin main

Python#

In this class, we’ll be coding in Python. There are several reasons for this, but probably most importantly it’s the one we know best (and the most commonly-used language in Network Science!). Python is a high-level general-purpose language that is object-oriented, has dynamic/strong typing, interpreted, and interactive. It’s also quite easy to learn and has the potential to create legible and ultimately reproducible code.

Object-oriented: object-oriented programming (OOP) is a type of software design where the coder defines the type of a data structure and types of operations (functions) that can be applied to the data structure. Everything in Python is an object, and almost everything has “attributes” and “methods” — also, everything is an object in the sense that it can be assigned to a variable or passed as an argument to a function.
Dynamic typing means that runtime objects (values) have a type, as opposed to static typing where variables have a type.
Strong typing means that the type of a value doesn’t suddenly change. A string containing only digits doesn’t magically become a number, as may happen in other languages (e.g. Perl). Every change of type requires an explicit conversion.
Interpreted: you don’t need to compile your code, as is the case in other languages.
Interactive: you can write and execute parts of your program while the program itself is already running.

Main Features#

(source: https://www.python.org/)

Uses an elegant syntax, making the programs you write easier to read.
Is an easy-to-use language that makes it simple to get your program working. This makes Python ideal for prototype development and other ad-hoc programming tasks, without compromising maintainability.
Comes with a large standard library that supports many common programming tasks such as connecting to web servers, searching text with regular expressions, reading and modifying files.
Python’s interactive mode makes it easy to test short snippets of code. There’s also a bundled development environment called IDLE.

Is easily extended by adding new modules implemented in a compiled language such as C or C++.
Can also be embedded into an application to provide a programmable interface.
Runs anywhere, including Mac OS X, Windows, Linux, and Unix.
Is free software in two senses. It doesn’t cost anything to download or use Python, or to include it in your application. Python can also be freely modified and re-distributed, because while the language is copyrighted it’s available under an open source license.
A variety of basic data types are available: numbers (floating point, complex, and unlimited-length long integers), strings (both ASCII and Unicode), lists, and dictionaries.

Python supports object-oriented programming with classes and multiple inheritance.
Code can be grouped into modules and packages.
The language supports raising and catching exceptions, resulting in cleaner error handling.
Data types are strongly and dynamically typed. Mixing incompatible types (e.g. attempting to add a string and a number) causes an exception to be raised, so errors are caught sooner.
Python contains advanced programming features such as generators and list comprehensions.
Python’s automatic memory management frees you from having to manually allocate and free memory in your code.

Additional Facts#

Python was conceived in the late 1980s, and its implementation was started in December 1989 by Guido van Rossum, the National Research Institute for Mathematics and Computer Science in the Netherlands (CWI) and had originally focused on users as physicists and engineers.
Python was designed from another existing language at the time, called ABC.
The name Python was taken by Guido van Rossum from british TV program Monty Python Flying Circus and there are many references to the show in language documentation.
The goals of the Python project were summarized by Tim Peters in a text called Zen of Python :

Fun…#

import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

# import antigravity

Python Today#

Source: https://www.statista.com/statistics/793628/worldwide-developer-survey-most-used-languages/

Additional Tools#

Jupyter Notebook (https://jupyter.org/) [already installed with Anaconda]
VS Code (https://code.visualstudio.com/) or any other text editor you might prefer such as Vim, Sublime, etc..

Commonly-used Software Libraries#

(most of them are already included in Anaconda)

numpy (https://numpy.org/)
scipy (https://www.scipy.org/)
matplotlib (https://matplotlib.org/)
seaborn (https://seaborn.pydata.org/)
pandas (https://pandas.pydata.org/)
beautifulsoup (https://www.crummy.com/software/BeautifulSoup/)
selenium (https://selenium-python.readthedocs.io/)

Additional Resources#

The Unix Shell (https://swcarpentry.github.io/shell-novice/)
Version Control with Git (https://swcarpentry.github.io/git-novice/)
Databases and SQL (https://swcarpentry.github.io/sql-novice-survey/)
A gallery of interesting Jupyter Notebooks (jupyter/jupyter)

‘Sup, world?#

Part 1: Very Basics#

print("Hello, World!")

Hello, World!

print("Hello", "World!")

Hello World!

range?

# print "hello, world!"

Part 2: Variables#

Variables in the Python interpreter are created by assignment.
Variable names must start with letter or underscore (_) and be followed by letters, digits or underscores (_).
Uppercase and lowercase letters are considered different (i.e., Python is case-sensitive).

a = 2
b = 4
x = "hello"

# Printing
print("a + b =",a + b)
print("a - b =",a - b)
print("a / b = %.2f"%(a/b))
print("a / b = {:f}".format(a/b))
print(f'a^b = {(a**b):d}')

print('\n')

print(x + x + x)
print(x * a)

print(x + x + str(a))
print(x + x + str(a**b))

a + b = 6
a - b = -2
a / b = 0.50
a / b = 0.500000
a^b = 16


hellohellohello
hellohello
hellohello2
hellohello16

whos

Variable   Type      Data/Info
------------------------------
a          int       2
b          int       4
this       module    <module 'this' from '/opt<...>/lib/python3.11/this.py'>
x          str       hello

del x

whos

Variable   Type      Data/Info
------------------------------
a          int       2
b          int       4
this       module    <module 'this' from '/opt<...>/lib/python3.11/this.py'>

print(type(a))
print(type(b))

b = a * 1.5
print(type(b))

x = "hello"
print(type(x))

<class 'int'>
<class 'int'>
<class 'float'>
<class 'str'>

Scope of Variables#

Not all variables are accessible from all parts of our program, and not all variables exist for the same amount of time. Where a variable is accessible and how long it exists depend on how it is defined. We call the part of a program where a variable is accessible its scope, and the duration for which the variable exists its lifetime.

A variable that is defined in the main body of a file is called a global variable. It will be visible throughout the file, and also inside any file that imports that file. Global variables can have unintended consequences because of their wide-ranging effects, which is why we should be careful about using them.

A variable that is defined inside a function is local to that function. It is accessible from the point at which it is defined until the end of the function, and exists for as long as the function is executing. The parameter names in the function definition behave like local variables, but they contain the values that we pass into the function when we call it. When we use the assignment operator (=) inside a function, its default behaviour is to create a new local variable, unless a variable with the same name is already defined in the local scope.

(source: https://python-textbok.readthedocs.io/en/1.0/Variables_and_Scope.html)

Part 3: Types in Python#

Text Type: str
Numeric Types: int, float, complex
Sequence Types: list, tuple, range
Mapping Type: dict
Set Types: set, frozenset
Boolean Type: bool
Binary Types: bytes, bytearray, memoryview
None Type: NoneType

We’ll be working most with:

A) Booleans are either True or False (case sensitive)
B) Numbers can be integers (e.g 1 or 2), floats (1.1 and 1.2), fractions (1/2 and 2/3), or even complex numbers.
C) Lists are ordered sequences of values, defined by its elements [...].
D) Strings are sequences of Unicode characters (e.g. an html document, 'eioajfwo', etc.).
E) Tuples are ordered, immutable sequences of values (e.g. (0,1)).
F) Sets are unordered bags of values, defined using squiggly braces {}.
G) Dictionaries are unordered bags of key-value pairs (also defined with the squiggs, but requires {key:value}).
H) Bytes and byte arrays, e.g. a jpeg image file.

(A) Booleans#

Booleans are either true or false and in Python we can assign “boolean values” to variables using two constants: “True” and “False”.

True

True

False

False

True + True

True * 3

(B) Numbers#

# integers
print(1)
    
# floats
print(3.14)

# scientific notation
print(3.14e2)

# fractions
from fractions import Fraction
print(Fraction('1/2'))
print(Fraction('1/2') + 0.5)

print(Fraction('1/2') + Fraction('1/4'))
print(Fraction(1,2) + Fraction(1,4))


# complex numbers
c = 1 + 5j
type(c)

print(c.real)
print(c.imag)
print(c.conjugate())

1
3.14
314.0
1/2
1.0
3/4
3/4
1.0
5.0
(1-5j)

# cast from float to int
int(3.7)

# cast from int to float
float(3)

3.0

print(c)

(1+5j)

c.real?

Common Operations with Numbers:#

Sum (+)
Difference (-)
Multiplication (*)
Floating Point Division (/)
Integer Division (//): the result is truncated to the next lower integer, even when applied to real numbers, but in this case the result type is real too.
Module (%): returns the remainder of the division.
Power (** ): can be used to calculate the root, through fractional exponents (eg 100 ** 0.5).
Positive (+)
Negative (-)

Logical Operations:#

Less than (<)
Greater than (>)
Less than or equal to (<=)
Greater than or equal to (>=)
Equal to (==)
Not equal to (!=)

from IPython.display import display, Math

# sum
print("1 + 3 =",1+3)

# difference
display(Math(r'2-5 = {}'.format(2-5)))

# multiplication
print(5*4,'\n')

# floating point division
print(5/3)

# integer division
print(5//3,'\n')

# notice:
print(-5/3)
print(-5//3,'\n')

# modulo (returns the remainder of a division)
print(5%3,'\n')

# exponentiation
display(Math(r'2^4 = {}'.format(2**4)))

1 + 3 = 4

\[\displaystyle 2-5 = -3\]

20 

1.6666666666666667
1 

-1.6666666666666667
-2 

2 

\[\displaystyle 2^4 = 16\]

Numbers & Booleans#

Zero values evaluate to False and non-zero values evaluate to True.

x = 0
if x:
    print("do something")

(C) Lists#

Lists are collections of heterogeneous objects, which can be of any type, including other lists. They are mutable and can be changed at any time. Lists can be sliced in the same way that the strings, but as the lists are mutable, it is possible to make assignments to the list items.

my_list = ['item 1', 'item 2', [4,5], 5 ,6]
my_list

['item 1', 'item 2', [4, 5], 5, 6]

my_list = [0,1,2,3,4,5,6,7,8]

# indexing
print(my_list[0])

# slicing
print(my_list[:4])
print(my_list[4:])

print(my_list[1:])
print(my_list[:2])

print(my_list[::1])
print(my_list[::2])
print(my_list[::3])

print(my_list[1:7:2])

print(my_list[-1])
print(my_list[-2])

print(my_list[::-1])

## notice that the return value from slicing is a new list

0
[0, 1, 2, 3]
[4, 5, 6, 7, 8]
[1, 2, 3, 4, 5, 6, 7, 8]
[0, 1]
[0, 1, 2, 3, 4, 5, 6, 7, 8]
[0, 2, 4, 6, 8]
[0, 3, 6]
[1, 3, 5]
8
7
[8, 7, 6, 5, 4, 3, 2, 1, 0]

my_list = my_list + [10,11]
my_list

[0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11]

my_list.extend([12,13])
my_list

[0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13]

my_list.insert(7,'dog')
my_list

[0, 1, 2, 3, 4, 5, 6, 'dog', 7, 8, 10, 11, 12, 13]

# search for values in a list
'dog' in my_list

True

my_list.count('dog')

my_list.index('dog')

# remove items from a list

my_list.remove('dog')
my_list

[0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13]

my_list.pop()

my_list

[0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12]

my_list.pop(-3)

my_list

[0, 1, 2, 3, 4, 5, 6, 7, 8, 11, 12]

# sum the elements of an iterable (e.g. a list)
print(sum(my_list))

# find the maximum value of the elements of an iterable (e.g. a list)
print(max(my_list))

# find the minimum value of the elements of an iterable (e.g. a list)
print(min(my_list))

# get the length of an object (e.g. a list)
print(len(my_list))

my_list = ['random item']*10
my_list

['random item',
 'random item',
 'random item',
 'random item',
 'random item',
 'random item',
 'random item',
 'random item',
 'random item',
 'random item']

len(my_list)

if my_list:
    print("do something")

do something

Lists and Booleans#

An empty list is False, any list with at least one item is true regardless of the actual value of the items.

(D) Strings#

Strings are immutable: you can not add, remove, or change any character in a string. To perform these operations, Python needs to create a new string.
In Python 3, all strings are sequences of Unicode characters and can be encoded either as unicode objects or as bytes object.
Unicode is an international encoding standard for use with different languages and scripts, by which each letter, digit, or symbol is assigned a unique numeric value that applies across different platforms and programs.

j = '伯恩茅斯'
# length of a string
print("len(j) =",len(j))

print()
# access characters
for c in j:
    print(c)

len(j) = 4

伯
恩
茅
斯

j[2]

'茅'

x = j[2]
x

'茅'

j[2] = 'c'

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[44], line 1
----> 1 j[2] = 'c'

TypeError: 'str' object does not support item assignment

# formatting a string
s = '  heLlo, wOrld!  '

s.upper()

s.lower()

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-45-32dd64412b48> in <module>
----> 1 s.lower()

NameError: name 's' is not defined

s.title()

Question: What does the following do?

# how to split a string
s.split()

s.split(',')

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-46-2be51098b857> in <module>
----> 1 s.split(',')

NameError: name 's' is not defined

s.strip()

s.rstrip()

s.lstrip()

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-47-e02c2e109a92> in <module>
----> 1 s.lstrip()

NameError: name 's' is not defined

String vs Bytes Objects#

ss = '伯恩茅斯'
ss

# encode a string
ss.encode()

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-48-5ba7a394e7d4> in <module>
      1 # encode a string
----> 2 ss.encode()

NameError: name 'ss' is not defined

# encode + decode
ss.encode().decode()

# string lenght vs bytes length
print(len(ss.encode()))
print(len(ss))

UTF-8 encoding table and Unicode characters

https://www.utf8-chartable.de/

Note: Python 3 assumes that your source code (i.e., your .py file) is encoded in utf-8.

# from bytes to string
(b'\x7e').decode()

'~'

è = 10

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-51-094e3afb2fe8> in <module>
----> 1 e

NameError: name 'e' is not defined

(E) Tuples#

Similar to lists, but immutable: its not possible to append, delete or make assignments to the items. Syntax:

my_tuple = (a, b, ..., z)

The parentheses are optional when defining a tuple.

Feature: a tuple with only one element is represented as:

t1 = (1,)

The tuple elements can be referenced the same way as the elements of a list:

first_element = tuple[0]

Lists can be converted into tuples:

my_tuple = tuple(my_list)

And tuples can be converted into lists:

my_list = list(my_tuple)

While tuple can contain mutable elements, these elements can not undergo assignment, as this would change the reference to the object.

Example (using the interactive mode):

>>> t = ([1, 2], 4)
>>> t[0].append(3)
>>> t
([1, 2, 3], 4)

>>> t[0] = [1, 2, 3]
Traceback (most recent call last):
  File "<input>", line 1, in ?
TypeError: object does not support item assignment

t = ( [1, 2], 4 )
type(t)

tuple

t[0] = 10

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-53-88963aa635fa> in <module>
----> 1 t[0] = 10

TypeError: 'tuple' object does not support item assignment

t[0].append('new_value')

([1, 2], 4)

['a']

['a']

('a')

'a'

('a',)

('a',)

Useful Tricks

In Python, you can use a tuple to assign multiple values at once:

(x,y) = (3,7)
(x,y) = (y,x)
print(x,y)

7 3

So what are tuples good for?

Tuples are faster than lists. If you’re defining a constant set of values and all you’re ever going to do with it is iterate through it, use a tuple instead of a list.
It makes your code safer if you “write-protect” data that doesn’t need to be changed. Using a tuple instead of a list is like having an implied assert statement that shows this data is constant, and that special thought (and a specific function) is required to override that.
Some tuples can be used as dictionary keys (specifically, tuples that contain immutable values like strings, numbers, and other tuples). Lists can never be used as dictionary keys, because lists are not immutable.

Tuples and Booleans#

In a boolean context, an empty tuple is false.
Any tuple with at least one item is true. The value of the items is irrelevant.
Note that to create a tuple of one item, you need a comma after the value. Without the comma, Python just assumes you have an extra pair of parentheses, which is harmless, but it doesn’t create a tuple.

(F) Sets#

A set is an unordered “bag” of unique values. A single set can contain values of any immutable datatype. Once you have two sets, you can do standard set operations like union, intersection, and set difference.

# create a set
my_set = {1}
my_set

{1}

my_set = set([1,1,2,3,4,5])
my_set

{1, 2, 3, 4, 5}

type(my_set)

set

# create an empty set
empty_set = set()
empty_set

set()

my_set.add(6)
my_set

{1, 2, 3, 4, 5, 6}

my_set.add(3)
my_set

{1, 2, 3, 4, 5, 6}

# add elements to a set
my_set.add(10)
print(my_set)

my_set.update({2,4,6})
print(my_set)

my_set.update([3,7,9])
print(my_set)

my_set.update({21,41,61},{0,1,2})
print(my_set)

{1, 2, 3, 4, 5, 6, 10}
{1, 2, 3, 4, 5, 6, 10}
{1, 2, 3, 4, 5, 6, 7, 9, 10}
{0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 41, 21, 61}

# remove elements from a set
my_set.remove(3)
print(my_set)

my_set.discard(7)
print(my_set)

my_set.discard(3)
print(my_set)

my_set.remove(3)
print(my_set)

{0, 1, 2, 4, 5, 6, 7, 9, 10, 41, 21, 61}
{0, 1, 2, 4, 5, 6, 9, 10, 41, 21, 61}
{0, 1, 2, 4, 5, 6, 9, 10, 41, 21, 61}

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-66-f77ba5d01082> in <module>
      9 print(my_set)
     10 
---> 11 my_set.remove(3)
     12 print(my_set)

KeyError: 3

set_a = {0,1,2,3}
set_b = {3,4,5,6}

set_a.union(set_b)

{0, 1, 2, 3, 4, 5, 6}

set_a.intersection(set_b)

{3}

set_a.difference(set_b)

{0, 1, 2}

set_a.symmetric_difference(set_b)

{0, 1, 2, 4, 5, 6}

set_a - set_b

{0, 1, 2}

set_a & set_b

{3}

set_a | set_b

{0, 1, 2, 3, 4, 5, 6}

Sets and Booleans#

In a boolean context, an empty set is false.
Any set with at least one item is true. The value of the items is irrelevant.

(G) Dictionaries#

A dictionary is an unordered set of key-value pairs.
When you add a key to a dictionary, you must also add a value for that key (you can always change the value later).
Python dictionaries are optimized for retrieving the value when you know the key, but not the other way around.
Keys must be an immutable type, usually strings, but can also be tuples or numeric types.
Values can be either mutable or immutable.

# create dictionary

my_dict = { 'monday': 0, 'tuesday': 1 , 'wed': 'another day', 1:'tuesday' }
my_dict

{'monday': 0, 'tuesday': 1, 'wed': 'another day', 1: 'tuesday'}

my_dict['monday']

my_dict['tuesday']

my_dict['sda']

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-78-7f98da8a8c6d> in <module>
----> 1 my_dict['sda']

KeyError: 'sda'

# remove elements
del my_dict['tuesday']
my_dict

{'monday': 0, 'wed': 'another day', 1: 'tuesday'}

my_dict.keys()

dict_keys(['monday', 'wed', 1])

my_dict.values()

dict_values([0, 'another day', 'tuesday'])

my_dict.items()

dict_items([('monday', 0), ('wed', 'another day'), (1, 'tuesday')])

# updated dict_keys
my_dict['monday'] = 100
my_dict

{'monday': 100, 'wed': 'another day', 1: 'tuesday'}

Dictionaries and Booleans#

In a boolean context, an empty dictionary is false.
Any dictionary with at least one key-value pair is true.

if my_dict:
    print('it is true')
    
empty_dict = dict([])
if empty_dict:
    print('it is true')

it is true

Part 4: Operations, Loops, etc.#

Python’s for statement iterates over the items of any iterable sequence (e.g. a list, a tuple, a set, or a string), in the order that they appear in the sequence.

my_list = [1,2,'a','b',[3,4]]
for item in my_list:
    print(item)

1
2
a
b
[3, 4]

my_set = set([1,2,3,4,4,4])
for item in my_set:
    print(item)

The `range()` function#

If you do need to iterate over a sequence of numbers, the built-in function range() comes in handy. It generates arithmetic progressions:

for i in range(4):
    print(i)

for i in range(5):
    print(i)

# Reversed order
for i in reversed(range(6)):
    print(i)
print('---')
for item in reversed(my_list):
    print(item)

5
4
3
2
1
0
---
[3, 4]
b
a
2
1

new_list = []
for i in range(len(my_list)):
    new_list.append(my_list[-1-i])
    
new_list

[[3, 4], 'b', 'a', 2, 1]

my_list[::-1]

[[3, 4], 'b', 'a', 2, 1]

simple_list = list(range(10))
simple_list

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

simple_list[0:3]

[0, 1, 2]

my_list[-1:-len(my_list)-1:-1]

[[3, 4], 'b', 'a', 2, 1]

for item in my_list[::-1]:
    print(item)

[3, 4]
b
a
2
1

Sorted order#

# random list of numbers
from random import randint
numbers = []
for __ in range(100):
    numbers.append(randint(0,1000))

randint?

Signature: randint(a, b)
Docstring:
Return random integer in range [a, b], including both end points.
        
File:      /usr/local/anaconda3/envs/covid/lib/python3.8/random.py
Type:      method

for item in sorted(numbers):
    print(item, end = ',')

6,11,14,14,39,42,55,86,98,118,131,136,158,168,173,198,221,221,226,246,252,255,283,302,303,304,304,308,308,320,330,342,344,346,348,350,360,372,391,447,454,463,467,473,476,483,486,489,493,509,518,522,534,551,558,579,601,622,627,634,654,660,660,662,682,684,686,693,693,695,697,712,715,727,727,735,739,777,782,804,813,818,821,824,831,839,882,901,903,903,935,939,950,953,963,968,972,973,989,995,

Enumerate function#

If you do need to iterate over a sequence of elements keeping track of the index/position of the element:

new_list = []
c = 0
for item in my_list:
    print(c, item)
    new_list.append((c,item))
    c += 1
    
new_list

1
2
a
b
[3, 4]

[(0, 1), (1, 2), (2, 'a'), (3, 'b'), (4, [3, 4])]

for x in new_list:
    c = x[0]
    item = x[1]
    print(c,item)

1
2
a
b
[3, 4]

x,y = (10,20)
print(x, y)

10 20

for c, item in new_list:
    print(c,item)

1
2
a
b
[3, 4]

strange_list = [(1,['a','b']),(2,['c','d'])]

strange_list[0]

(1, ['a', 'b'])

for i, (x,y) in strange_list:
    print(i,x,y)

1 a b
2 c d

for c, item in enumerate(my_list):
    print(c,item)

1
2
a
b
[3, 4]

Zip function#

We can use “zip” to loop over two or more sequences at the same time.

zip creates an iterator that aggregates elements from each of the iterables:

list_a = ['a','b','c']
list_b = [1,2,3]
list_c = ['x','y','z']

merged_list = [('a',1,'x'), ('b',2,'y'), ('c',3,'z')]

for i,j,k in zip(list_a,list_b,list_c):
    print(i,j,k)

a 1 x
b 2 y
c 3 z

zip(list_a,list_b,list_c)

<zip at 0x7fc42002e840>

list(zip(list_a,list_b,list_c))

[('a', 1, 'x'), ('b', 2, 'y'), ('c', 3, 'z')]

sources_nodes = [1,2,4,6]
target_nodes = [5,1,5,9]
weights = [3.0,2.0,1.3,3.2]

edge_list = list(zip(sources_nodes,target_nodes))
edge_list

[(1, 5), (2, 1), (4, 5), (6, 9)]

graph = {}
for source_node, target_node, weight in zip(sources_nodes,target_nodes,weights):
    graph[(source_node,target_node)] = weight

graph

{(1, 5): 3.0, (2, 1): 2.0, (4, 5): 1.3, (6, 9): 3.2}

print(edge_list,'\n')
print(list(zip(*edge_list)))

[(1, 5), (2, 1), (4, 5), (6, 9)] 

[(1, 2, 4, 6), (5, 1, 5, 9)]

ss, tt = zip(*edge_list)

sources = [0,0,0,0,1,1,2,3,3,3]
targets = [1,2,3,4,4,5,6,7,8,9]
el = list(zip(sources,targets))

# what will this do?
set(list(zip(*el))[0]+list(zip(*el))[1])

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}

More on loops#

For loop#

for i in range(4):
    print(i)

While loop#

limit = 4
z = 0

while True:
    if z < limit:
        z += 1

    else:
        break

limit = 4
z = 0

while z < limit:
    print(z)
    z += 1

Break, continue, and pass statements#

The `break` statement#

Similar to other languages (e.g. C), this statement “breaks” out of the innermost enclosing for or while loop.

empty_list = []
for number in range(10):
    empty_list.append(number)
    if len(empty_list) > 5:
        break

print(empty_list)

[0, 1, 2, 3, 4, 5]

partial_sum = 0
i = 0
while True:
    partial_sum += numbers[i]
    i += 1
    if partial_sum > 3000:
        break

print(partial_sum)

partial_sum = 0
i = 0
while partial_sum<=3000:
    partial_sum += numbers[i]
    i += 1

print(partial_sum)

partial_sum = 0
while True:
    for i, number in enumerate(numbers):
        partial_sum += number
        if partial_sum > 3000:
            break
    break

print(partial_sum)

The `continue` statement#

Also borrowed from C, this statement “continues” with the next iteration of the loop.

empty_list = []
for n in range(10):
    if n == 5:
        continue
    empty_list.append(n)

print(empty_list)

[0, 1, 2, 3, 4, 6, 7, 8, 9]

empty_list = []
for n in range(10):
    if n != 5:
        empty_list.append(n)

print(empty_list)

[0, 1, 2, 3, 4, 6, 7, 8, 9]

The pass statement#

pass does nothing. It can be used when a statement is required syntactically but the program requires no action.

# e.g. this line will yield a SyntaxError
for i in range(10):
    

  File "<ipython-input-126-39d65b6213c2>", line 2
    
    ^
SyntaxError: unexpected EOF while parsing

for i in range(10):
    pass

def some_function():
    # this should do X
    pass

Part 5: Functions#

In Python, functions:

Can return objects or not.
Can (should) have Doc Strings.
Accepts optional parameters (with defaults). If no parameter is passed, it will be equal to the default defined in the function.
Accepts parameters to be passed by name. In this case, the order in which the parameters were passed does not matter.
Have their own namespace (local scope), and therefore may obscure definitions of global scope.
Can have their properties changed (usually by decorators).

def hello():
    print("Hello, World!")
    return

hello

<function __main__.hello()>

hello()

Hello, World!

def hello(data=None):
    """
    This is a function that takes anything as `data` and returns
    a string depending on the input.

    Parameters:
    data (type): Description of the first parameter.

    Returns:
    out (str): String with the answer.
    """

    out = ''
    if data:
        out += "Hello, World!"

    else:
        out += 'Set "data" to something!'


    return out

print(hello())
print()
print(hello(x))

Set "data" to something!

Hello, World!

Making Your Functions Readable#

When you write a function, you want to be able to look at it a year from when you wrote it and still know what’s going on. There are several ways you can do this; one really helpful thing to use is a docstring.

A docstring is a long string that usually shows up right after you declare a function (so after def blah(inputs):). It should look like this, though there’s some flexibility with respect to what you do and don’t include:

def blah(inputs):
    """
    Imperative statement explaining what the function does.
    
    Inputs:
    input1 (type): describes the first parameter
    input2 (type): describes the second parameter
    ...
    
    Outputs:
    output1 (type): describes the first output
    output2 (type): describes the second output
    ...
    """
    z = some_code(inputs)
    y = do_something(z)
    # comments should be used to clarify things that look strange 
    # you can also use them to label chunks of code that perform a particular operation together.
    
    return output1, output2

You can also indicate the types of your inputs and outputs in the function declaration, and you can set default values for the keyword arguments. Keyword arguments are optional inputs (if there is no input, the keyword argument will be set to its default value) and must appear after all required inputs are declared.

def blah1(my_number: float, my_string: str, my_bool: bool = False) -> str:
    """
    Do a string manipulation on the inputs or return a default phrase.
    
    Inputs:
    my_number (float): a number that is a float
    my_string (string): a string of any length
    my_bool (boolean): default False; indicates whether the default phrase will be returned
    
    Outputs:
    a string that is either "it was true all along!" 
    or my_number and my_string concatenated into one string.
    """
    if my_bool:
        return "it was true all along!"
    else:
        return str(my_number) + ' ' + my_string
    
print(blah1(0.5, '0.5', True))
print(blah1(0.5, 'potatoes'))

it was true all along!
0.5 potatoes

Scope#

Recall from the very beginning of this lesson, you should also be careful about scope when writing a function. When you’re writing a function, you will have variables that are local—they only exist inside the function, and they don’t exist outside it. Other variables that you declare outside the function will not be known about in the world of the function itself. Try out this block of code, then uncomment the line that’s commented out. What happens, and why?

# Your Turn!

NUM_POTATOES = 5
print('we have this many potatoes: ', NUM_POTATOES)


def potato_peeler(words:str) -> str:
    """
    Return a string plus the phrase "but we have 4 POTATOES".
    
    Inputs:
    words (str): any string
    
    Outputs:
    (str): words plus "but we have 4 POTATOES"
    
    """
    # print('now we have this many potatoes: ', NUM_POTATOES)
    NUM_POTATOES = 4
    print('but now we have this many potatoes: ', NUM_POTATOES)
    return words + ' but we have ' + str(NUM_POTATOES) + ' POTATOES'


print(potato_peeler('my dog is barking'))
print('finally we have this many potatoes: ', NUM_POTATOES)

we have this many potatoes:  5
but now we have this many potatoes:  4
my dog is barking but we have 4 POTATOES
finally we have this many potatoes:  5

Part 6: NumPy#

NumPy is the fundamental package for scientific computing with Python. It contains among other things:

a powerful N-dimensional array object
sophisticated (broadcasting) functions
tools for integrating C/C++ and Fortran code
useful linear algebra, Fourier transform, and random number capabilities

Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases. NumPy is licensed under the BSD license, enabling reuse with few restrictions.

Source: https://numpy.org/

NumPy for MATLAB users: https://docs.scipy.org/doc/numpy-dev/user/numpy-for-matlab-users.html

Citation: Harris, C.R., Millman, K.J., van der Walt, S.J. et al. Array programming with NumPy. Nature 585, 357–362 (2020). https://doi.org/10.1038/s41586-020-2649-2

Basic array operations#

import numpy as np

arr = np.array([0,1,2,3])
print("type of arr is:",type(arr))
arr

type of arr is: <class 'numpy.ndarray'>

array([0, 1, 2, 3])

Numpy array: Memory-efficient container that provides fast numerical operations.

L = list(range(1000))
%timeit [i**2 for i in L]

206 µs ± 1.12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

a = np.arange(1000)
%timeit a ** 2

626 ns ± 1.56 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

# array dimension and shape
a.shape

(1000,)

a.ndim

# 2D array
arr2d = np.array([[0,1,2],[3,4,5]])
arr2d

array([[0, 1, 2],
       [3, 4, 5]])

arr2d.ndim

arr2d.shape

(2, 3)

# m-D arrays
a = np.arange(5)
b = np.array([a,a])
c = np.array([b,b,b])

print(a.shape)
print(b.shape)
print(c.shape)

(5,)
(2, 5)
(3, 2, 5)

# built-in functions to create simple arrays
# evenly spaced array (similar to range in standard Python)
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

np.arange(1,10,0.2)

array([1. , 1.2, 1.4, 1.6, 1.8, 2. , 2.2, 2.4, 2.6, 2.8, 3. , 3.2, 3.4,
       3.6, 3.8, 4. , 4.2, 4.4, 4.6, 4.8, 5. , 5.2, 5.4, 5.6, 5.8, 6. ,
       6.2, 6.4, 6.6, 6.8, 7. , 7.2, 7.4, 7.6, 7.8, 8. , 8.2, 8.4, 8.6,
       8.8, 9. , 9.2, 9.4, 9.6, 9.8])

# linear space 
np.linspace(0,1,101)

array([0.  , 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1 ,
       0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2 , 0.21,
       0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3 , 0.31, 0.32,
       0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.4 , 0.41, 0.42, 0.43,
       0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.5 , 0.51, 0.52, 0.53, 0.54,
       0.55, 0.56, 0.57, 0.58, 0.59, 0.6 , 0.61, 0.62, 0.63, 0.64, 0.65,
       0.66, 0.67, 0.68, 0.69, 0.7 , 0.71, 0.72, 0.73, 0.74, 0.75, 0.76,
       0.77, 0.78, 0.79, 0.8 , 0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87,
       0.88, 0.89, 0.9 , 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98,
       0.99, 1.  ])

np.linspace(0,1,100, endpoint = False)

array([0.  , 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1 ,
       0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2 , 0.21,
       0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3 , 0.31, 0.32,
       0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.4 , 0.41, 0.42, 0.43,
       0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.5 , 0.51, 0.52, 0.53, 0.54,
       0.55, 0.56, 0.57, 0.58, 0.59, 0.6 , 0.61, 0.62, 0.63, 0.64, 0.65,
       0.66, 0.67, 0.68, 0.69, 0.7 , 0.71, 0.72, 0.73, 0.74, 0.75, 0.76,
       0.77, 0.78, 0.79, 0.8 , 0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87,
       0.88, 0.89, 0.9 , 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98,
       0.99])

# log space
np.logspace(0,10,6, base=2)

array([1.000e+00, 4.000e+00, 1.600e+01, 6.400e+01, 2.560e+02, 1.024e+03])

np.diff(np.log10(np.logspace(0, 10, 6, base = 10)))

array([2., 2., 2., 2., 2.])

np.logspace(0, 10, 6, base = 10)

array([1.e+00, 1.e+02, 1.e+04, 1.e+06, 1.e+08, 1.e+10])

np.log10(np.logspace(0, 10, 6, base = 10))

array([ 0.,  2.,  4.,  6.,  8., 10.])

np.diff(np.log10(np.logspace(0, 10, 6, base = 10)))

array([2., 2., 2., 2., 2.])

# Common arrays
np.ones(10)

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

np.ones((5,5))

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

np.zeros((4,4))

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

np.eye(5,5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

NumPy datatypes (examples)#

np.eye(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

np.eye(5, dtype = int)

array([[1, 0, 0, 0, 0],
       [0, 1, 0, 0, 0],
       [0, 0, 1, 0, 0],
       [0, 0, 0, 1, 0],
       [0, 0, 0, 0, 1]])

np.eye(5, dtype = np.unicode_)

array([['1', '', '', '', ''],
       ['', '1', '', '', ''],
       ['', '', '1', '', ''],
       ['', '', '', '1', ''],
       ['', '', '', '', '1']], dtype='<U1')

Slicing and Indexing#

The items of an array can be accessed and assigned to the same way as other Python sequences (e.g. lists):

a = np.arange(100)
a

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

a[1]

a[1:10]

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

a[::-1]

array([99, 98, 97, 96, 95, 94, 93, 92, 91, 90, 89, 88, 87, 86, 85, 84, 83,
       82, 81, 80, 79, 78, 77, 76, 75, 74, 73, 72, 71, 70, 69, 68, 67, 66,
       65, 64, 63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53, 52, 51, 50, 49,
       48, 47, 46, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32,
       31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15,
       14, 13, 12, 11, 10,  9,  8,  7,  6,  5,  4,  3,  2,  1,  0])

Illustrations of numpy array slicing and indexing

new_arr = np.arange(36).reshape((6,6))
new_arr

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])

# slicing example
new_arr[4:,4:]

array([[28, 29],
       [34, 35]])

# boolean mask
new_arr[new_arr>10] 

array([11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,
       28, 29, 30, 31, 32, 33, 34, 35])

new_arr[[1,2],[3,4]]

array([ 9, 16])

# indexing with an array
a = np.arange(100)+10
b = np.arange(5,15)

a[b]

array([15, 16, 17, 18, 19, 20, 21, 22, 23, 24])

# notes on array copy
a = np.arange(10)
b = a[::2]
print(a)
print(b)

[0 1 2 3 4 5 6 7 8 9]
[0 2 4 6 8]

np.may_share_memory(a, b)

True

b[1] = -3

array([ 0,  1, -3,  3,  4,  5,  6,  7,  8,  9])

a = np.arange(10)
b = a[::2].copy()
print(a)
print(b)

[0 1 2 3 4 5 6 7 8 9]
[0 2 4 6 8]

np.may_share_memory(a, b)

False

NumPy random functions#

(Source: https://numpy.org/doc/stable/reference/random/index.html)

list_to_choose_from = [1,2,3,4,5]
print(np.random.choice(list_to_choose_from))

list_to_choose_from = [1,2,3,4,5,200,300,23,22]
for _ in range(5):
    print(np.random.choice(list_to_choose_from))

list_to_choose_from = [1,2,3,4,5,200,300,23,22,0]
probs = [0.91]+[0.01]*(len(list_to_choose_from)-1)
for _ in range(5):
    print(np.random.choice(list_to_choose_from, p=probs))

list_to_choose_from = [1,2,3,4,5]
print("replace=False:",np.random.choice(list_to_choose_from, size=5, replace=False))

list_to_choose_from = [1,2,3,4,5]
print("replace=True:",np.random.choice(list_to_choose_from, size=5, replace=True))

replace=False: [4 3 1 2 5]
replace=True: [1 5 1 1 5]

np.random.randint(0,2,size=100)

array([1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1,
       0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1,
       0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1,
       0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0,
       1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0])

np.random.randint(0,200,size=(5,5))

array([[ 60, 160, 182, 103,  28],
       [ 66,  25,  70, 191,  57],
       [160, 157,  36, 194,  49],
       [ 89,  46,  27, 112, 129],
       [148, 136, 129, 162,  98]])

np.random.normal(0,1,size=20)

array([ 1.38538459, -0.22499655,  0.44265991,  1.84486781,  0.91635511,
       -1.49724389, -1.56061478,  0.40359774, -0.22923305,  1.26643928,
        0.6307696 , -0.30890966, -0.39230262,  0.06575317, -0.10915068,
        1.01247965, -0.56251802, -0.8125841 ,  1.9370336 ,  1.75435461])

probs = [0.9]+[0.01]*10

np.random.multinomial(100, pvals=probs)

array([90,  0,  1,  0,  2,  1,  0,  1,  2,  2,  1])

np.random.multinomial(100, pvals=probs, size=5)

array([[92,  0,  0,  1,  1,  0,  0,  4,  1,  0,  1],
       [91,  0,  1,  1,  1,  1,  1,  1,  1,  1,  1],
       [86,  0,  0,  0,  3,  0,  0,  6,  1,  2,  2],
       [90,  2,  1,  1,  0,  2,  2,  0,  1,  0,  1],
       [92,  2,  3,  1,  0,  0,  1,  0,  0,  1,  0]])

A = np.random.randint(1,6,size=(3,3))
A

array([[2, 2, 3],
       [1, 1, 5],
       [1, 4, 5]])

NumPy Arithmetics, etc.#

np.sum(A)

A.sum(axis = 0)

array([ 4,  7, 13])

A.sum(axis = 1)

array([ 7,  7, 10])

A.T

array([[2, 1, 1],
       [2, 1, 4],
       [3, 5, 5]])

A.transpose()

array([[2, 1, 1],
       [2, 1, 4],
       [3, 5, 5]])

A.reshape(9)

array([2, 2, 3, 1, 1, 5, 1, 4, 5])

A.reshape(9).reshape((3,3))

array([[2, 2, 3],
       [1, 1, 5],
       [1, 4, 5]])

A.reshape(9).reshape((3,3)).reshape(-1)

array([2, 2, 3, 1, 1, 5, 1, 4, 5])

A.flatten()

array([2, 2, 3, 1, 1, 5, 1, 4, 5])

np.log(A)

array([[0.69314718, 0.69314718, 1.09861229],
       [0.        , 0.        , 1.60943791],
       [0.        , 1.38629436, 1.60943791]])

np.log10(A)

array([[0.30103   , 0.30103   , 0.47712125],
       [0.        , 0.        , 0.69897   ],
       [0.        , 0.60205999, 0.69897   ]])

np.sqrt(A)

array([[1.41421356, 1.41421356, 1.73205081],
       [1.        , 1.        , 2.23606798],
       [1.        , 2.        , 2.23606798]])

np.power(A,2)

array([[ 4,  4,  9],
       [ 1,  1, 25],
       [ 1, 16, 25]])

array([[2, 2, 3],
       [1, 1, 5],
       [1, 4, 5]])

A*A

array([[ 4,  4,  9],
       [ 1,  1, 25],
       [ 1, 16, 25]])

A@A

array([[ 9, 18, 31],
       [ 8, 23, 33],
       [11, 26, 48]])

A.dot(A)

array([[ 9, 18, 31],
       [ 8, 23, 33],
       [11, 26, 48]])

np.dot(A,A)

array([[ 9, 18, 31],
       [ 8, 23, 33],
       [11, 26, 48]])

np.vstack([A,A])

array([[2, 2, 3],
       [1, 1, 5],
       [1, 4, 5],
       [2, 2, 3],
       [1, 1, 5],
       [1, 4, 5]])

np.hstack([A,A])

array([[2, 2, 3, 2, 2, 3],
       [1, 1, 5, 1, 1, 5],
       [1, 4, 5, 1, 4, 5]])

array([[2, 2, 3],
       [1, 1, 5],
       [1, 4, 5]])

np.diag(A)

array([2, 1, 5])

np.diag([1,2,3])

array([[1, 0, 0],
       [0, 2, 0],
       [0, 0, 3]])

np.unique(A)

array([1, 2, 3, 4, 5])

Part 7: Matplotlib#

import matplotlib.pyplot as plt

npoints = 1000
data = np.random.normal(0,1,npoints)

plt.hist(data);

_images/588defe7b6ebcbe420bacc107a44ae11511461f0266c554a5d9f6f33c0e35c87.png

plt.hist(data,bins=20,density=True);

_images/d272d6ed9c80dd87ee5e01b9a4cf4ac27789615e6ed9665cac121a6f35ecee81.png

plt.hist(data,bins=20,lw=1,ec='k',fc='.8',density=True);

_images/c0fb3b5713da0931ccd4afe805050fc528765078451d21587cb49c273c31362d.png

Matplotlib Tip 1:#

Problem: “I never remember which matplotlib functions have “set_” in front of them and which dont… also how do you make log-scaled axes? Also ughhh.”

Solution: Commit to making everything a subplot.

w = 4
h = 3
fig, ax = plt.subplots(1, 1, figsize=(w,h), dpi=150)


plt.show()

_images/d811ea04262df37985511dd6d18e9185b9659c38e368a63723bd6e815f85e94b.png

w = 6.0
h = 4.0
fig, ax = plt.subplots(1, 1, figsize=(w,h), dpi=150)

# note the "ax" instead of the "plt"!
ax.hist(data,bins=20,lw=1,ec='k',fc='.8',density=True)

plt.show()

_images/ef91c650d6644b694a614c3674660bdca2e58953835048a9d1b3ea3c22d25119.png

w = 6.0
h = 4.0
fig, ax = plt.subplots(1, 1, figsize=(w,h), dpi=150)

data_label = "legend if you'd like"
ax.hist(data,bins=20,lw=1,ec='k',fc='.8',density=True,label=data_label)

ax.legend(loc=2,fontsize='small')
ax.set_title('Histogram of 1,000 randomly sampled values',fontsize=14)
ax.set_xlabel('Always label your axes')
ax.set_ylabel('Alwasy label your axes')

## uncomment below to get the grid's default placement to be behind the bars
# from matplotlib import rc
# rc('axes', axisbelow=True)
ax.grid()

plt.show()

_images/1313780a95c92a4c038e07583b8e3df9f7a2710b494bc390f8730f7b95e14086.png

Matplotlib Tip 2:#

Think about future you. You might be sharing these figs with colleagues. In that case, a simple png is nice. Same with slides. If you’re putting them into your manuscript, 97% of the time, you’re gonna wanna use pdfs.

I usually have a directory structure that looks something like:

code/
- this_notebook.ipynb
- other_dot_py_files.py
data/
figs/
- pngs/
  - figure1.png
  - etc…
- pdfs/
  - figure1.pdf
  - etc…

# uncomment below to get the grid's default placement to be behind the bars
from matplotlib import rc
rc('axes', axisbelow=True)

# uncomment below if you want the output images to have a white (as opposed to transparent) background
rc('axes', fc='w')
rc('figure', fc='w')
rc('savefig', fc='w')

w = 6.0
h = 4.0
fig, ax = plt.subplots(1, 1, figsize=(w,h), dpi=150)

data_label = "legend if you'd like"
ax.hist(data,bins=20,lw=1,ec='k',fc='.8',density=True,label=data_label)

ax.legend(loc=2,fontsize='small')
ax.set_title('Histogram of 1,000 randomly sampled values',fontsize=14)
ax.set_xlabel('Always label your axes')
ax.set_ylabel('Alwasy label your axes')

ax.grid()

plt.savefig('images/pngs/class_01_figure1.png', dpi=425, bbox_inches='tight')
plt.savefig('images/pdfs/class_01_figure1.pdf', bbox_inches='tight')

plt.show()

_images/edb44c9bedb8a6fc12f3a899139a7ebf755f688cc60e9b0752a1024166d444c8.png

Matplotlib Tip 3#

Different SIZED subplOTS

w = 4
h = 3
nrows = 3
ncols = 4
fig, ax = plt.subplots(nrows, ncols, figsize=(w*ncols,h*nrows), dpi=150,
                       gridspec_kw={"width_ratios": [0.5, 2.0, 1.0, 0.5], 
                                    "height_ratios":[0.6, 1.0, 0.5]})

plt.show()

_images/a6ae96e28e20c671086bc830be1fb3c7abc7bf4b9375d54207e18c237b79460a.png

w = 4
h = 3
nrows = np.random.randint(6)+2
ncols = np.random.randint(6)+2
fig, ax = plt.subplots(nrows, ncols, figsize=(w*ncols,h*nrows), dpi=150,
                       gridspec_kw={"width_ratios": np.random.uniform(0.1, 1.5, ncols),
                                    "height_ratios":np.random.uniform(0.1, 1.5, nrows)})

plt.show()

_images/56f7a308b1bddf0c6ac76152409a395fa5c6ac36a1a5e03f89769d4e27691d82.png

Matplotlib Tip 4#

Spacing! Between! __ - _ _ - _ Your! _ _ _ Subplots

w = 4
h = 3
nrows = np.random.randint(6)+2
ncols = np.random.randint(6)+2
fig, ax = plt.subplots(nrows, ncols, figsize=(w*ncols,h*nrows), dpi=150,
                       gridspec_kw={"width_ratios": np.random.uniform(0.1, 1.5, ncols),
                                    "height_ratios":np.random.uniform(0.1, 1.5, nrows)})

plt.subplots_adjust(wspace=0.50, hspace=0.05)
# not that much "heightspace" between subplots, kinda a lot of "widthspace"

plt.show()

_images/434fcf3ba873e1c950e1b58d4371a96622e54169a1e5bb2f6bfff02e4f6e5334.png

Matplotlib Tip 5:#

Text! Place it great!

There are two ways / ideaologies about how to place text on a subplot.

My text is being placed in a specific spot on the subplot and all I need to know is coordinates between 0 and 1
My text is data-relevant and as such I need my xpos and ypos to have the same domain as my data

w = 4.5
h = 3
ncols = 2
nrows = 1
fig, ax = plt.subplots(nrows, ncols, figsize=(w*ncols,h*nrows), dpi=150)

actualtext = 'ax.transAxes makes the xlim and ylim [0,1]\nmaking positioning text easier'

############ ax[0]
xpos0 = 0.5
ypos0 = 0.5
ax[0].text(xpos0, ypos0, actualtext, transform=ax[0].transAxes,
        ha='center', va='top', color='.3', fontsize='small', fontweight='bold')
ax[0].vlines(xpos0, 0, 1, linestyle=':', linewidth=1, color='crimson', label='ha="center"')
ax[0].hlines(ypos0, 0, 1, linestyle=':', linewidth=1, color='orange', label='va="top"')
ax[0].legend(fontsize='small', loc=2)

############ ax[1]
xpos1 = 0.9
ypos1 = 0.2
ax[1].text(xpos1, ypos1, actualtext, transform=ax[1].transAxes,
        ha='right', va='bottom', color='.3', fontsize='small')
ax[1].vlines(xpos1, 0, 1, linestyle=':', linewidth=1, color='crimson', label='ha="right"')
ax[1].hlines(ypos1, 0, 1, linestyle=':', linewidth=1, color='orange', label='va="bottom"')
ax[1].legend(fontsize='small', loc=2)

plt.savefig('images/pngs/class_01_figure2.png', dpi=425, bbox_inches='tight')
plt.savefig('images/pdfs/class_01_figure2.pdf', bbox_inches='tight')
plt.show()

_images/f521188c0fbc379f03bab599990505efbb6d111b0337d87f93fedfd155824878.png

Matplotlib Tip 6:#

itertools is your friend and likely your best friend. Check this out for getting the list of your axis tuples

import itertools as it
w = 4
h = 3
nrows = 3
ncols = 4
fig, ax = plt.subplots(nrows, ncols, figsize=(w*ncols,h*nrows), dpi=150,
                       gridspec_kw={"width_ratios": [0.5, 2.0, 1.0, 0.5], 
                                    "height_ratios":[0.1, 1.0, 0.5]})
plt.subplots_adjust(wspace=0.25, hspace=0.25)

####*****####
tups = list(it.product(range(nrows), range(ncols)))
print("These are all your axis indices:",tups)
####*****####

for i, tup in enumerate(tups):
    if tup != (1,1):
        ax[tup].scatter(np.random.uniform(0,1,100),np.random.uniform(0,1,100), lw=0)
    else:
        ax[tup].scatter(np.random.uniform(0,1,1000),np.random.uniform(0,1,1000),
                        c=np.linspace(0,1,1000), cmap='rainbow', lw=0)

plt.savefig('images/pngs/class_01_figure3.png', dpi=425, bbox_inches='tight')
plt.savefig('images/pdfs/class_01_figure3.pdf', bbox_inches='tight')
plt.show()

These are all your axis indices: [(0, 0), (0, 1), (0, 2), (0, 3), (1, 0), (1, 1), (1, 2), (1, 3), (2, 0), (2, 1), (2, 2), (2, 3)]

_images/964e6d195e005d5a0938609fef79f6813cfb94f8ce5987058928b5ab3e1b34b3.png

Matplotlib Tip 7:#

do something to all the subplots.

Highly encouraged to make that something be adding grids

import warnings
warnings.filterwarnings("ignore", category=UserWarning)

w = 4
h = 3
nrows = 3
ncols = 4
fig, ax = plt.subplots(nrows, ncols, figsize=(w*ncols,h*nrows), dpi=150,
                       gridspec_kw={"width_ratios": [1.0, 2.0, 1.0, 1.0], 
                                    "height_ratios":[0.3, 1.0, 0.5]})
plt.subplots_adjust(wspace=0.05, hspace=0.25)


tups = list(it.product(range(nrows), range(ncols)))

for i, tup in enumerate(tups):
    if tup != (1,1):
        ax[tup].scatter(np.random.uniform(0,1,100),np.random.uniform(0,1,100), lw=0)
    else:
        ax[tup].scatter(np.random.uniform(0,1,500),np.random.uniform(0,1,500),
                        c=np.linspace(0,1,500), cmap='rainbow', lw=0, marker="$\u266B$", s=100)
        ax[tup].scatter(np.random.uniform(0,1,500),np.random.uniform(0,1,500),
                        c='.2', lw=0, marker="$\$$", s=100)
    if tup == (2,3):
        ax[tup].scatter(np.random.uniform(0,1,100),np.random.uniform(0,1,100),
                        lw=2, c='None', ec='.2', s=50)
    if tup == (1,3):
        ax[tup].scatter(np.random.uniform(0,1,100),np.random.uniform(0,1,100),
                        lw=2, c='w', ec='.8', s=50)


xlabs = ['Boston', 'New York', 'Cleveland', 'Nowhere', 'The Land B4 Time', 'Tucson',
         'Texax', 'Yucky Lake', 'Great Big Nope', 'Town!!', 'Lake Of The Ozark (s2e2)', 'Yawnville']

plt.suptitle(r'Definitive Ranking of Cities $\mathbf{(dataviz=passion)}$', y=0.95, fontsize=30, color='limegreen')
for ai, a in enumerate(fig.axes):
    a.grid(linewidth=2, alpha=0.3, color='0.7')
    a.set_xlabel(xlabs[ai], fontsize='large')
    a.set_xlim(0,1)
    a.set_ylim(0,1)
    a.set_xticklabels(["" for i in a.get_xticks()])
    a.set_yticklabels(["" for i in a.get_yticks()])
    
    
plt.savefig('images/pngs/class_01_figure4.png', dpi=425, bbox_inches='tight')
plt.savefig('images/pdfs/class_01_figure4.pdf', bbox_inches='tight')
plt.show()

_images/4458bf31d48466fd9af2edb64af4e57b50c747f9791ed0c334b0a086a31b2df8.png

Matplotlib Tip 8:#

Writing a paper? How about one baseline fontsize?

also sneaking a “set_xticks([])” into this one

fs = 12

w = 4
h = 3
nrows = 3
ncols = 4
fig, ax = plt.subplots(nrows, ncols, figsize=(w*ncols,h*nrows), dpi=150,
                       gridspec_kw={"width_ratios": [1.0, 2.0, 1.0, 1.0], 
                                    "height_ratios":[0.3, 1.0, 0.5]})
plt.subplots_adjust(wspace=0.0, hspace=0.0)


tups = list(it.product(range(nrows), range(ncols)))

for ai, tup in enumerate(tups):
    fs_i = fs * (2*ai/len(tups)) + np.random.uniform(-2,2)
    ax[tup].text(0.1, 0.5, 'fs = %.2f'%fs_i, fontsize=fs_i, color='.4', ha='left')
    ax[tup].set_xticks([])
    ax[tup].set_yticks([])


plt.savefig('images/pngs/class_01_figure5.png', dpi=425, bbox_inches='tight')
plt.savefig('images/pdfs/class_01_figure5.pdf', bbox_inches='tight')
plt.show()

_images/aecff4ff8509a72e34f25ddfdf5f30efaa0c2cce8f66ed9c44addc27a580c6c8.png

Matplotlib Tip 9:#

Here’s a small one, how about putting percentages instead of fractions?

Also sneaking a sharey=True into this tip

from matplotlib import ticker

raw_data1 = np.random.beta(0.8, 2, 10000)
raw_data1[raw_data1<=0] = 0
raw_data2 = np.random.normal(0.35, 0.1, 10000)
raw_data2[raw_data2<=0] = 0

fs = 12

w = 4
h = 3
nrows = 1
ncols = 2
fig, ax = plt.subplots(nrows, ncols, figsize=(w*ncols,h*nrows), dpi=100, sharey=True)

ax[0].hist(raw_data1, bins=40, ec='.2', color='w')
yticks = ax[0].get_yticks()

ax[0].vlines(np.mean(raw_data1), 0, yticks[-1]*1.01, color='orange',
          label='mean = %.1f%%'%(np.mean(raw_data1)*100), linestyle=':', lw=2)

ax[1].hist(raw_data2, bins=40, ec='.2', color='w')
ax[1].vlines(np.mean(raw_data2), 0, yticks[-1]*1.01, color='orange',
          label='mean = %.1f%%'%(np.mean(raw_data2)*100), linestyle=':', lw=2)

ax[0].set_ylabel('Number of pizzas')

for a in fig.axes:
    a.xaxis.set_major_formatter(ticker.PercentFormatter(xmax=1.0))
    a.set_xlim(-0.01,1.01)
    a.set_ylim(yticks[0], yticks[-1]*1.01)
    a.legend()
    a.grid(linewidth=1.5, color='.7', alpha=0.3)
    a.set_xlabel('Percent.... pizza!!!')


plt.savefig('images/pngs/class_01_figure6.png', dpi=425, bbox_inches='tight')
plt.savefig('images/pdfs/class_01_figure6.pdf', bbox_inches='tight')
plt.show()

_images/22f2e706c8f01de8f334388d92fdb52ab104d954cab8797291e64aa0b196811a.png

Matplotlib Tip 10:#

(worried whisper…)

…It’s scatterplot week….

(pronounced like a Great British Bakeoff contestant)

from scipy.stats import gaussian_kde
from scipy.stats import linregress

# so let's play with some fake data
raw_data_x = np.array(sorted(np.random.normal(0.5,0.25,20000)))
raw_data_y = np.linspace(0.01,0.99,20000)+np.random.normal(0.0,0.125,20000)
slope, intercept, rvalue, pvalue, stderr = linregress(raw_data_x, raw_data_y)
xslope = [min(raw_data_x), max(raw_data_x)]
yslope = [xslope[i]*slope+intercept for i in range(len(xslope))]

w = 4
h = 4
nrows = 2
ncols = 2
fig, ax = plt.subplots(nrows, ncols, figsize=(w*ncols,h*nrows), dpi=100)
plt.subplots_adjust(hspace=0,wspace=0)

tups = list(it.product(range(nrows), range(ncols)))

########## ax[(0,0)]
ax[tups[0]].scatter(raw_data_x[:10000], raw_data_y[:10000], ec='w',
           lw=0.1, s=np.random.normal(12,2.5,10000), c='.2')
ax[tups[0]].scatter(raw_data_x[10000:], raw_data_y[10000:], ec='w',
           lw=0, s=10, c=raw_data_x[::2], cmap='viridis', alpha=0.9)

########## ax[(0,1)]
xy = np.vstack([raw_data_x, raw_data_y])
z = gaussian_kde(xy)(xy)
ax[tups[1]].scatter(raw_data_x, raw_data_y, lw=0, s=5, c=z, cmap='magma', alpha=0.9, marker='s')

########## ax[(1,0)]
cc = plt.cm.Blues(np.random.uniform(0.3,1,len(raw_data_x)))
ax[tups[2]].scatter(raw_data_x, raw_data_y, lw=np.random.beta(0.2, 2.0, len(raw_data_x)),
                    s=np.random.uniform(2, 30, len(raw_data_x)), fc='None', ec=cc, marker='o')

########## ax[(1,1)]
inds = np.random.choice(range(len(raw_data_x)), size=int(0.01*len(raw_data_x)), replace=False)
ax[tups[3]].scatter(raw_data_x, raw_data_y, lw=0, s=40, c='dodgerblue',
                    marker='X', alpha=0.35, label='normal')
ax[tups[3]].scatter(raw_data_x[inds], raw_data_y[inds], lw=1, s=40, c='None',
                    marker='X', ec='.2', label='special')


for a in fig.axes:
    a.set_ylim(-0.5,1.5)
    a.set_xlim(-0.5,1.5)
    a.plot([-2,2],[-2,2],color='.3',linestyle=':',label=r'$y=x$')
    a.plot(xslope, yslope, color='.6', linewidth=2, label=r'$r^2 = %.3f$'%(rvalue**2))
    a.grid(linewidth=1.25, color='.7', alpha=0.3)
    a.set_xticklabels(["" for i in a.get_xticks()])
    a.set_yticklabels(["" for i in a.get_xticks()])
    a.tick_params(axis='both',which='both',length=0)
    a.legend(loc=2, fontsize='small')

plt.savefig('images/pngs/class_01_figure7.png', dpi=425, bbox_inches='tight')
plt.savefig('images/pdfs/class_01_figure7.pdf', bbox_inches='tight')
plt.show()

_images/f36ca1e00f2c395e73038568d6db84bddb40f7946291ee3860fadf540a0c26b5.png

Matplotlib Tip 11:#

Spanning subplots for dramatic multi-panel figures

import matplotlib.pyplot as plt
import numpy as np
import matplotlib
from matplotlib import gridspec

# Set figure params
nrows = 3 # number of rows in the plot
ncols = 6 # number of columns
cols = plt.cm.viridis(np.linspace(0.3,0.85,ncols))
fs = np.round(np.linspace(0,1,ncols),2)


# First define the figure environment
plt.figure(dpi=150, figsize=(8,7))

# Next, define the "GridSpec"
gs = gridspec.GridSpec(nrows,
                       ncols,
                       width_ratios=[1]*ncols,
                       height_ratios=[1.25, 1, 2])

# Now, plot the data
for i in range(ncols):
    f = fs[i]
    
    ##### TOP ROW ######
    ax_row1_i = plt.subplot(gs[i])
    xvs = np.random.normal(0,1,100)
    yvs = np.random.normal(0,1,100)
    ax_row1_i.scatter(xvs,yvs,color=cols[i])
    
    ax_row1_i.set_xticks([])
    ax_row1_i.set_yticks([])
    ####################
    
    
    ##### SECOND ROW ######
    ax_row2_i = plt.subplot(gs[i+ncols])
    ax_row2_i.plot(np.cumsum(xvs),
                   np.cumsum(yvs),color=cols[i])
    
    if i!=ncols-1:
        ax_row2_i.arrow(0.8, 1.115, 0.4, 0, head_width=0.07, head_length=0.08, width=0.008,
                  fc=cols[i], ec=cols[i],transform=ax_row2_i.transAxes,clip_on=False)
        
    if i > 0:
        ax_row2_i.set_yticks([])

    else:
        ax_row1_i.text(-0.60, 1.05,'A',
                 fontsize=14,
                 transform=ax_row1_i.transAxes,
                 fontweight='semibold',
                 va='center',
                 ha='left',clip_on=False)
        ax_row2_i.text(-0.60, 1.25,'B',
                 fontsize=14,
                 transform=ax_row2_i.transAxes,
                 fontweight='semibold',
                 va='center',
                 ha='left',clip_on=False)

    ax_row2_i.spines['top'].set_visible(False)
    ax_row2_i.spines['right'].set_visible(False)
    ax_row2_i.set_xticks([])
    ax_row2_i.set_title('$f = %.1f$'%f, fontsize=9,y=1)
    #######################
    

######## BOTTOM ROW #########
axi = plt.subplot(gs[nrows-1,:]) # this makes the subplot span axes

xvals = np.linspace(0,1)
axi.plot(xvals,1-(np.cumsum(xvals)/sum(xvals))[::-1])

axi.annotate("",
             xy=(0.23, 0.72),
             xytext=(0.28, 0.52),
             transform=axi.transAxes,
             arrowprops=dict(shrink=0.1, ec='#808080',fc='w',lw=1.2))
axi.text(0.30, 0.725,
         'other text',
         fontsize=10,
         transform=axi.transAxes,
         color='#525252',
         va='bottom',ha='right')

axi.annotate("",
             xy=(0.35, 0.25),
             xytext=(0.30, 0.45),
             transform=axi.transAxes,
             arrowprops=dict(shrink=0.1, ec='#808080',fc='w',lw=1.2))
axi.text(0.325, 0.2375,
         'some text',
         fontsize=10,
         transform=axi.transAxes,
         color='#525252',
         va='top',ha='left')

axi.set_xlim(0,1)
axi.set_ylim(0,1)

axi.text(-0.09, 1.01,'C',
         fontsize=14,
         transform=axi.transAxes,
         fontweight='semibold',
         va='center',
         ha='left',clip_on=False)

#############################



plt.suptitle('Big, Fancy Figure',y=0.95,fontweight='semibold',fontsize='x-large')
plt.subplots_adjust(wspace=0.15, hspace=0.35)

plt.savefig('images/pngs/class_01_figure8.png', dpi=425, bbox_inches='tight')
plt.savefig('images/pdfs/class_01_figure8.pdf', bbox_inches='tight')
plt.show()

_images/2fd936249278caa6cb5a91ed1a02b493a47053175866f3d11cd8a1b0dbb00ddc.png

Can you think of any more great tips for making visualizations in matplotilb? Send them our way!

Matplotlib Cheatsheet#

Next time…#

Networks and networkx! class_02_networkx1.ipynb

References and further resources:#

Class Webpages
- Jupyter Book: https://asmithh.github.io/network-science-data-book/intro.html
- Github: asmithh/network-science-data-book
- Syllabus and course details: https://brennanklein.com/phys7332-fall24
Data Visualization with matplotlib: https://matplotlib.org/stable/plot_types/index.html
Intro Python & Numpy: https://numpy.org/numpy-tutorials/
Harris, C.R., Millman, K.J., van der Walt, S.J. et al. Array programming with NumPy. Nature 585, 357–362 (2020). https://doi.org/10.1038/s41586-020-2649-2