In the digital age, data is the essence of scientific discovery and engineering innovation. From the tiniest subatomic particles to the vast expanse of the cosmos, the ability to represent and manipulate data effectively is crucial. But how do we translate the complexities of the physical world into a form that a computer can understand and process?
Imagine you are an engineer working on a project to design a new material with unique properties. To achieve this, you need to model the material’s behavior under various conditions. This requires handling a vast array of data points, from atomic structures to stress-strain curves. Or consider a scientist analyzing the genomic sequences of thousands of organisms to understand evolutionary patterns. The sheer volume and variety of data involved can be overwhelming.
In Python, the key to managing this data lies in its diverse and powerful data types and structures. These tools allow us to represent everything from simple numerical values to complex relationships and patterns, making it possible to perform sophisticated analyses and simulations. Python’s flexibility and readability make it an ideal language for engineers and scientists who need to focus on solving problems rather than wrestling with syntax.
Python data types refer to the different kinds of data that can be used in a Python program, such as numbers, strings, and boolean values. Python data structures, on the other hand, are objects that can hold multiple values, such as lists, tuples, and dictionaries.
Historically, the concept of data types and data structures has been important in computer science because it helps programmers organize and manipulate data in an efficient and meaningful way. By using the appropriate data type or data structure for a given task, programmers can optimize their code and improve its performance.
Here are some of the most common Python data types and data structures:
Data Types
Numbers: There are several different types of numbers in Python, including integers, floats, and complex numbers. These can be used for calculations and other mathematical operations.
Strings: Strings are sequences of characters, and are used to represent text in Python. They can be manipulated using a variety of built-in string methods.
Booleans: Booleans are true/false values that are used for logical operations in Python.
NoneType: NoneType is a special type in Python that represents the absence of a value.
Data Structures
Lists: Lists are ordered collections of values, and can contain elements of different types. They can be modified by adding or removing elements.
Tuples: Tuples are similar to lists, but are immutable, meaning they cannot be modified once they are created.
Sets: Sets are unordered collections of unique elements. They can be used for tasks such as removing duplicates or testing membership.
Dictionaries: Dictionaries are key-value pairs, and can be used to store and retrieve data using a key.
Understanding data types and data structures is an essential part of writing effective Python code. By using the appropriate data type or data structure for a given task, programmers can write more efficient and effective programs.
2.1 Data and Variables
Data is the main ingredient in programming. This data is information. In general, a software program (also called code) involves processing data. Essentially, this data has to be input into the computer. Inside the computer, in a high-level programming language like Python, the data is given a name and this name is linked to the memory location of the data. These names can only contain alphanumeric characters (letters and numbers) and underscores. However, the first character of the name must be a letter or underscore. Spaces within a variable name are not permitted, and the variable names are case-sensitive (e.g., a and A will be considered different variables) !
In the following example, assume we have a certain number of experimental specimens with a value of 10 ! Let us assign this data a name called n_specimens !
n_specimens =10
n_specimens
10
Now let us check if indeed the name n_specimens points to the data 10
print(n_specimens)
10
Now in case we have recieved information that the data has changed (e.g. miscommunication, failed specimens etc.), we do not have to create a new name. In case the number of specimens has increased by 5, then we can update this information as follows:
n_specimens = n_specimens +5
We can check if the value has been updated !
n_specimens
15
As the data i.e information that the name ‘n_specimens’ represents can vary, this is called a variable ! Henceforth in programming, a variable ‘contains’ data. You can remove a variable from the notebook using the del function. Typing del n_specimens will clear the variable n_specimens from the workspace. If you want to remove all the variables in the notebook, you can use the magic command %reset. This is unique to Jupyter. In case you use Spyder or another IDE, these commands are not valid.
Note
The mathematical equation x=x+1 has no solution for any value of x. In programming, if we initialize the value of x to be 1, then the statement makes perfect sense. It means, “Add x and 1, which is 2, then assign that value to the variable x”. Note that this operation overwrites the previous value stored in x.
The Jupyter Notebook has its data list to store all the variables in the notebook. As a result of the previous example, you will see the variable ‘n_specimens’ in this data list. You can view a list of all the variables in the notebook using the magic command %whos.
%whos
Variable Type Data/Info
----------------------------------------------------
LogNorm type <class 'matplotlib.colors.LogNorm'>
ax Axes Axes(0.125,0.11;0.62x0.77)
cm module <module 'matplotlib.cm' f<...>ckages/matplotlib/cm.py'>
cmap LinearSegmentedColormap <matplotlib.colors.Linear<...>ap object at 0x11192dd10>
fig Figure Figure(1056x480)
i int 201
n_specimens int 15
norm LogNorm <matplotlib.colors.LogNorm object at 0x127ed9610>
np module <module 'numpy' from '/Us<...>kages/numpy/__init__.py'>
ojs_define function <function ojs_define at 0x117100040>
plt module <module 'matplotlib.pyplo<...>es/matplotlib/pyplot.py'>
prices_per_gb ndarray 203: 203 elems, type `float64`, 1624 bytes
sm ScalarMappable <matplotlib.cm.ScalarMapp<...>le object at 0x15b174690>
years ndarray 203: 203 elems, type `float64`, 1624 bytes
Caution!
You can overwrite variables or functions that have been stored in Python.
2.2 Data Types and Data Structures
From a computational point of view, it would be efficient if the information regarding the type of data is also included in the variable. The reason this information is required is because the data that the variable contains can vary. Hence this variable should be able to accomodate the possible variations. For example, an integer variable requires lesser memory than a real number. Hence, including this information is essential for storing and processing data. In python, the data type is automatically specified during variable creation. The basic data types are boolean, int, float, string, list, tuple, dictionary, set.
Let us create an integer and a real number !
an_integer =1a_real_number =1.0
%whos
Variable Type Data/Info
----------------------------------------------------
LogNorm type <class 'matplotlib.colors.LogNorm'>
a_real_number float 1.0
an_integer int 1
ax Axes Axes(0.125,0.11;0.62x0.77)
cm module <module 'matplotlib.cm' f<...>ckages/matplotlib/cm.py'>
cmap LinearSegmentedColormap <matplotlib.colors.Linear<...>ap object at 0x11192dd10>
fig Figure Figure(1056x480)
i int 201
n_specimens int 15
norm LogNorm <matplotlib.colors.LogNorm object at 0x127ed9610>
np module <module 'numpy' from '/Us<...>kages/numpy/__init__.py'>
ojs_define function <function ojs_define at 0x117100040>
plt module <module 'matplotlib.pyplo<...>es/matplotlib/pyplot.py'>
prices_per_gb ndarray 203: 203 elems, type `float64`, 1624 bytes
sm ScalarMappable <matplotlib.cm.ScalarMapp<...>le object at 0x15b174690>
years ndarray 203: 203 elems, type `float64`, 1624 bytes
As you can see, the type of a_real_number is float. Float is a type of data that can include decimal values. Given a variable, the type can be extracted using the function type().
type(an_integer)
int
2.2.1 Numeric
There are three numeric data types: integers, floating point numbers, and complex numbers. Information about the precision and internal representation of floating point numbers for the machine on which your program is running is available in sys.float_info. Complex numbers have a real and imaginary part, which are each a floating point number. To extract these parts from a complex number z, use z.real and z.imag.
To Summarize:
int - holds signed integers.
float- holds floating precision numbers and it’s accurate up to 15 decimal places.
complex - holds complex numbers (with float representations of the real and imaginary part).
The constructors int(), float(), and complex() can be used to produce numbers of a specific type.
Because we are limited by the resolution of the maschine to represent data, we are limited to what we can represent. In the above output from python, epsilon refers to the smallest resolution of the machine. This means that a value \(\delta = a-b < epsilon / 2\) cannot be represented.
We can test this using the following code !
1==1
True
1*10**-16+1==1
True
0.7*2.220446049250313*10**-16+1==1
False
All numeric types support the following operations:
Operation
Result
x + y
sum of x and y
x - y
difference of x and y
x * y
product of x and y
x / y
quotient of x and y
x // y
floored quotient of x and y
x % y
remainder of x / y
-x
x negated
+x
x unchanged
abs(x)
absolute value or magnitude of x
int(x)
x converted to integer
float(x)
x converted to floating point
complex(re, im)
a complex number with real part re, imaginary part im. im defaults to zero.
c.conjugate()
conjugate of the complex number c
divmod(x, y)
the pair (x // y, x % y)
pow(x, y)
x to the power y
x ** y
x to the power y
Please note that the complex type does not support % and // operations !
2.2.2 Boolean
One of the key components of programming is the use of testing the truth value of certain operations. The data type used to represent this is the bool data type. Boolean variable can either be True or False. Numeric data types can be converted to boolean data type using the bool() function. Except the value 0, all other numerical values would give True.
start_experiment =True# initializing a boolean variablespecimen_indicator =bool(n_specimens) # converting numeric to boolean typespecimen_indicator
True
print(specimen_indicator)
True
bool(0)
False
Caution
The keywords True and False must have an Upper Case first letter. Using a lowercase true returns an error.
Boolean arithmetic is the arithmetic of true and false logic. A boolean or logical value can either be True or False. Boolean operators in Python include and, or, and not.
The boolean operators in Python are given below:
or
and
not
== (equivalent)
!= (not equivalent)
Given two boolean variables a and b, the following results are obtained for the aformentioned boolean logic !
a
b
not a
not b
a == b
a != b
a or b
a and b
T
F
F
T
F
T
T
F
F
T
T
F
F
T
T
F
T
T
F
F
T
F
T
T
F
F
T
T
T
F
F
F
Try it out: Test the boolean arithmetic for yourself.
2.2.3 Strings
An important data type for representing textual data, especially when processing files etc. is the string data type. Strings can be sequences of letters, numbers, symbols, and spaces. In Python, strings can be almost any length and can also contain spaces. Strings are immutable, which means that you cannot change their characters. To create a string in Python, you can use single quotes (’’), double quotes (““) or triple quotes (”““).
filename ='database_xmcv_21.csv'print(filename)
database_xmcv_21.csv
type(filename)
str
One of the frequently used operation on strings is indexing and slicing. This is used to extract and manipulate string data.
filename[0] # first symbol
'd'
filename[1] # second symbol
'a'
filename[-2] # last symbol
's'
filename[0:16:3] # [start:stop:step]
'dasxv1'
filename[::-1] # reverse text
'vsc.12_vcmx_esabatad'
filename.center(100)
' database_xmcv_21.csv '
Method
Description
capitalize()
Converts the first character to upper case
casefold()
Converts string into lower case
center()
Returns a centered string
count()
Returns the number of times a specified value occurs in a string
encode()
Returns an encoded version of the string
endswith()
Returns true if the string ends with the specified value
expandtabs()
Sets the tab size of the string
find()
Searches the string for a specified value and returns the position of where it was found
format()
Formats specified values in a string
format_map()
Formats specified values in a string
index()
Searches the string for a specified value and returns the position of where it was found
isalnum()
Returns True if all characters in the string are alphanumeric
isalpha()
Returns True if all characters in the string are in the alphabet
isascii()
Returns True if all characters in the string are ascii characters
isdecimal()
Returns True if all characters in the string are decimals
isdigit()
Returns True if all characters in the string are digits
isidentifier()
Returns True if the string is an identifier
islower()
Returns True if all characters in the string are lower case
isnumeric()
Returns True if all characters in the string are numeric
isprintable()
Returns True if all characters in the string are printable
isspace()
Returns True if all characters in the string are whitespaces
istitle()
Returns True if the string follows the rules of a title
isupper()
Returns True if all characters in the string are upper case
join()
Converts the elements of an iterable into a string
ljust()
Returns a left justified version of the string
lower()
Converts a string into lower case
lstrip()
Returns a left trim version of the string
maketrans()
Returns a translation table to be used in translations
partition()
Returns a tuple where the string is parted into three parts
replace()
Returns a string where a specified value is replaced with a specified value
rfind()
Searches the string for a specified value and returns the last position of where it was found
rindex()
Searches the string for a specified value and returns the last position of where it was found
rjust()
Returns a right justified version of the string
rpartition()
Returns a tuple where the string is parted into three parts
rsplit()
Splits the string at the specified separator, and returns a list
rstrip()
Returns a right trim version of the string
split()
Splits the string at the specified separator, and returns a list
splitlines()
Splits the string at line breaks and returns a list
startswith()
Returns true if the string starts with the specified value
strip()
Returns a trimmed version of the string
swapcase()
Swaps cases, lower case becomes upper case and vice versa
title()
Converts the first character of each word to upper case
translate()
Returns a translated string
upper()
Converts a string into upper case
zfill()
Fills the string with a specified number of 0 values at the beginning
2.2.4 List, Tuples
A list is an ordered collection of items. Lists are mutable, which means that their elements can be added, removed, or modified. To create a list in Python, you can use square brackets [] and separate the elements by commas. The contents of the list can be any of the aformentioned elementary data type.
data_discussed_so_far[2:4] #start, stop, location of final item + 1
[True, 'database_xcmm_45.csv']
The colon operator denotes start : end + 1. data_discussed_so_far[2:4]returns the 2nd element, and 3rd element but not the fourth even though 3 is used in the index.
A tuple is similar to a list, but it is immutable. Once a tuple is created, you cannot change its elements. To create a tuple in Python, you can use parentheses () and separate the elements by commas.
my_numbers = (1, 2, 3, 4)type(my_numbers)
tuple
my_numbers[3]
4
List Methods
Description
append()
Adds an element at the end of the list
clear()
Removes all the elements from the list
copy()
Returns a copy of the list
count()
Returns the number of elements with the specified value
extend()
Add the elements of a list (or any iterable), to the end of the current list
index()
Returns the index of the first element with the specified value
insert()
Adds an element at the specified position
pop()
Removes the element at the specified position
remove()
Removes the first item with the specified value
reverse()
Reverses the order of the list
sort()
Sorts the list
Tuple Methods
Description
count()
Returns the number of times a specified value occurs in a tuple
index()
Searches the tuple for a specified value and returns the position of where it was found
2.2.5 Dictionary
Dictionaries as the name suggests is similar to the classical definition of a dictionary. Given a ‘key’ piece of information, like a word, the detailed meaning of this word can be obtained. Similarly, we can organise data in the form of key:value pairs. A dictionary data type in python is a collection of key-value pairs. Each key in a dictionary must be unique, and the values can be of any data type. To create a dictionary in Python, you can use curly braces {} and separate the key-value pairs by colons (:).
Note
In Python, lists and tuples are organized and accessed based on position. Dictionaries in Python are organized and accessed using keys and values. The location of a pair of keys and values stored in a Python dictionary is irrelevant.
Keys can be a string, number or even tuple (but not a list). In contrast to the list where data is indexed by integer values denoting thier position in the list, the dictionary can be indexed using the keys. Dictionaries can be used to organize data and avoid errors while indexing and extracting information.
material_properties['viscosity'][-1] # extract the value of the key 'density'
4
type(material_properties)
dict
Method
Description
clear()
Removes all the elements from the dictionary
copy()
Returns a copy of the dictionary
fromkeys()
Returns a dictionary with the specified keys and value
get()
Returns the value of the specified key
items()
Returns a list containing a tuple for each key value pair
keys()
Returns a list containing the dictionary’s keys
pop()
Removes the element with the specified key
popitem()
Removes the last inserted key-value pair
setdefault()
Returns the value of the specified key. If the key does not exist: insert the key, with the specified value
update()
Updates the dictionary with the specified key-value pairs
values()
Returns a list of all the values in the dictionary
2.2.6 Set
A set is an unordered collection of unique elements. Sets are mutable, which means that you can add or remove elements from them. To create a set in Python, you can use curly braces {} or the set() function. Set theoretical rules like union, intersection and difference can be applied on this data type. It is specified by comma-seperated data in curly braces. The command set() creates a set.
my_integers = {1, 2, 3, 4}type(my_integers)
set
The following methods are defined for sets !
Method
Description
add()
Adds an element to the set
clear()
Removes all the elements from the set
copy()
Returns a copy of the set
difference()
Returns a set containing the difference between two or more sets
difference_update()
Removes the items in this set that are also included in another, specified set
discard()
Remove the specified item
intersection()
Returns a set, that is the intersection of two or more sets
intersection_update()
Removes the items in this set that are not present in other, specified set(s)
isdisjoint()
Returns whether two sets have a intersection or not
issubset()
Returns whether another set contains this set or not
issuperset()
Returns whether this set contains another set or not
pop()
Removes an element from the set
remove()
Removes the specified element
symmetric_difference()
Returns a set with the symmetric differences of two sets
symmetric_difference_update()
inserts the symmetric differences from this set and another
union()
Return a set containing the union of sets
update()
Update the set with another set, or any other iterable
2.3 Input and Output
In Python, you can get input from the user through the console using the input() function. This function takes a string argument, which is used as a prompt to ask the user for input.
2.4 Saving data
The pickle module in Python is used for serializing and deserializing Python objects. Serialization is the process of converting an object into a byte stream, and deserialization is the process of converting a byte stream back into an object.
Here are some basic steps for using the pickle module in Python:
Note that the pickle module can be used to serialize and deserialize any Python object, including lists, tuples, sets, and custom objects. However, it is important to note that the pickle module can be unsafe if you are unpickling data from an untrusted source, as it can execute arbitrary code.
The second argument in the open() function determines the mode in which the file will be opened. In particular, it specifies whether the file should be opened for reading or writing, and whether it should be opened in text mode or binary mode. Here’s a summary of the different modes that can be specified:
r (default): Open the file for reading in text mode.
w: Open the file for writing in text mode. If the file already exists, it will be truncated (i.e., emptied).
x: Open the file for exclusive creation in text mode. If the file already exists, the operation will fail.
a: Open the file for writing in text mode. If the file already exists, new data will be appended to it.
b: Open the file in binary mode, regardless of whether it is being opened for reading or writing. This mode should be used for non-text files, such as images, audio, or serialized data.
t: Open the file in text mode, regardless of whether it is being opened for reading or writing. This is the default mode.
+: Open the file for updating (i.e., both reading and writing).
Combining these modes allows you to specify more complex options. For example, if you want to open a binary file for reading and writing, you would use ‘rb+’ or wb+. Similarly, if you want to open a text file for appending, you would use ‘at’.
In the context of the pickle module, you would typically use ‘wb’ to open a file for writing in binary mode, and ‘rb’ to open a file for reading in binary mode. This is because pickle serializes Python objects into a binary format, which is not compatible with text mode. The reason why it’s called serialization and not just copying is because serialization involves more than just making a copy of an object. When we serialize an object, we’re actually taking all of its data (like its properties, values, and attributes) and converting it into a format that can be easily saved or transmitted.
This format is typically a sequence of bytes, which is a series of numbers that represent the object’s data in binary code. This sequence of bytes is what gets saved or transmitted, rather than the original object itself. So serialization is more than just copying an object - it’s actually a process of transforming an object’s data into a format that can be saved or transmitted. And deserialization is the process of transforming that saved or transmitted data back into an object with all of its original properties and attributes.
2.5 Help
Applying the function help on any python object, the details of the object can be retrieved.
help(print)
Help on built-in function print in module builtins:
print(*args, sep=' ', end='\n', file=None, flush=False)
Prints the values to a stream, or to sys.stdout by default.
sep
string inserted between values, default a space.
end
string appended after the last value, default a newline.
file
a file-like object (stream); defaults to the current sys.stdout.
flush
whether to forcibly flush the stream.
2.6 Worked Example 1: Climate Change Indicators
Let’s consider a scenario related to climate change, focusing on data points that are commonly discussed in environmental studies: average global temperature change (float), carbon dioxide (CO2) emissions (int, measured in gigatons), and a boolean indicating whether renewable energy usage is increasing year-over-year.
From these inputs, we aim to derive properties that can provide insights into the progress towards mitigating climate change. These could include:
Temperature Anomaly Risk (boolean): Indicates if the change in average global temperature is beyond a certain threshold, suggesting a high risk to global climates.
Emissions Trend (boolean): Determines if CO2 emissions are within a target reduction path.
# Input dataaverage_temperature_change =1.2# Example change in degrees Celsiusco2_emissions =38# CO2 emissions in gigatons for the current yearrenewable_energy_increasing =True# Whether the use of renewable energy is increasing# Constants for calculationstemperature_risk_threshold =1.5# Degrees Celsiustarget_co2_emissions =35# Target emissions in gigatons# Deriving additional properties# Temperature Anomaly Risk (true if average_temperature_change exceeds threshold)temperature_anomaly_risk = average_temperature_change > temperature_risk_threshold# Emissions Trend (true if emissions are below or equal to the target)emissions_trend = co2_emissions <= target_co2_emissions# Display the derived propertiesprint(f"Temperature Anomaly Risk: {temperature_anomaly_risk}")print(f"Emissions Trend: {emissions_trend}")
Temperature Anomaly Risk: False
Emissions Trend: False
This example synthesizes basic climate-related data into actionable insights. The Temperature Anomaly Risk signals when immediate action is needed to curb global warming. The Emissions Trend offers a quick check against set targets for CO2 emissions, indicating whether more aggressive measures are needed.
Such calculations, though simplified here, are vital in real-world policy-making and environmental science, helping to monitor and guide efforts against climate change.
2.7 Worked Example 2: Creating an Environmental Monitoring Database
Consider the environmental monitoring scenario. Using sensor data, we can derive additional properties that provide more nuanced insights into the environment. These insights can be critical for different applications, such as agricultural planning, public health advisories, and even automated home systems. Assuming the user provides the temperature, humidity and information if it is raining, a recommendation system can be setup that also logs the data.
import mathimport datetime as dtimport pickleproperties = {}# Given inputs remain the samedate = dt.datetime.now()properties['date and time'] = (date.strftime("%m/%d/%Y, %H:%M:%S"), date.weekday())properties['temperature'] =float('33') properties['humidity'] =int('65') properties['is_raining'] =bool('True') # Existing derived propertiesproperties['suitable_for_plant_growth'] =18< properties['temperature'] <25and properties['humidity'] >50andnot properties['is_raining']properties['risk_of_equipment_damage'] = properties['humidity'] >75properties['additional_watering_needed'] =not properties['is_raining'] and properties['humidity'] <65# Frost Warning (assuming frost risk if temperature is close to 0°C and not raining)properties['frost_warning'] = properties['temperature'] <5andnot properties['is_raining']properties['inspection_needed'] = properties['risk_of_equipment_damage'] or properties['frost_warning']# Add the newly derived propertiesenvironmental_data.append(properties)# Save the data point in a databasewithopen('environmental_state.dbs', 'wb') as my_file: pickle.dump(environmental_data, my_file)
Import the database and view it !
withopen('environmental_state.dbs', 'rb') as my_file: my_data_that_is_imported = pickle.load(my_file)print(my_data_that_is_imported)
What is the difference between a list and a tuple in Python?
What is the difference between an int and a float in Python?
What is the purpose of type conversion in Python? Give an example.
How do you declare a variable in Python, and what are the rules for naming variables?
We have the average temperature data for each day of the week for one week. Which is the most suitable data structure to represent this data ?
2.8.2 Coding
Create a program that prompts the user to enter a sentence. Your program should convert the sentence to a list of words and then sort the list in alphabetical order. Finally, print the sorted list to the console.
Get the largest number from a list of integers.
Create a program that prompts the user to enter a string. Your program should then check if the string is a palindrome (i.e., it reads the same forward and backward). If the string is a palindrome, print “Palindrome” to the console. Otherwise, print “Not a palindrome” to the console.