This the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.


Gregor von Laszewski (

1 - Python Modules

Gregor von Laszewski (

Often you may need functionality that is not present in Python’s standard library. In this case, you have two option:

  • implement the features yourself
  • use a third-party library that has the desired features.

Often you can find a previous implementation of what you need. Since this is a common situation, there is a service supporting it: the Python Package Index (or PyPi for short).

Our task here is to install the autopep8 tool from PyPi. This will allow us to illustrate the use of virtual environments using the venv and installing and uninstalling PyPi packages using pip.

Updating Pip

You must have the newest version of pip installed for your version of python. Let us assume your python is registered with python and you use venv, than you can update pip with

pip install -U pip

without interfering with a potential system-wide installed version of pip that may be needed by the system default version of python. See the section about venv for more details

Using pip to Install Packages

Let us now look at another important tool for Python development: the Python Package Index, or PyPI for short. PyPI provides a large set of third-party Python packages.

To install a package from PyPI, use the pip command. We can search for PyPI for packages:

$ pip search --trusted-host autopep8 pylint

It appears that the top two results are what we want, thus install them:

$ pip install --trusted-host autopep8 pylint

This will cause pip to download the packages from PyPI, extract them, check their dependencies and install those as needed, then install the requested packages.



Install guizero with the following command:

sudo pip install guizero

For a comprehensive tutorial on guizero, click here.


You can install Kivy on macOS as follows:

brew install pkg-config sdl2 sdl2_image sdl2_ttf sdl2_mixer gstreamer
pip install -U Cython
pip install kivy
pip install pygame

A hello world program for kivy is included in the cloudmesh.robot repository. Which you can find here

To run the program, please download it or execute it in cloudmesh.robot as follows:

cd cloudmesh.robot/projects/kivy

To create stand-alone packages with kivy, please see:


Formatting and Checking Python Code

First, get the bad code:

$ wget --no-check-certificate -O

Examine the code:

$ emacs

As you can see, this is very dense and hard to read. Cleaning it up by hand would be a time-consuming and error-prone process. Luckily, this is a common problem so there exist a couple of packages to help in this situation.

Using autopep8

We can now run the bad code through autopep8 to fix formatting problems:

$ autopep8 >

Let us look at the result. This is considerably better than before. It is easy to tell what the example1 and example2 functions are doing.

It is a good idea to develop a habit of using autopep8 in your python-development workflow. For instance: use autopep8 to check a file, and if it passes, make any changes in place using the -i flag:

$ autopep8    # check output to see of passes
$ autopep8 -i # update in place

If you use pyCharm you can use a similar function while pressing on Inspect Code.

Writing Python 3 Compatible Code

To write python 2 and 3 compatible code we recommend that you take a look at:

Using Python on FutureSystems

This is only important if you use Futuresystems resources.

To use Python you must log in to your FutureSystems account. Then at the shell prompt execute the following command:

$ module load python

This will make the python and virtualenv commands available to you.

The details of what the module load command does are described in the future lesson modules.



The Python Package Index is a large repository of software for the Python programming language containing a large number of packages, many of which can be found on pypi. The nice thing about pypi is that many packages can be installed with the program ‘pip.’

To do so you have to locate the <package_name> for example with the search function in pypi and say on the command line:

$ pip install <package_name>

where package_name is the string name of the package. an example would be the package called cloudmesh_client which you can install with:

$ pip install cloudmesh_client

If all goes well the package will be installed.

Alternative Installations

The basic installation of python is provided by However, others claim to have alternative environments that allow you to install python. This includes

Typically they include not only the python compiler but also several useful packages. It is fine to use such environments for the class, but it should be noted that in both cases not every python library may be available for install in the given environment. For example, if you need to use cloudmesh client, it may not be available as conda or Canopy package. This is also the case for many other cloud-related and useful python libraries. Hence, we do recommend that if you are new to python to use the distribution from, and use pip and virtualenv.

Additionally, some python versions have platform-specific libraries or dependencies. For example, coca libraries, .NET, or other frameworks are examples. For the assignments and the projects, such platform-dependent libraries are not to be used.

If however, you can write a platform-independent code that works on Linux, macOS, and Windows while using the version but develop it with any of the other tools that are just fine. However, it is up to you to guarantee that this independence is maintained and implemented. You do have to write requirements.txt files that will install the necessary python libraries in a platform-independent fashion. The homework assignment PRG1 has even a requirement to do so.

In order to provide platform independence we have given in the class a minimal python version that we have tested with hundreds of students: If you use any other version, that is your decision. Additionally, some students not only use but have used iPython which is fine too. However, this class is not only about python, but also about how to have your code run on any platform. The homework is designed so that you can identify a setup that works for you.

However, we have concerns if you for example wanted to use chameleon cloud which we require you to access with cloudmesh. cloudmesh is not available as conda, canopy, or other framework packages. Cloudmesh client is available form pypi which is standard and should be supported by the frameworks. We have not tested cloudmesh on any other python version than which is the open-source community standard. None of the other versions are standard.

In fact, we had students over the summer using canopy on their machines and they got confused as they now had multiple python versions and did not know how to switch between them and activate the correct version. Certainly, if you know how to do that, then feel free to use canopy, and if you want to use canopy all this is up to you. However, the homework and project require you to make your program portable to If you know how to do that even if you use canopy, anaconda, or any other python version that is fine. Graders will test your programs on a installation and not canopy, anaconda, ironpython while using virtualenv. It is obvious why. If you do not know that answer you may want to think about that every time they test a program they need to do a new virtualenv and run vanilla python in it. If we were to run two installs in the same system, this will not work as we do not know if one student will cause a side effect for another. Thus we as instructors do not just have to look at your code but code of hundreds of students with different setups. This is a non-scalable solution as every time we test out code from a student we would have to wipe out the OS, install it new, install a new version of whatever python you have elected, become familiar with that version, and so on and on. This is the reason why the open-source community is using We follow best practices. Using other versions is not a community best practice, but may work for an individual.

We have however in regards to using other python versions additional bonus projects such as

  • deploy run and document cloudmesh on ironpython
  • deploy run and document cloudmesh on anaconda, develop script to generate a conda package form github
  • deploy run and document cloudmesh on canopy, develop script to generate a conda package form github
  • deploy run and document cloudmesh on ironpython
  • other documentation that would be useful


If you are unfamiliar with programming in Python, we also refer you to some of the numerous online resources. You may wish to start with Learn Python or the book Learn Python the Hard Way. Other options include Tutorials Point or Code Academy, and the Python wiki page contains a long list of references for learning as well. Additional resources include:

A very long list of useful information is also available from

This list may be useful as it also contains links to data visualization and manipulation libraries, and AI tools and libraries. Please note that for this class you can reuse such libraries if not otherwise stated.

Jupyter Notebook Tutorials

A Short Introduction to Jupyter Notebooks and NumPy To view the notebook, open this link in a background tab and copy and paste the following link in the URL input area Then hit Go.



Write a python program called that accepts an integer n from the command line. Pass this integer to a function called iterate.

The iterate function should then iterate from 1 to n. If the i-th number is a multiple of three, print multiple of 3, if a multiple of 5 print multiple of 5, if a multiple of both print multiple of 3 and 5, else print the value.


  1. Create a pyenv or virtualenv ~/ENV
  1. Modify your ~/.bashrc shell file to activate your environment upon login.
  1. Install the docopt python package using pip
  1. Write a program that uses docopt to define a command line program. Hint: modify the iterate program.
  1. Demonstrate the program works.

2 - Data Management

Gregor von Laszewski (

Obviously when dealing with big data we may not only be dealing with data in one format but in many different formats. It is important that you will be able to master such formats and seamlessly integrate in your analysis. Thus we provide some simple examples on which different data formats exist and how to use them.



Python pickle allows you to save data in a python native format into a file that can later be read in by other programs. However, the data format may not be portable among different python versions thus the format is often not suitable to store information. Instead we recommend for standard data to use either json or yaml.

import pickle

flavor = {
    "small": 100,
    "medium": 1000,
    "large": 10000

pickle.dump( flavor, open( "data.p", "wb" ) )

To read it back in use

flavor = pickle.load( open( "data.p", "rb" ) )

Text Files

To read text files into a variable called content you can use

content = open('filename.txt', 'r').read()

You can also use the following code while using the convenient with statement

with open('filename.txt','r') as file:
    content =

To split up the lines of the file into an array you can do

with open('filename.txt','r') as file:
    lines =

This cam also be done with the build in readlines function

lines = open('filename.txt','r').readlines()

In case the file is too big you will want to read the file line by line:

with open('filename.txt','r') as file:
    line = file.readline()
    print (line)

CSV Files

Often data is contained in comma separated values (CSV) within a file. To read such files you can use the csv package.

import csv
with open('data.csv', 'rb') as f:
   contents = csv.reader(f)
for row in content:
    print row

Using pandas you can read them as follows.

import pandas as pd
df = pd.read_csv("example.csv")

There are many other modules and libraries that include CSV read functions. In case you need to split a single line by comma, you may also use the split function. However, remember it swill split at every comma, including those contained in quotes. So this method although looking originally convenient has limitations.

Excel spread sheets

Pandas contains a method to read Excel files

import pandas as pd
filename = 'data.xlsx'
data = pd.ExcelFile(file)
df = data.parse('Sheet1')


YAML is a very important format as it allows you easily to structure data in hierarchical fields It is frequently used to coordinate programs while using yaml as the specification for configuration files, but also data files. To read in a yaml file the following code can be used

import yaml
with open('data.yaml', 'r') as f:
    content = yaml.load(f)

The nice part is that this code can also be used to verify if a file is valid yaml. To write data out we can use

with open('data.yml', 'w') as f:
    yaml.dump(data, f, default_flow_style=False)

The flow style set to false formats the data in a nice readable fashion with indentations.


import json
with open('strings.json') as f:
    content = json.load(f)


XML format is extensively used to transport data across the web. It has a hierarchical data format, and can be represented in the form of a tree.

A Sample XML data looks like:

        <item name="item-1"></item>
        <item name="item-2"></item>
        <item name="item-3"></item>

Python provides the ElementTree XML API to parse and create XML data.

Importing XML data from a file:

import xml.etree.ElementTree as ET
tree = ET.parse('data.xml')
root = tree.getroot()

Reading XML data from a string directly:

root = ET.fromstring(data_as_string)

Iterating over child nodes in a root:

for child in root:
    print(child.tag, child.attrib)

Modifying XML data using ElementTree:

  • Modifying text within a tag of an element using .text method:

    tag.text = new_data
  • Adding/modifying an attribute using .set() method:

    tag.set('key', 'value')

Other Python modules used for parsing XML data include


To read RDF files you will need to install RDFlib with

$ pip install rdflib

This will than allow you to read RDF files

from rdflib.graph import Graph
g = Graph()
g.parse("filename.rdf", format="format")
for entry in g:

Good examples on using RDF are provided on the RDFlib Web page at

From the Web page we showcase also how to directly process RDF data from the Web

import rdflib

for s,p,o in g:
    print s,p,o


The Portable Document Format (PDF) has been made available by Adobe Inc. royalty free. This has enabled PDF to become a world wide adopted format that also has been standardized in 2008 (ISO/IEC 32000-1:2008, A lot of research is published in papers making PDF one of the de-facto standards for publishing. However, PDF is difficult to parse and is focused on high quality output instead of data representation. Nevertheless, tools to manipulate PDF exist:

PDFMiner allows the simple translation of PDF into text that than can be further mined. The manual page helps to demonstrate some examples parses pdf documents and identifies some structural elements that can than be further processed.

If you know about other tools, let us know.


A very powerful library to parse HTML Web pages is provided with

More details about it are provided in the documentation page

Beautiful Soup is a python library to parse, process and edit HTML documents.

To install Beautiful Soup, use pip command as follows:

$ pip install beautifulsoup4

In order to process HTML documents, a parser is required. Beautiful Soup supports the HTML parser included in Python’s standard library, but it also supports a number of third-party Python parsers like the lxml parser which is commonly used [@www-beautifulsoup].

Following command can be used to install lxml parser

$ pip install lxml

To begin with, we import the package and instantiate an object as follows for a html document html_handle:

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_handle, `lxml`)

Now, we will discuss a few functions, attributes and methods of Beautiful Soup.

prettify function

prettify() method will turn a Beautiful Soup parse tree into a nicely formatted Unicode string, with a separate line for each HTML/XML tag and string. It is analgous to pprint() function. The object created above can be viewed by printing the prettfied version of the document as follows:


tag Object

A tag object refers to tags in the HTML document. It is possible to go down to the inner levels of the DOM tree. To access a tag div under the tag body, it can be done as follows:

body_div = soup.body.div

The attrs attribute of the tag object returns a dictionary of all the defined attributes of the HTML tag as keys.

has_attr() method

To check if a tag object has a specific attribute, has_attr() method can be used.

if body_div.has_attr('p'):
    print('The value of \'p\' attribute is:', body_div['p'])

tag object attributes

  • name - This attribute returns the name of the tag selected.
  • attrs - This attribute returns a dictionary of all the defined attributes of the HTML tag as keys.
  • contents - This attribute returns a list of contents enclosed within the HTML tag
  • string - This attribute which returns the text enclosed within the HTML tag. This returns None if there are multiple children
  • strings - This overcomes the limitation of string and returns a generator of all strings enclosed within the given tag

Following code showcases usage of the above discussed attributes:

body_tag = soup.body

print("Name of the tag:',

attrs = body_tag.attrs
print('The attributes defined for body tag are:', attrs)

print('The contents of \'body\' tag are:\n', body_tag.contents)

print('The string value enclosed in \'body\' tag is:', body_tag.string)

for s in body_tag.strings:

Searching the Tree

  • find() function takes a filter expression as argument and returns the first match found
  • findall() function returns a list of all the matching elements
search_elem = soup.find('a')

search_elems = soup.find_all("a", class_="sample")
  • select() function can be used to search the tree using CSS selectors
# Select `a` tag with class `sample`
a_tag_elems ='a.sample')




Often we need to protect the information stored in a file. This is achieved with encryption. There are many methods of supporting encryption and even if a file is encrypted it may be target to attacks. Thus it is not only important to encrypt data that you do not want others to se but also to make sure that the system on which the data is hosted is secure. This is especially important if we talk about big data having a potential large effect if it gets into the wrong hands.

To illustrate one type of encryption that is non trivial we have chosen to demonstrate how to encrypt a file with an ssh key. In case you have openssl installed on your system, this can be achieved as follows.

    #! /bin/sh

    # Step 1. Creating a file with data
    echo "Big Data is the future." > file.txt

    # Step 2. Create the pem
    openssl rsa -in ~/.ssh/id_rsa -pubout  > ~/.ssh/

    # Step 3. look at the pem file to illustrate how it looks like (optional)
    cat ~/.ssh/

    # Step 4. encrypt the file into secret.txt
    openssl rsautl -encrypt -pubin -inkey ~/.ssh/ -in file.txt -out secret.txt

    # Step 5. decrypt the file and print the contents to stdout
    openssl rsautl -decrypt -inkey ~/.ssh/id_rsa -in secret.txt

Most important here are Step 4 that encrypts the file and Step 5 that decrypts the file. Using the Python os module it is straight forward to implement this. However, we are providing in cloudmesh a convenient class that makes the use in python very simple.

from cloudmesh.common.ssh.encrypt import EncryptFile

e = EncryptFile('file.txt', 'secret.txt')

In our class we initialize it with the locations of the file that is to be encrypted and decrypted. To initiate that action just call the methods encrypt and decrypt.

Database Access





Test the shell script to replicate how this example works


Test the cloudmesh encryption class


What other encryption methods exist. Can you provide an example and contribute to the section?


What is the issue of encryption that make it challenging for Big Data


Given a test dataset with many files text files, how long will it take to encrypt and decrypt them on various machines. Write a benchmark that you test. Develop this benchmark as a group, test out the time it takes to execute it on a variety of platforms.

3 - Plotting with matplotlib

Gregor von Laszewski (

A brief overview of plotting with matplotlib along with examples is provided. First, matplotlib must be installed, which can be accomplished with pip install as follows:

$ pip install matplotlib

We will start by plotting a simple line graph using built in NumPy functions for sine and cosine. This first step is to import the proper libraries shown next.

import numpy as np
import matplotlib.pyplot as plt

Next, we will define the values for the x-axis, we do this with the linspace option in numpy. The first two parameters are the starting and ending points, these must be scalars. The third parameter is optional and defines the number of samples to be generated between the starting and ending points, this value must be an integer. Additional parameters for the linspace utility can be found here:

x = np.linspace(-np.pi, np.pi, 16)

Now we will use the sine and cosine functions in order to generate y values, for this we will use the values of x for the argument of both our sine and cosine functions i.e. $cos(x)$.

cos = np.cos(x)
sin = np.sin(x)

You can display the values of the three parameters we have defined by typing them in a python shell.

array([-3.14159265, -2.72271363, -2.30383461, -1.88495559, -1.46607657,
    -1.04719755, -0.62831853, -0.20943951, 0.20943951, 0.62831853,
    1.04719755, 1.46607657, 1.88495559, 2.30383461, 2.72271363,

Having defined x and y values we can generate a line plot and since we imported matplotlib.pyplot as plt we simply use plt.plot.


We can display the plot using which will pop up a figure displaying the plot defined.

Additionally, we can add the sine line to outline graph by entering the following.


Invoking now will show a figure with both sine and cosine lines displayed. Now that we have a figure generated it would be useful to label the x and y-axis and provide a title. This is done by the following three commands:

plt.xlabel("X - label (units)")
plt.ylabel("Y - label (units)")
plt.title("A clever Title for your Figure")

Along with axis labels and a title another useful figure feature may be a legend. In order to create a legend you must first designate a label for the line, this label will be what shows up in the legend. The label is defined in the initial plt.plot(x,y) instance, next is an example.

plt.plot(x,cos, label="cosine")

Then in order to display the legend, the following command is issued:

plt.legend(loc='upper right')

The location is specified by using upper or lower and left or right. Naturally, all these commands can be combined and put in a file with the .py extension and run from the command line.

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(-np.pi, np.pi, 16)
cos = np.cos(x)
sin = np.sin(x)
plt.plot(x,cos, label="cosine")
plt.plot(x,sin, label="sine")

plt.xlabel("X - label (units)")
plt.ylabel("Y - label (units)")
plt.title("A clever Title for your Figure")

plt.legend(loc='upper right')

:o2: link error

An example of a bar chart is preceded next using data from [T:fast-cars]{reference-type=“ref” reference=“T:fast-cars”}.

import matplotlib.pyplot as plt

x = [' Toyota Prius',
     'Tesla Roadster ',
     ' Bugatti Veyron',
     ' Honda Civic ',
     ' Lamborghini Aventador ']
horse_power = [120, 288, 1200, 158, 695]

x_pos = [i for i, _ in enumerate(x)], horse_power, color='green')
plt.xlabel("Car Model")
plt.ylabel("Horse Power (Hp)")
plt.title("Horse Power for Selected Cars")

plt.xticks(x_pos, x)

You can customize plots further by using, in python 3. If you provide the following command inside a python command shell you will see a list of available styles.


An example of using a predefined style is shown next.'seaborn')

Up to this point, we have only showcased how to display figures through python output, however web browsers are a popular way to display figures. One example is Bokeh, the following lines can be entered in a python shell and the figure is outputted to a browser.

from import show
from bokeh.plotting import figure

x_values = [1, 2, 3, 4, 5]
y_values = [6, 7, 2, 3, 6]

p = figure(), y=y_values)

4 - DocOpts

Gregor von Laszewski (

When we want to design command line arguments for python programs we have many options. However, as our approach is to create documentation first, docopts provides also a good approach for Python. The code for it is located at

It can be installed with

$ pip install docopt

Sample programs are located at

A sample program of using doc opts for our purposes looks as follows

"""Cloudmesh VM management

  cm-go vm start NAME [--cloud=CLOUD]
  cm-go vm stop NAME [--cloud=CLOUD]
  cm-go set --cloud=CLOUD
  cm-go -h | --help
  cm-go --version

  -h --help     Show this screen.
  --version     Show version.
  --cloud=CLOUD  The name of the cloud.
  --moored      Moored (anchored) mine.
  --drifting    Drifting mine.

  NAME     The name of the VM`
from docopt import docopt

if __name__ == '__main__':
    arguments = docopt(__doc__, version='1.0.0rc2')

Another good feature of using docopts is that we can use the same verbal description in other programming languages as showcased in this book.

5 - OpenCV

Gregor von Laszewski (

Learning Objectives

  • Provide some simple calculations so we can test cloud services.
  • Showcase some elementary OpenCV functions
  • Show an environmental image analysis application using Secchi disks

OpenCV (Open Source Computer Vision Library) is a library of thousands of algorithms for various applications in computer vision and machine learning. It has C++, C, Python, Java, and MATLAB interfaces and supports Windows, Linux, Android, and Mac OS. In this section, we will explain the basic features of this library, including the implementation of a simple example.


OpenCV has many functions for image and video processing. The pipeline starts with reading the images, low-level operations on pixel values, preprocessing e.g. denoising, and then multiple steps of higher-level operations which vary depending on the application. OpenCV covers the whole pipeline, especially providing a large set of library functions for high-level operations. A simpler library for image processing in Python is Scipy’s multi-dimensional image processing package (scipy.ndimage).


OpenCV for Python can be installed on Linux in multiple ways, namely PyPI(Python Package Index), Linux package manager (apt-get for Ubuntu), Conda package manager, and also building from source. You are recommended to use PyPI. Here’s the command that you need to run:

$ pip install opencv-python

This was tested on Ubuntu 16.04 with a fresh Python 3.6 virtual environment. In order to test, import the module in Python command line:

import cv2

If it does not raise an error, it is installed correctly. Otherwise, try to solve the error.

For installation on Windows, see:

Note that building from source can take a long time and may not be feasible for deploying to limited platforms such as Raspberry Pi.

A Simple Example

In this example, an image is loaded. A simple processing is performed, and the result is written to a new image.

Loading an image

%matplotlib inline
import cv2

img = cv2.imread('images/opencv/4.2.01.tiff')

The image was downloaded from USC standard database:

Displaying the image

The image is saved in a numpy array. Each pixel is represented with 3 values (R,G,B). This provides you with access to manipulate the image at the level of single pixels. You can display the image using imshow function as well as Matplotlib’s imshow function.

You can display the image using imshow function:


or you can use Matplotlib. If you have not installed Matplotlib before, install it using:

$ pip install matplotlib

Now you can use:

import matplotlib.pyplot as plt

which results in Figure 1

Figure 1: Image display

Figure 1: Image display

Scaling and Rotation

Scaling (resizing) the image relative to different axis

res = cv2.resize(img,

which results in Figure 2

Figure 2: Scaling and rotation

Figure 2: Scaling and rotation

Rotation of the image for an angle of t

rows,cols,_ = img.shape
t = 45
M = cv2.getRotationMatrix2D((cols/2,rows/2),t,1)
dst = cv2.warpAffine(img,M,(cols,rows))


which results in Figure 3

Figure 3: image

Figure 3: image


img2 = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
plt.imshow(img2, cmap='gray')

which results in +Figure 4

Figure 4: Gray sacling

Figure 4: Gray sacling

Image Thresholding

ret,thresh =    cv2.threshold(img2,127,255,cv2.THRESH_BINARY)
plt.subplot(1,2,1), plt.imshow(img2, cmap='gray')
plt.subplot(1,2,2), plt.imshow(thresh, cmap='gray')

which results in Figure 5

Figure 5: Image Thresholding

Figure 5: Image Thresholding

Edge Detection

Edge detection using Canny edge detection algorithm

edges = cv2.Canny(img2,100,200)

plt.subplot(121),plt.imshow(img2,cmap = 'gray')
plt.subplot(122),plt.imshow(edges,cmap = 'gray')

which results in Figure 6

Figure 6: Edge detection

Figure 6: Edge detection

Additional Features

OpenCV has implementations of many machine learning techniques such as KMeans and Support Vector Machines can be put into use with only a few lines of code. It also has functions especially for video analysis, feature detection, object recognition, and many more. You can find out more about them on their website

OpenCV( was initially developed for C++ and still has a focus on that language, but it is still one of the most valuable image processing libraries in Python.

6 - Secchi Disk

Gregor von Laszewski (

We are developing an autonomous robot boat that you can be part of developing within this class. The robot bot is measuring turbidity or water clarity. Traditionally this has been done with a Secchi disk. The use of the Secchi disk is as follows:

  1. Lower the Secchi disk into the water.
  2. Measure the point when you can no longer see it
  3. Record the depth at various levels and plot in a geographical 3D map

One of the things we can do is take a video of the measurement instead of a human recording them. Then we can analyze the video automatically to see how deep a disk was lowered. This is a classical image analysis program. You are encouraged to identify algorithms that can identify the depth. The simplest seems to be to do a histogram at a variety of depth steps and measure when the histogram no longer changes significantly. The depth of that image will be the measurement we look for.

Thus if we analyze the images we need to look at the image and identify the numbers on the measuring tape, as well as the visibility of the disk.

To showcase how such a disk looks like we refer to the image showcasing different Secchi disks. For our purpose the black-white contrast Secchi disk works well. See Figure 1

Figure 1: Secchi disk types. A marine style on the left and the freshwater version on the right wikipedia.

Figure 1: Secchi disk types. A marine style on the left and the freshwater version on the right wikipedia.

More information about Secchi Disk can be found at:

We have included next a couple of examples while using some obviously useful OpenCV methods. Surprisingly, the use of the edge detection that comes to mind first to identify if we still can see the disk, seems too complicated to use for analysis. We at this time believe the histogram will be sufficient.

Please inspect our examples.

Setup for OSX

First lest setup the OpenCV environment for OSX. Naturally, you will have to update the versions based on your versions of python. When we tried the install of OpenCV on macOS, the setup was slightly more complex than other packages. This may have changed by now and if you have improved instructions, please let us know. However, we do not want to install it via Anaconda out of the obvious reason that anaconda installs too many other things.

import os, sys
from os.path import expanduser
home = expanduser("~")
sys.path.append(home + '/.pyenv/versions/OPENCV/lib/python3.6/site-packages/')
import cv2
! pip install numpy > tmp.log
! pip install matplotlib >> tmp.log
%matplotlib inline

Step 1: Record the video

Record the video on the robot

We have done this for you and will provide you with images and videos if you are interested in analyzing them. See Figure 2

Step 2: Analyse the images from the Video

For now, we just selected 4 images from the video

import cv2
import matplotlib.pyplot as plt

img1 = cv2.imread('secchi/secchi1.png')
img2 = cv2.imread('secchi/secchi2.png')
img3 = cv2.imread('secchi/secchi3.png')
img4 = cv2.imread('secchi/secchi4.png')

figures = []
fig = plt.figure(figsize=(18, 16))
for i in range(1,13):
count = 0
for img in [img1,img2,img3,img4]:

  color = ('b','g','r')
  for i,col in enumerate(color):
      histr = cv2.calcHist([img],[i],None,[256],[0,256])
      figures[count+1].plot(histr,color = col)


  count += 3

print("First column = image of Secchi disk")
print("Second column = histogram of colors in image")
print("Third column = histogram of all values")

Figure 2: Histogram

Figure 2: Histogram

Image Thresholding

See Figure 3, Figure 4, Figure 5, Figure 6

def threshold(img):
  ret,thresh = cv2.threshold(img,150,255,cv2.THRESH_BINARY)
  plt.subplot(1,2,1), plt.imshow(img, cmap='gray')
  plt.subplot(1,2,2), plt.imshow(thresh, cmap='gray')


Figure 3: Threshold 1, threshold(img1)

Figure 3: Threshold 1, threshold(img1)

Figure 4: Threshold 2, threshold(img2)

Figure 4: Threshold 2, threshold(img2)

Figure 5: Threshold 3, threshold(img3)

Figure 5: Threshold 3, threshold(img3)

Figure 6: Threshold 4, threshold(img4)

Figure 6: Threshold 4, threshold(img4)

Edge Detection

See Figure 7, Figure 8, Figure 9, Figure 10, Figure 11. Edge detection using Canny edge detection algorithm

def find_edge(img):
  edges = cv2.Canny(img,50,200)
  plt.subplot(121),plt.imshow(img,cmap = 'gray')
  plt.subplot(122),plt.imshow(edges,cmap = 'gray')


Figure 7: Edge Detection 1, find_edge(img1)

Figure 7: Edge Detection 1, find_edge(img1)

Figure 8: Edge Detection 2, find_edge(img2)

Figure 8: Edge Detection 2, find_edge(img2)

Figure 9: Edge Detection 3, find_edge(img3)

Figure 9: Edge Detection 3, find_edge(img3)

Figure 10: Edge Detection 4, , find_edge(img4)

Figure 10: Edge Detection 4, , find_edge(img4)

Black and white

bw1 = cv2.cvtColor(img1, cv2.COLOR_BGR2GRAY)
plt.imshow(bw1, cmap='gray')

Figure 11: Back White conversion

Figure 11: Back White conversion