Tuesday, September 5, 2017

Machine Learning Python: Wavelet Spectrum examples


Another test on my MacBook Pro:  The wavelet transform example in Megies et al (2011).

You may need to install the Machine Learning Python package.  The command

conda install mlpy

did not work for me, so I search the internet for a method to install mlpy

The mlpy package requires the GNU Scientific Library (GSL), which was not installed on my laptop, and may not be installed on yours either.  This package can be found on the web.  My method was taken from the book "Practical Data Analysis Cookbook" by Tomasz Drabas.  First, direct your web browser to

http://gnu.mirro.iweb.com/gsl

and download the latest version of GSL, typically called gsl-latest.tar-gz. Currently the latest is V2.4. Move the downloaded file into your Applications directory (on a Mac) -- I dont know the equivalent place on a Windows machine.  Then unpack this gzipped tarfile by double-clicking on it ot else use the command

gzip -c gsl-latest.tar-gz | tar -xv

The unpacked file will produce a folder with the GSL software in it.  In your terminal-app go to this folder and prepare and install GSL

./configure 

make

make install

Hopefully this compilation and installation proceeds without error.  Now download a compressed tarfile with the lastest version of mlpy from this website into your anaconda dirctory, which on my laptop is /Users/jjpark/anaconda.  The current latest version is V3.5.

http://sourceforge.net/projects/mlpy/files

Double-click the tarfile to extract its contents into a subdirectory, or else

gzip -c mlpy-.tar.gz | tar -xv

or, if already uncompressed,

tar -xvf mlpy-.tar

This action produces a subdirectory mlpy-.  cd to this directory in your terminal app and type

python setup.py install

and mlpy will be installed if all goes well.  Python modules that one obtains from sourceforge and other websites typically include a setup.py scriptfile to instruct python how to set the module up for later use.

I discovered coding bugs in the mlpy module while trying to execute the coding examples.  The continuous wavelet transform relies on a script file ./mlpy-3.5.0/mlpy/wavelet/continuous.py that had a mistake in its Line 162:

161     J = floor(dj**-1 * log2((N * dt) / s0))
162     s = empty(J + 1)
   
The python function "floor" from the numpy module returns a floating-point number, but the python function "empty" expects an integer to define how long an array to generate. This combination of mismatched variable types leads to an error message:

'numpy.float64' object cannot be interpreted as an integer

Fixing the code to read, e.g.,

     J = floor(dj**-1 * log2((N * dt) / s0))
     JJ = int(J)
     s = empty(JJ + 1)

*should* solve the problem, but it doesn't.  Why? Because the original continuous.py is loaded into the python libraries during the install-operation, and changing the code after installation doesnt change the installed code.  You must make changes to the script continuous.py in the mlpy-3.5.0 subdirectory, and then re-install the module by executing

python setup.py install

while the terminal app is "in" the ./mlpy-3.5.0 directory.  Note that python will tell you that the execution error resides in a copy of continuous.py that lies deep within a subdirectory tree.  This is NOT the copy that needs to be debugged.  You must debug the copy of continuous.py that sits close within the subdirectory anaconda/mlpy-3.5.0/mlpy/wavelet, and then re-install the mlpy module.

Here is a simple example of a wavelet transform of a randomly-generated array of numbers

import numpy as np
import matplotlib.pyplot as plt
import mlpy.wavelet as wave
x = np.random.sample(512)
scales = wave.autoscales(N=x.shape[0], dt=1, dj=0.025, wf='morlet', p=2)
X = wave.cwt(x=x, dt=1, scales=scales, wf='dog', p=2)
fig = plt.figure(1)
ax1 = plt.subplot(2,1,1)
p1 = ax1.plot(x)
ax1.autoscale_view(tight=True)
ax2 = plt.subplot(2,1,2)
p2 = ax2.imshow(np.abs(X), interpolation='nearest')
plt.show()


Here is the wavelet transform of the data series we downloaded in the first ObsPy exercise

import numpy as np, matplotlib.pyplot as plt
import mlpy 
import mlpy.wavelet as wave
from obspy.core import UTCDateTime
from obspy.clients.arclink import Client
from obspy.signal.filter import lowpass
from obspy.signal.invsim import corn_freq_2_paz, simulate_seismometer
client = Client("webdc.eu", port=18001)
t = UTCDateTime("2009-08-24 00:20:03")
st = client.get_waveforms("BW", "RJOB", "", "EHZ", t, t+30)
omega0 = 2.0
trace = st[0].data
npts = len(trace)
dt = st[0].stats.delta
dj = 0.05
wf = 'morlet'
p=omega0
scl = mlpy.wavelet.autoscales(npts, dt, dj , wf, p) 
spec = mlpy.wavelet.cwt(trace, dt, scl , wf='morlet', p=omega0) 
freq = (omega0 + np.sqrt(2.0 + omega0**2)) / (4*np.pi * scl[1:])
t = np.arange(st[0].stats.npts) / st[0].stats.sampling_rate 
plt.imshow(np.abs(spec), extent=(t[0], t[-1], freq[-1], freq[0])) 
plt.xlabel('Time [s]') 
plt.ylabel('Frequency [Hz]') 
plt.show()

Try these scripts to be sure that your installation of Python and ObsPy is working properly.








Saturday, September 2, 2017

Methods for using Python and ObsPy: The first examples

Methods for using Python and ObsPy: The first examples

You will need to open a terminal window.  I personally use the XQuartz App in MacOS, which allows me X-window graphics for my own applications, but a standard Terminal window should be OK.  Python handles its own graphics.

start with switching to the BASH shell, then invoking Python

bash
python

you get a new prompt with Python, meaning that you have entered the world of Python syntax.  Fear not.  To conclude your Python session, you can CTRL-D or enter the command "exit()" to leave.

It is also possible to invoke the fancier IPython package in a similar way:

bash
ipython

try some simple commands that are easy to verify

q=2
q==2
q==3
q
q+1
b=2*q
print(b)

Then some syntax tricks: multiple commands on a line separated by semicolons

a=1;b=2;c=3
a
b
c==3

command split into two lines with a backslash, but only at an operation or equal sign

xval=3.14
xval
yval=\
3.14
yval
zval= yval +\
xval
zval

You must import modules in order to use them

import numpy

You can import modules and abbreviate their names

import numpy as np

Python syntax requires that you attach any function to the library it comes from.  A simple command like

d=sqrt(2)

will fail.  The square root function sqrt is in numpy, and to use it you need to specify where the function comes from

d=numpy.sqrt(2)
d

This is why abbreviations of Python modules are useful

ee=np.sqrt(3)
ee

dont want 16-digit precision?  Use the round command to limit the answer to 4 or 6 digits:

round(ee,4)

round(numpy.sqrt(2),6)

You can import all the functions in numpy if you like, and never need to use the above syntax. This could cause confusion if two functions in two separate modules have the same name.

import single functions:

from numpy import sqrt
sqrt(6)

to import all numpy functions:

from numpy import *

In IPython, it is possible to clear the memory of the python package, which erases the values of all the variables, with the command

%reset

In regular Python the %reset command wont work.

Unlike high-level languages like Fortran and C, the variables in Python are assigned their variable-type by context. The commands

aval = 1.0
bval = "BigNose"

will create aval as a floating-point variable, and bval as a string variable

L= [1,2,3]

creates an integer list of length 3. Try

L.reverse()
L
L.pop(2)
L

A list can have dissimilar objects, e.g.,

LL = [1,"D",3]

A tuple is a list that behaves more like an array in Fortran or C, with objects that are all the same.

M=(1,2,3)

Integer tuples are used to define arrays of numbers

amat = numpy.zeros((3,5))

sets amat to be a 3 by 5 matrix of zeros, with 3 rows and 5 columns

Python does loops differently from what you are accustomed to.  The loops are declared with the word "for" and a colon. The loops are not closed by a statement, but rather by a change in indentation.  Try

b,c,d=3,4,5
b
c
d

for a in np.arange(c,d,0.1):
   x = (-b + np.sqrt(b**2 + 4*c*a))/(2.0*a)
   print(a,b,c,x)

note that the function arange(x1,x2,x3) steps in equal increments of x3 from x1 to x2

The loop executes differently if the last line is not indented.  The indentation defines the loop.  Nested loops are indicated by deeper indentations. To print the results at the end of the loop rather than within it, add a blank line to close the loop.

for a in np.arange(c,d,0.1):
   x = (-b + np.sqrt(b**2 + 4*c*a))/(2.0*a)

print(a,b,c,x)

Loops and branching commands "while" and "if" and "else" follow similar syntax.  Python requires a colon to tell it where the loop-definition ends and the loop-operations begin

Vector operations can be created in place of loops.  The code

b,c,d=3,4,5
a = np.arange(c,d,0.1)
x = (-b + np.sqrt(b**2 + 4*c*a))/(2.0*a)
print(a)
print(x)
print(a,x)

generates a vector a to hold the output of the arange function.  Placing the vector a into the equation line that follows vectorizes the equation, so that the variable x is also a vector.  The print commands will print vectors as well as scalar variables.

you will want to write scriptfiles with common command sequences, e.g., importing the usual libraries.  A scriptfile in Python conventionally uses the suffix ".py"

A scriptfile program.py can be executed by

python program.py

but the Python package exits after executing the script.  The interactive Python version IPython can execute scriptfiles within the program using the %run command and exit within the Python environment.  You can also generate scriptfiles within IPython with the %edit command. Try

echo import numpy as np >! startup.py
echo import matplotlib.pyplot as plt >> startup.py
ipython
%run startup.py

These lines of code generate a 2-line startup script, start IPython, and execute the script.  At conclusion you will still be operating in the IPython environment. If you then execute

%edit startup.py

the IPython terminal becomes the window for the Unix vi editor.  Honestly, you might prefer to create your scriptfiles in a fancier editor, because vi is old and klunky.

In scriptfiles you can add comments, which is VERY useful.  Lines that start with a hashtag # are comments.  If a line contains """ - three doublequotes, then it declares the next group of lines to be comments, until another line with """ is found. NOTE: the MacOS TextEdit app does not create double-quotes that Python can read properly.  I use an X-window editor named "nedit" that can be downloaded from the fink archive - but you may wish to develop your own kluge for this problem.  The nedit editor is almost as old as the vi editor.

Beyreuther et al (2010) has an ObsPy coding example that reads seismic data within one time interval from one component at one station, accessed via the webdc data-aggregator.  The code in the paper must be modified to work with the 2017 implementation of ObsPy, because the syntax and function names have been updated over time.  Here is the example that generates Figure 1, and that works for my MacBook Pro.

# Figure 1 from Beyreuther etal (2010)
# commands are updated for 2017 versions of Python and ObsPy
from obspy.core import UTCDateTime
from obspy.clients.arclink import Client
# old command from Beyreuther et al (2010), function libraries and names have changed
# from obspy.signal import cornFreq2Paz, seisSim, lowpass
from obspy.signal.filter import lowpass
from obspy.signal.invsim import corn_freq_2_paz, simulate_seismometer
import numpy as np, matplotlib.pyplot as plt
#1 Retrieve Data via Arclink -- note that the "host" variable is suppressed
client = Client("webdc.eu", port=18001)
# note that ObsPy will search data archives for time intervals based on UTC time
t = UTCDateTime("2009-08-24 00:20:03")
# The data will be plotted raw and bandpassed to simulate a 1-Hz seismic sensor
one_hertz = corn_freq_2_paz(1.0) # 1Hz instrument
st = client.get_waveforms("BW", "RJOB", "", "EHZ", t, t+30)
#  PAZ means "pole and zeroes" which is one way to define a digital filter
#  zeroes in filter response arise from moving-average filters
#  poles in the filter response arise from auto-regressive filters
paz = client.get_paz("BW", "RJOB", "", "EHZ", t, t+30)
#2 Correct for frequency response of the instrument
# note that st is the stream object that holds the data, but its information is
# partitioned into data and metadata
# the data are expressed in instrument counts
# the metadata are the station parameters, such as location, instrument type and
# instrument response
res = simulate_seismometer(st[0].data.astype("float32"), st[0].stats.sampling_rate, paz, one_hertz)
# Correct for overall sensitivity, nm/s
# this syntax *= means multiply res by the number that follows
# the operator += would add a number to res
res *= 1e9 / paz["sensitivity"]
#3 Apply lowpass at 10Hz
res = lowpass(res, 10, df=st[0].stats.sampling_rate, corners=4)
#4 Plot the seismograms
# sec is the time ordinate, a vector with length equal to the data
sec = np.arange(len(res))/st[0].stats.sampling_rate
plt.subplot(211)
plt.plot(sec,st[0].data, "k")
plt.title("%s %s" % ("RJOB",t))
plt.ylabel("STS-2")
plt.subplot(212)
plt.plot(sec,res, "k")
plt.xlabel("Time [s]")
plt.ylabel("1Hz CornerFrequency")
plt.show()

Give this script a try, maybe by feeding it line by line into python or ipython on your computer.  See if you get the comparison plot that matches Figure 1 of Beyreuther et al (2010)








Friday, September 1, 2017

Useful Websites for ObsPy reference

Useful Websites for ObsPy reference — this is a living document that will be updated as new interesting and useful websites emerge during the semester

The homebase Wiki for ObsPy


There are books for new Python users.  I am using 

A Student’s Guide to Python for Physical Modeling Paperback – September 22, 2015
by Jesse M. Kinder, Philip Nelson  — the paperback version is roughly $20, and it might be on sale at the Yale Bookstore — there will be other Python texts at the Bookstore if Kinder/Nelson is not carried. 

Python has its own homebase website


And there is a cheatsheet for Python commands 


The recommended installation for ObsPy is achieved with the Anaconda “Data Science Platform”


Anaconda advertises itself as a platform for Python, but it also can be used with the R-language and supports the Jupyter notebook environment.  R is an open-source software package used for statistics and typical data analysis.  Jupyter notebooks are a standardized way to set of programming scripts for Python, R and other command-interpreter software packages, in ways that allow you to document your calculations more thoroughly.

Many of the open-source software packages, such as Python and R, can be used in overlapping contexts.  Nearly all of them can be forced to replicate the clear, but laborious, operation flow of an old-style FORTRAN program if you really really want to write all the code.  However, the motivation for using a high-level package is to reduce a complicated, but standardized, workflow into one or a few lines of code. Because the coding is highly compressed, knowing the syntax becomes crucial.

Python is implemented with a swarm of open-source computational modules that sit on the Web, waiting for your script to pull them down and use them.  The most important modules are

SciPy (pronounced “Sigh Pie”) is a Python-based ecosystem of open-source software for mathematics, science, and engineering


NumPy is the fundamental package for scientific computing with Python.


Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats


Installing Python within Anaconda gives you access to all these packages.  The websites are useful for reference, such as tutorials and example scripts.

The sources for seismic data can be found in several places.  The original ObsPy paper refers to the general WebDC address for seismic data archives from many sources, including the USA-based IRIS data archive.  Maintained by the Deutsches GeoForschungsZentrum (GFZ) at Potsdam, Germany.


The “central repository” of ObsPy applications, listed in Beyreuther et al (2010) is a dead link.  Alternatively, the ObsPy website points to software packages for ObsPy developers at a “conda-smithy repository for obspy”


Any “conda-smithy” websites are beyond the scope of this course!

The paper Megies et al (2011) points to a number of Python and ObsPy websites for reference. 

The Python Standard Library of functions, documented.  If you need a Python function  that handles or replicates a computational process from your past life, look here. 


The IPython package provides a rich architecture for interactive computing using GUI (graphical user interfaces)


Note that the IPython developers split off part of their project to a new “language-agnostic” scripting tool called Jupyter


From the website: Jupyter and the future of IPython
IPython is a growing project, with increasingly language-agnostic components. IPython 3.x was the last monolithic release of IPython, containing the notebook server, qtconsole, etc. As of IPython 4.0, the language-agnostic parts of the project: the notebook format, message protocol, qtconsole, notebook web application, etc. have moved to new projects under the name Jupyter. IPython itself is focused on interactive Python, part of which is providing a Python kernel for Jupyter.

The advertised ease of documentation within Jupyter is a strong motivation for learning it.  We hope to have chances to do so once we are comfortable with Python and ObsPy

The Python Package Index is a repository of software for the Python programming language. There are currently 115962 packages here. 
To contact the PyPI admins, please use the Support or Bug reports links.


Hopefully we wont need to plunge into these user-contributed libraries, aside from those that are in common use by ObsPy practitioners.  A wavelet transform is demonstrated by Megies et al (2011), with software from Machine Learning Python Library


There are some examples of ObsPy processing at 


but most files date from May 2013, so caveat emptor.

We will be working through various examples of ObsPy packages, such as the phase-picking software ObsPyck, as the weeks scroll by.



















 
Link