Friday, June 1, 2018

Preparing IRIS data files for analysis by R, Take 1

Preparing IRIS data files for analysis by R: converting 2-column ascii to CSV files.  Rdseed and the IRIS DMS produce datafiles in  2-column ASCII format with *.ascii filename suffixes.  These files start with a one=-line header, then a UTC time and a datum in the form of instrument counts

TIMESERIES CU_BCIP_00_LHZ_M, 180001 samples,  1.0 sps, 2016-11-13T10:00:00.000000, TSPAIR, FLOAT, Counts
2016-11-13T10:00:00.000000             3130
2016-11-13T10:00:01.000000             1921


This command filters all the ascii files for a minimum size, to avoid data-window fragments

ls -l *ascii | awk ' $5>NNNN {print $9}' >! infiles

Prepare a shell command file for using awk to strip out the data column and write a CSV file.  First construct the CSV filenames from the key attributes in the IRIS filename: Station.Location.Channel.csv

# construct CSV filenames from long IRIS filenames

sed 's/\./ /g' infiles >! blob
awk '{print  $1 "."  $2 "."  $3 "."  $4 "."  $5 "."  $6 "."  $7 "."  $8 "."  $9 "."  $10 "."  $11 " "  $7 "."  $8 "."  $9 ".csv" }' blob  >! infiles1 

-> head infiles1
2016.318.10.00.00.0000_CU.ANWB.00.LHZ.M.ascii ANWB.00.LHZ.csv
2016.318.10.00.00.0000_CU.BBGH.00.LHZ.M.ascii BBGH.00.LHZ.csv
2016.318.10.00.00.0000_CU.BCIP.00.LHZ.M.ascii BCIP.00.LHZ.csv
2016.318.10.00.00.0000_CU.GRGR.00.LHZ.M.ascii GRGR.00.LHZ.csv
2016.318.10.00.00.0000_CU.GRTK.00.LHZ.M.ascii GRTK.00.LHZ.csv
2016.318.10.00.00.0000_CU.GTBY.00.LHZ.M.ascii GTBY.00.LHZ.csv

Write the shell script that runs awk once per filename, for LHZ files. 

# because I cant figure out how to add a quote to the sed output, the script 
# for generating CSV data files from LHZ ascii data requires a file edit
awk ' {print "awk NR>1 {print NR-1 \", \" $2} " $1 " >! " $2 "; wc "$2 } ' infiles1 >! blob
te blob &
#save as saveblob
source saveblob

note that the NR>1 skips the header line.  The time ordinate is indexed to the record number

-> head blob
awk NR>1 {print NR-1 ,  $2} 2016.318.10.00.00.0000_CU.ANWB.00.LHZ.M.ascii >! ANWB.00.LHZ.csv; wc ANWB.00.LHZ.csv
awk NR>1 {print NR-1 ,  $2} 2016.318.10.00.00.0000_CU.BBGH.00.LHZ.M.ascii >! BBGH.00.LHZ.csv; wc BBGH.00.LHZ.csv
awk NR>1 {print NR-1 ,  $2} 2016.318.10.00.00.0000_CU.BCIP.00.LHZ.M.ascii >! BCIP.00.LHZ.csv; wc BCIP.00.LHZ.csv

note that we edited blob --> saveblob by adding some quote marks around the awk command. 

-> head saveblob
awk 'NR>1 {print NR-1", " $2}' 2016.318.10.00.00.0000_CU.ANWB.00.LHZ.M.ascii >! ANWB.00.LHZ.csv; wc ANWB.00.LHZ.csv
awk 'NR>1 {print NR-1", " $2}' 2016.318.10.00.00.0000_CU.BBGH.00.LHZ.M.ascii >! BBGH.00.LHZ.csv; wc BBGH.00.LHZ.csv
awk 'NR>1 {print NR-1", " $2}' 2016.318.10.00.00.0000_CU.BCIP.00.LHZ.M.ascii >! BCIP.00.LHZ.csv; wc BCIP.00.LHZ.csv

Write the shell script that runs awk once per filename, for VHZ files. Note that the time ordinate is converted to ksec from seconds.  Time-step increments of 0.01 ksec = 10 seconds.

ls -l *ascii | awk ' $5>3700000 {print $9}' >! infiles
# because I cant figure out how to add a quote to the sed output, the script 
# for generating CSV data files from VHZ ascii data requires a file edit
awk ' {print "awk NR>1 {print (NR-1)/100. \", \" $2} " $1 " >! " $2 "; wc "$2 } ' infiles1 >! blob
te blob &
#save as saveblob
source saveblob

No comments:

 
Link