Geotechnical News •   March 2018
          
        
        
          
            
              39
            
          
        
        
          
            GEOTECHNICAL INSTRUMENTATION NEWS
          
        
        
          purchased programs, specialty-written
        
        
          code, or open-source solutions.
        
        
          ETL processes
        
        
          
            extract
          
        
        
          the data from
        
        
          the datalogger,
        
        
          
            transform
          
        
        
          it into a for-
        
        
          mat that the storage system can input,
        
        
          then
        
        
          
            load
          
        
        
          the data into the specific
        
        
          file (database, text file) for long term
        
        
          storage, post-processing, or graph-
        
        
          ing. Depending on the ETL needs,
        
        
          commercially available programs that
        
        
          perform these tasks can add $1,000
        
        
          to over $10,000 to the budget of a
        
        
          job. Configuration and programming
        
        
          of these programs requires labor, in
        
        
          addition to the cost. The type of ETL
        
        
          process will influence power-usage
        
        
          requirements, datalogger design, data
        
        
          storage design, and telemetry design.
        
        
          Manual data downloads / uploads, an
        
        
          admittedly low-tech variety of ETL,
        
        
          are outside the scope of this discus-
        
        
          sion.
        
        
          The following questions should be
        
        
          answered when designing an ETL
        
        
          system:
        
        
          1. How much data will the telemetry
        
        
          system handle?
        
        
          a. Are there increased costs for
        
        
          additional data transfer?
        
        
          2. Is there a need for near-real time
        
        
          data from the system?
        
        
          a. If not, what frequency of
        
        
          readings and downloads are
        
        
          required?
        
        
          3. What are the power requirements
        
        
          from the datalogger components?
        
        
          a. Will more frequent downloads
        
        
          deplete the battery?
        
        
          b. Will more frequent readings or
        
        
          continuous readings deplete the
        
        
          battery?
        
        
          4. How should the data “look” once it
        
        
          has been transformed?
        
        
          a. What format does the data need
        
        
          to be in?
        
        
          5. How is the data loaded into
        
        
          whatever storage system that is
        
        
          established?
        
        
          a. Is data appended to a text file?
        
        
          b. Is data loaded into an existing
        
        
          or new database?
        
        
          6. What are the server storage space /
        
        
          processing power needs?
        
        
          a. Are more frequent readings
        
        
          going to fill the storage or require
        
        
          more processing time?
        
        
          7. How frequently is the data being
        
        
          examined?
        
        
          8. Are alarms established based on
        
        
          this data?
        
        
          Answering these questions will
        
        
          prompt iterative reviews of the data
        
        
          transfer design. For example, a need
        
        
          for additional download frequency
        
        
          may change the plan for data telem-
        
        
          etry, or a need for more frequent read-
        
        
          ings may prompt the installation of
        
        
          additional solar panels to meet power
        
        
          requirements.
        
        
          
            Common methods of ETL
          
        
        
          
            
              Commercially available programs
            
          
        
        
          ETL is most commonly setup with
        
        
          commercially available programs,
        
        
          usually written by a vendor. Some
        
        
          examples include; LoggerNet from
        
        
          Campbell Scientific, Cloud and
        
        
          Enterprise from Sensemetrics, and
        
        
          DEX from dataTaker. These programs
        
        
          typically take care of the Extract and
        
        
          Transform part of the ETL process.
        
        
          They can be scheduled to commu-
        
        
          nicate with the datalogger,
        
        
          
            extract
          
        
        
          the data, and
        
        
          
            transform
          
        
        
          the data to a
        
        
          format of your choice. Most of these
        
        
          programs can
        
        
          
            load
          
        
        
          the data into some
        
        
          storage format, whether it is a text
        
        
          file or proprietary database. These
        
        
          programs cannot load data into an
        
        
          internally developed database, as
        
        
          they would not “know” the database
        
        
          setup. These programs need to run on
        
        
          a computer, virtual machine, or cloud
        
        
          service.
        
        
          Some advantages of using commer-
        
        
          cially available programs are:
        
        
          1. They typically take care of the
        
        
          connections to the datalogger with
        
        
          relative ease.
        
        
          2. They can handle difficult com-
        
        
          munications settings and net-
        
        
          works, including configurations to
        
        
          download data at a specific time or
        
        
          repeat downloads if the downloads
        
        
          were unsuccessful.
        
        
          3. Typically, they have some (but not
        
        
          full) functionality to control the
        
        
          format of the data.
        
        
          Some disadvantages of using commer-
        
        
          cially available programs are:
        
        
          1. You will need to configure your
        
        
          database uploading function to
        
        
          process the data as formatted by
        
        
          the program.
        
        
          2. Only the manufacturer provides
        
        
          updates and support, as needed.
        
        
          
            
              Purpose-written code
            
          
        
        
          Code written specifically for the
        
        
          application is another commonly
        
        
          available ETL process. This purpose-
        
        
          written code can be more agile and
        
        
          flexible than a vendor program, and
        
        
          can automate any or all the required
        
        
          ETL processes. For example, when
        
        
          using a commercially available pro-
        
        
          gram like Loggernet to connect to the
        
        
          datalogger and save the data into a text
        
        
          file, a piece of code could be writ-
        
        
          ten to upload this data into a specific
        
        
          database. In this example, Loggernet
        
        
          would be performing the Extraction
        
        
          and Transforming parts of ETL, and
        
        
          the piece of code would be performing
        
        
          the Loading part. With more control of
        
        
          the process, the data can be saved in
        
        
          the format best suited for the project
        
        
          or application.
        
        
          Advantages of using purpose-written
        
        
          code are:
        
        
          1. Control of the format of the data.
        
        
          2. Capability to automate the neces-
        
        
          sary data transfer steps, including
        
        
          loading into a database.
        
        
          3. Not paying for functions of a sys-
        
        
          tem that aren’t used.
        
        
          4. Not beholden to any costly forced
        
        
          updates or lack of support for an
        
        
          older product.
        
        
          Disadvantages of using purpose-writ-
        
        
          ten code are:
        
        
          1. More time possibly spent in man-
        
        
          hours to develop the code than the
        
        
          cost of a commercially available
        
        
          project.