Transition to AWS for Website Management

12 May 2024

Overview

As of 2023, the HPWREN website (http://hpwren.ucsd.edu) was the locally maintained culmination of many years' efforts by a very small support staff. This article presents an overview of its transition to a more automated AWS based cloud service (which today is in its initial prototype form and remains a work in progress).

Key take-aways from this article include:

Legacy HPWREN image workflow and website system challenges

Keeping the image workflow scripts coordinated and operational
Expensive storage infrastructure (capital and operational costs)
Limited website redundancy
Supporting a mix of aging servers and VM infrastructure

Limited operational improvements over time to the legacy system with attempts to

Keep servers operational and current
Provide additional and more scalable storage
Maintain availability of decades of archival image data and videos
Automate the process of adding or modifying active cameras for image acquisition

Anticipated AWS longer term benefits

Adds significant automation
Deploys Git systems for data/development management and revision control
Supports automated CI/CD update mechanisms
Provides and maintains a Content Delivery Network
Separates static and dynamic web pages
Hosts on widely distributed and Highly Available storage systems
Allows for automated near term and long term tiered archival system
Reduces staffing support requirements

Introduction

This article describes the recent (and ongoing) transition of the HPWREN camera image and website data management infrastructure from traditional in house servers and processes to AWS cloud based services. It provides an overview of previous legacy data flows, the recent (Phase I) transition to AWS, and anticipated future (Phase II) AWS enhancements. In short, it summarizes how we got here, where we are currently, and where we are going. See Appendices for diagrams illustrating legacy as well as emerging AWS based workflows.

Background

In support of its Public Safety and Research and Education missions, HPWREN currently collects camera images from many hundreds of cameras throughout Southern California at a rate of (at least) 1 image per minute, and has done so since around 2001 starting with a handful of cameras in San Diego. We also build, publish and store videos from the newest images every three hours. Access to this imagery is available through https://www.hpwren.ucsd.edu/cameras

For over two decades, images have been fetched, processed and stored by HPWREN in-house servers and provided to the public using our website. The system was established and maintained by a small handful of people, and remains today similarly supported by a sparse staff. Given the growth to many hundreds of cameras across many remote Southern California sites, staff support is stretched to its limit. Additionally, the increases in cameras over time have led to the need for multiple independent processes to collect and process camera images and a multitude of servers, with ever increasing storage expansion requirements, for their housing, publication and archival. Until recently, the effort to integrate a new camera into our workflow and [re]build the multitude of website files required the manual editing of dozens of files and scripts across many servers. In 2023 we built a set of comma separated text files and control scripts to help automate the creation of many of those files.

In short, at the start of our third decade of operations, we were maintaining dozens of legacy servers (and VMs) with the prospect of needing at least a half dozen or more for further required storage expansion. By this time we had reached the limits of storage expansion or our existing servers, had started backing up storage to shelved disks, had initiated manual cloud backups of older data, had begun efforts to decimate past image data to reduce storage requirements, and were on the verge of adopting a self managed distributed storage system in the form of a CEPH cluster (the cluster itself depending on half a dozen or so new servers). The feasibility of establishing and managing the CEPH cluster, along with all the rest of our legacy systems, came into question.

For the previous few years, HPWREN had been investigating cloud based alternatives, and in particular the costs and benefits of AWS S3 based storage. S3, EdgeFS and CEPH storage experiments had also taken place in conjunction with the our participation in the Pacific Research Platform (https://nationalresearchplatform.org/media/pacific-research-platform-video). We also used AlertCEPH (a storage system maintained by https://alertcalifornia.org/) in early 2022 and 2023. In 2023, at the invitation of our UCSD campus partner, Qualcomm Institute (https://qi.ucsd.edu), HPWREN participated in an AWS proof of concept, working with Xpertec Solutions, which ultimately led to the current Phase I AWS project. This Phase I project recently transitioned to production on 4/1/24 with the new websire at https://www.hpwren.ucsd.edu .

HPWREN legacy camera image workflow and website … (aka the nuts and bolts)

The somewhat oversimplified, but general HPWREN camera workflow, prior to AWS adoption, can be summarized as follows:

Multiple servers run cron driven processes alternating minutes to fetch images from many hundreds of cameras
As these images arrive, other servers reformat them into different resolutions, add timestamp and location identifications (both in metadata and visible on the edge of the image), add watermarking, duplicate them for immediate web publication, store copies of them locally, and store copies of them on other archival servers (without the watermarking) for long term storage.
Every 3 hours, other servers collect the recent 3 hours of per minute images and create a video from them, for every active camera. Those videos are stored into the archival servers as well.

To add a new site's cameras to the system, prior to AWS adoption, the following actions took place (these actions are applied to up to 12 camera imagers for each new site)*

Setup directory structure on receiving servers
Add crontab entries on fetching servers for new cameras
Setup symbolic links for new cams on receiving servers
Update various web files on multiple servers
Update multiple informational files documenting all cameras
Run multiple scripts to update archival servers directory and link structure
Run multiple scripts to update websites on archival and backup webservers
Edit multiple cgi support and search scripts across multiple servers to add new cameras
Update multiple decimation scripts on multiple servers to recognize new cameras
Update multiple monitoring and logging scripts to access new cameras

*Some of the initial steps above were automated in 2023, driven by newly created scripts referencing updated csv and config files which maintained all needed camera information and status

AWS Proof of Concept

The initial goals (Phase I) of the AWS proof of concept project were one or more of the following

Provide cloud based storage for HPWREN data, differentiating between new data and archival data access requirements
Provide cloud based website hosting
Provide tools to automate the updating of the general website (static pages)/li>
Provide tools to automate the additions of new cameras to the website (dynamic pages)

A collection of AWS based tools were deployed initially to replicate the HPWREN website and camera image workflows for developing a new AWS hosted website and archival data access mechanism. These goals have generally been accomplished (by the release to production on 4/1/24), though various implementation idiosyncrasies identified as HPWREN staff are now using those tools, are still being worked through.

At the completion of Phase I, the general process for website additions included:

To make general website changes:

Update hpwren-site git repository from upstream (e.g. fetch current website files)
Update website files as needed
Review changes
Commit changes
Push changes to AWS code commit
Start AWS build

To add new cameras to the website:

Update generateCameraHtml git repository from upstream (fetch current website)
Update CSV files containing camera data
Copy results to local hpwren-site
Update generateCameraHtml git repository
Update git repository hpwren-site with above local changes
Deploy code on AWS
Backup all CSV files to S3 hpwren-sensor-data-website bucket

Phase II activities are not yet fully defined, but will likely include:

Need to do much website organizational cleanup to relocate website imagery onto the cdn server and thus speed up website rebuilds and code deployment
Need to reassess leftover legacy website sections not yet transferred to AWS.
The handling of weather data as well as camera images
Automated access to archival data (stored in the Glacier service tier) for users
Tools for searching for specific image data

Appendix A - Legacy HPWREN camera image workflow

Around 2015

Around 2020

Around 2022

Appendix B - AWS based HPWREN camera image workflow

Overall 2024 architecture

Updating the Website