by Stephen Brockwell
It’s not clear that commercial RDBMS technology (Relational Database Management System) has adapted to the possibilities of massive parallelism offered by new computing environments, inexpensive solid state drives, memory and processors. We started working on spatial data in Oracle, for example, in 1990—over twenty five years ago. At that time, customers were trying to get away from piles of CAD files representing their assets.
We had a mid-western gas distribution customer that had literally a mile of DWG files documenting their network assets. Not an easy thing to search. But what about all the inspection results and locations for all those pipes over time? Millions of records start to be the norm as soon as you manage not just the assets but the location of inspection results on the assets over time. The temporal aspect of the state of utility networks starts to stress old-fashioned architectures.
Another customer—a relatively small electric utility in Ontario, Canada—has been an early adopter of smart metering technology. We deployed a MultiSpeak (www.multispeak.org) interface to collect the real-time meter information and locate each event. One use of this data is to show a web map of the current state of the network. But there are other uses of this information, of course: correlating outages to equipment types, finding irregular patterns in consumption, looking for hotspots in the network. The amount of data collected—even for a small utility—is in the millions and millions of records. A relational database may not be the best—or most affordable—approach for storing and analyzing that amount of information.
By pursuing a big-data approach to geospatial data based on the Cassandra platform (cassandra.apache.org), the typical relational database architecture can be improved upon. In the coming months, we’ll be giving customers trial access to cloud-based services for synchronizing commercial RDBMS and CAD information with Cassandra—and providing access to analysis services that provide incredibly fast retrieval and analysis of information based on the massively parallel capabilities of this architecture.
Some of the use cases we’re planning to support in the early releases include:
1. Seamless point cloud services for utilities and local governments. Customers will be able to store hundreds of tiled multiple LiDAR scans and query the data they need rather than manually assemble specific files for a particular project.
2. Analyze real time meter data historically and provide thematic services to map summary information in seconds.
We would love to hear from you—what kind of questions do you have difficulty answering with your current file-based or relational asset database? Do you have good LiDAR data that you use on projects today but have difficulty managing because of the massive storage required for all the files?