Practical Applications of Very Cool Oracle Database Custom Aggregate SQL Functions
by: Stephen Brockwell
The Brockwell Telecom Module has an elegant, simple model for telecom, fiber and copper equipment locations. In general terms, we have Structures (poles, towers, sites) at which Enclosures (splice cans, comm cabinets, vaults) contain Equipment (muxes, radios, and so on). Cables run between Structures and terminate at Equipment; this organization tends to make information easy to find. From a GIS perspective, the only geographic data is the Structure—the locations and their visualization is obtained on-the-fly from the Equipment located at a site or on a pole. These clean, simple maps are easy to navigate at various scales—and easy to label with customer-specific information from this Structure >
Enclosure > Equipment hierarchy.
So, essentially, all the important Equipment is always located at a Structure—a pole, vault, tower, site, or building. In previous versions, we made buildings Enclosures to “keep-it-simple”, and for many customers, this was natural—the sites were the primary means of locating equipment for trouble call or asset management purposes, and most buildings were located at these sites.
But we’ve recently embarked on a major project with a large North American utility to integrate their telecom assets with a leading asset management system, and this has required a substantial re-organization of the data. Buildings must be independent locations, fully addressable—and, for this customer, many sites have multiple buildings, sometimes with different addresses (for instance, the operations center and the primary equipment hub may be a few hundred feet apart).
So we’ve decided to change the model to treat buildings as Structures. Because the Equipment itself is hierarchical (sites have buildings, buildings have rooms, rooms have comm-cabinets and so on) we must maintain very careful bookkeeping of those relationships.
The Geometric Problem
But the more interesting challenge that I want to share with you here is the geometric one. Here are the possible situations we’ve encountered, in general terms:
- All buildings must have new locations based on the site or structure that contains them.
- Multiple site locations must be consolidated and new building locations must be created for the previous sites.
- All sites that currently have no buildings must have new buildings created (this is a consistency requirement for asset management).
- A site and a building cannot share the exact same location.
Sites with multiple buildings must appear to be near the “center” of the building locations
From an Oracle perspective, we need to:
- Offset the geometry of any site that has only one building.
- Average the geometry for sites that have more than one building.
Less or More?
Over the years, I’ve learned an important lesson about the Oracle database: Work with it, not against it. Too many application developers write 3GL programs to circumvent the way SQL works by nature; rather than make things simple, they make things more complex and less maintainable. Oracle is one of the most extensible databases on the planet. There is almost always a way to do something more simply with far less code—often using a single SQL statement. This is much better than a 3GL loop that is often more difficult to debug and maintain.
This is not to say that a thousand-line query with nested views, multiple hierarchical subqueries and so on is always a better solution. It is essential to find the right balance between maximizing performance, simplicity, and testability, and minimizing cost, complexity, and maintainability.
Let’s look at an approach that achieves these goals.
The SQL group by clause allows you to consolidate records into summaries. Often, a business requires only simple groups. For multiple telecom sites with the same asset management Location ID, for example, you might want the total number of equipment:
Sample output might look something like this:
Nested Enclosures and Hierarchical Queries
I have a few observations for anyone learning or trying to master SQL. Because enclosures are hierarchical, it’s important to recursively visit all of them. Most terminating Equipment is in a vault, in a rack, in a shelf, and, depending on your model, in a chassis. This is the purpose of the CONNECT BY clause, which is unique to Oracle’s flavor of SQL.
For observant followers of standard SQL, almost every CONNECT BY query can be replaced with a nested query using recursive CTE (common table expressions). The following query is somewhat simplified, but it gives you an idea of the fundamentals:
With the CTE approach, you split the query into two parts: the anchor members (the root of the tree), and the recursive members (the leaves) which result from querying the results of the view. If recursive queries are necessary for your database work—in either SQL Server, Oracle or any other database—get in touch with us.
Aggregating String Data
In case you didn’t know, the LISTAGG function will concatenate a list of text with a separator—so if you want to preserve multiple location details, remarks, or notes for the aggregated site, you can do that as indicated below in a slightly simplified version of the query we used to consolidate the sites:
The “WITHIN” keyword resembles the “OVER” keyword familiar to users of analytical functions. This is used to order the textual list.
For remarks, for example, you might see a result like this, ‘Site A (Main Site), Site B (radio tower), Survey date 7/08/1998’—the consolidated results for three different locations that now fall within a single site.
You may be aware that Oracle has some geometric aggregation functions (have a look at this Oracle link). Some of these would come very close to solving the problem of the average location for our consolidated sites.
For example, we could use the SDO_AGGR_MBR function to get the minimum bounding rectangle for all the buildings and previous sites that are to be merged. The center of that rectangle would be very close to the average location. The SDO_AGGR_CENTROID function would be even better—it creates a single point, and that point would be the geometric average. Unfortunately, it doesn’t adjust the offset of the site if that site has only one location associated to it.
For this specific case, I decided to use Oracle’s ability to have user-defined aggregate (group by) functions.
Implementing a custom aggregate function involves the implementation of a database object type that supports specific interfaces necessary to support aggregation.
- ODCIAggregateInitialize()– used to set the initial state for each group (for example, reset a counter to zero).
- ODCIAggregateIterate() – used to increment the group result for each row within the group.
- ODCIAggregateMerge() – used when parallelizing to merge the results of two or more parallel group operations.
- ODCIAggregateTerminate()– used to return the final result for the group.
I won’t provide all the details here, but suffice to say,
- ODCIAggregateInitialize() creates a new empty geometry for the group.
- ODCIAggregateIterate() keeps track of the average location for the group.
- ODCIAggregateMerge() averages the results from any parallel sets.
- ODCIAggregateTerminate()returns the result—and here’s where we use a nice trick.
What is happening here is simple, but effective—and it shows the power of these functions. All this code does is return the average if there was more than one point. However, if there was only one point for a site, it nudges it north 50’. Any rule here could apply; in fact, our final customer rule is somewhat more complicated. Still, it’s amazing how a programmatic exercise in a loop can be transformed into something as simple as this:
where the averagepoint custom aggregate function returns the optimized location of the site. A lot of benefit for less than 20 lines of code across the four interfaces.
I didn’t talk about performance here, but if you have database performance issues, we can help with that as well. We’ve tuned electric databases with millions of customers and hundreds of millions of features. We’ve tuned telecom databases with thousands of wire centers. We’ve tuned graphic display performance, query performance, and long-transaction performance, whether it was with ArcGIS SDE versioning, Oracle Workspace Manager, or Autodesk Industry Model Jobs. Performance tuning is essential to the implementation of the above processes to be most effective.
Frank Misurec and I have been working with database GIS since 1990. Brandon Amick is now a five-year veteran of some of our largest and most complex database migrations. The Brockwell team has tuned, partitioned, migrated, and customized Oracle and SQL Server databases in almost every continent for some of the largest utility and public-sector organizations on the planet. If you’re migrating to Oracle 12c, or from SQL Server to Oracle, or from files or proprietary formats to an ESRI geodatabase, or even if you just want to know how to get more value from the incredible reporting and analytical capabilities of your asset databases, reach out to us at firstname.lastname@example.org.