by Katerina Guseva
When trying to pursue a “Big Data” approach to geospatial data and the relationship to storing pointclouds, you first have to answer the question of whether the geospatial data is actually big data. There are many industries and organizations throwing around the idea of “Big Data”, but this type of data can be easily categorized.
“Big Data” is usually described in terms of 4 V’s:
- Volume refers to the size and volume of data stored and how large it is. To store the information about a pointcloud of a city or an entire region the number of points can be counted in trillions. This information is usually stored in geo-files such as LAS, RCS or RCP files which are slow to load to the system. When the volume reaches a certain level, additional support will be needed to query, view and make use of the data without creating excessive processing times in loading if they were to be an LAS file.
- Velocity refers to the speed at which new data is inserted or queried. Here comes the major stumbling block for most RDMS. The speed decreases drastically with the increasing volume. So when we are creating and using multiple point clouds for multiple areas, the processing time and the subsequent results take the efficiency out of being able to utilize the data.
- Variety refers to the different types of data, namely, unstructured data. We have been investigating the Cassandra platform and have found that it gives an incredible opportunity to store the information about pointclouds coming from different sources that contain different information. The variety of data in geospatial data can cause an efficiency nightmare if not managed properly.
- Veracity refers to the messiness or trustworthiness of the data. It’s not a secret that data trustworthiness is extremely important in any industry. With the increasing volume of data the chances of the data got lost or corrupted are increasing. We have a relatively small municipal client who has more than 50 GB of LIDAR files to load to build the city map. All the objects on the map are the subject of inspection, thus, all the points in this massive pointcloud data is extremely important. Although, when using Cassandra platform, it is a responsibility of the application manager to guarantee the data trustworthiness as there are no constraints to support it.
Considering those aspects, we can conclude that for geospatial data we have at least 3 out of 4 big Vs that describe big data. Thus, yes, geospatial data is Big Data!
Contact us about your “Big Data” geospatial issues… We would love to hear some “real-world” experiences with managing this type of data!