Sunday, 2 June 2013

Geocluster: Server-side clustering for mapping in Drupal based on Geohash

Vienna University of Technology
DrupalThesisStudyOpen SourceOpen DataResearchMapping

This thesis investigates the possibility of creating a server-side clustering solution for mapping in Drupal based on Geohash. Maps visualize data in an intuitive way. Performance and readability of digital mapping applications decreases when displaying large amounts of data. Client-side clustering uses JavaScript to group overlapping items, but server-side clustering is needed when too many items slow down processing and create network bottle necks. The main goals are: implement real-time, server-side clustering for up to 1,000,000 items within 1 second and visualize clusters on an interactive map. Clustering is the task of grouping unlabeled data in an automated way. Algorithms from cluster analysis are researched in order to create an algorithm for server-side clustering with maps. The proposed algorithm uses Geohash for creating a hierarchical spatial index that supports the clustering process. Geohash is a latitude/longitude geocode system based on the Morton order. Coordinates are encoded as string identifiers with a hierarchical spatial structure. The use of a Geohash-based index allows to significantly reduce the time complexity of the real-time clustering process. Three implementations of the clustering algorithm are realized as the Geocluster module for the free and open source content management system and framework Drupal. The first algorithm implementation based on PHP, Drupal’s scripting language, doesn’t scale well. A second, MySQL-based clustering has been tested to scale up to 100,000 items within one second. Finally, clustering using Apache Solr scales beyond 1,000,000 items and satisfies the main research goal of the thesis. In addition to performance considerations, visualization techniques for putting clusters on a map are researched and evaluated in an exploratory analysis. Map types as well as cluster visualization techniques are presented. The evaluation classifies the stated techniques for cluster visualization on maps and provides a foundation for evaluating the visual aspects of the Geocluster implementation.

View Original Post  →