Neo4j Spatial
From Neo4j Wiki
The Neo4j database has no built in support for spatial data. However, it is in principle not difficult to model spatial data in a graph database. Several users of Neo4j have already done so in various domain specific ways. The idea has been discussed to develop a set of extensions or utilities for Neo4j to provide specific support for spatial data. Ideally we would like to make sure that the approach taken by Neo4j is one that is of use to the majority of potential users, and feedback on these ideas is very welcome. In this project we will discuss the various elements of spatial support, the types of solutions and approaches that have been discussed, and the proposed schedule of activities to achieve such a goal. We are also open to Collaboration on Spatial Projects, and are offering to mentor Google Summer of Code for Neo4j Spatial projects.
[edit] Elements of a spatial database
The most basic requirements of a spatial database would be:
| Storage | Provide the ability to store spatial objects in a way that facilitates the other spatial requirements |
|---|---|
| Search | Facilitate indexing of spatial data for optimal search performance (both time and memory) as well as performance of spatial operations |
| Operations | Provide a set of common spatial operations in the Neo4j API to aid application developers in making use of Neo4j as a spatial database |
| I/O | Provide support for importing and exporting spatial data using a number of popular spatial standards |
We will start simple, taking inspiration from some specific use cases, and elements of the published standards, like OpenGIS, as well as other open source GIS projects, like those tracked by OSGeo. Since Neo4j is a Java library, the GeoTools project is of high relevance. As is the open source spatial database PostGIS.
[edit] Suggested approaches
There are many things to consider here, but one of the most critical would be the question of how to store the spatial objects. There are two main groups of data:
For the early work in this project we will focus on feature data, primarily because the perceived benefits of using a graph database for such data are a little more apparent.
The current suggestions for storing feature data are:
- In-Node
- Storing all data associated with a single feature object within a single node. This can be done even within a single property, using either or both of the well known text and well known binary formats (WKT/WKB), or as a set of properties describing the various elements of the geometry.
- The Oracle and PostGIS spatial databases use approaches like this, because this works as well in tabular databases. There is a lot of public information available about both of these solutions and in PostGIS case, a lot of open source code is available for review.
- The indexing and spatial operations that are performed on this might are less transparent to the application developers and users, which has pros and cons.
- Sub-graphs
- Storing the features as a sub-graph with a structure that matches the feature data structure itself.
- This approach makes much better use of the graph database itself, makes the feature storage transparent to the developers and users, and leads to 'graph-based' solutions to the spatial operations. It would also facilitate application developers extending the spatial support themselves.
- The storage and complexity of the database is much higher than in the other approach (many more primitives are involved in this approach), probably affecting scalability and performance.
The difference between the two approaches are likely to lead to very different performance characteristics of the database, and so it is suggested that both be prototyped and benchmarked in various scenarios. It might turn out that both have their place, and if possible both might be supported in the final solution, possibly behind a unified API. It is possible that the Neo4j community will come up with ways of working with spatial data that go well beyond the common scenarios from relational databases, and beyond the expectations of the original developers of the spatial support. We would not like to take decisions that might impede that kind of innovation.
[edit] Proposed schedule
At the highest level, the project will take three phases:
- Setup - Planning, discussions with the community, discussions with identified early adopters and potential customers.
- Prototype - Develop and benchmark a decent minimal viable subset of the required features, sufficient to be released to the community, be of use to many, and enable feedback on the approaches taken.
- Release - Based on the response to the first public release, the minimum viable product, plan, develop and release a more complete system, covering all the most requested features. Ideally this might conform to published standards like OpenGIS, but possibly it might require a new standard focusing on graph databases.
Further details are available on the Neo4j Spatial Project Plan.

