Online Backup

From Neo4j Wiki

Jump to: navigation, search
This page will teach you how to use the Neo4j online backup component.

Contents

[edit] Online backup basics

The online backup utility can be used to synchronize a destination neo4j database from a source neo4j database. The source database is a running EmbeddedGraphDatabase instance, which can continue to run as usual during the backup.

The destination is either a running EmbeddedGraphDatabase or a filesystem location with a neo4j database in. The destination database has to start out as a copy of the files of the original datastore.

All completed transactions for all included data sources will be copied to the backup. Transactions that are still open don't affect the backup, and are of course not included in the backup.

The component information is located at: http://components.neo4j.org/neo4j-online-backup/

Adding online-backup as a Maven dependency is done like this (assuming version 0.4-SNAPSHOT, it's the latest as of 2010-01-13):

    <dependency>
    	<groupId>org.neo4j</groupId>
    	<artifactId>neo4j-online-backup</artifactId>
    	<version>0.5</version>
    </dependency>

If you want to download the component as a jar file, it's found here: http://m2.neo4j.org/org/neo4j/neo4j-online-backup/

[edit] Database configuration

The backup relies on using the logical logs, so the original (source) database has to be configured to keep the logs:

        EmbeddedGraphDatabase graphDb = new EmbeddedGraphDatabase( STORE_LOCATION_DIR );
        XaDataSourceManager xaDsMgr = graphDb.getConfig().getTxModule().getXaDataSourceManager();
        XaDataSource dataSource = xaDsMgr.getXaDataSource( "nioneodb" );
        dataSource.keepLogicalLogs( true );
Note: All data sources included in a backup have to be set to keep their logical logs.

There are also settings for auto-rotating the logs. These are the corresponding methods, using the default values for the settings:

        dataSource.setAutoRotate( true );
        dataSource.setLogicalLogTargetSize( 10 * 1024 * 1024 ); // 10 MB

[edit] How to perform backup

Note: The very first backup has to be performed by shutting down the neo4j database and copying its files to the backup location. All subsequent backups can then be performed online using the online backup utility to keep the backup in sync with the live database.

The backup method can differ in two ways:

  1. destination is a running EmbeddedGraphDatabase instance vs. only the location of a neo4j database is given
  2. there is just a single data source (e.g. neo4j) vs. multiple data sources (e.g. neo4j + lucene)

We will walk you through the different alternatives below.

[edit] Single data source; backup to filesystem location

        EmbeddedGraphDatabase graphDb = getTheGraphDbFromApp();
        String location = "/var/backup/neo4j-db";
        Backup backup = new Neo4jBackup( graphDb, location );
        backup.doBackup();

That's it.

Note: If there is a problem writing to the file system location Backup.doBackup() will throw an IOException.

[edit] Single data source; backup to running backup database

        EmbeddedGraphDatabase graphDb = getTheGraphDbFromApp();
        String location = "/var/backup/neo4j-db";
        EmbeddedGraphDatabase backupGraphDb = new EmbeddedGraphDatabase( location );
        Backup backup = new Neo4jBackup( graphDb, backupGraphDb );
        backup.doBackup();
        backupGraphDb.shutdown();

Not much to say here. Feed both databases to Neo4jBackup and you should be fine.

[edit] Multiple data sources; backup to filesystem location

For now, this variation assumes that you're running a neo4j service together with a lucene index service (from the index component).

The Neo4jBackup constructor is in this case given a list of data source names. The names in the example are the typical names when running neo4j + lucene.

        EmbeddedGraphDatabase graphDb = getTheGraphDbFromApp(); // assume lucene is hooked into this instance
        String location = "/var/backup/neo4j-db";
        Backup backup = new Neo4jBackup( graphDb, location,
            new ArrayList<String>()
            {
                {
                    add( "nioneodb" );
                    add( "lucene" );
                }
            } );
        backup.doBackup();

[edit] Multiple data sources; backup to running data sources

In this case, the neo4j source and destination instances are used to lookup any data sources in the list of names.

(TODO: missing info: how to wrap your data source to be used together with neo4j)

        EmbeddedGraphDatabase neo = getTheGraphDbFromApp();
        String location = "/var/backup/neo4j-db";
        EmbeddedGraphDatabase backupGraphDb = new EmbeddedGraphDatabase( location );
        IndexService backupIndexService = new LuceneIndexService( backupGraphDb );
        Backup backup = new Neo4jBackup( graphDb, backupGraphDb,
            new ArrayList<String>()
            {
                {
                    add( "nioneodb" );
                    add( "lucene" );
                }
            } );
        backup.doBackup();
        backupIndexService.shutdown();
        backupGraphDb.shutdown();

[edit] Manually transferring and applying logical logs

If you have a running Neo4j graph database which is set up to keep its logical logs, you can manually copy or move rotated logical logs from the server and have a client apply them on a destination database. The first step still is to start with a copy of the source database and from there apply new logs incrementally whenever you like.

It's done by starting up a new JVM and run the org.neo4j.onlinebackup.ApplyNewLogs main class with a path to the destination database where you've put your copied/moved logical logs from the source database (keeping the directory structure from the source database). It will then apply those logs on the destination database. Example (assuming you have a running source database in /var/db and a destintion database (originated from the source database at some point) in /var/backup-db:

mv /var/db/*log.v* /var/backup-db/
java -cp $CLASSPATH_INCLUDING_ONLINE_BACKUP_AND_ITS_DEPENDENCIES \
            org.neo4j.onlinebackup.ApplyNewLogs /var/backup-db

If you're using LuceneIndexService/LuceneFulltextIndexService as well you'll have to additionally move/copy its logs. So the script can be extended to this:

mv /var/db/*log.v* /var/backup-db/
mv /var/db/lucene/*log.v* /var/backup-db/lucene/
mv /var/db/lucene-fulltext/*log.v* /var/backup-db/lucene-fulltext/
java -cp $CLASSPATH_INCLUDING_ONLINE_BACKUP_AND_INDEX_AND_THEIR_DEPENDENCIES \
            org.neo4j.onlinebackup.ApplyNewLogs /var/backup-db

[edit] Backup logs

As per default, backup logs are sent to standard error output (usually to the console, that is). If you want to you can enable logging to a file as well (default is: off), using the following method call:

        backup.enableFileLogger();

The log file will be named backup.log and created or appended to in the current working directory.

Changed your mind? Then go:

        backup.disableFileLogger();

There are three different log levels to choose from:

        backup.setLogLevelNormal(); // default, few lines of output
        backup.setLogLevelDebug();  // detailed output
        backup.setLogLevelOff();    // no output at all

This setting affects both console and file log output.

[edit] Summary

In summary, this is what you have to do:
  • Preparation:
    • shutdown the database, copy all the files to the backup location
    • configure the database to keep its logical log in the future
  • Performing backup:
    • instantiate Neo4jBackup according to your scenario (location/running, single/multiple data sources)
    • configure file output and log level of the backup log
    • off you go, doBackup()!
Personal tools