Schema configuration

The schema configuration tells Prospecter which fields are available for searching and how/where to store queries.

The structure looks like this:

{
    "fields": {
        ...field definitions...
    },
    "persistence": {
        ...persistence configuration...
    }
}

Fields

Each field you want to support in your documents is an entry in the fields dictionary. The field name is the key in the dictionary and the value is another dictionary.

"fieldName": {
    ...field configuration...
}

type

Specified the type of the field. Available types are:

Type	Description
DateTime	Range index for date time information. Exact matches have to match to the millisecond.
Double	Range index for double precision floating point values.
FullText	Full text index for text fields.
GeoDistance	Queries specify coordinates and a radius, documents contain coordinates and will match if they lie within radius of query coordinates. (At the moment a bounding box is used, so points NW, NE, SE or SW of query coordinate could be further away than radius!
Integer	Range index for integers (32-bit signed)
Long	Range index for long integers. (64-bit signed)
String	Index for string literals. For categories or similar fields.

options

At the moment only FullText and DateTime fields support options.

You can set a different Analyzer by specifying analyzer in the options. The value has to be a string naming a class that implements the de.danielbasedow.prospecter.core.analysis.Analyzer interface. The default is de.danielbasedow.prospecter.core.analysis.LuceneStandardAnalyzer.

With the above analyzers you can further configure the stop word list that is used by specifying stopwords. The following settings are available for stopwords:

Setting	Description
none	Don’t use any stopwords. This is the default that is also used if you specify anything not recognized.
predefined	Uses a predefined list of english stopwords. The list is part of Lucene.
[“word1”, “word2”, …]	Custom stop word list.

If you implement your own Analyzer your make() method will receive the option object during startup.

bloomfilter

For FullText fields that will contain many unique terms it is possible to configure a Bloom filter this makes token mapping faster. The bloom filter needs two values:

"options": {
    ...
    "bloomfilter": {
        "expectedNumberOfElements": 200000,
        "falsePositiveProbability": 0.001
    }
    ...
}

expectedNumberOfElements is a guess of how many unique terms will be indexed. This can be hard to estimate in advance. falsePositiveProbability the probability a false positive. Bloom filters can not determine with 100% certainty if an element is present. They can only say for certain if an element is definitely not present. The higher the probability of false positives, the higher the possibility of unnecessary lookups.

If the expectedNumberOfElements is reached and more elements get added the actual falsePositiveProbability will go up. Memory usage of the Bloom filter increases with more expected elements and/or lower false positive probability.

format

For DateTime fields you can specify in what format dates will be represented. The default is ISO8601 but you can specify any string that can be interpreted by Java’s SimpleDateFormat.

Persistence

The persistence settings tell Prospecter where to store your queries. At the moment there is only MapDB available as a backend. The alternative is to use no backend. Queries will not be available after restarting Prospecter. This may still be what you’re looking for. To disable persistence leave out the whole persistence section from your configuration.

file

Filename in for backend to store queries in. Note that MapDB creates three files on disk, file is used as a prefix.

Example

{
    "fields": {
        "textField": {
            "type": "FullText",
            "options": {
                "analyzer": "de.danielbasedow.prospecter.core.analysis.LuceneStandardAnalyzer",
                "stopwords": "predefined"
            }
        },
        "price": {
            "type": "Integer"
        },
        "location": {
            "type": "GeoDistance"
        }
    },
    "persistence": {
        "file": "queries.mapdb"
    }
}