Miscellaneous

There are a few more bits and pieces that are both good to know and maybe necessary for a particular node setup.

Setting the deployment URL

The NodeSoftware tries to automatically find out the URL with which it is accessed and uses this to fill the URL-information in /tap/capabilities, among other things. However, this does not always work (e.g. if you deploy behind a proxy) so there is a manual override. Simply set DEPLOY_URL in settings.py, ending with /tap/ like this:

DEPLOY_URL = 'http://your.server/some/path/tap/'

Filling the IDs

As you know, XSAMS is a hierarchical structure where certain parts reference other parts. For example, each (molecular or atomic) state has an ID, which can be used by a radiative transition to point to its initial and final states. Similarly, all species, bibliographic sources etc. have an ID that other parts use to point to them.

Here is a list of the most important Returnable names for IDs:

  • AtomSpeciesID uniquely identifies an atomic spieces. Different isotopes and ions are considered different species.
  • AtomStateID is the ID for the states within an atomic species.
  • CrossSectionID identifies radiative crosssections.
  • EnvironmentID identifies environments.
  • FunctionID numbers functions.
  • MethodID is for the defined methods.
  • MoleculeSpeciesID identifies molecular species. As for atoms, different isotopologues are considered to be separate species.
  • MoleculeStateID
  • ParticleSpeciesID identifies particles.
  • SolidSpeciesID identifies solids.
  • SourceID identifies the bibliographical sources and is used in many places of the schema to connect data to its origin.

NodeID is “special” in the sense that it is not formally part of the schema. The XML generator uses it to make all the other IDs unique within VAMDC. Say, for example that you (in dictionaries.py) set your NodeID to “xyz” and fill the SourceID with numbers from your database. Then the XML output will look something like <Source sourceID=”Bxyz-1”> for your first source. This means that the generator takes care of adding the prefix “B” as mandated for sourceIDs by the schema, plus it inserts the NodeID to prevent clashed with IDs from other VAMDC nodes.

IDs are mandatory which means that you have to fill the Returnables from the list above, if you use the corresponding part of the schema.

Ideally the node’s database layout roughly matches the XSAMS structure which means for example that you have separate tables for the atoms/molecules and their states. The linking indexes between the tables (usually an integer) are then directly suited to be used as the IDs above because the generator formats it as described.

In order to do this, it is good to be aware of the following Djangoism: Consider the example data model from here and that s is an instance of the State model. Then s.energy gives the value of the energy column in the database, as you expect. s.species however is, contrary to non-ForeignKey fields, not the key value of the corresponding species, but the actual instance of the species model because Django tries to be smart and convenient. Now we could use s.species.id to get the key value, but this would be slow since we would unnecessarily traverse into the species table to get it. The better way is to use s.species_id which is provided automatically, i.e. for any ForeignKey field xyz there is a field xyz_id which holds the key value instead of the linked object.

Using a custom model method for filling a Returnable

Sometimes it is necessary to do something with your data before returning them and then it is not possible to directly use the field name in the right-hand-side of the Returnable. Now remember that the string there simply gets evaluated and that your models can not only have fields but also custom methods. Therefore the easiest solution is to write a small method in your class that returns what you want, and then call this function though the returnable.

For example, assume you for some reason have two energies for your states and want them both returned into the Returnable AtomStateEnergy which can handle vectors as input. Then, in your models.py, you do:

class State(Model):
    energy1 = FloatField()
    energy2 = FloatField()

    def bothenergies(self):
        return [self.energy1, self.energy2]

And correspondingly in your RETURNABLES in dictionaries.py:

RETURNABLES = {\
    ...
    'AtomStateEnergy':'AtomState.bothenergies()',
    }

Note

Use this sparingly since it adds some overhead. For doing simple calculations like unit conversions it is usually better to do them once and for all in the database, instead of doing them for every query.

Handling the Requestables better

The XML generator is aware of the Requestables and it only returns the parts of the schema that are wanted. Therefore the nodes need in principle not care about this. However, there are two issues that can interfere:

  • If a node imposes volume limitations, this can lead to false results. For example in a transition database, when a client asks for “SELECT SPECIES” without any restriction then a node’s query function usually finds out the species for a set of transitions, which gets truncated to the volume limit, then only the species for the first few transitions in the database are returned.
  • Again taking “SELECT SPECIES” as example, this can lead to performance issues if a node’s query stategy is to impose the restrictions onto the most numerous model fist, since this query then corresponds to selecting everything and afterwards throwing everything away except the species information.

The solution is to make the queryfunction aware of the Returnables. These are attached to the object sql that comes as input. For example, one can test if the setup of atomic states is needed like this:

needAtomStates = not sql.requestables or 'atomstates' in sql.requestables

and then use the boolean variable needAtomStates to skip parts of the QuerySet building. This test checks first, if we have requestables at all (otherwise “ALL” is default) and then whether ‘atomstates’ is one of them.

Note

The query parser tries to be smart and adds the Requestables that are implied by another one. For example it adds ‘atomstates’ and ‘moleculestates’ when the client asks for ‘states’. Therefore it is enough to test for the most explicit one in the query functions.

Note

The keywords in sql.requestables are all lower-case!

Inserting custom XML into the generator

There can arise situations where it might be easier for a node to create a piece of XML itself than filling the Returnable and letting the generator handle this. This is allowed and the generator checks every time it loops over an object, if the loop variable, e.g. AtomState has an attribute called XML. If so, it returns AtomState.XML() instead of trying to extract the values from the Retunable for the current block of XSAMS. Note the execution of .XML() which means that this needs to be coded as a function/method in your model, not as an attribute.

Quick debugging and testing

Sometimes it is necessary to go manually go though the steps that happen when a query comes in in order to find out where omething goes wrong. A good tool for this is in interactive python session which you start from within your node directory with:

./manage.py shell

From within the Python shell, you can run:

# import the relevant part of the NodeSoftware
from vamdctap import views as V
# import your queryfunction
from node import queryfunc as Q
# set up a query
foo = {'LANG':'VSS2','FORMAT':'XSAMS',
    'QUERY':'select all where radtranswavelength < 1000 and radtranswavelength > 900'}
# run the parser
foo = V.TAPQUERY(foo)
# check basic validity
print foo.isvalid
...
# look at the parsed where clause
print foo.where
# put it into your query function and see what happens
Q.setupResults(foo)

You can also manually run the first step from the queryfunction:

from vamdctap import sqlparse as S
q = S.sql2Q(foo)
print q

Unit conversions for Restrictables

It is possible in dictionaries.py to apply a function to the values that come in the WHERE-clause of a query together with the Restrictables:

from vamdctap.unitconv import *
RESTRICTABLES = {\
'RadTransWavelength':'wave',
'RadTransWavenumber':('wave',invcm2Angstr),
...

Here we give a two-tuple as the right-hand-side of the Restrictable RadTransWavenumber where the first element is the name of the model field (as usual) and the second is the function that is to be applied.

Note

The second part of the tuple needs to be the function itself, not its name as a string. This allows you to write custom functions in the same file, just above where you use them.

Note

The common functions for unit conversion reside in vamdctap/unitconv.py. This set is far from complete and you are welcome to ask for additions that you need.

Treating a Restrictable as a special case

Perhaps a unit conversion (see above) is not enough to handle a Restrictable, e. g. because you do not have the quantity available in your database but know it anyway. Suppose a database has information on one atom only, say iron. For the output one would simply hardcode the information on iron in the Returnables as constant strings. For the query on the other hand, you would like to support AtomSymbol but have no field in your database to check against - after all it would be wasteful to have a database column that is the same everywhere.

Custom restrictable function

One way of handling this is to use a custom function as the value of the Restrictable in dictionaries.py:

'AtomSymbol':checkIron,

where checkIron would be a function, e.g. defined in the same file (before referencing it, of course) as:

def checkIron(restrictable,operator,value):
    value = string.strip('\'"')
    if value == 'Fe' and operator in ('=','=='):
        return return Q(pk=F('pk'))
    else:
        return ~Q(pk=F('pk'))

Note

Q(pk=F(‘pk’)) is a restriction that is always true and should be fast. The operator ~ negates it.

Note

This (and the alternative below) do not cover all possible query cases, for example the operators LIKE or IN. In practice, some more lines of code will therefore be needed to manually handle a Restrictable.

Note

If this topic is relevant for you, please also have a look into vamdctap/unitconv.py where there are some examples.

Note

For the easy example of comparing to a constant string, we have a ready solution: One can use ‘SomeRestrictable’:test_constant([‘Fe’,’U’]), where the function test_constant takes a single string or a list of strings that the value will be compared to.

Manipulatine the query

Another solution is to manipulate the set of restrictions by hand instead of letting sql2Q() handle it automatically. sql2Q() is a shorthand function that does these steps after each other:

  1. Use splitWhere(sql.where) to split the WHERE statement in two:
  • a structure that represents the logical structure of the query.
  • a dictionary with numbers as keys and a list as values that each contain the Restrictable, the operator and the arument(s).
  • For example, the query SELECT ALL WHERE RadTranswavelenth > 3000 and RadTranswavelenth < 3100 and (AtomSymbol = ‘Fe’ OR AtomSymbol = ‘Mg’) would return the two variables like
  • [‘r0’, ‘and’, ‘r1’, ‘and’, ‘(‘, ‘r2’, ‘or’, ‘r3’, ‘)’]
  • {‘1’: [u’RadTranswavelength’, ‘<’, u‘3100’], ‘0’: [u’RadTranswavelength’, ‘>’, u‘3000’], ‘3’: [u’AtomSymbol’, ‘=’, u“‘Mg’”], ‘2’: [u’AtomSymbol’, ‘=’, u“‘Fe’”]}
  1. Go through the Restrictables and apply the unit conversion functions that were specified with the mechanism above.
  2. Make use of the information in dictionaries.py to rewrite the restrictions into the native field names, in the form of Django Q-objects.
  3. Merge the individual restrictions together with their logic connection again and evaluate the whole shebang.

So, in summary, the call q=sql2Q(sql) at the start of the query function can be replaced by:

logic,restrictions,count = splitWhere(sql.where)
q_dict = {}
for i,restriction in restrictions.items():
    restriction = applyRestrictFu(restriction)
    q_dict[i] = restriction2Q(restriction)
q = mergeQwithLogic(q_dict, logic)

Now, depending on what you want to do, you can manipulate this process at any intermediate step. To continue the example with iron only, we could insert the following at the start of the loop over the restrictions:

if restriction[0].lower() == 'atomsymbol':
    if restriction[1] in ('=','==):
        if restriction[3] == 'Fe':
            q_dict[i] = Q(pk=F('pk'))
            continue

How to skip the XSAMS generator and return a custom format

Currently, only queries with FORMAT=XSAMS are officially supported. Since some nodes wanted to be able to return other formats (that are only useful for their community, for example to inculde binary data like an image of a molecule) there is a mechanism to to do this.

Whenever FORMAT is something else than XSAMS, the NodeSoftware checks whether there is a function called returnResults() in a node’s queryfunc.py. If so, it completely hands the responsibility to assemble the output to this function.

Note

This means that you have to return a HttpResponse object from it and know a little more about Django views. In addition you are on your own to assembe your custom data format.

Making more use of Django

Django offers a plethora of features that we do not use for the purpose of a bare VAMDC node but that might be useful for adding custom funcitonality. For example you could:

  • Use the included admin-interface to browse and manipulate the content of your database.
  • Add a custom query form that is suited specifically for the most common use case of your data.
  • Add a web-browsable view of your data.

For more information on all this have a look into Django’s excellect documentation at https://docs.djangoproject.com/

For extending your node beyond the VAMDC-TAP interface, you would normally add a second app to your node directory, besides the existing one called node. Then you simply tell your urls.py to serve the new app at a certain URL.