.. _newnode: Step by step guide to a new VAMDC node ====================================== Let's have a look at the structural diagram from the :ref:`intro` once more: .. image:: nodelayout.png :width: 700 px :alt: Structural layout of a VAMDC node If you have followed the instructions of the page on :ref:`prereq`, you are done with the yellow box in the figure. This page will tell you first how to configure and write the few code bits that your node needs before running (blue box), and then how to deploy the node and make it run as shown in the violet box. It goes like this: * Get the Nodesoftware and make a copy of the example node. * Auto-create a new settings file and put your database connection there. * Either * Write your data model and let Django create the database from it. Then use the import tool to put your data there. * Let Django write the model from an existing database that you already have. * Assign names from the VAMDC dictionary to your data to make them globally understandable. * Start your node and test it. But let's take it step by step: The main directory of your node --------------------------------- Let's give the directory which holds your copy of the NodeSoftware a name and call it `$VAMDCROOT`. (It is called `NodeSoftware` by default and exists whereever you downloaded and extracted it, unless you moved it elsewhere and/or renamed it, which is no problem to do) a name and call it `$VAMDCROOT`. Let's also assume the name of the dataset is `YourDBname`. Inside $VAMDCROOT you find several subdirectories. For setting up a new node, you only need to care about the one called `nodes/` which contains the files for several nodes already, plus the example node. The first thing to do, is to make a copy of the ExampleNode:: $ export VAMDCROOT=/your/path/to/NodeSoftware/ $ # (the last line is for Bash-like shells, for C-Shell use `setenv` instead of `export` $ cd $VAMDCROOT/nodes/ $ cp -a ExampleNode YourDBname $ cd YourDBname/ .. note:: In the following you always work within this newly created directory for your node. You should not need to touch any files or run commands outside it. Inside your node directory --------------------------------- The first thing to do inside your node directory is to run:: $ ./manage.py This will generate a new file ``settings.py`` for you. This file is where you override the default settings which reside in ``settings_default.py`` (which you should not edit!). There are only a few configuration items that you need to fill * The information on how to connect to your database. * A name and email address for the node administrator(s). * Example queries that makes sense with your data. * Optionally you can set the location of the log-file and override other options by copying from ``settings_default.py``. The structure for filling in this information is already inside the newly created file. You can leave the default values for now, if you do not yet know what to fill in. There are only three more files that you will need to care about in the following: * ``node/models.py`` is where you put the data model, * ``node/dictionaries.py`` is where you put the dictionaries and * ``node/queryfunc.py`` is where you write the query function, all of which will be explained in detail in the following. .. _thedatamodel: The data model and the database --------------------------------- By *data model* we mean the piece of Python code that tells Django the layout of the database, including the relations between the tables. By *database* we mean the actual relational database that is to hold the data. (See also :ref:`concepts`). There are two basic scenarios to come up with these two ingredients. Either the data are already in a relational database, or you want to create one. Case 1: Existing database ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If you want to deploy the VAMDC node software on top of an existing relational database, the *data model* for Django can be automatically generated by running:: $ ./manage.py inspectdb > node/models.py This will look into the database that you told Django about in ``settings.py`` above and create a Python class for each table in the database and attributes for these that correspond to the table columns. An example may look like this:: from django.db.models import * class Species(Model): id = IntegerField(primary_key=True) name = CharField(max_length=30) ion = IntegerField() mass = DecimalField(max_digits=7, decimal_places=2) class Meta: db_table = u'species' There is one important thing to do with these model definitions, apart from checking that the columns were detected correctly: The columns that act as a pointer to another table need to be replaced by `ForeignKeys`, thereby telling the framework how the tables relate to each other. This is best illustrated in an example. Suppose you have a second model, in addition to the one above, that was auto-detected as follows:: class State(Model): id = IntegerField(primary_key=True) species = IntegerField() energy = DecimalField(max_digits=17, decimal_places=4) ... Now suppose you know that the field called `species` is acutally a reference to the species-table. You would then change the class `State` as such:: class State(Model): id = IntegerField(primary_key=True) species = ForeignKey(Species) energy = DecimalField(max_digits=17, decimal_places=4) ... .. note:: You will probably have to re-order the classes inside the file ``models.py``. The class that is referred to needs to be defined before the one that refers to it. In the example, `Species` must be above `State`. Let's add a third model:: class Transition(Model): id = IntegerField(primary_key=True) species = ForeignKey(Species) upper_state = ForeignKey(State, related_name='transup') lower_state = ForeignKey(State, related_name='translo') wavelength = FloatField() The important thing here is the `related_name`. Whenever you want to define more than one `ForeignKey` to the *same* model, you need to set this to an arbitrary name. This is because Django will automatically set up the reverse key for you and needs to give it a unique name. The reverse key in this example could be used to get all the Transitions that have a given State as upper or lower state. More on this at :ref:`relatedname`. Once you have finished your model, you should test it. Continuing the example above you could do something like:: $ ./manage.py shell >>> from node.models import * >>> allspecies = Species.objects.all() >>> allspecies.count() # the number of species is returned >>> somestates = State.objects.filter(species__name='He') >>> for state in somestates: print state.energy >>> sometransitions = Transition.objects.filter(wavelength__lt=500) >>> atransition = sometransitions[5] >>> othertransitions = atransition.upper_state.transup.objects.all() >>> othertransitions.count() # gives the number of transitions with the # same upper state. Detailed information on how to use your models to run queries can be found in Django's own excellent documentation: http://docs.djangoproject.com/en/1.3/topics/db/queries/ Case 2: Create a new database ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In this case we assume that the data are in ascii tables of arbitrary layout. The steps now are as follows: #. Write the data model in your ``node/models.py``. #. Create an empty database with corresponding user and password #. Tell the node software where to find this database. #. Let the node software create the tables #. Use the import tool to fill the database with the data. First of all, you need to think about how the data should be structured. Data conversion (units, structure etc) can and should be done while importing the data since this saves work and execution time later. Since the data will need to be represented in the common XSAMS format, it is recommended to adopt a layout with separate tables for species, states, processes (radiative, collisions etc) and references. Deviating data models are certainly possible, but will involve some more work on the query function (see below). In any case, do not so much think about how your data is structured now, but how you want it to be structured in the database, when writing the models. Writing your data models is best learned from example. Have a look at the example from Case 1 above and at file ``$VAMDCROOT/nodes/vald/node/models.py`` inside the NodeSoftware to see how the model for VALD looks like. Keep in mind the following points: * As mentioned, a `class` in the model becomes a `table` in the database and the fields/members of the class correspond to the table columns. * Each class should have one member with `primary_key=True`. If not, one called `id` will be implicitly created for you. * How you name your classes and fields is up to you. Sensible names will make it easier to write the dictionaries below. * Use the appropriate field type for each bit of data, e.g. BooleanField, CharField, PositiveSmallIntegerField, FloatField. There is also a DecimalField that allows you to specify arbitrary precision which will also be used in later ascii-representations of data. * Use `ForeignKey()` to another class's primary key to connect your tables. * The full list of possible fields can be found at http://docs.djangoproject.com/en/1.3/ref/models/fields/. * If you know that a field will be empty sometimes, add `null=True` to the field definition inside the brackets (). * For fields that are frequent selection criteria (like wavelength for a transition database), you can add `db_index=True` to the field to speed up searches along this column (at the expense of some disk space and computation time at database creation). * If you do not define a table name for your model with the Meta class, as in the first example above, the table in the database will be named as the model, but lowercase and with a prefix `node_`. Once you have a first draft of your data model, you test it by running (inside your node directory):: $ ./manage.py sqlall node This will (if you have no error in the models) print the SQL statements that Django will use to create the database, using the connection information in ``settings.py``. If you do not know SQL, you can ignore the output and move straight on to creating the database:: $ ./manage.py syncdb Now you have a fresh empty database. You can test it with the same commands as mentioned at the end of Case 1 above, replacing "Species" and "State" by your own model names. .. note:: There is no harm in deleting the database and re-creating it after improving your models. After all, the database is still empty at this stage and `syncdb` will always create it for you from the models, even if you change your database engine in ``settings.py``. The command for re-creating the tables in the database (deleting all data!) is ``./manage.py reset node``. .. note:: If you use MySQL as your database engine, we recommend its internal storage engine InnoDB over the standard MyISAM. You can set this in your settings.py by adding `'OPTIONS': {"init_command": "SET storage_engine=INNODB"}` to your database setup. We also recommend to use UTF8 as default in your MySQL configuration or create your database with `CREATE DATABASE CHARACTER SET utf8;` How you fill your database with information from ascii-files is explained in the next chapter: :ref:`importing`. You can do this now and return here later, or continue with the steps below first. Using the XML generator ----------------------------------- Before we go on to the remaining two ingredients, the *query function* and the *dictionaries*, we need to have an understanding on how they play together in the XML generator. As you remember from :ref:`xsamsconcepts`, the goal is to run queries on your models and pass on the output to the generator so that it can looped over them to fill the hierarchical XSAMS structure. In order to make this work, we need to name the variables that you pass into the generator (as explained below) and the loop variables that you use in the Returnables. For example, continuing on the model above: Assume you have made a selection of your Transition model; you pass this on under the name *RadTrans*; the generator loops over it, calling each Transition insite its loop *RadTran* (note the singular!). `RadTran` is now a single instance of your Transition model and has the wavelength as `RadTran.wavelength` since we called the field this way above. The entry in the RETURNABLES would therefore look like `'RadTranWavelenth':'RadTran.wavelength'` - where the first part is the keyword from the VAMDC dictionary (which the generator knows where in the schema it should end up) and the second part tells it how to get the value from the query results that it got from your query function. Do not fret if this sounded complicated, it will become clear in the examples below. Just read the previous paragraph again after that. Here is a table that lists the variables names that you can pass into the generator and the loop variables that you use in the Returnables. The one is simply the plural of the other. ===================== ============= ================================================= ================= Passed into generator Loop variable Object looped over Loop variable ===================== ============= ================================================= ================= Atoms Atom .. Atom.States AtomState .. Atom.Components Component .. Atom.Component.SuperShells AtomSuperShell .. Atom.Component.Shells AtomShell .. Atom.Component.ShellPairs AtomShellPair Molecules Molecule .. Molecule.States MoleculeState .. Molecule.State.Parameters Parameter .. Molecule.State.Parameter.Vector VectorValueOA .. Molecule.NormalModes NormalMode .. Molecule.State.Expansions Expansion .. Molecule.State.Expansion.Coefficients Coefficient Solids Solid .. Solid.Layers Layer .. Solid.Layer.Components Component Particles Particle RadTrans RadTran .. RadTran.Shiftings Shifting .. RadTran.Shifting.ShiftingParams ShiftingParam .. RadTran.Shifting.ShiftingParam.Fits Fit .. RadTran.Shifting.ShiftingParam.Fit.Parameters Parameter RadCross RadCros .. RadCros.BandModes BandMode CollTrans CollTran .. CollTran.Reactants Reactant .. CollTran.IntermediateStates IntermediateState .. CollTran.Products Product .. CollTran.DataSets DataSet .. CollTran.DataSet.FitData FitData .. CollTran.DataSet.FitData.Arguments Argument .. CollTran.DataSet.FitData.Parameters Parameter .. CollTran.DataSet.TabData TabData NonRadTrans NonRadTran Environments Environment .. Environment.Species EnvSpecies Particles Particle Sources Source Methods Method Functions Function .. Function.Parameters Parameter ===================== ============= ================================================= ================= The third and fourth columns are for an inner loop. So for example the generator loops over all `Atoms`, calling each atom insteance `Atom`. To extract all states being a part of this particualar Atom, the generator will assume that there is an iterable `States` defined on each `Atom` over which it will iterate. So it will loop over `Atom.States`, calling each of state `AtomState` in the inner loop, like this:: for Atom in Atoms: [...] for AtomState in Atom.States: [...] It is up to you to make sure the `Atom.States` is defined if you want to output state information. This is covered in the next section. .. _queryfu: The query routine ----------------------------------- Now that we have a working database and data model and know in principle how the generator works, we simply need to tell the framework how to run a query and pass the output to the generator. This is done in a single function called `setupResults()` which must be written in the file ``node/queryfunc.py`` in your node directory. It works like this: * `setupResults()` is called from elsewhere and you need not run it yourself. * `setupResults()` gets an object as input, called `sql`. This is a parsed version of the query that comes in. It holds the WHERE-part as `sql.where` and so on. * We now need to run this query on the data model in order to get so called `QuerySets` which are basically unevaluated queries that are then passed on to the XML generator which takes care of the rest. * If you want to enforce limits on how much data can be returned in one query, this can be done here as well. * You should also calculate some statistics on how much information a query returns and return it as header information. In a concrete example of an atomic transition database, it looks like this: .. literalinclude:: queryfunc.py :linenos: Explanations on what happens here: * Lines 1-4: We import some helper functions from the sqlparser and the dictionaries and models that reside in the same directory as ``queryfunc.py`` * Line 6: Set the limit of transitions for use below. * Line 7: Begin the function `setupResults`. Do not change this line. * Line 9: This uses the helper function where2q() to convert the information in `sql.where` to QueryObjects that match your model, using the RESTRICTABLES (see below). The result from where2q() is a string that needs to be executed with eval(). * In line 10 we simply pass these QueryObjects to the Transition model's filter function. This returns a QuerySet, an unevaluated version of the query, which we assign to the variable `transs`. We also ordered it by wavelength. * Line 11: We use the count() method on the QuerySet to get the number of transitions which we later pass into the header. * Line 13-17: We check if the number is larger than our limit and shorten the QuerySet if necessary. This is done by getting the wavelength at the limit and making a new QuerySet that has as an additional restriction the new upper wavelength limit. We also prepare a string with the percentage for the headers. * Lines 19-29: Here comes the tricky part. For the selected transitions, we now need to create the corresponding atoms/species, since they go into different parts of the generator, see the table above. Not only that, each atom should have attached its list of states that are upper or lower states for the selected transitions - there is an inner loop over `Atom.States` in the generator, remember? In detail: * Line 19: We pull a single column out of the Transitions model, the key that links to the Species model. We put that into a `set()` to throw out duplicates. * Line 20: We use this set to query for all our Species. * Line 21: We count them and save the result for later. * Line 22: We make a new variable for the number of states which we will increase in the coming loop. * Line 23: Start a loop over our selected `species`. * Line 24: Make a sub-selection on our previously selected transitions, now only selecting the ones that belong to the current species. * Lines 25-26: As for the species IDs before, we now pull the keys to the upper and lower states out of our Transition model. * Line 27: We concatenate the two lists of IDs and put them in a `set()` to get rid of duplicates. `sids` is now a list of IDs of all the states within the current species that are used in the selected transtions. * Line 28: Use this list to make the query on the State model. And, **most importantly**, attach it to the current species object. This way we have constructed the nested structure for the generator. * Line 29: For the statistics, we now increase the state count with the number for the current species. * Lines 31-36: Put the statistics into a key-value structure where the keys are the header names as definded by the VAMDC-TAP standard and the values are the strings/numbers that we calculated above. * Lines 39-41: Return the QuerySets and the headers, again as key-value pairs. The keys are the names from the first column of the table above, so that the generator recognizes them and loops over them at the right place. .. note:: As you might have noticed, all restrictions are passed to the Transitions model in the above example. This does not mean that we cannot put constraints on e.g. the species here. We simply use the models ForeignKey in that case in the RESTRICTABLES. An entry there could e.g. be `'AtomIonCharge':'species__ion'` which will use the `ion` field of the species model. Depending on your database layout, it might not be possible to pass all restrictions to a single model. Then you need to write a more advanced query than the shortcuts in Lines 7-8. .. note:: We are well aware that adapting the above example to your data is a non-trivial task unless you know Python and Django reasonably well. There is a more complete example in ``ExampleNode/node/queryfunc.py`` and you can also have a look at the other nodes' ``queryfunc.py`` which are included in the NodeSoftware. And, of course, we are willing to assist you in this step, so feel free to contact us about this. More comprehensive information on how to run queries within Django can be found at http://docs.djangoproject.com/en/1.3/topics/db/queries/. The dictionaries ---------------------------------- As the last important step before the new node works, we need to define how the data relates to the VAMDC *dictionary*. If you have not done so yet, please read :ref:`conceptdict` before continuing. What needs to be put into the file ``node/dictionaries.py`` is the definition of two variables that map the individual fields of the data model to the names from the dictionary, like this:: RESTRICTABLES = {\ 'AtomSymbol':'species__name', 'AtomIonCharge':'species__ion', 'RadTransWavelength':'wavelength', } RETURNABLES={\ 'NodeID':'YourNodeName', # constant strings work 'AtomIoncharge':'Atom.ion', 'AtomSymbol':'Atom.name', 'AtomStateEnergy':'AtomState.energy', 'RadTransWavelength':'RadTran.wavelength', } .. note:: Note for example the use of the names `Atom` and `AtomState` on the right-hand side of the dictionary definition. These are examples of the "loop variables" mentioned in the table above and act as shortcuts to the nested data you are storing. About the RESTRICTABLES ~~~~~~~~~~~~~~~~~~~~~~~~ As we have learned from writing the query function above, we can use the RESTRICTABLES to match the VAMDC dictionary names to places in our data model. The key in each key-value-pair is a name from the VAMDC dictionary and the values are the field names of the model class that you want to query primarily (Transition, in the example above, line 10). The RESTRICTABLES example give fits our query function from above, so we know that the "main" model is the Transitions. Now if a query like "AtomIonCharge > 1" comes along, this can be translated into `Transition.objects.filter(species__ion__gt=1)` without further ado, which is exactly what `where2q()` does. Note that we here used a ForeignKey to the Species model; the values in the RESTRICTABLES need to be written from the perspective of the queried model. .. note:: Even if you chose to not use the RESTRICTABLES in your setupResults() and treat the incoming queries manually, you are still encouraged to fill the keys (with the values being empty), because they are automatically provided to the VAMDC registry so that external services can figure out which names make sense to query at this node. About the RETURNABLES ~~~~~~~~~~~~~~~~~~~~~~~~ Equivalent to how the RESTRICTABLES take care of translating from global names to your custom data model when the query comes in, the RETURNABLES do the opposite on the way back, i.e. when the data reply is sent by the generator, as we have already seen above. Again the keys of the key-value-pairs are the global names from the VAMDC dictionary. The values now are their corresponding places in the QuerySets that are constructed in setupResults() above. This means that the XML generator will loop over the QuerySet, getting each element, and try to evaluate the expression that you put in the RETURNABLES. Continuing our example from above, where the State model has a field called `energy`, so each object in the QuerySet will have that value accessible at `AtomState.energy`. Note that the first part before the dot is not the name of your model, but the *loop variable* inside the generator as it is listed in the second (or forth, in the case of an inner loop) column of the table above. There is only one keyword that you **must** fill, all the others depend on your data. The obligatory one is `NodeID` which you should set to a short string that is unique to your node. It will be used in the internal reference keys of an XSAMS document. By including the NodeID, we make these keys globally unique within VAMDC which will facilitate the merging of data that come from different nodes. http://dictionary.vamdc.org/returnables/ is where you can browse all the available keywords. .. note:: Again, at least the keys of the RETURNABLES should be filled (even if you use your own generator for the XML output) because this allows the registry to know what kind of data your node holds before querying it. Testing the node ------------------------------ Now you should have everything in place to run your node. If you still need to fill your database with the import tool, now is the time to do so according to :ref:`importing`. Django comes with a built-in server for testing. You can start it with:: $ ./manage.py runserver This will use port 8000 at your local machine which means that you should be able to browse to http://127.0.0.1:8000/tap/availability and hopefully see a positive status message. You should also be able to run queries by accessing URLS like:: http://127.0.0.1:8000/tap/sync?LANG=VSS1&FORMAT=XSAMS&QUERY=SELECT ALL WHERE AtomIonCharge > 1 replacing the last part by whatever restriction makes sense for your data set. .. note:: The URL has to be URL-encoded when testing from a script or similar. Web browsers usually do that for you. To also see the statistics headers, you can use `wget -S -O output.xml ""`. You should run several different test queries to your node, using all the Restrictables that you defined. Make sure that the output values matches your expectations. There is a very convenient software called **TAPvalidator** (see http://www.vamdc.org/software) which can be used to query a node, browse the output and check that it is valid with respect to the xsams standard. Once your node does what it should do with the test server, you can start thinking about :ref:`deploying it `.