Reconciliation Help
Using the Reconciliation Service
The recommended way to use the Reconciliation Service is with OpenRefine. This tool, previously called Google Refine, can query a Web Service — a website that returns information in a form the computer can interpret — and record the results, whether that’s an exact match, a close match, a list of possible matches, or no match at all.
Software overview and installation
Watch the three introductory videos — these instructions assume some familiarity with OpenRefine. There’s also written documentation.
- General introduction, editing messy data
- Transforming semi-structured data into properly structured data
- Calling a web service to supplement the dataset, reconciliation
Users at Kew: OpenRefine has been installed on the network. Go to X:\apps\OpenRefine\
and double-click OpenRefine.bat
. A black window should pop up, where any technical error messages appear. When you have finished with OpenRefine, click this window and press Control C to close the program properly.
Users elsewhere: OpenRefine needs Java to run. If your computer supports it choose 64-bit Java — this allows working on larger datasets that consume more memory. A version of OpenRefine including the Kew extension is available via GitHub (recommended). Alternatively, download OpenRefine from the download page. Choose the development version, currently 2.6-beta1
. This does not include the Kew extension, so the functionality to extend data using The Plant List will not be available.
Data preparation
The services are easiest to use if the whole name (or value to be reconciled) is in a single column, like Quercus alba L.
or Quercus alba f. latiloba Sarg.
. Better results can sometimes be obtained with a column for each necessary part (e.g. generic epithet, species epithet, publication title etc). You can use OpenRefine to do this — see the videos — or any other program.
Optionally, use facets to limit which names you wish to match — for example, to select particular ranks to match. If you have a lot of names (over 1000) you could star 10 or so names and facet on them, for a trial run.
Find the configuration you want to use from the list here. Note the two endpoints: the OpenRefine reconciliation service, and the JSON web service. These instructions will assume you have a list of plant names and wish to reconcile them against the IPNI Name reconciliation service.
Querying the Reconciliation Service
- If you have whole entities (e.g. full scientific names) in a single column, choose that column
- Otherwise, choose a column unique to each record, like an identifier.
- Click the column menu, and choose Reconcile → Start reconciling…
- If this is the first time you’ve reconciled against a particular service, you may need to click Add Standard Service. (Some services are already included.) Enter the URL from the Reconciliation Service website, for example
[http://data1.kew.org/reconciliation/reconcile/IpniName](http://data1.kew.org/reconciliation/reconcile/IpniName)
, and click OK. - Select the service from the list on the left. After a moment, the dialog is filled in with options.
- If you have columns for genus, species etc fill in the text boxes for Also use relevant details from other columns. The values to fill in come from those listed on the website describing the service (in this case,
epithet_1
,epithet_2
etc, listed here).
- Click Start Reconciling
-
Results appear after a while. Where there’s a single possibility it will have been automatically selected. Otherwise, you can select the match using the tick boxes.
It’s likely you will receive multiple results where IPNI has duplicate names. We hope to hide the duplicates from IPNI in the near future.
- If matching hasn’t worked you can also click Search for match and adjust the query.
-
To get the identifiers: click the column’s menu and choose Add column based on this column….
Then use the expression
cell.recon.match.id
.To get the name use
cell.recon.match.name
instead.These are GREL expressions — see the GREL Functions Documentation for more information.
Extending data using the Metaweb Query Language service
Data that has been (partially!) reconciled against IPNI and presented through a MQL service can be added to your data. At present, only some data from The Plant List is available in this way.
- Click a reconciled column heading and choose Edit column → Add column using MQL.
- Select a service from the list on the left, or add a new one.
- This shows a list of available properties — choose one or more properties from this list and click OK.
- The data values are retrieved and added as extra columns. If they are entities themselves (for example, more plant names) then it’s possible to run further MQL queries from those columns.
Using the results
You can then export the results into standard formats, including CSV, using the Export menu.
Troubleshooting
This section will be completed as we discover problems — please let us know! Allocating more memory may help, refer to the OpenRefine documentation on this.
Advanced data preparation / manipulation
It’s possible to use some of the transformers that are behind these reconciliation services to prepare your data. For example, you may wish to extract a year out of a field containing a whole reference. See the String Transformers project for how to do this.
Source code
The source code is available on Kew’s GitHub page.