The EUBIROD Network

1^st Annual EUBIROD meeting

Dasman Center for Research and Treatment of Diabetes, Kuwait City

Kuwait City, Kuwait, 2^nd-4^th May 2009

Technical Details on the Training Session

The session was initiated by an introductory talk by Dr.Carinci, Technical Coordinator of the EUBIROD project, and a presentation by Dr.Baglioni, EUBIROD project manager, software engineer and major developer of the BIRO Box, on the various IT components.

As a first step, Dr.Baglioni introduced the technical aspects of the Box by reminding the major features of the system also presented at the BIRO Academy. Video and slides of the relevant Academy presentation can be found here.

Trainees were allocated to different tables in two separate rooms (see venue here). Participants were mostly equipped with own notebook, except for five partners who did not bring their laptops, then provided by the Dasman Centre with desktop PCs, fully loaded with the required software installed by the Coordinating Centre.

A supporting team was nominated to solve all technical problems, including Dr.Baglioni, Dr.Carinci and L.Rossi from the Perugia Coordinating Centre, and Dr.Awaraji, Dr.Trehan from the Dasman Centre.

The training session was designed around four phases:

setup of all software components
database load operations
local statistical analysis
global statistical analysis

Setup

In the initial setup phase, Dr.Baglioni introduced a list of mandatory prerequisites that needed to be satisfied in order to install the BIRO system. These steps mainly consisted in downloading and installing software packages by third parties, as well as setting environmental variables for the operating system. The complete list of instructions can be found here.

This step proved to be the most difficult aspect of the whole session, as reflected by responses to an evaluation questionnaires that the Coordinating Centre has submitted to all participants afterwards.

Countless problems were encountered while carrying out setup operations. The variability of problems was very high, and challenging: many problems occurred only on some specific PCs. That was most likely due to heterogeneous operating systems and the different hardware available.

User permissions gave also substantial problems: some notebooks were property of the partner organization whose policy did not provide administrative privileges to end users. Although the Coordinating Centre explicitly requested for privileged to be granted in advance, users came unprepared to the meeting.

Even when administrative privileges were granted, computers powered by Vista behaved in strange ways, making the entire process difficult, particularly when setting environment variables.

Some packages available online from third parties (in particular, Mixtex and Postgres) did not install properly on specific machines, apparently for no reason.

As a result, the supporting team was not always able to find quick solutions to basic installation problems and this created a state of uncertainty among participants about the possible achievements.

Indeed, these difficulties show the following:

it is not possible to foresee all problems without a direct application of the software in real life conditions. A shared information systems e.g. BIRO requires to be compliant with many different environments. Although frustrating for partners, the experiment must be regarded as very useful for software developers, as it may trigger new solutions to overcome complex operations and reduce installation bugs

open source software can be very convenient, but it requires downloading and installing tons of different packages/libraries from third parties. These solutions must be tested on different systems before being adopted. A trade-off must be identified to match the user needs (all operating systems, old and new machines, etc) with the developer needs (upgrading software). As a result, it might be necessary to fix stringent minimum requirements for machines running BIRO.

Internet connection may slow down considerably when many users download huge files from same repositories. Cable connection should be preferred over wireless.

PCs powered by MS Windows may be easier to use, but as more flexibility is demanded, they can show unexpected problems, that are difficult to overcome even by expert users. On this ground, Vista is perhaps the worse case to handle. On the other hand, Linux machines may be less used, but they are much easier to operate, and open source software is native to them. This means that as problems occur, their resolution by a team of experts e.g. the supporting team may be almost immediate.

To make more efficient usage of time, it will be advisable to carry out a remote setup and assist users with a hotline well in advance of the meeting.

The supporting team should be more evenly balanced across users to intervene faster as problems occur. They must be also more coordinated in preparation of the session by sharing the definition of the materials.

The last step required, i.e. installing the BIRO Box, did not create any problem once all requisites for setup have been satisfied. For partners who were unable to finalize the first part, the Coordinating Centre provided a desktop computer fully functional with regards to using the BIRO system.

Database load operations

The phase of database load operations was split in two different rounds:

exporting data to the BIRO XML format
loading data on the Postgres database

The aim of this round was to export any data table from a diabetes register in a format that could be read by the BIRO system and be loaded into a BIRO-standardized Postgres database.

Both steps can be bypassed by a centre having the capacity to directly translate local datasets (i.e. all tables required by BIRO) into the BIRO Postgres database. This is normally the case of large centres with a substantial expertise in IT and data management.

On the other hand, for those relying entirely on the BIRO system, both steps are required.

The objective of the first step was to produce a zip file - whose dimension depends from the size of the origin dataset - including all XML files for patients in the register according to a consistent BIRO format. The latter may be only a subset of larger databases maintained by the contributing unit.

Such an option is important to grant flexibility to the system: if a small centre does not have the capacity to run a Postgres database (which in many instances may be very likely), then the operator has the opportunity to produce the export using the stand-alone BIRO Box, referring to another centre (e.g. a “Regional”, or “National BIRO coordinator”) to upload BIRO exports and manage the Postgres database. The coordinating centre will then be able load all exports into own database. Noticeably, such a case falls outside the scope of the BIRO system (will require separate agreements), as the coordinating entity will exchange and manage individual data from associated centres rather than aggregated data.

Diabetes registers are usually based on heterogeneous data entry systems with own coding. The choice made by the BIRO project is to leave this situation unaltered and to deal with complexity, to avoid any additional burden for contributing units.

To export local data into the BIRO format, it is then required to “map” from local coding to the BIRO coding. Although this could be done autonomously by each unit by referring to the specifications provided by the Common Dataset and Data Dictionary, the BIRO development team designed a custom application to perform this operation, that has been fully integrated in the BIRO Box and was applied at the training session.

In most cases, the tool worked fine. However, we found that so many conditions are present in original databases, that it is practically impossible to cover all possible mappings. Therefore, some minimal requirements must be met to optimise the process.

Despite of the very few mandatory items required in a BIRO database, some of them were not met by the local registers. Some items apparently straightforward (e.g. type of diabetes, date of diagnosis, etc) were not so obvious in a practical situation. To solve this problem and advance the analysis, in some cases it was necessary to change the original data, either by adding a dummy field and populating the database, or eliminating errors in dates, etc.

Our experience shows that it is necessary to improve the original databases. There is a need for more training at the local centre, and upgrading local database software, including quality checks that are evidently not present. For instance, Excel spreadsheets frequently hide improper format, which once exported cause consistency errors.

By the way, the most convenient situation is the one in which registers are started using the BIRO format, as in the case of Cyprus, or revised to comply with it.

By all means, the BIRO mapping tool, connected to the Java-based “Adaptor” to export data, proved to be quite a robust application, as it avoided for the majority of inconsistent fields to be translated into the standardized BIRO export.

Nevertheless, it still needs to be improved by adding detailed log messages in the application, indicating the exact position of inconsistent data. Experience shows that when errors are found either the process suddenly breaks (meaning that no export is produced), or result in XML files with missing data/fields.

Some of these inconsistencies were then reflected in the second step, i.e. loading data into the Postgres database. When important data/columns were missing, the relevant tables were not created in the database, causing problems to the statistical engine.

At the end of this round, with substantial help and data tweaking from the supporting team, almost all partners succeeded in loading the database, either by using own data, or reverting to use the test data provided by the Coordinating Centre to complete the training session.

Local Statistical Analysis

The phase of local statistical analysis consisted in running the local statistical engine to produce the BIRO reports.

In BIRO, the statistical engine connects directly to the Postgres database to load target fields and produce the report of diabetes indicators.

At this stage, errors that were not discovered in the previous round became evident, since the engine would naturally stop when data, particularly mandatory fields, were not properly translated into the Postgres database. That occurrence triggered a trial-and-error phase in which the supporting team had to check all fields carefully and eventually correct original data to repeat steps previously performed without success.

The source code of the statistical engine was also improved to take into account unexpected situations and several bugs were corrected.

The above steps needed to be repeated while partners were progressively able to deliver own reports. At the end, many partners succeeded in delivering correct reports, first of all Germany, Austria, Malta, Poland, Croatia, Kuwait. However, operations for the use of the local statistical engine required a much longer time than initially expected.

Global Statistical Analysis

For this reason, the phase of global statistical analysis was finally cancelled and moved to next technical meeting.

The program for this phase included sending the statistical objects to the central engine (BIRO server software) and producing the overall BIRO report. Although possible, partners agreed that a “wrap up” session was more useful to understand the contents of reports and plan improvements. Participants agreed to take the case of Germany as a basis for discussion and examined all outputs in detail, making comments on all indicators.

Conclusions

In conclusion, trainees were very satisfied and enthusiast of the production of the BIRO system. Partners agreed that the system had vastly improved since its first release. The practical outcome is visible, and it can be reproduced in very different conditions.

Partners also examined other reports.

In particular, the case of Kuwait was found very interesting, mainly for two different reasons.

Firstly, it shows that a database from a non European country, inexperienced in the approach, can successfully contribute to EUBIROD, producing a data export and a standard BIRO report.

Secondly, it provides an important message from a clinical perspective, being the only case of a pediatric database in the training session. The case shows that a profound revision is required to apply BIRO to the pediatric population. Indeed, the current report seems inadequate, including many tables with sparse cells, stratified by age groups that are not relevant. Furthermore, many indicators on diabetes complications are meaningless in this context. Partners agreed that a connection with other EU projects dedicated to pediatric diabetes is required.

The training session provided positive results for participants and important indications for further improvement. Partners left agreeing that more materials should be produced to ensure online, everyday use of the software.

An advanced educational session is scheduled for 2010 at the 2nd EUBIROD Annual Residential Meeting.