In the last weeks I’ve introduced to you dark matter physics, and the EDELWEISS experiment in the first two posts of this series. In this final post I will show the CouchApps we built to provide feedback to researchers and engineers, and describe our data analysis process management system.
As discussed in the previous post, in the EDELWEISS experiment we store slow-control data (temperatures, pressures, voltages, status bits, etc.) at fixed time intervals from our various sub-systems (cryogenics, radon detector, and muon detector systems). These data give us snapshots of the experimental conditions over time, which is useful to find periods of specific and/or stable conditions. We also use these data to provide real-time feedback to detector operators and engineers. To provide that feedback, we built the following CouchApps: two cryogenics monitoring apps (1 and 2), muon-veto high voltage monitor, muon-veto detector position monitor, radon detector monitor.
One important thing to point out is that while the set of slow-controls measures are quite stable, they can change over time as our system is modified (in fact, we added a new measure just last week). The flexible schema allowed by CouchDB/Cloudant allows for these changes without breaking any of the downstream monitoring applications or methods to monitor for stable conditions. Secondary indexes via MapReduce views provide that consistent interface to all of the downstream applications. The flexible schema feature was one of the reasons we chose Cloudant.
In addition to the slow-control data, the metadata for each of the physics ROOT files are stored on Cloudant. (Reminder: the ROOT files contain the real physics data that are analyzed with digital signal processing and statistical analysis tools.) The metadata docs hold the conditions set by the data acquisition computers and the initial location of the physics ROOT file on the local disks. For our physics analysis, we must move the data files from the data acquisition computers to a batch processing system, analyze the digitized waveforms, perform calibrations and finally make cuts and selections of data under well-understood conditions in order to search for dark matter interactions. We keep track of the actions performed on the ROOT data files in the metadata docs as they progress through our analysis chain. Additionally, we use a MapReduce View to list the ROOT files that are ready for each step in the process. In this way, the database and metadata docs act as a messaging tool between multiple data processing programs and the physics data files that need attention. Continuously running scripts listen to filtered _changes feed from Cloudant and the appropriate processing scripts are called. A CouchApp displays the metadata docs so that members of the collaboration can visually monitor this chain of events. (More technical details are published here.)
Illustration of the interaction between the metadata database on Cloudant and our data processing scripts. Python scripts listen on filtered _changes feed for new documents to appear on the database that require some type of action. These actions include moving the data to the batch processing farm in Lyon, performing signal processing on the digitized waveforms, and performing calibrations. In addition to process tracking, these metadata docs hold environmental conditions such as initial temperature, radioactive source location, and detector voltages.
Create a database
curl -X PUT https://<username>:<password>@<username>.cloudant.com/<dbname>
git clone https://github.com/gadamc/edwexample cd edwexample pip install virtualenv virtualenv venv source venv/bin/activate pip install requests
First, edit edw.ini with your credentials and database name (do not modify viewname). Then upload the _design document.
Open two more terminals and run a listener script in each of them.
cd edwexample && source venv/bin/activate python listen_newfiles.py cd edwexample && source venv/bin/activate python listen_analysis1.py
In the original terminal add more data. Repeat as desired.
Every time you add a new document you should see the listener scripts respond, update the document and upload it to the database. You can use the Cloudant account interface to look directly at single documents to see the changes. Or you can use curl to GET a document directly
The final piece of the puzzle was to build a tool based on the metadata docs to find the physics ROOT files that are interesting for a particular analysis. In the 2009-2010 dataset we had five separate data acquisition machines creating new data files each hour. We took data for about a year, which produced a number of data files under many different conditions. Sometimes our temperatures weren't stable or were set at non-standard values, we set non-standard voltages on the detectors, or we purposefully introduced radioactive sources into the environment for calibrations. For the next data set (starting in 2014) we will have about 10 times more data. Each researcher in our collaboration needs to find the data files that are interesting to their particular analysis in a relatively painless way. To facilitate this, we created a few MapReduce Views that index the temperature, radioactive source condition and voltages found in the metadata docs. By sorting on these view results (and in combination with a list function), researchers in EDELWEISS can find the files they need. Now, if you're already familiar with Cloudant, you may be scratching your head about this approach, and you very well should be. Since the time we wrote those MapReduce views, Cloudant introduced Lucene-based searching, which is clearly the better tool for this particular job. It’s significantly more robust, easier to build, and produces better results than the view/list functions that were put together.
Simply put, Cloudant's database as a service helped me and all EDELWEISS members be better physicists and engineers. Centralizing the slow-control data and managing the analysis chain with Cloudant as the backend relieves many of the stresses of the final physics data analysis (searches for dark matter). This also demonstrates the power of a hosted service rather than managing the servers ourselves. Using Cloudant has saved us many hours per month learning and running a database system, which we, as physicists, are not particularly interested in doing. Additionally, the Cloudant engineers were always available and responsive on the #cloudant IRC channel when we had questions - another part of the service. All of us at EDELWEISS sincerely thank Cloudant for their support of our experiment.