2016 and Beyond: Advanced tips for Developers and users David J. Just

Slides:



Advertisements
Similar presentations
JQuery MessageBoard. Lets use jQuery and AJAX in combination with a database to update and retrieve information without refreshing the page. Here we will.
Advertisements

LiNC Developer Meetup Welcome!. Our job is to make your life easier APIs Tools and workflow Documentation Stay in touch: developers.lithium.com Join the.
Module R2 CS450. Next Week R1 is due next Friday ▫Bring manuals in a binder - make sure to have a cover page with group number, module, and date. You.
XML Configuration Access Protocol (XCAP) Jonathan Rosenberg dynamicsoft.
Links and Comments.
Heapsort. 2 Why study Heapsort? It is a well-known, traditional sorting algorithm you will be expected to know Heapsort is always O(n log n) Quicksort.
We Need Smart XML Processing HTML has ultra-complex semantics XML has no semantics Something must bridge the gap  A program ?  A clear set of data semantics?
Python and Web Programming
Working with cursors in Python GISDE Python Workshop Qiao Li.
CHP - 9 File Structures. INTRODUCTION In some of the previous chapters, we have discussed representations of and operations on data structures. These.
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
Data File Access API : Under the Hood Simon Horwith CTO Etrilogy Ltd.
Multi-Part Requests/ Parent & Child Service Items.
INTRODUCTION TO JAVASCRIPT AND DOM Internet Engineering Spring 2012.
Heapsort CSC Why study Heapsort? It is a well-known, traditional sorting algorithm you will be expected to know Heapsort is always O(n log n)
Priority Queues and Binary Heaps Chapter Trees Some animals are more equal than others A queue is a FIFO data structure the first element.
Selenium and Selenium on Rails. Agenda  Overview of Selenium Simple Selenium Tests Selenium IDE  Overview of Selenium on Rails  Problems with Selenium.
CIT 590 Intro to Programming Files etc. Announcements From HW5 onwards (HW5, HW6,…) You can work alone. You can pick your own partner. You can also stick.
Tutorial 3 Adding and Formatting Text with CSS Styles.
ICM – API Server & Forms Gary Ratcliffe.
Heapsort. What is a “heap”? Definitions of heap: 1.A large area of memory from which the programmer can allocate blocks as needed, and deallocate them.
This was written with the assumption that workbooks would be added. Even if these are not introduced until later, the same basic ideas apply Hopefully.
Advanced Taverna Aleksandra Pawlik University of Manchester materials by Katy Wolstencroft, Aleksandra Pawlik, Alan Williams
Lawson Mid-America User Group Spring 2016 Meeting.
Emdeon Office Batch Management Services This document provides detailed information on Batch Import Services and other Batch features.
What is GroupWise? A tool for communication in our organization A system to send and receive A way to increase productivity A method to get documents.
Product Training Program
JavaScript, Sixth Edition
CHP - 9 File Structures.
Self Healing and Dynamic Construction Framework:
Loops BIS1523 – Lecture 10.
Full Page Watermarking
Development Changes in Dynamics 365 for Finance and Operations
File System Structure How do I organize a disk into a file system?
Arrays and files BIS1523 – Lecture 15.
CSE373: Data Structures & Algorithms Lecture 7: AVL Trees
Hashing Exercises.
Heap Sort Example Qamar Abbas.
Using Excel with Google Maps
Un</br>able’s MySecretSecrets
CSE373: Data Structures & Algorithms Lecture 7: AVL Trees
While Loops BIS1523 – Lecture 12.
Recommended Budget Reductions
New Functionality in ARIN Online
Dr. David Matuszek Heapsort Dr. David Matuszek
5 Tips for Upgrading Reports to v 6.3
GDSS – Digital Signature
Lesson Objectives Aims
Our Wireless Enterprise Help Desk (WEHD) is here to assist you 24 hours a day, 7 days a week, 365 days a year. We also have our own private number.
ARRAYS 1 GCSE COMPUTER SCIENCE.
Heaps Chapter 11 has several programming projects, including a project that uses heaps. This presentation shows you what a heap is, and demonstrates.
Electronics II Physics 3620 / 6620
JavaScript & jQuery AJAX.
Heapsort.
Building ASP.NET Applications
Tonga Institute of Higher Education
REST APIs Maxwell Furman Department of MIS Fox School of Business
Data Structures Unsorted Arrays
Data Structures & Algorithms
How to debug a website using IE F12 tools
Amortized Analysis and Heaps Intro
Python and XML Styling and other issues XML
Heapsort.
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Introduction to AJAX and JSON
Heapsort.
Consuming Web Services with 2E Generated Objects
Heapsort.
CO 303 Algorithm Analysis and Design
Presentation transcript:

2016 and Beyond: Advanced tips for Developers and users David J. Just PyXNAT 2016 and Beyond: Advanced tips for Developers and users David J. Just

Introduction David Just Bachelors of CS from Winona State University 14 years of developer experience. Mayo Aging and Dementia Research Lab ( ADIR ) Four production XNAT servers. ~30,000-40,000 Scans Current maintainer of pyxnat PyXNAT website: https://github.com/pyxnat/pyxnat http://pythonhosted.org/pyxnat/

Agenda What is pyXNAT pyXNAT 1.0.0.0 pyXNAT 1.1 Data Ingestion Tips Reporting Tips

What is pyXNAT pyXNAT is a python interface to XNAT’s REST API Simplifys writing data interactions with xnat by giving you a local programmable interface to XNAT’s data. Supplies methods of searching, and file interaction with XNAT. Getting started: http://pythonhosted.org/pyxnat/

Streamed Upload / Download Better error messages PyXNAT 1.0.0.0 Released September 2015 NO MORE CACHING!!! Streamed Upload / Download Better error messages put/get/post/delete/head wrappers Bug in proxy support. Use old 0.9.5.4 instead. The biggest change for pyxnat 1.0.0.0 was the move from using httplib2 to “requests” for handling all http communication. But this change is should be transparent to the users. For users of pyxnat, the biggest change is probably the removal of all of the caching functionality. For most of us, it seems that pyxnat’s caching was of limited use at best, and actually caused problems at worst. So the concensus on the forum was to remove it. pyXNAT still looks like it supports caching, but it doesn’t. All of the cache parameter references were left in to facilitate backwards compatibility Additionally I made the changes necessary so that all file uploads and downloads were streamed. What this means is that large files will no longer cause pyxnat use exorbitant amounts of memory. This feature is completely automatic so if you upgrade to 1.0.0.0 you will see the perormance improvment right away. I also tried* to make the error messages a little more useful. Pyxnat was swallowing what little data xnat gave us on errors, I’ve tried to make it so that what little error information xnat gives you, pyxnat shows you. Lastly, because pyxnat is now using requests library, I felt it was a good idea to expose the http: put/get/post/delete/head methods from requests into pyxnat.interface. This allows you to use pyxnat to run custom url strings without having to manage the cookies yourself. An example later in this presentation uses “put” to show how it works.

All dependent on my “free” time. Proxy support fix is my priority ONE. PyXNAT 1.1 All dependent on my “free” time. Proxy support fix is my priority ONE. Move from CSV/JSON to XML + XPATH where possible inside of pyxnat. More code cleanup of unused legacy code: cache/httputil Documentation Documentation Documentation Help is ALWAYS welcome PyXNAT is not my day job. And my day job takes up 99% of my time. So changes and enhancements from me to pyxnat are greatly affected by “are they affecting me” priority. I do recognize that the move to requests has somehow broken proxy support, and as soon as I can get some free time I will get into it. If anybody is using proxy now and has the ability to assist it would be greatly appreciated. The other things I’d like to do are moving from using CSV->JSON kludges to XML+XPATH for column/attribute access. And to clean up unused code.

Data Ingestion File Upload to “deep” paths Something that isn’t very well documented is that you can technically upload files to “deep” paths within a resource. Each resource is NOT limited to single folder depth, however this has some limitations. It’s possible to create deep folders in two ways: Upload each file separately using a relative path as the file’s id. (Shown on this slide) Create a zip file that contains the path and use resource.put_zip() with extract=True (shown later.) However, when working with deep path files, it’s impossible to retrieve them as a group. Each file must be retrieved individually.

Data Ingestion What file manager looks like after the previous slide.

Data Ingestion Zip Upload This shows, that even when uploading as a zip. There is not (currently) a way in pyxnat to retrieve a folder from a deep path upload. Each file must be retrieved individually using the full path to the file.

Data Ingestion Zip Upload Showing what an un-zipped zip upload looks like in XNAT’s file manager.

Data Ingestion Use attrs.mset when possible instead of attrs.set When updating data (attributes) it’s best to use Object.attrs.mset() if possible over Object.attrs.set(). The reason is two fold: Mset is much faster than setting multiple values using set. And the speed increase is greater with an increase in the number of parameters. XNAT’s history logging will add a row of history for each mset, or each set call. So lots of sets result in a much larger history table to represent just one “event” However, there are some caveats to mset: Limited length: pyxnat turns the dictionary into a big URL string, and URL strings are limited so it is possible to blow past the limit and some attributes would not be set. Silent failure: XNAT does not return a failure if one of many attributes does not exist or could not be set. I believe this is a bug in XNAT. Problem one is on my list of things to fix in pyxnat because the fix is relatively simple and you can use it now: Use JSON as form data as we’ll see on the next slide.

Data Ingestion JSON can be used to set multiple parameters at a time: Using JSON to load data into xnat avoids the URL length problem of mset, but does not avoid the silent failure problem. The trick to this method is to ALWAYS include the “xsiType”: “xnat:dataType” field. Without it XNAT will either error out, or worse, use the default data type to partially create the object and then error (sometimes silently). xnat.put requires pyxnat 1.0.0.0

Data Ingestion JSON can be used to set multiple parameters at a time: Using JSON to load data into xnat avoids the URL length problem of mset, but does not avoid the silent failure problem. The trick to this method is to ALWAYS include the “xsiType”: “xnat:dataType” field. Without it XNAT will either error out, or worse, use the default data type to partially create the object and then error (sometimes silently). xnat.put requires pyxnat 1.0.0.0

Data Ingestion JSON can even be used to reference child elements JSON object updates, allow you to update any of the unbounded children within a data element. There are two methods of doing this: 1)The first method is to use the indexes into the sub array. However using this method you cannot add elements at a random position. To add a new child, you must know how long the old array was and increment by 1. 2) The second method, using id’s as keys as shown above, ends up being friendlier and allows you to access elements at any position in the child objects. This method does NOT require you to know the length of the old array and will append to the end when a new ID (or label) is referenced. xnat.put requires pyxnat 1.0.0.0

Data Ingestion JSON can even be used to reference child elements JSON object updates, allow you to update any of the unbounded children within a data element. There are two methods of doing this: 1)The first method is to use the indexes into the sub array. However using this method you cannot add elements at a random position. To add a new child, you must know how long the old array was and increment by 1. 2) The second method, using id’s as keys as shown above, ends up being friendlier and allows you to access elements at any position in the child objects. This method does NOT require you to know the length of the old array and will append to the end when a new ID (or label) is referenced. xnat.put requires pyxnat 1.0.0.0

Data Ingestion Dynamic Fields If you don’t have the ability to create/modify data types during build, XNAT’s support for custom fields can be used, but they have some gotchas that you need to be aware of. This slide shows a subject object’s XML with two custom fields added. Notice that the extra lines above (before) each <!--hidden_fields…> tag which is then followed by the value. Those are one of the reason things get slightly complicated.

Data Ingestion Dynamic Fields If you don’t have the ability to create/modify data types during build, XNAT’s support for custom fields can be used, but they have some gotchas that you need to be aware of. This slide shows a subject object’s XML with two custom fields added. Notice that the extra lines above (before) each <!--hidden_fields…> tag which is then followed by the value. Those are one of the reason things get slightly complicated.

Data Ingestion Dynamic Fields If you don’t have the ability to create/modify data types during build, XNAT’s support for custom fields can be used, but they have some gotchas that you need to be aware of. This slide shows a subject object’s XML with two custom fields added. Notice that the extra lines above (before) each <!--hidden_fields…> tag which is then followed by the value. Those are one of the reason things get slightly complicated.

Data Ingestion Dynamic Fields Once you have more than one custom field, pyxnat default attrs.get method cannot be used as it returns all the custom fields, not just the value for the one you want. In this case, XML and Xpath queries can be more useful for getting the value, but at a cost of a more complex attribute string: All fields must include their namespace attribute When comparing use [] to denote query values. @ means attribute. text() gives you ALL text nodes of the element. Full code {{{ xnat = pyxnat.Interface('https://central.xnat.org','nosetests','nose2016') nosetests = xnat.select.project('nosetests') subject = nosetests.subject('pyxnat2016') if subject.exists(): subject.delete() time.sleep(5) subject.create() print "Setting and getting first attribute:" subject.attrs.set("xnat:subjectData/fields/field[name='customfield1']/field", 'testValue1') print "Value:" , subject.attrs.get("xnat:subjectData/fields/field[name='customfield1']/field") print "Setting and getting second attribute:" subject.attrs.set("xnat:subjectData/fields/field[name='customfield2']/field", 'testValue2') print "Value:" , subject.attrs.get("xnat:subjectData/fields/field[name='customfield2']/field") print ''' Using: subject.xpath("/xnat:Subject/xnat:fields/xnat:field[@name='customfield2']/text()[2]")[0] xpath explination: * /xnat:Subject/xnat:fields/xnat:field = Find me all elements of type xnat:field that are children of xnat:Subject/xnat:fields * [@name='customfield2'] = of all xnat:field elements, give me only those with an attribute 'name' equal to 'customfield2' * /text() = give me all text nodes of the found elements. * [2] = we know that each xnat:field has a blank text element before the main data element. So grab the 2nd element of the list. ''' print "Value:" , subject.xpath("/xnat:Subject/xnat:fields/xnat:field[@name='customfield2']/text()[2]")[0] }}}

Data Ingestion Dynamic Fields Once you have more than one custom field, pyxnat default attrs.get method cannot be used as it returns all the custom fields, not just the value for the one you want. In this case, XML and Xpath queries can be more useful for getting the value, but at a cost of a more complex attribute string: All fields must include their namespace attribute When comparing use [] to denote query values. @ means attribute. text() gives you ALL text nodes of the element. Full code {{{ xnat = pyxnat.Interface('https://central.xnat.org','nosetests','nose2016') nosetests = xnat.select.project('nosetests') subject = nosetests.subject('pyxnat2016') if subject.exists(): subject.delete() time.sleep(5) subject.create() print "Setting and getting first attribute:" subject.attrs.set("xnat:subjectData/fields/field[name='customfield1']/field", 'testValue1') print "Value:" , subject.attrs.get("xnat:subjectData/fields/field[name='customfield1']/field") print "Setting and getting second attribute:" subject.attrs.set("xnat:subjectData/fields/field[name='customfield2']/field", 'testValue2') print "Value:" , subject.attrs.get("xnat:subjectData/fields/field[name='customfield2']/field") print ''' Using: subject.xpath("/xnat:Subject/xnat:fields/xnat:field[@name='customfield2']/text()[2]")[0] xpath explination: * /xnat:Subject/xnat:fields/xnat:field = Find me all elements of type xnat:field that are children of xnat:Subject/xnat:fields * [@name='customfield2'] = of all xnat:field elements, give me only those with an attribute 'name' equal to 'customfield2' * /text() = give me all text nodes of the found elements. * [2] = we know that each xnat:field has a blank text element before the main data element. So grab the 2nd element of the list. ''' print "Value:" , subject.xpath("/xnat:Subject/xnat:fields/xnat:field[@name='customfield2']/text()[2]")[0] }}}

Data Ingestion Dynamic Fields Once you have more than one custom field, pyxnat default attrs.get method cannot be used as it returns all the custom fields, not just the value for the one you want. In this case, XML and Xpath queries can be more useful for getting the value, but at a cost of a more complex attribute string: All fields must include their namespace attribute When comparing use [] to denote query values. @ means attribute. text() gives you ALL text nodes of the element. Full code {{{ xnat = pyxnat.Interface('https://central.xnat.org','nosetests','nose2016') nosetests = xnat.select.project('nosetests') subject = nosetests.subject('pyxnat2016') if subject.exists(): subject.delete() time.sleep(5) subject.create() print "Setting and getting first attribute:" subject.attrs.set("xnat:subjectData/fields/field[name='customfield1']/field", 'testValue1') print "Value:" , subject.attrs.get("xnat:subjectData/fields/field[name='customfield1']/field") print "Setting and getting second attribute:" subject.attrs.set("xnat:subjectData/fields/field[name='customfield2']/field", 'testValue2') print "Value:" , subject.attrs.get("xnat:subjectData/fields/field[name='customfield2']/field") print ''' Using: subject.xpath("/xnat:Subject/xnat:fields/xnat:field[@name='customfield2']/text()[2]")[0] xpath explination: * /xnat:Subject/xnat:fields/xnat:field = Find me all elements of type xnat:field that are children of xnat:Subject/xnat:fields * [@name='customfield2'] = of all xnat:field elements, give me only those with an attribute 'name' equal to 'customfield2' * /text() = give me all text nodes of the found elements. * [2] = we know that each xnat:field has a blank text element before the main data element. So grab the 2nd element of the list. ''' print "Value:" , subject.xpath("/xnat:Subject/xnat:fields/xnat:field[@name='customfield2']/text()[2]")[0] }}}

Data Ingestion Dynamic Fields Once you have more than one custom field, pyxnat default attrs.get method cannot be used as it returns all the custom fields, not just the value for the one you want. In this case, XML and Xpath queries can be more useful for getting the value, but at a cost of a more complex attribute string: All fields must include their namespace attribute When comparing use [] to denote query values. @ means attribute. text() gives you ALL text nodes of the element. Full code {{{ xnat = pyxnat.Interface('https://central.xnat.org','nosetests','nose2016') nosetests = xnat.select.project('nosetests') subject = nosetests.subject('pyxnat2016') if subject.exists(): subject.delete() time.sleep(5) subject.create() print "Setting and getting first attribute:" subject.attrs.set("xnat:subjectData/fields/field[name='customfield1']/field", 'testValue1') print "Value:" , subject.attrs.get("xnat:subjectData/fields/field[name='customfield1']/field") print "Setting and getting second attribute:" subject.attrs.set("xnat:subjectData/fields/field[name='customfield2']/field", 'testValue2') print "Value:" , subject.attrs.get("xnat:subjectData/fields/field[name='customfield2']/field") print ''' Using: subject.xpath("/xnat:Subject/xnat:fields/xnat:field[@name='customfield2']/text()[2]")[0] xpath explination: * /xnat:Subject/xnat:fields/xnat:field = Find me all elements of type xnat:field that are children of xnat:Subject/xnat:fields * [@name='customfield2'] = of all xnat:field elements, give me only those with an attribute 'name' equal to 'customfield2' * /text() = give me all text nodes of the found elements. * [2] = we know that each xnat:field has a blank text element before the main data element. So grab the 2nd element of the list. ''' print "Value:" , subject.xpath("/xnat:Subject/xnat:fields/xnat:field[@name='customfield2']/text()[2]")[0] }}}

xnat.select(“xnat:mrSessionData”).where(constraints) Reporting No such thing as “one good way”. Each report might need a different technique. xnat.select(“xnat:mrSessionData”).where(constraints) 513 ms for project nosetests on central xnat.select(‘/projects/nosetests/subjects/*/experiments’) 4799 ms after including check for data type. When generating reports from xnat, there simply is no one solution. Depending on the data you want: is it subject data, study data, or scan data, the technique used is going to change. For example, when reporting data from scans I have found that the fastest way is using xnat.select(‘xnat:dataType’).where(constraints) and then for each experiment I use experiment.get() to retrieve it’s xml, load that into an xml DOM and use xpath to retrieve scan values. {{{ print "="*80, '\n' with timeit_context('select(/projects/nosetests/subjects/*/experiments/*'): print "xnat.select('/projects/nosetests/subjects/*/experiments/*')" results = xnat.select('/projects/nosetests/subjects/*/experiments') experiments=[e for e in results if e.datatype() == 'xnat:mrSessionData'] print len(experiments) with timeit_context('select(xnat:mrSessionData).with(constraints)'): print "xnat.select('xnat:mrSessionData').where(contraints)" contraints = [('xnat:mrSessionData/project', '=', 'nosetests')] results = xnat.select('xnat:mrSessionData').where(contraints) experiments=[e for e in results] }}}

Reporting XML and XPath Using the built in pyxnat xpath: values = subject.xpath("/xnat:Subject/@label") Getting your own dom tree more efficient if you need to retrieve multiple values: root = etree.fromstring(subject.get()) values = root.xpath("/xnat:Subject/@label", namespaces=root.nsmap) Most data types in XNAT can be retrieved in an XML format. Using this, plus Xpath queries is probably the fastest way to get values and attributes out of an object. Eventually I would like to move pyXNAT to using xml and xpath internally because it’s much faster and more reliable than it’s current method of “csv” -> json for retrieving values.

Reporting XPath examples Find all scans in study: scans = experiment.xpath("//xnat:scans/xnat:scan") With each scan object get values using xpath: scan.xpath(‘xnat:series_description’)[0] scan.xpath(‘@ID’)[0] When querying using Xpath the results are always an array. This will need to be taken into account when getting data with them. When retrieving data from scans, it’s best to get the experiment, and use xpath to retrieve the scans from within the experiment’s xml. From there, you can iterate over the scans and use further xpath queries to get the values. But remember to NOT add ‘/’ to the front of the query because that will make xpath search from tree root instead of scan root. Attributes can be retrieved using Xpath by specifying ‘@’ in front of the attribute name.

Thank you for listening pyXNAT 2016 and beyond