Betsy Rolland -- UW MLIS Candidate

Wrapper #1: InterProWrapper.java

This wrapper queries the InterPro protein database using any of the fields shown in the search screen. Though the wrapper was written to be robust and allow searching using any of the field names, the Bio-Mediator team decided that its use should be limited to querying only by InterPro ID (Accession Number). This is because the XML that is returned by each of the different field queries is different. For example, different XML results are returned if I query by protein ID versus querying by sequence ID. Because the XML result fields must be known in advance in order to map them to the mediated schema, it is not possible to have that sort of variation.

Because the calls to the database are done programmatically, the code completely bypasses the GUI, passing all arguments via the URL. For example, this wrapper creates a URL string like this:

http://srs.ebi.ac.uk/cgi-bin/wgetz?-noSession+[INTERPRO:(IPR010405)]+-view+InterProPrintXML+-mime+text/xml

This URL queries on ID number IPR010405 and tells the website to return the information in plain XML. The wrapper then receives that information and displays it in a way that it can be consumed by the metawrapper.

One of the interesting things about this wrapper is that InterPro returns results with attributes in its XML. It was the first data source to do so, so I needed to write another Java class that implemented a SAX parser to parse out the attributes. XML that came back from the datasource looking like this:

was then converted to look like this:

Much of the information returned as attributes needed to be mapped to the mediated schema, so it needed to be shown as unique elements.

Source Code:

InterProWrapper Java class
XMLAttributeParser Java class (implements SAX parser)

Wrapper #2: InterProSequenceWrapper.java

This wrapper also queries the InterPro protein database but does so by submitting a complete or partial genetic sequence such as this one:

MRCSISLVLGLLALEVALARNLQEHVFNSVQSMCSDDSFSEDTECINCQTNEECAQNDMCCPSSCGRSCKTPVNIEVQKAGRCPWNPIQMIAAGPCPKDNPCSIDSDCSGTMKCCKNGCIMSCMDPEPKSPTVISFQ

This is the genetic sequence for "whey acidic protein, core region." This query also returns XML with attributes so also needs to be run through the SAX parser. The challenge with this wrapper was that the initial query displays a "please wait for your results" page with a unique job number while the query is being run. That intermediate page needed to be parsed first to get the job number, then a new URL with that job number needed to be created. The wrapper also needed to be able to refresh the page every 4 seconds until the results were displayed.

Source Code:

InterProSequenceWrapper Java class

brolland *at* u.washington.edu