Here follows a general report from presentations and Q&A time following them.
CADC services all have ReST API functionalities and use the authentication in a common way. Credential delegation isused to move among the services and the authorization is managed through ownership or group membership (role) assesment. Authentication protocols accepted are: anonymous access, user/pwd BasicAA, X.509 and cookies/tokens. Authentication grants the user with a larger set of resources and capabilities with respect to the anonymous one. Brian slides report a view of the number of accesses in one month, subdivided by authentication solution. On the client side CADC python libraries use user/pwd or .netrc stored credentials or proxy certificates. The CADC astroquery package use cookies/tokens solution. Proxy certificates are provided to the user at registration level using an internal CADC CA, but users can also provide their own certificates if they have one. On the server side, services are 100% Java. Identities are also managed on the server side and multiple identities for the same user are merged and linked toa unique identity/credential set (user asks for merging). Merging identities is helpful in providing full access to all the resources pertaining to a user. Authorization is handled through roles represented by groups. The are four predefined roles for the groups: owner, read, read-write and admin. Identities are stored in an LDAP database including attributes. The main userID and a proxy certifiate are created at user's registration phase. Future evolutin of the CADC infrastructure sees moving the authentication to a cloud solution (OpenStack Keystone) with replication of the LDAP information content. User access to virtual machines is foreseen and OAuth2/OpenID-connect will be added to the system. From the Q&A time emerges that multiple authentication metods over publicly accessible data is meant to provide access to different kind of clients with different means of authentication. Also cookies manipulation (security concern) is not a problem because the cookie only has the identity or distinguisehd names to use with the A&A services. Whether credentials translation services could be a solution the credential delegation solves this problem at CADC when cookies or tokens are not in place. In principle a user with a valid certificate doesn't need a CADC account to call CADC services, credential delegation filling in the various calls after the initial access. Clarification on OAuth (the protocol) and OpenID-connect (token service) was discussed/clarified.
The talk reports a (IVOA) Registry view on VOSI capabilities, focusing also on its connection with the way resources requiring authentication should be reported and described there. The talk is connected to the TAP-1.1 authenticated endpoints issue, for which it tries to provide a “DALIInterface” exit strategy from current situation. Talk reports the relationship among:
Resource : Capability : Interface : URL
Interface is the place where authenication is dealt with. URL is the level where, e.g., mirror solution sould be in place. While currently (resources in registry) the relationships among the above are of the 1:1:1:1 type, which makes it easier for the current issues and transitions, the scenario seems now changing to an N:M:Q:P set of relations. The talk investigated the /capabilities and other endpoints currently available, including the /examples and the /sync-/async ones. The /capabilities is a 1:1 to the service, providing a list of the capabilities for that service. However uses that endpoint to expose a lot of stuff. The /tables endpoint however varies with Interface, which is fine being the authentication handled there. Not clear where /availabilty belongs to, but probably the best choice is to have it at URL level, where also mirroring solutions should go. The /examples are something directly related to a protocol, so, if a Capability depends on a standard-Interface, then the /examples varies with Capability. Finally, URL should be single per service, so to let people share base URLs properly and safely. In conclusion the talk explains why current VOSI specification for Capabilities is wrong, also considerig the fact that it works on top of a multi-level VOResource that makes it a complicate model. Despite this fact TAP specification actually represents a good way to do the work, so the idea is to try not to broke that specification (i.e. in the current minor revision). A problem is that TAPRegExt ab/used the ParamHTTP interface type, which is wrong being the TAP base URL not one fitting a parametric query (having sibling path parameters in first place), plus VOResource has never been chnged accordingly to the path followed in the real world implementations. The proposal coming out of the talk consist mainly in the introduction of a DALIInterface sanctioning the TAP Interface, the alternative seeing as an abuse of the “title” attribute to collect bundles of endpoints pertaining each other (this is what TOPCAT currenty implements based on the curent WD). So the plead is not to break TAP and work on the model instead. Of course there is a need for some changes with current specifiction if this exit strategy is adopted: VODataService needs a revision to introduce the DALIInterface, also VOSI and DALI need revisions, claimed to be minor, and DataLink and SIAP-2.0 require a revision too to handle registration solution change. Of course TAPRegExt needs to change and use DALIInterface. Q&A time clarified impact on DataLink and SIAP-2.0 and the not clear view on /availability, but that depends also on its usage from the operations point of view. It was also considered that the solution may need some investigation for VOSpace and SODA. In the discussion followed a question was also arisen whether to show or not something that cannot be accessed but having authenticated.
Overview of the active services at CADC and how they use VOSI /capabilities. In short, there is a mapping 1:1 on {resourceID:standardID:securityMethod} to URL. The presentation continued describng the vairous requirements and usage scenarios that make A&A such a complex task to handle. Following, an overview of the three prototype solutions put in place in the last year or so to fulfill the TAP authentiated endpoints requirements:
Number 3 is what was currently implemented at CADC (being the one defined before College Park interop) and meant a 1:1 mapping on {resourceID:standardID:interfaceType:securityMethod} to URL, thus adding one element in the schematic “registry” search for geting to the service. Given the solution 3 was ruled out in College Park, implementation fell back to solution 2. So the question is: where to go now? Usually service users ask: what URL to {file}? What URL to {service}. This give clearly the idea to try to use one URL only and access it whatever the authentication method you may use. This solution actually works, but excludes having a one-for-all that can handle the BAsicAA protocol because that protocol uses a different mechanism than the other. Even hacking an httpd for an unsolicited user/pwd authentication doesn't seem a palatable solution to this. Discussion followed on about the need for a unique interface, and the answer there is to lead to a unique URL response in registry search. BasicAA emargination is however potentially problematic for data providers (though it may depend on where it as applied, e.g. at file retrieval time). Using proxy IdP wouldn't solve the issue either, being a challenge/response scenario. Extra metadata would anyway be needed on the registry side to cope with providers alongside the protocol used for the authentication itself. The single interface solution requires an erratum to DALI to solve the “siblings to /capabilities” rule in finding the various endpoints out of a base URL.
The talk presented some general principles on designing a tool and taking into account the A&A scenario. A tool needs to retrieve a bundle of endpoints to atach to and needs to be able to find a URL to start from directly. One issue is on how to prompt the user and let him make the right choice, e.g. the IdP (or domain) alongside the securityMethod protocol attached to the interface. Another one is that [G]UI tools cannot rely on the same WAIF solutions that can be used with browser based applications. One important principle stated is to take that we “Don’t make the (unauthenticating) many suffer to support the needs of the (authenticating) few”. The subsequent discussion touched on the way to report IdP/domain to user from service, the usage of translation services and the point on wheter kerberos or ssh private/public keys are considered in the current scenario. The risk of security phishing through application interfaces as not to be discarded, even if it is felt there is little that can be done about it but teaching the users.
A demo session led to the description of the RAP (Remote Authentication Portal) solution that has been prototyped for the SKA Regional Center pilot prototype. The service includes an identity merging solution and supports a set of protocols and IdP authentictions. More information can be found in the doc attached to the schedule. RAP wouldn't work outside a browser (tough the ECP optional extension to Shibboleth might). The metadata internally used in RAP may be useful to identify those useful to registry fill in.
ESO archives maintain both a browser based scienceportal and full programmatica access to its resources, tha latter is based on VO standard technologies. The base architecture works on 2 TAP services plus SSAP and DataLink solutions. Ongoing project for ESO archives involves privacy policies at various levels: private data downloads, data and metadata privileged access (called “special” data) and user authentication to allow TAP_UPLOAD persistent solution. For the “special” data authenication seems ok but authorisation looks like being more complicated, especially if one takes into account subquery mechanisms. Discussion on this last topics takes place considering row-level grants directly handled at RDBMS level or complete query parsing and changing. Dedicated “view” tables seem not a feasible solution considering the number of rows involved, even though the mapping to standardised ivo.obscore and *.TAP_SCHEMA could have helped n this case (relaxing the constraint on the first one?). Another point raised by the presentation is about the persistency for tables in the TAP_UPLOAD mechanism. The idea is not totally new (even if not dealt with in the current specification), CADC prototyped a modified /tables endpoint making it a PUT-able HTTP endpoint. Also ESAC has a solution for it (maybe some libraries directly) given persistent upload is in place for users of their TAP+ implementation. Finally a question has been asked on the expected deadline for TAP-1.1 authenticated endpoints solution needed to solve ESO use cases. The answer is that from the specification point of view it shouldn't take longer than Paris. More diffcult the task on the client side, where TOPCAT currently allows BasicAA and TLS-with-certificate. ESO is using pyvo and some small scripts, maybe it should thing of an ESO astroquery package.
An overview of a test prototype architecture for an SKA Regional Center was presented, embedding the solution provided by the H2020 AENEAS projectand taking into account the GMS IVOA WD. SKA RCs will be a combination of data storage and computation facilities. In the prototype the main means to get to the user's data is by means of a proposal_id, the one used in planning observations. The use case for the prototype was shown, including how the interface solution from the GMS draft would fit in it. From the proposal_id the user is then authenticated, checked for authorization on the data s/he's requesting and redirected (email) to a (virtual) server machine for which user and password are provided and data are attached. One of the requirements for the prototype has been to give the ability to access datasets whose complexity sits in having the needed data resources for a user in datsets located over geographically distributed locations (i.e. more than one Regional Center). The foreseen evolution of this prototype will be in really standardasing the access to A&A services and trying to show it works on a distributed solution having the IA2 (Trieste) and IRA (Bologna) locations for the actual datasets. Q&A time discussed on how this prototype fits the (foreseen) AENEAS distributed environment scenario and requirements. Also, the solution of composable containerization with graphical interface through a web browser was reported, evene if the reported case (CADC for ALMA) is considerd a prototype one.
STOA is a web application to help building up service out of workflows. The workflows are described using the Common Worklow Language (CWL). The requirements include keepig track of the provenance of the workflow and letting results be availablefor futher re-use. Workflow execution is preserved as row in a DB including status, the worktables are connected in a relational style. STOA manages 3 roles in worktable access: owner, collaborator, reader. The former is the computation initiator, the latter is a simple reader of the input/output. Collaborators can flag and comment the jobs. Worktables are searchable through VO Cone Search or downloadable in block. Future work will deal with authentication, needed to release part of the content as public, and with STOA description as a VO resource.
Overview of the Yabi workflow management web tool at IA2, with demo of its usage and description of the initial goal (GAPS project) of setting up the workflow system for the scientific users. Authorization in the workflow environment has to consider access to data, computational resources and tools to operate. Tabi does this using the backend, credentials and backend credentials architecture. A backend is something that runs a tools, provides file access and allow files filtering and it can use varioud protocols to provide these resources. A credential is the way (encrypted) that Yabi uses to attach user credential keys or other means. The backend credential is the link between the above. The resources (backends) available are in terms of atmoic tools, grouped in toolsets available to group of users or toolgroups to collect together tools by specific usage. It is foreseen to describe the workflows in CWL, possibly to share pipelines to geographically distributed Yabi instances. Work to be done includes accessing remote data resources, possibly remotely located and how to run on external public data Yabi or user provided tools.
Overview of the core features of the EGI service for AAI and the Check-in service. It allows for multiple credentials/protocols for the user and linking among them. Implements the AARC blueprint. Authorization (entitlement) information is a formatted URN based on AARC guidelines: urn:mace:egi.eu:group:<group>[:<subgroup>*][:role=<role>]#<group-authority> It will be integrated in the EOSC-Hub AAI solutions. Discussion touched the topic of programmatic access using this service and which, if any, other communities have looked into this to contact them for understanding the way they use it. A question was also raised with respect of anonymously accessing the resources. This is not prevented, however accounting is considered having nonetheless an important role. Being this an EU based effort, global connection to it depends on the funding/governance among the possible partners. EOSC governance itself is not yet in place.
First day open discussion report. This mainly touched the following topics:
The discussion started from the question: what are the advantages and drawbacks of having data behind authentication? In terms of tooling and services CADC does not apply a distinction in the environment for public and private data. This is the reason that led to having VO standard based tooling behind of all this. However the IVOA ha generally been seen as the “public” data framework and one concern may still stay there in having authenticated services/resources because it could lead to no-release of data to the public without authenticating first. This is perfectly expressed by the statement (see Mark Taylor's presentation) that “many unautheticated shouldn't suffer from few authenticated ones”.
The discussion continued on investigating on how one should label data resources for filtering based on authentication, whether it means allowing metadata to be in place and visible even if the actual data are private, what to do with tablesets and at what step authentication should step in. Along these line it was reported that DOI metadata must be public, while the actual data behind can be private. As for the length of proprietary period, it is clear that it depends on cases: shorter for telescope observations usually have longer detention period for project related data and, in multi-messenger scenarios, the data segregation may be used to limit collaboration anf avoid potential competition while reaching the publication goal.
Next topic dealt with the implications of using commercial clouds for sceintific research. One point was about the costs of this solution, not always easy to evaluate, in particular when comparing to institutional services whose costs may be hidden or difficult to quantify. Another was about how many risks to interoperability this will lead to if commercial interfaces take the lead on community standard developed ones. What is called “vendor lock in” can happen at any step, like making it unfeasible to retrieve resources that were easy and cheap to load into the cloud itself. One thing to check for sure is how much open standards are in place and guaranteed within commercial clouds (e.g. Amazon uses PostgreSQL based DBaaS).
As last topic of this section the attendees focused on trying to identify what metadata would be needed by the registry to allow client/server proper interoperable authentication. usually authenticators are human readbale interfaces. Discussion fell back into the solution in merging identities and credentials systems (e.g. certificates) and the idea of credentials translation services or proxy solutions. A proxy solution for interoperability would be having an organization committing in running a centralized access service to allow authentication.
The second day morning discussion was devoted to two specific issues:
The discussion on the TAP-1.1 specification, directly related to its authenticated endpoints issue took start from the presentations of the morning of day 1. Reviewing mainly what was said there by the various stakeholders, it turned out that the agreeable solution would be staying on the ParamHTTP interface solution for TAP-1.1 and let the transaction to a better standard interface to the TAPRegExt recommendation revision (introducing the DALIInterface specification, to be written). The solution will require anyway an erratum to VOResource-1.1 to allow multiple securityMethod(s) and it will have some flow into the RegTAP-1.1 specification. A solution to allow anonymous authentication listed alongside other methods (or leaving it unspecified) was not found, it may need a change in SSO or some simple statements on best practices in the involved recommendations.
This discussion was meant to recover comments and decision about the wording to be used in ADQL-2.1 about the REGION function in connection with an xtype=“region” in DALI (to fit into the DALI-1.2 revision). The discussion saw a review of the DALI-1.1-Next page that lists the basic requirements for new xtypes in DALI. Discussion focused mainly on the polymorphic and complex/disjoint shapes to be (or not) allowed to be described with the help of an xtype, given the primitive types and arrays cannot unambigously identify them. The ADQL related region (x)type falls in both the polymorphic and complex types, allowing union and intersection of analityic poligonals. The discussion touched many points and many connections also to other VO standards and, in particular, the real need for complex shapes when, for many discovery use cases, the tessellation solution may be a quicker choice. The discussion was not really conlusive on all the points and mainly saw an agreement in having in DALI-1.2 the region, shape and multiinterval xtypes and allow for a smooth path with this respect in ADQL-2.1.
The afternoon of the second day saw discussion upon:
The discussion in session 6 started with a panel overview on credential delegation, starting with questions like how the users can use their credentials to run the jobs, how to hide the users the complexity of moving their credentials around (like in the CADC proxy certificate solution), what do the users do need to know about their certificates and the CAs releasing them, what impact has this in their ability to use the certificates afterwards. It was again stated that a solution like the CADC one is what the users might find more confortable in having their research lives simplified. However the proxy certificates from CADC are not usable/interoperable outside the CADC environment. As an alternative to certificates OAuth tokens are largely gaining ground in authentication and authorization. They also can be re-used but it is consiedered unsafe because the longer they live, the higher tha chance that control on the token is lost. Thus, setting the scope of the token is a mandatory requirement (up to the level of single scoped tokens). Form the point of view of interoperability/delegation more work needs to be done on OAuth tokens and their delegation mechanism before having a clear view. There are however astrophysical research infrastructures investigating them (like LSST and STScI). On the general topic of credential delegation one concern is also about the delegation of credentials outside the institutional scope. Again the topic of programmatic access and federated or delegated authentication was arised, with a mention to the Shibboleth-ECP solution, that is, however, optional to SAML IdPs. Credentials linking/merging on the providers side is considered by most of the attendees the only way to be able to attach authorization properly to the resource-role relationship.
The final part of the meeting hosted a splinter about starting a revision of the IVOA DataLink Recommendation.
Pat Dowler (DataLink spec editor), Alberto Micol, Mark Taylor and François Bonnarel participated to this splinter.
FB proposed a revision of the spec by presenting a preliminary new draft. The content and rationale of the changes had already been presented in an IVOA note (Bonnarel: Recent DAL Protocols Feedback) and a presentation at College Park interop meeting (Bonnarel: Towards DataLink-1.1). The proposed changes in the text have been discussed during this Trieste splinter and a new version has been disseminated to the authors, the splinter participants and DAL chair/vice-chair on February the 5th.
[back to main Meeting page]
[back to ASTERICS DADI WP4 main wiki page]