Trust and Internet Identity Meeting Europe
2013 - 2020: Workshops and Unconference

TIIME 2015 Session 21: Privacy by design in federated identity management

Conveners: Rainer Hörbe, Walter Hötzendorfer

Abstract: FIM, while solving important problems of remote entity authentication, introduces new privacy risks, like new possibilities of linking private data sets and new opportunities for user profiling. We will discuss privacy by design requirements, transpose them into specific architectural requirements and and evaluate a number of FIM models that have been proposed to mitigate these risks.

Tags: Privacy by Design

Notes

The paper presented is

Rainer Hörbe/Walter Hötzendorfer: Privacy by Design in Federated Identity Management

DOI 10.1109/SPW.2015.24

http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=7163221&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D7163221

Please do not hesitate to contact us: This email address is being protected from spambots. You need JavaScript enabled to view it.

Slides: https://kantarainitiative.org/confluence/x/3wRlB

What are the risks in federated identity management?

Easy to resolve

1. Linkability by introduction common identifiers - two service parties should not be able to know that they are dealing with the same user. The worst thing to do in terms of linkability is to introduce common identifiers.

2. Impersonation by Identity/ Credential Providers or because of weaknesses in SSO mechanism -- here a central instance can observe the behaviour of the user. So an identity provider can see which relying parties the user is interacting with. We should find ways to overcome that.

Non-FIM Privacy Risks:

Observability:

  • Device fingerprinting
  • IP address


Linkability

What are the incentives. I would come to the conclusion to think about things like privacy in the systems we are building.

I would like to show you what we found out privacy and privacy by design means in particularly in the field of identity management.

From general provision reduced to requirements for identity systems.

Privacy risks related to FM, linkability and observability as two general problems, linkability is that basically two SPs should not be able to know that they are dealing with the same IDP. The worst thing to do is to make join identifiers.

Motivation and Scope

-          FIM Projects featuring cross-sector federation (e.g. smart cities, citizen eIDs, B2B across supply chains)

-          How to handle the increased privacy risk


Privacy by design

-          The principle of including privacy in the system development life cycle from the beginning

-          What does this mean in practice?

-          Difficulty: Bridging the gap between abstract principles and tangible requirements for a specific system

-          There is no default methodology but rather the need to act as a privacy engineer during the sys design and implementation process

Privacy by Design: The code is Law principle

-          Preclude that the system can be used in a privacy-infringing way

  1. by architecture
  2. by the Software design
  3. by other technical means (To 50% this means data minimization)

-         Particularly important in data protection and privacy because illegitimate use of data usually happens behind closed doors

Approach to Elicit Requirements

PP - Private Principles

PDR - Privacy by Design requirements

BR - Business requirements

AR - Architectural requirements

Lex(legal source)->PP-> PDR-> AR-> FIM models-> BR

Privacy Principles --> Privacy by Design rules

Next: PbD rules --> Architectural requirements

Privacy by design rules--> Architectural requirements

Existing Implementation --> Architectural requirements

Business Requirements --> Architectural requirements

We limited our scope to the WebSSO use case

Scope is limited to a single sign on use case and we could look at several things but the main thing is linkability.

The main difficulty is to bridge the gap in between legal principles and tangible requirements.

Involvement is system design and implementation is very crucial.

There is no direct match in between identifiers and a data protection directive, or there is no methodology.

A very important part of it is the code is law principle, after 1999 book code. In data protection law we mostly don’t learn about the legitimate use of data and the data subject is never aware of some misuse of her data so that’s why it’s very important to preclude that the system can be used in the privacy layer in the first place. It’s important to preclude the misuse with the architecture and the design. Misuse in data protection is very easy and will mostly be unknown to the data subject.

This means data minimization, and now this was made more concrete.

We started at the top and the bottom at the same time. It was focused on the European law and at the bottom we have different models which solve different problems.

We joined both processes to come up with 8 architectural requirements.

Feedback:

Did you ever have the situation where you had the cool idea in your architecture but were completely disconnected by the lawyer?

Nicholas:

I had that problem but it was mostly fighting with the laws for the Crypto, and I still haven’t figured it out. We are bypassing it and getting strong crypto.

Richard:

It’s an interesting thought. One of the factors that the regulators enforce is to make sure that the servers themselves don’t have the USB ports. They could actually be compromised with threat to do harmful things, to insert viruses via USB ports. The services don’t have the USB ports and the area that has to be defended is lowered.

It’s in the fingerprinting end of the spectrum but who is the certification authority, I don’t understand that.

Walter:

Device fingerprinting is that if you open a website with your browser then the operator can recognize you. The great example is the Frontier foundation which told me that my browser is completely different than the rest.

Rainer:

There is for example an API in browsers to monitor your battery status, which could be used for fingerprinting. The W3C privacy interest group (PING) has a very elaborate document on that.

Comparing PbD Models:

What did we actually do?

There is the list of 5 steps and it somehow fits in our approach, but we call it differently because the main difference is that in privacy by design is to include design from the beginning and then you can see how the privacy is impacted with this.

Rainer:

There are also the business requirements. A business case is a federation for loyalty systems needing a central clearing service. Therefore they need some linkability.

Tom:

I think it’s interesting, to examine how to prevent them or to demonstrate how to always be successful

Walter:

No, of course not. We can’t demonstrate that so that would be a correct approach. You have to do the privacy assessment in the end.

Just briefly, the steps you saw in the picture, on the top there is a table of 8 common privacy principles which can be derived from legislation.

  • From that we felt what that means for the domain of identity management and transposed or deduced from the top table, the 5 principles in the bottom table which are specific for the identity management domain.
  • The table of the bottom moved to the top and what we did next. The provider still needs to talk about the particular user.
  • We came up with 8 requirements that can be implemented in identity management systems.

 

Models: (Rainer)

  • 1. Organizational model, we need technical controls
  • 2. Attribute based credentials,, there are some issues but the actual technology that we used
  • 3. Canadian Model the Late Binding/Federated Credentials, this is not privacy preserving at all but compliant, you wont be sued if you don’t release the attributed
  • 4. Constrained Logging Proxy, WAYF and SURFnet operate a similar thing, you throw away or hide away the logs so if you have no problems with the proxy, (unless there is a man in the middle, owning the proxy)
  • 7. Blind Proxy – the SP can’t be identified by the IDP


There are a number of other models, but those are the most important ones

  • All of them are focused on identifiers and the most use cases have identified attributes.


Question for the group:

Which model was used in your group?

The pairwise email address thing is an interesting thing, it was a part of the 500+ chain.

We were basically talking about targeting IDs that are still uniquely identified. Having an identifier that’s not an email address, and not having facilities that people run the kind of put email addresses in a file or a folder.

Rainer: We don’t do targeted identifiers, because we are releasing the attributes anyway. We don’t privacy here because we don’t have privacy there and so on

Nick: There is a conflicting use case that we mentioned, there are people who specifically don’t want privacy so when we do privacy by design they have to be satisfied as well.

When they are acting as researches and then being able to act synonymously.

We are doing clinical trials involving human beings. When someone interacts with our system they are doing on the record because they are working with human subjects. How do we make sure to minimize the data that we collect? I see potential for many conflicts, that’s where our scientists are coming from and need to authenticate them strongly against these requirements that they are operating on themselves

Tom: A range of use cases must be made available. If you build it by design you can never take that back, there is a must consider the design upfront.

Because there never is one that meets all purposes, we had discussions about these lines, about where to start. By default all these principles were observed. You can start out with the (floor), but what the design must do is to allow the range of these use cases. If you build it by design you cannot take it back.

Walter: There is a solution to that, the system that we built must provide space to any changes.

Even though the user used a synonym, but we have to find a way to figure out who the user was.

Rainer: Most of the solutions are talking about privacy by design are based on pseudonymous identities.

We have pseudonymous identifiers in Austrian DP law, called indirectly identified data, but the typical reaction from lawyers is: that is still private data! Therefore there is no incentive from the legal side to reduce the risk with by using pseudonyms. So we as engineers come up with solutions that are not being taken up by authorities.

Tom: Something that is unique to academics and that is provenance, so when you publish or when you find something to understand who wrote it so that you can trust it. In an academic work that’s very important. There is a big link not for just compliance reasons but that’s why they want to be identified. It is all very critical for science building itself over a period of time, provenance is very crucial.

Walter: We have so many requirements, and it would be so much easier to do that. I don’t see as a problem to tell people what are the requirements.

Rainer: If you think about how to improve the privacy by design: Is there any business requirements for that?

Nick: There are things that we can’t do I n the context of identity federation, that are very difficult to do without relationships so I can’t access an IMR, someplace else via the context federation. It pairs up with the proxy methods you are mentioning. It would be difficult to run them because of fiscal constraints.

Richard: Seems to me that if you have that restriction that you can have a pointer that would direct you to a server that can supply you with the contract, we have it all in place we just don’t use it.