CFP'93 - Computer Matching and Digital Identity
by Roger Clarke
Australian National University
(c) Xamax Consultancy Pty Ltd, 1992-93
The digital persona is the model of the individual established through the collection, storage and analysis of data about that person. Dataveillance techniques operating on the digital persona provide an economically efficient means of exercising control over the behaviour of individuals and societies.
This paper briefly describes the contributions which computer matching is making to the construction of the digital persona, particularly through alternative approaches to establishing the digital identity. Risks inherent in dataveillance of the digital persona are outlined, at both the technical and the social levels.
Dataveillance is the systematic use of personal data systems in the investigation or monitoring of the actions or communications of one or more persons (Clarke 1988, p.499). Two kinds need to be distinguished: personal dataveillance, in which an identified person is monitored, generally for a specific reason; and mass dataveillance, which is of groups of people, generally to identify individuals of interest. Dataveillance differs from conventional surveillance in several respects. One of particular importance is that it involves monitoring not of the individual, but of the individual's data shadow, alter ego, or as it is referred to in this session, the 'digital individual'.
This author prefers the term 'digital persona'. In Jungian psychology, the anima is the inner personality, turned towards the unconscious, and the persona is the public personality that is presented to the world. The only persona that Jung knew was that based on appearance and behaviour. With the increased data-intensity of the second half of the twentieth century, Jung's persona has been supplemented, and to some extent even replaced, by the summation of the data available about an individual.
The digital persona is a construct. In its primary sense, a construct is a complex notion building on multiple concepts to produce a semantically rich cluster of ideas. It is also valuable to invoke the sense of the word in 'cyberpunk' science fiction, in which a construct is "a hardwired ROM cassette replicating a ... man's [sic] skills, obsessions, knee-jerk responses" (Gibson, 1984, p.97). More prosaically, a digital persona is a model of an individual's public personality based on data, and intended for use as a proxy for the individual. The technique is rather threatening, reminiscent as it is of the voodoo technique of sticking pins in an (iconic) model or doll.
The author has recently completed a five-year research project on computer matching. The purpose of this paper is to examine the contributions which computer matching is making to the construction of the digital persona, in particular the digital identity. Implications are drawn.
Computer matching is any computer-supported process in which personal data records relating to many people are compared in order to identify cases of interest. Since it became economically feasible in the early 1970s, the technique has been progressively developed, and is now widely used, particularly in government administration and particularly in the United States and Canada, and Australia and New Zealand. A description and analysis are in Clarke (1993, pp.24-41). See Exhibit 1 for an overview of the process.
Computer matching contributes greatly to the establishment and maintenance of the digital persona, because it brings together data which would otherwise be isolated. It has the capacity to assist in the detection of error, abuse and fraud in large-scale systems (Clarke 1993, pp.41-46). It may, in the process, jeopardise the information privacy of everyone whose data is involved, and even significantly alter the balance of power between consumers and corporations, and citizens and the State.
The Digital Identity
A digital persona is of limited use unless new data can be associated with that already held. To be useful for social control, it must also be able to be related to a specific, locatable human being. Digital identity is the means whereby data is associated with a digital persona.
Organisations which pursue relationships with individuals can generally establish an identifier for use on its master file and on transactions with or relating to the individual. The challenges are greater when multiple organisations involved. There are three approaches whereby a digital identity can be constructed from multiple sources:
- a common identifier;
- multiple identifiers, correlated; and
- multi-attributive matching.
Virtually all computer matching undertaken by agencies of the U.S. Federal Government appears to be based on the Social Security Number or SSN (Clarke 1993, pp.9-19). In Canada the Social Insurance Number (SIN) plays a similar role. This is because the number is widely available, and its integrity regarded by the agencies as adequate. Some agencies, in order to address acknowledged quality problems, use additional data-items to confirm matches.
In European countries it has been the practice for many years for a single identifier to be used for a limited range of purposes, in particular taxation, social security, health insurance and national superannuation. In Australia, an originally single-purpose identifier (the Tax File Number) has recently been appropriated to serve as a social security identifier as well (Clarke 1992).
There are alternatives to a government-assigned number. Physiologically-based identifiers (sometimes referred to as 'positive' identifiers) have the advantage of being more readily relatable to the person concerned. Many forms have been proposed, including thumbprint, fingerprints, voiceprints, retinal prints and earlobe capillary patterns. There is also the possibility of a non-natural identifier being imposed on people, such as the brands and implanted chips already used on animals, the collars on pets, and the anklets on prisoners on day-release schemes.
Multiple Identifiers, Correlated
Where the simple approach is not possible, two or more organisations can establish cross-references between or among separate identifiers. This can be achieved in two main ways:
- the application of sophisticated computer matching techniques; and/or
- the supply by each individual of their identifiers under one scheme to the operator of one or more other schemes. This may be mandated under law, or sanctioned under law (i.e. not prohibited) and required under contract (which if applied consistently by all operators in an industry such as credit or insurance is tantamount to mandating).
An alternative approach is to construct a matching algorithm based on pairs of similarly-defined fields. Typically this involves names, date-of-birth, some component(s) of address and any available identifiers (such as drivers' licence number). Data collection, validation, storage and transmission practices are such that dependence on equality of content of such fields is impracticable. Instead, the data generally needs to be re-formatted and massaged ('scrubbed') and/or algorithms concocted to identify similarity. In addition to government programmes, the private sector uses the technique; for example in the credit reporting and direct marketing industries.
Considerable progress has been made in supporting technologies for multi-attributive matching, including higher-speed processors, storage, vector- and array-processing, associative processing (such as CAFS) and software development tools expressly designed to enable multi-attributive matching (such as INDEPOL) (Clarke 1993, pp.39-40).
Flaws in the Digital Persona
From a technical perspective, there are substantial weaknesses in the digital persona arising from computer matching. For manifold instances of data and process quality problems, see Neumann (19xx). An analysis in Clarke (1993, pp.52-61) distinguishes the following areas in which problems arise:
- data sources;
- data meaning;
- data quality;
- data sensivity and privileges;
- matching quality;
- context; and
- oppressive use of the results.
Social Risks in the Concept of the Digital Persona
Data surveillance's broader social impacts can be categorised as follows (Clarke 1988, p.505):
- Personal Dataveillance
- low data quality decisions
- lack of subject knowledge of, and consent to, data flows
- denial of redemption
- Mass Dataveillance
- dangers to the individual
- acontextual data merger
- complexity and incomprehensibility of data
- witch hunts
- ex-ante discrimination and guilt prediction
- selective advertising
- inversion of the onus of proof
- covert operations
- unknown accusations and accusers
- denial of due process
- dangers to society
- prevailing climate of suspicion
- adversarial relationships
- focus of law enforcement on easily detectable and provable offences
- inequitable application of the law
- decreased respect for the law and law enforcers
- reduction in the meaningfulness of individual actions
- reduction in self-reliance and self-determination
- stultification of originality
- increased tendency to opt out of the official level of society
- weakening of society's moral fibre and cohesion
- destabilisation of the strategic balance of power
- repressive potential for a totalitarian government
Clearly, many of these concerns are diffuse. On the other hand, there is a critical economic difference between conventional forms of surveillance and dataveillance. Physical surveillance is expensive because it requires the application of considerable resources. With a few exceptions (such as East Germany under the Stasi, Romania, and China during its more extreme phases), this expense has been sufficient to restrict the use of surveillance. Admittedly the selection criteria used by surveillance agencies have not always accorded with what the citizenry might have preferred, but at least its extent was limited. The effect was that in most countries the abuses affected particular individuals who had attracted the attention of the State, but were not so pervasive that artistic and political freedoms were widely constrained.
Dataveillance changes all that. Dataveillance is relatively very cheap, and getting cheaper all the time, thanks to progress in information technology. The economic limitations are overcome, and the digital persona can be monitored with thoroughness and frequency, and surveillance extended to whole populations. To date, a number of particular populations have attracted the bulk of the attention, because the State already possessed substantial data-holdings about them. These are social welfare recipients, and employees of the State. Now that the techniques have been refined, they are being pressed into more general usage, in the private as well as the public sector.
The primary focus of the government matching programmes which are being implemented are 'evildoers'. This is not intended in a sarcastic or cynical sense - the media releases do indeed play on the heartstrings, but the fact is that publicly known matching programmes have been mostly aimed at classes of individual who are abusing a government programme, and thereby denying more needy individuals of the benefits of a limited pool of resources. Nonetheless, these programmes have a 'chilling effect' on the population they monitor. Moreover, they have educated many employees in techniques which are capable of much more general application.
The emergence of the digital persona is an inevitable outcome of the data-intensity of contemporary administrative practices (Rule 1974, 1980). The physical persona is progressively being replaced by the digital persona as the basis for social control by governments, and for consumer businesses undertaken by corporations. Even from the strictly social control and business efficiency perspectives, substantial flaws exist in this approach. In addition, major risks to individuals and society arise.
If information technology continues unfettered, then use of the digital persona will inevitably result in impacts on individuals which are inequitable and oppressive, and in impacts on society which are repressive. Our legal systems have been highly permissive of the development of inequitable, oppressive and repressive information technologies. Focussed research is needed to assess the extent to which regulation will be sufficient to prevent and/or cope with these threats. If the risks are manageable, then effective lobbying of legislatures will be necessary to ensure appropriate regulatory measures and mechanisms are imposed. If the risks are not manageable, then information technologists will be left contemplating a genie and an empty bottle.
Clarke R.A. (1988), 'Information Technology and Dataveillance', Commun. ACM 31,5 (May 1988) 498-512
Clarke R.A. (1992), 'The Resistible Rise of the Australian National Personal Data System' Software L. J. 5,1 (January 1992)
Clarke R.A. (1993), 'Computer Matching by Government Agencies: A Normative Regulatory Framework' Forthcoming Comp. Surv. (110 pp.)
FACFI (1976), 'The Criminal Use of False Identification' U.S. Federal Advisory Committee on False Identification, Washington DC, 1976
Gibson W. (1984), 'Neuromancer' Grafton/Collins, London, 1984
Laudon K.C. (1986), 'Data Quality and Due Process in Large Interorganisational Record Systems' Commun. ACM 29,1 (Jan 1986) 4-11
Neumann P. (19xx), 'RISKS Forum', Software Engineering Notes and netnews, since 19xx
Rule J.B. (1974), 'Private Lives and Public Surveillance: Social Control in the Computer Age' Schocken Books, 1974
Rule J.B., McAdam D., Stearns L. & Uglow D. (1980), 'The Politics of Privacy' New American Library, 1980
Return to the CPSR home page.
Created before October 2004