Library Juice 5:37 - December 19, 2002


Special Issue - Total Information Awareness

This issue is based on postings to Declan McCullough's "Politech"
list, archived at http://www.politechbot.com


Contents:

  1. What's so bad about Total Information Awareness? by Ben Brunk
  2. Replies to "What's so bad about Total Information Awareness?"
  3. Data miner replies to Politech, says TIA can ID terrorists
  4. More on data mining, TIA, and how to ID terrorists
  5. EFF's Brad Templeton and Norm Singleton on TIA's true threat
  6. Links out on Total Information Awareness and privacy in general


Quote for the week:

"How often, or on what system, the Thought Police plugged in on any
individual wire was guesswork. It was even conceivable that they watched
everybody all the time. But at any rate they could plug in your wire
whenever they wanted to. You had to live - did live, from habit that became
instinct - in the assumption that every sound you made was overheard, and,
except in darkness, every movement scrutinized." - George Orwell


Homepage of the week: Karen Coyle
http://www.kcoyle.net/

________________________________________________________________________top


1. What's so bad about Total Information Awareness? by Ben Brunk

Date: Mon, 09 Dec 2002 23:57:16 -0500
From: Declan McCullagh <declan[at]well.com>
To: politech[at]politechbot.com
Reply to: declan[at]well.com

---

Date: Mon, 09 Dec 2002 22:34:13 -0500
From: Ben Brunk <brunkb[at]ils.unc.edu>
To: declan[at]well.com
Subject: Debunking TIA

Declan,

I'm in the middle of writing a dissertation relating to online privacy, but
I have been completely sidetracked by the recent discussion over the Total
Information Awareness program authorized by the Homeland Security bill that
just passed into law. All I've seen so far are a lot of reactionary
editorials written by people who haven't put an ounce of effort into
analyzing the proposed system. They seem infatuated with the TIA logo, its
slogan, and Poindexter. I have read, with avid fascination, all the dire
predictions and scary stories about a new Big Brother system spearheaded by
a felon who managed to avoid accountability. What I have yet to see is a
rational analysis of the idea itself from someone who knows something about
computers, databases and statistics. I hope to fill in that gap as best I
can, though I'm sure there are experts out there with even better
background in the appropriate research fields.

From what I have been able to find out about the TIA program, it is
supposed to be a massive computerized dragnet that culls information from
dozens of different sources and is intended to locate potential terrorists
so that government agents can scrutinize them more closely. This system
will draw data from sources such as credit reports, bank records, airline
reservation systems, police records, gun purchase records, and many others.

Many of these sources of information are private databases owned and
maintained by the corporations that rely on them. Even if they were all
implemented in say, Oracle, it would be difficult to match up records to
any reliable degree. Who knows if the John Poindexter in one database is
the same as Jon Pointdexter in another? The social security number, which
is apparently the holy grail of database keys, is not necessarily going to
help since many of these companies did not collect it or use it as a key.
Name and address might make a good cross referencing key, but people move
all the time, and I get three catalogs from a company that I purchased
items from three times-even their internal database is not sophisticated
enough to detect slight differences in spacing or my apartment number using
a '#' instead of 'apt' or 'apartment'. This is just inside one
organization; we're not even trying to connect any dots yet. It will be
easier to match records kept by the government, especially if they include
SSNs and fingerprints. However, errors in government databases are well
documented (although not readily admitted to). Those systems contain large
numbers of errors, and even when errors are located and fixed, they have a
nasty tendency of recurring when data is shared or re-shared. If you fix
an error in your Experian credit report, but not TRW, often times, the
Experian error will reappear. Many people play this sort of "whack a mole"
game for years.

Another matter that no journalist has touched on, and the one I think is
the biggest nail in TIA's coffin, is the matter of database error are
several orders of magnitude higher than the number of terrorists in the
world. All databases contain errors. Data culled from multiple,
heterogeneous sources is going to have lots of errors. I don't have
current estimates on the average expected error rate in a database, but
let's suppose it is 5%. That means that in any given database, 95% of the
data is right and 5% of it is junk. Garbage in, garbage out. Errors such
as misspellings, flipped bits, juxtaposed numbers, and transaction entries
that never took place or were unintentionally duplicated or omitted. Five
percent isn't a big deal until you look at it on the scale of what TIA is
proposing. There are approximately 300 million people in the United
States. Those 300 million people are very busy consumers, and their paper
trail is enormous. There are trillions of transaction records, log
entries, and records that TIA would have to amass, standardize, and then
examine. Even if the government buys all the necessary computing power and
the very best staff, the government can't do anything about randomness. The
5% expected error rate is the monkey wrench in the works. 5% of 300
million is 15,000,000. Multiply that number by however many data points
will be looked at. Say 500 data points for each person. Now we are
looking at 300 million times 500, or 150,000,000,000 data points. 5% of
that number leaves us with 7,500,000,000. Seven and one half billion data
points if they want to look at every American. Worse, this is not a
one-time scan. For any hope of success, they would have to look
longitudinally. That is, every year, month, day, hour, whatever. Some
indications of terrorism are very subtle: People who plan terror don't
just run out and buy their entire list of bomb making ingredients in one
day and then book a flight. Terrorists are slow and methodical. They plan
over months and years. So what we're looking at here is 7.5 billion data
points examined day in and day out for years and years. With a 5% error
rate, the number of false positives is outrageous, no matter what analysis
technique used (and any analysis technique will have its own error rate).
There is not enough manpower in the entire federal government to possibly
track down every lead generated, even if much of that work is automated.
With each passing day, homeland security will drown a little more in a
hopeless pile of randomly generated false leads that grow even on weekends
and holidays.

Let's suppose there are 1,000 terrorists hiding out in the USA, waiting to
strike, which I personally think is a greatly exaggerated number. We know
from the actions taken on 9/11 that these people are fairly cunning. They
know how to hide from the system and how to hide in plain sight. They pay
in cash, or they buy what they need by proxy, and they don't act any
different than anyone else. Like the millions of illegal immigrants in the
US, terrorist operatives are good at using social networks to "fly below
the radar" and subvert the system. One thousand people is a lot, but 1,000
out of 300 million is 3.33 * 10^-6, or .000033%. In other words, TIA would
be looking for a miniscule fraction of 1% of the population in their
database, the exact people who are going out of their way to escape
detection. With an error rate of even 1%, detecting such a tiny fraction
would be impossible. You would not be able to separate the signal from the
noise, no matter what techniques were used. Pollsters run into this
problem every election season when the 'margin of error' rises to a level
greater than the projected differential between the candidates. 3% margin
of error in a race where the candidates differ by 1% is "too close to
call." The same problem exists for scanning all airport baggage, but that
is fodder for another day. The only way TIA would work is if some high
percentage of Americans were terrorists-20%, 50%, whatever. Only then
could there be enough comparison data in both sets to draw testable
conclusions from and be assured that those conclusions were not just random
error phenomena.

Let's look at this on a much smaller scale: Suppose the system worked well
enough each day to render a list of 10,000 people, one (1) of which is an
actual terrorist (unbelievably good odds for the government). The
government has a .0001% probability of successfully picking the terrorist
each day (using this system alone). Could the FBI/CIA/NSA/whatever even
investigate 10,000 people with other techniques carefully enough each day
to locate the one terrorist? Could they do it in a month or a year? I
suppose the government could err on the side of caution and detain large
numbers of people, place them in custody, and hold them indefinitely
without due process until certain that they weren't terrorists. But that
action presents nightmarish logistical and humanitarian prospects. The US
prison population is bursting at the seams with an all time high of two
million. There would have to be enormous concentration camps for the
millions of suspected terrorists who would be detained until their
innocence is proven. That begs the question: Is it even possible to prove
you are innocent in the current legal climate? The Red Scare (and the more
recent FBI watch lists) has already taught us the folly of black lists and
unsubstantiated accusations.

Lastly, data mining as a useful technique has been thoroughly debunked. It
never lived up to its promises. This is why you don't hear much about data
mining in the CS and IS literature these days; what of it that is left has
morphed into the more esoteric "knowledge management" or KD. Like AI, it
turned out to be quite a bit more difficult to do than expected and has
been largely abandoned. Had anyone in the government actually bothered to
read any of the literature, they would already know this.

All in all, I can't see how TIA will do anything except harm innocent
people and create new jobs for bureaucrats. Any numerate person who spends
five minutes thinking about what is proposed will come to the same
conclusion. If our system is going to become this arbitrary, there are
going to be an awful lot of lives ruined in this country. I fail to see
how the TIA approach could do anything positive for the war on terror or
for America in general. It will eat up resources better spent on more
proven and acceptable approaches. In fact, such a data-drive approach
might actually be more successful if it simply took a random sampling of
the population each day.

My hope is that this editorial will awaken those who are even more skilled
in computer science, statistics, game theory, etc. and that they find the
courage to speak up so we can put the brakes on the wasteful and
destructive blind alley called TIA.

Benjamin Brunk

-------------------------------------------------------------------------
POLITECH -- Declan McCullagh's politics and technology mailing list
You may redistribute this message freely if you include this notice.
To subscribe to Politech: http://www.politechbot.com/info/subscribe.html
This message is archived at http://www.politechbot.com/
Declan McCullagh's photographs are at http://www.mccullagh.org/
-------------------------------------------------------------------------
Like Politech? Make a donation here: http://www.politechbot.com/donate/
Recent CNET News.com articles: http://news.search.com/search?q=declan
-------------------------------------------------------------------------
________________________________________________________________________top


2. Replies to "What's so bad about Total Information Awareness?"

Date: Thu, 12 Dec 2002 23:07:22 -0500
From: Declan McCullagh <declan[at]well.com>
To: politech[at]politechbot.com
Reply to: declan[at]well.com

Other Politech messages:
http://www.politechbot.com/cgi-bin/politech.cgi?name=poindexter

---

To: declan[at]well.com
Subject: Re: FC: What's so bad about Total Information Awareness? by Ben Brunk
From: "Thomas A Giovanetti" <tomg[at]ipi.org>
Message-ID: <OF2821FA80.E4B93BD6-ON86256C8B.001EE99E[at]org>
Date: Mon, 9 Dec 2002 23:48:35 -0600

If Ben is so bright a researcher, he should know better than to make such a
glaring error in the first sentence of his post.

TIA is NOT authorized in the Homeland bill. It was authorized as a DOD
(Dept. of Defense) appropriation.

In fact, the Homeland bill contains an explicit provision to ban anything
like the TIA from ever being implemented.

And that's good.

Now we need to get TIA cancelled from the DOD budget.
_______
Tom Giovanetti
President
Institute for Policy Innovation (IPI)
http://www.ipi.org

---

Date: Tue, 10 Dec 2002 16:53:01 +1100
From: Nathan Cochrane <ncochrane[at]theage.fairfax.com.au>
Reply-To: ncochrane[at]theage.fairfax.com.au
Organization: The Age newspaper
To: declan[at]well.com
Subject: Re: FC: What's so bad about Total Information Awareness? by Ben Brunk

Hi Declan

Much of what Ben writes has merit. To paraphrase:

  1. Private companies use different, often incompatible technologies.
  2. There are usually several instances of the same person in different
    databases held in the same company.
  3. There is no easy way to capture an individual's virtual identity across
    multiple databases short of mandating use of a national ID card and number
    at every transaction.
  4. Lives could be ruined by poor use of information.

The drive by groups such as OASIS, Microsoft, IBM and Sun to deliver
eXtensible Markup Language (XML)-- a single rail gauge for online
information sharing -- makes linking systems easier. Although it will be
several years before this really takes hold because taxonomies still have
to be ratified, software coded, systems migrated etc. But already law
enforcement is looking at this area closely as a way to easily find the
people and legal documents it is looking for.

Legal XML lawful intercept technical committee announcement
http://lists.oasis-open.org/archives/tc-announce/200211/msg00007.html
"If emergency or exigent conditions exist ... judicial issuance of an
authorizing instrument (warrant) can usually be altered by the LEA (law
enforcement agency) using another instrument coupled with a posteriori
judicial or administrative action."

MORE:
Legally speaking, it's a brief transition
http://www.theage.com.au/articles/2002/12/09/1039379779706.html

Just because an investigator can't be 100 per cent certain a particular
identity is the one s/he is looking for, doesn't mean they can't be more or
less sure, at least so far as to continue an investigation. This raises a
bigger question, how much information will be winnowed out, and what
processes will exist to maintain privacy during this phase? By their very
existence, these sorts of fishing expeditions are harmful to a free society.

Governments around the world already have national id card and number
systems. In Australia it is the tax file number for individuals and BAS
number for business. You can't transact without using these numbers, all of
which is fed into government systems accessible by LEA here and in the US.
In the US there is a drive to do the same thing with drivers' licenses. A
single unique key is not necessary when you have a range of keys that can,
in unison, provide a high level of confidence.

DARPA is moving ahead with its plans to fund TIA. A few hours ago I spoke
with a member of the executive management team of supercomputer maker Cray
Inc. Cray is one of five companies each receiving $US3 million to fund a
feasibility study into developing a petaflop computer. Big applications for
this sort of computer are to track in real time the movements of people,
understand how biochemical agents spread in populations dutring
bioterrorism attacks, break complex crypto and trawl through signal streams
using semantic forests to find patterns.

Semantic forests article by Suelette Drefus
http://www.underground-book.com/articles/CyberWireDispatch-1999-11-30-Semantic-Forests.php3

And just because a failed implementation would destroy an innocent's life
is no reason for a government not to do it. The authorities would see that
an arrested, imprisoned or executed innocent is a small price to pay for
continued national security and the lives of millions, or the interests of
a select few.

---

Date: Tue, 10 Dec 2002 10:42:28 -0800 (PST)
From: Ben Polen <benpolen[at]yahoo.com>
Subject: Re: FC: What's so bad about Total Information Awareness? by Ben Brunk
To: declan[at]well.com

Declan,

Reading Ben's post reminded me of Terry Gilliam's movie
"Brazil." In the noir sci-fi flick, the government is
basically running its own version of TIA and a bug in the
systems (literally, a fly interferes with the operation of
a typewriter) leads to the arrest and prosecution of the
wrong man. Its quite an amazing movie overall, but the dire
warnings about a surveillance society (and a powerful state
supporting it) are even more important now, in our USA
Patriot/TIA/Homeland Security world. The director's cut of
"Brazil" is worth another viewing for all Politechnicals.
Seriously, don't even bother with the edited version.

-Ben

PS feel free to post this if you do a follow up to Brunk's

---

Date: Mon, 09 Dec 2002 21:29:50 -0800
To: declan[at]well.com, politech[at]politechbot.com
From: Lizard <lizard[at]mrlizard.com>
Subject: Re: FC: What's so bad about Total Information Awareness? by

Ben Brunk

At 08:57 PM 12/9/2002, Declan McCullagh wrote:

>---
>
>Date: Mon, 09 Dec 2002 22:34:13 -0500
From: Ben Brunk <brunkb[at]ils.unc.edu>
>To: declan[at]well.com
>Subject: Debunking TIA
>
>Declan,
>
>
>
>Many of these sources of information are private databases owned and
>maintained by the corporations that rely on them. Even if they were all
>implemented in say, Oracle, it would be difficult to match up records to
>any reliable degree. Who knows if the John Poindexter in one database is
>the same as Jon Pointdexter in another?

Bingo.

Ever see 'Brazil'?

Tuttle, Buttle, what's the difference?

The thing is, no one is going to do a rational analysis and say "This can't
work." If they do, they'll be ignored. Government isn't about doing things
that work. Government is about looking like you're doing something. Simply
honestly saying "There is nothing you can do to stop a determined madman
from killing innocent people. Period. That's the price you pay for freedom.
When people say that the tree of liberty must be watered in blood, they
don't just mean the blood of those who volunteered for the job. A free
society is one in which there is danger. Deal with it, or move to North
Korea." will not get you re-elected. Promising false safety, out-and-out
lying, will get you re-elected, by a wide margin.

Nothing can stop the 'Homeland Security' juggernaut, because of the nature
of politics. We'll just have to wait for the next revolution.

---

Date: Tue, 10 Dec 2002 00:10:05 -0800 (PST)
From: Marc Hedlund
To: Declan McCullagh <declan[at]well.com>
Subject: Re: FC: What's so bad about Total Information Awareness? by Ben
Brunk

Declan,

The criticism I would make of Total Information Awareness (TIA) and
the Department of Homeland Security (DHS) in general is that they are
agressively centralized solutions to an agressively decentralized
problem. I would feel better about our government's efforts to fight
terrorism if I heard much more discussion of decentralized solutions,
and an economic and organizational plan that blended centralized and
decentralized approaches to the problems of terrorism.

The vast majority of discussion around government response to 9/11 has
framed the question as, "How can we change the Federal government to
prevent terrorist attacks?" The DHS is a Federal entity composed of
existing Federal entities. Its efforts, and likewise the Pentagon's
TIA proposal, have (in public discussion at least) been described as
aiming to ensure information is shared between sources, analyzed at a
single desk, and acted upon by a central enforcement agency. In other
words, these efforts aim to centralize information about potential
terrorist acts.

Certainly these are approaches worth using. The INS sending Mohammed
Atta a letter to his Florida address months after 9/11 can only
provoke a wish for a better head on the shoulders of our national
bureaucracy. But do we really believe that terrorists -- who
presumably have heard about the DHS -- will act in the future in any
way that would trigger DHS or TIA attention?

We know these terrorists are determined and willing to spend enormous
time and resources preparing a plan. Terrorist groups, we're told,
plant "sleeper cells" in our country years before an intended attack,
and these cells work strenuously to avoid detection or contact with
other cells. Assume that we go ahead with a TIA-type program, or even
just the DHS as planned, and that we are now able to monitor and
correlate border entries, large cash transfers, anomalous airline
ticket purchases, and whatever other data might alert a central
authority of terror plans. Does this really prevent terrorism? Do we
believe that no terrorist could ever enter the country without
creating a record, bring gold or drugs or something else to convert to
cash on the black market, buy a round-trip ticket rather than a
one-way ticket, and so forth? It seems obvious that even if
centralized data collection, analysis, and response help the problem,
they certainly do not solve the problem. A determined attacker -- as
the 9/11 attackers certainly were -- will do what it takes to avoid
TIA triggers.

Furthermore, is it really the best thing for the country for the FBI,
the CIA, and now the DHS to focus so intently on preventing terrorism
from Washington? I was taken aback to read in the November 21st New
York Times that

...the [FBI]'s commitment to nonterrorism cases that were once
staples of the bureau dropped significantly in the months after the
Sept. 11 attacks. The number of agents working narcotics cases
dropped 45 percent, bank fraud cases dropped 31 percent and bank
robbery investigations dropped 25 percent, according to the Justice
Department figures, even though the number of reported crimes in
some cases went up.

I can only wonder what has happened to the CIA in parallel. The FBI
existed for good reason prior to 9/11 -- fought serious and difficult
crimes prior to 9/11 -- and yet it is now being criticized roundly for
not dropping its earlier priorities more quickly and completely.
(Senator Charles Grassley of Iowa was quoted in the same article as
saying, "Old habits die hard at the FBI.") We are debilitating the
prevention of crimes that not only still occur, but are increasing.
Who will take up fighting these crimes if not the FBI? Probably state
and local law enforcement.

Let's look at that for a moment. Prior to the Millenium celebrations,
a truck filled with bomb-making equipment was stopped at a ferry
crossing in Port Angeles, Washington, and this probably prevented a
serious attack. While the person who stopped the truck was a Federal
employee (a Customs Inspector), the reason for the stop was not a
centralized database nor an alert from a centralized agency. Instead,
the driver was stopped because he seemed suspicious. An individual
acted on a hunch, investigated, and stopped an attack. We should
learn from this, and we're not.

Rather than centralizing, another approach to fighting terrorism would
be to concentrate resources on training local law enforcement officers
how to better spot and combat terrorism; that is, how to be more like
the Port Angeles Customs Inspector. Rather than sucking all possible
data sources into the Pentagon or the DHS, we could distribute
knowledge to the local -- far more numerous -- law enforcement
resouces who are far more likely to be able to prevent terrorism. How
do you interview someone seeking admission to the country, or to a
sports arena? What are the signs of lying that may be visible in
facial expressions or demeanor? What set of purchases might signal an
attempt to build a bomb? What are the little details a
carefully-trained eye might be able to piece into detection of a
terrorist? This is what I mean by a decentralized approach. Move the
effort to the more massive, more distributed, more intuitive body of
law enforcement coming into daily contact with the same terrorist
cells trying so hard to look normal. If sleeper cells lie dormant for
years, local police will very likely encounter at least one member of
the cell in that time. Don't we want those police officers to know
what questions to ask that might detect the cell?

We could be taking this approach, but we're not. We could be
improving the ability of local law enforcement to detect terrorism --
but instead we're degrading that ability, since we're shifting the
FBI's traditional crime-fighting work onto local resources. The one
method that has actually prevented a terrorist attack on US soil is
not being used, and is instead being inhibited. We are focusing on
centralizing intelligence and resources when instead -- or at least in
addition -- we should be decomposing, distributing, decentralizing.

I'm not suggesting, obviously, that the Federal government has no
role, nor a minimal role. Watch lists and signals intelligence and
data warehousing almost certainly are key tools for fighting
terrorism. But before we go too far in creating (or trying to create)
a grand unified database of all electronic transactions, maybe we
should think first about whether this is a problem best solved by
brute force data analysis, or a smart cop on the street.

Marc Hedlund
e: marc at precipice dot org

---

From: "Carrick Mundell" <carrick[at]multispatial.com>
To: <declan[at]well.com>
Subject: RE: What's so bad about Total Information Awareness? by Ben Brunk
Date: Tue, 10 Dec 2002 08:45:10 -0800

Declan,

Ben Brunk really spells it out. If the probability of finding a terrorist
using TIA is practically nil, then the system must be going to be used for
other purposes, namely, domestic spying. By increasing the size of the
target (e.g. libertarians, liberals, privacy hawks, greens, pro choicers,
Democratic Party donors, persons-we-hate, and, oh yeah, terrorists) maybe
TIA will prove more useful. What's so bad about Total Information
Awareness? Everything.

-Carrick Mundell

---

Subject: RE: What's so bad about Total Information Awareness? by Ben Brunk
Date: Tue, 10 Dec 2002 08:59:19 -0800
From: "Ron Schweiger" <Schweig[at]SRCSoftware.com>
To: <declan[at]well.com>
Content-Transfer-Encoding: 8bit

Benjamin is missing one little point that TIA will be widely successful
at which is monitoring ordinary American's. With a 5% error rate they
will know exactly what 95% of every American is doing at any given time!

Ron

---

Date: Tue, 10 Dec 2002 19:09:45 -0800 (PST)
From: Sascha Goldsmith <saish[at]yahoo.com>
Subject: Re: FC: What's so bad about Total Information Awareness? by Ben Brunk
To: "Christopher A. Petro" <petro[at]christopherpetro.com>,

Declan McCullagh <declan[at]well.com>

"I am SHOCKED, shocked to find gambling in this establishment"

"Sir, your winings..."

- Casablanca

CP!!!

I thought you were the leading "privacy/individual rights/get 'yer
publicly-funded mitts off my data" individual I knew!!!

That having been said, I agree with a lot of what you had said, with a few
caveats. (God, this sounds a lot like our discussions at work, n'est-ce pas?)

First, I think you are right. There is plenty of low-hanging fruit. (I
cannot help but wonder if your current vocation makes you more entreated to
security collection than your last, but that is only a supposition). The
point is: you are right.

However, as a drug-loving, freedom-loving, felony-avoiding indvidual, I
cringe. We have no privacy. Live it, but don't love it. And for God's
sake, don't encourage it. This leads me to my caveat.

I fully and toally, without reservation, back the establishment of a
British-like MI5 organization in this country. They have statutory
limitations. They have a charter, a mission. And they do it well. It
took dozens or IRA bombings to lead to its inception, but the institution
has adapted and learned and works. We can leverage their decades of
experience, and coupled with our simlilar traditions, the experiment should
work.

Here is why: call me a nut, call me a cashew. But I fully believe that
the FBI has been, is, and will always been unsuited for intelligence. The
duties of prosecution and espionage have significantly difference
attributes. Let's not dilute the FBI so it does both poorly.

With a newly funded department, focused on a singular mission, their powers
to use the data (i.e. pool it with the DEA, IRS, FBI, etc.) will be limited
by statue. However, their ability to pool information on terrorists (how
that is decided is a tricky issue, but at least you have a separation)
should be FULLY exercised in the manner your email eloquently
describes. Pool databases, tap into corporate records, share information
with the DEA, IRS, FBI, CIA, NSA, DIA and any other TLA they need to.

All I want, as a libertarian, is a "separation of powers". In the most
gracious nod to the founders I can muster, let's separate in a statutory,
congressional and judicial way, the powers afforded to the aforementioned
entities and the newly created US-MI5. (Hell, if we could get James Bond,
I would sleep BETTER at night!)

So, in general, I agree with you. But with the abject failure of
aforementioned institutions to respect their jursidictions and to hoard
information from other agencies, not to mention the abject failure to stop
9/11, let's start from scratch. Let's protect ourselves with an agency
that is ONLY dedicated to that purpose. I'm not talking about Homeland
Security. I'm talking about the tech of the NSA, the guile of the CIA, the
resources of the DIA, and a whole lot more nefarious to boot. (Let THEM
fear the Hellfire missiles from the Predators or the idea of being tapped,
not me).

Getting off my soapbox,

Saish

-------------------------------------------------------------------------
POLITECH -- Declan McCullagh's politics and technology mailing list
You may redistribute this message freely if you include this notice.
To subscribe to Politech: http://www.politechbot.com/info/subscribe.html
This message is archived at http://www.politechbot.com/
Declan McCullagh's photographs are at http://www.mccullagh.org/
-------------------------------------------------------------------------
Like Politech? Make a donation here: http://www.politechbot.com/donate/
Recent CNET News.com articles: http://news.search.com/search?q=declan
-------------------------------------------------------------------------
________________________________________________________________________top


3. Data miner replies to Politech, says TIA can ID terrorists

Date: Tue, 10 Dec 2002 15:04:40 -0500
From: Declan McCullagh <declan[at]well.com>
To: politech[at]politechbot.com
Cc: Abraunberg[at]currentanalysis.com, editor[at]kdnuggets.com
Reply to: declan[at]well.com

Previous Politech message:
"What's so bad about Total Information Awareness? by Ben Brunk"
http://www.politechbot.com/p-04234.html

Gregory Piatetsky-Shapiro's bio is here:
http://www.kdnuggets.com/gps.html

---

From: "Braunberg, Andrew" <Abraunberg[at]currentanalysis.com>
To: "'Declan McCullagh'" <declan[at]well.com>
Cc: "'KDnuggets Editor'" <editor[at]kdnuggets.com>
Subject: FW: Can TIA work or how can you separate bad coins from good ones?
By repetition
Date: Tue, 10 Dec 2002 14:52:45 -0500

Declan,

I follow the data mining industry fairly closely and am also a reader
of Politech. I passed on Ben's TIA concerns to Gregory Piatetsky-Shapiro, a
well known expert in the industry. His response follows. He is happy to
have you post it to Politech if you desire.

Best,

Andrew Braunberg
Senior Analyst,
Data Warehousing
Current Analysis

abraunberg[at]currentanalysis.com
912/236-6912

"Never express yourself more clearly than you think."
--Niels Bohr

-----Original Message-----
From: KDnuggets Editor [mailto:editor[at]kdnuggets.com]
Sent: Tuesday, December 10, 2002 12:16 PM
To: Braunberg, Andrew
Cc: editor[at]kdnuggets.com; Farhad Manjoo
Subject: Can TIA work or how can you separate bad coins from good ones? By
repetition

Andrew,

thanks for the note.

There are serious questions about whether TIA can work and how much privacy
it will erode.

There is no doubt that TIA will produce some false positives.

However, the statistical analysis given by Ben Brunk is very naive and
shows lack of understanding how the system might work. Data mining as a
useful technique has not been debunked -- all large companies are using it
every day.

The whole idea of finding patterns is that with enough history and
repetition the suspicious patterns will stand out, despite noise and errors.

Imagine that you have a thousand coins and that two of them are crooked
(i.e. probability

of heads is not half but 1/4)

If you flip each coin once, you cannot determine which one is crooked.

If you flip each coin a twenty times, crooked coins will have about 3-7
heads, but a few dozen "false positive" honest coins will also have about 7
heads.

If you flip each coin a thousand times, crooked coins will have about 230 -
270 heads, while honest coins will have 480 to 520 heads. So with a rule --

number of heads < 350

you will catch all crooked coins and no honest coins.

How many times you will need to flip each coin to find at least one crooked
coin?

That depends on the level certainty you want, the number of crooked coins, and

how crooked is each coin.

Applying this to terrorists, if there is ONE terrorist that does ONE thing
that is a LITTLE suspicious, no system cannot find it.

However, if there are MANY terrorists that do MANY things that are STRONGLY
suspicious, the system will find them.

How many is needed? We don't know, but that is one of the questions that
TIA wants to investigate.

Of course the real system will be using much more complex reasoning than
what I presented above.

Gregory Piatetsky-Shapiro

President, KDnuggets

The Source of Expertise in

Data Mining and Knowledge Discovery

At 08:30 AM 12/10/2002 -0500, you wrote:

>>>>

Gregory,

Good morning. I thought you would find the attached analysis of TIA
interesting. It comes from the Politech mailing list which is moderated by
Declan McCullagh, who was until recently Washington bureau chief of Wired
and is now at Cnet. The list is very well respected in the civil liberties
community. I thought you might have some unique insight or might wish to
pass the discussion on to your wider readership.

Best Regards,

Andrew Braunberg

-------------------------------------------------------------------------
POLITECH -- Declan McCullagh's politics and technology mailing list
You may redistribute this message freely if you include this notice.
To subscribe to Politech: http://www.politechbot.com/info/subscribe.html
This message is archived at http://www.politechbot.com/
Declan McCullagh's photographs are at http://www.mccullagh.org/
-------------------------------------------------------------------------
Like Politech? Make a donation here: http://www.politechbot.com/donate/
Recent CNET News.com articles: http://news.search.com/search?q=declan
-------------------------------------------------------------------------
________________________________________________________________________top


4. More on data mining, TIA, and how to ID terrorists

Date: Thu, 12 Dec 2002 23:07:35 -0500
From: Declan McCullagh <declan[at]well.com>
To: politech[at]politechbot.com
Reply to: declan[at]well.com

Other Politech messages:
http://www.politechbot.com/cgi-bin/politech.cgi?name=poindexter

---

From: "Werberg, Sam" <swerberg[at]findsvp.com>
To: "'declan[at]well.com'" <declan[at]well.com>
Cc: Abraunberg[at]currentanalysis.com, editor[at]kdnuggets.com
Subject: RE: Data miner replies to Politech, says TIA can ID terrorists
Date: Tue, 10 Dec 2002 15:34:33 -0500

Having followed the data mining space, I agree that it has its usefulness,
but the question that Gregory hints at but leaves unanswered is critical:

"How many times you will need to flip each coin to find at least one crooked
coin?"

In other words, how many law-abiding citizens will need to be "flipped", or
have their lives turned over, in order to find the terrorist?  How many is
too many?

Yes, the purpose of the TIA experiment is to find the answer to this, but
considering that the person in charge of it (Poindexter) was "convicted of
conspiracy, lying to Congress, defrauding the government, and destroying
evidence", can we trust that any conclusions he provides will be valid or
factual?

Sam Werberg

---

Date: Tue, 10 Dec 2002 14:10:05 -0700
From: "John W. Durham" <johnwdurham[at]acm.org>
Subject: Re: FC: Data miner replies to Politech, says TIA can ID terrorists
To: declan[at]well.com

Of course, the problem with Mr. Piatetsky-Shapiro's argument is that
there's no real assurance that the government will actually employ data
mining or will, instead, simply sift the "data" for whatever seems
expedient or interesting at the time of the search. Since "intelligence"
work is usually done under some time pressure, I find it hard to believe
that rigorous methodology will be used.

Remember, too, that we now have a government controlled by people who
argued successfully against the foundations of statistics in the court
cases relating to the use of sampling by the Census Bureau.

Not, of course, that politicians would ever have access to the TIA
program.

---

Please anonymously post my reply to Gregory Piatetsky-Shapiro.

xxx

Yes, this is an oversimplification!

Let's start with the fact that there are 280 million people, not a
thousand. Two out of a thousand turns out to be 560,000 bad guys, they say
we are only looking for 3000 or so, but how would they KNOW? And what are
these bits of information anyway? They are subjective things like cash
withdrawals, credit card purchases, not Boolean bits, and you want a
million transactions for 280 million people? Oh, and you need to do all
this in real time if you want to catch "pre-crime." Oh, you infer they can
catch every one? Ha!

It seems that the FBI et al had enough information so that a lowly agent
had the foresight to predict that a plane might be used to crash into a
building. Is there going to be any bureaucrat control in the TIA, or will
it be run like the Gestapo?

In this group there was a report that face recognition cameras couldn't
accurately spot the employees at a test airport using willing test
subjects. What happens when most of the evidence is analog? What happens
when the "bad guys" never committed any crime? If pre-perp prays to Allah 3
times a day for 50 years would he get 55,000 bad guy demerits? If he reads
Arabian nights instead of Mien Kompf does he get a Big Brother knock on he
door?

How can the government not sift through -everyone's- personal data
real-time to find the alleged bad guy, who has never done anything bad, and
how do you KNOW he/they were going to do something anyway? If someone has
too many matches is he an arsonist, too many guns a revolutionary? What
programmers are going to write this insightful program to predict crime
that hasn't happened unless they know the bad guy's crime already. No you
are going to spy on farmers who buy too much fertilizer and people who
decide to turn in their credit cards and withdraw an unusual amount of cash
because they are not going to participate in the sacking of the Bill of
Rights.

I am afraid if this continues we will have a default tyranny, one that is
capable of doing bad things even if it hasn't done it yet, does that sound
familiar? Thomas Jefferson said its our DUTY to overthrow tyranny. But what
if we can't because every bit of communication data is linked and
scrutinized so WE THE PEOPLE would be completely unable to form peaceful
protests out of fear, or effective rebellion because of the sheer power of
the state's spy network?

I think you data miners are dangerous. Does that get me put on the watch
list? If you successfully create this universal spy network, the result
will be the creation of a totalitarian ant pile. In the meantime, you will
be creating the terrorists you are looking for, only once upon a time we
called them patriots.

---

Date: Tue, 10 Dec 2002 15:33:49 -0500
From: jeff <jeff[at]newcitynet.com>
Reply-To: jeff[at]newcity.com
Organization: newcity
To: declan[at]well.com
Subject: Re: FC: Data miner replies to Politech, says TIA can ID terrorists

I am happy to read the reasoned response below by Ben Brunk and Gregory
Piatetsky-Shapiro

There are 2 issues outside of its scope:

  1. TIA used for partisan purposes, such as investigating adulturous
    politicians of only one party, or to intimidate and track small
    environmental organisations who are opposing an oil well project owned
    by a close political ally of an authority.
  2.  a cash-only lifestyle.... if they're smart terrorists are
    stockpiling stuff now.... thus rendering a false sense of security and a
    maginot-line-mentality among authorities.
  3. Industrial intelligence gathering... patterns may emerge from sifting
    through data that might benefit company X to know about the
    import/export/financial records of foreign supplier Y so that company X
    can push harder for a price reduction, etc.... this sort of info might
    be tempting for authorities to supply to politically connected
    businesses in return for lots of support.

-Jeff C
chicago

---

To: declan[at]well.com
Cc: brunkb[at]ils.unc.edu
Date: Mon, 9 Dec 2002 22:44:41 +0000
Subject: Re: FC: What's so bad about Total Information Awareness? by Ben Brunk

What Mr. Brunk doesn't make explicit, but
probably knows, is that TIA is simply an excuse
to get their hands on every database in the world.
The people who control the operation will simply
look up what they want about whoever they have
predetermined is suspect, according to their
political agenda.

--GJ

---

From: Amos Satterlee <asatterlee[at]inta.org>
To: "'declan[at]well.com'" <declan[at]well.com>
Subject: RE: Data miner replies to Politech, says TIA can ID terrorists
Date: Tue, 10 Dec 2002 17:18:38 -0500

Declan:

Fascinating point/counterpoint. I think, however, that Brunk's point central
concern is not addressed by Piatetsky-Shapiro, to wit:

  1. Intelligent analysis of business behavior or buying patterns occurs
    within reasonably defineable contexts. TIA can't limit itself to a
    particular context because the terrorist then acts outside the context.
  2. Piatetsky-Shapiro is correct that increasing the number of trials reduces
    the instance of false positives. Brunks point is that the terrorist acts in
    such a way to try and reduce the number of trials of a particular pattern of
    behavior.
  3. If there is no context other than Everything, Brunk's point is that the
    time, effort, manpower, and invasiveness needed to analyze the data that
    intends to be unpatterned is so high as to be unacceptable and the
    unintended consequences are shocking.

Amos

---

Date: Tue, 10 Dec 2002 17:42:30 -0500
To: declan[at]well.com
From: Stephen Cobb <scobb[at]cobb.com>
Subject: Re: FC: What's so bad about Total Information Awareness? by
   Ben Brunk
In-Reply-To: <5.1.1.6.0.20021209235549.0161eec0[at]mail.well.com>

At 12/9/2002 11:57 PM -0500, you wrote:
>Date: Mon, 09 Dec 2002 22:34:13 -0500
From: Ben Brunk <brunkb[at]ils.unc.edu>
>To: declan[at]well.com
>Subject: Debunking TIA
>
>My hope is that this editorial will awaken those who are even more skilled
>in computer science, statistics, game theory, etc. and that they find the
>courage to speak up so we can put the brakes on the wasteful and
>destructive blind alley called TIA.
>
>Benjamin Brunk

Back when Safire wrote his column on TIA my comment was "The reality is
that no government agency could possibly link more than two databases in
under ten years for just $200 million." (11/14/02)

Maybe I should have said it louder, with more credentials :-)

Stephen

---

Date: Tue, 10 Dec 2002 19:45:49 -0500
From: "J.D. Abolins" <jda-ir[at]njcc.com>
Subject: Re: FC: What's so bad about Total Information Awareness? by
Ben        Brunk
In-reply-to: <5.1.1.6.0.20021209235549.0161eec0[at]mail.well.com>
To: declan[at]well.com

On Mon, 2002-12-09 at 23:57, Declan McCullagh wrote:
 > Date: Mon, 09 Dec 2002 22:34:13 -0500
 > From: Ben Brunk <brunkb[at]ils.unc.edu>
 > To: declan[at]well.com
 > Subject: Debunking TIA
 >
[...]
 > All in all, I can't see how TIA will do anything except harm innocent
 > people and create new jobs for bureaucrats.  Any numerate person who spends
 > five minutes thinking about what is proposed will come to the same
 > conclusion.  If our system is going to become this arbitrary, there are
 > going to be an awful lot of lives ruined in this country.

Exactly so!

If TIA works in linking patterns to some useful, the errors will be bad
news. If TIA doesn't work to any useful anti-terrorism purposes, it will
be bad.

Whether or not a system like TIA works is not as critical as the belief
by its users that it works.

Recently, I got a good laugh from a Wall Street Journal article, "My
TiVo Thinks I'm Gay, How to Set It Straight." The article described the
grossly mistaken customer preferences generated by TiVo and Amazon.com
routines. (Amazon.com's customer suggestions pegged a gay man as a
"pregnant gay man" after he bought some books on pregenancy to give to a
friend. What I'm not laughing about is the prospect that TIA thinks I'm
a Middles Eastern terrorist if I bought some couscous and browsed some
Middle Eastern Web sites. <grin> Then I'd have to buy some canned hams
(which Iwouldn't eat but give to a soup kitchen) to adjust the profile.
But it is likely that somebody with TIA development would try to catch
TIA hacking attempts and peg eractic profile changes as suspicious.
<grin and groan>

In real life, however, I believe that likely government response to the
occaisional rousting of innocent people, perhaps with soem rough
treatment, is "They should thank us for being so diligent in following
up possible terrorism leads instead of grumbling about the SWAT fellow's
boot on the back of their necks."

I hope to come to back to Mr. Burke's good comments about the
complexities and difficulties of mining data in a latter email. Even
though there are many problems, the perception (a shakey one) is still
being sold as a great business tool.

J.D. Abolins

---

Date: Tue, 10 Dec 2002 22:33:11 -0500
To: declan[at]well.com
From: Stephen Cobb <scobb[at]cobb.com>
Subject: Re: FC: Data miner replies to Politech, says TIA can ID
   terrorists
In-Reply-To: <5.1.1.6.0.20021210150041.021d9048[at]mail.well.com>

I'm sorry but this is a laughable defense. Personally, I see a lot of
technical problems with TIA long before you get to the statistics, but if
TiVo is any indication of the technology involved, we are in deep trouble...

"If TiVo Thinks You Are Gay, Here's How to Set It Straight
What You Buy Affects Recommendations
On Amazon.com, Too; Why the Cartoons?
By JEFFREY ZASLOW
Staff Reporter of THE WALL STREET JOURNAL
http://online.wsj.com/article/0,,SB1038261936872356908,00.html
Basil Iwanyk is not a neo-Nazi. Lukas Karlsson isn't a shadowy stalker.
David S. Cohen is not Korean.

But all of them live with a machine that seems intent on giving them
such labels. It's their TiVo, the digital videorecorder that records
some programs it just assumes its owner will like, based on shows the
viewer has chosen to record."

Stephen

-------------------------------------------------------------------------
POLITECH -- Declan McCullagh's politics and technology mailing list
You may redistribute this message freely if you include this notice.
To subscribe to Politech: http://www.politechbot.com/info/subscribe.html
This message is archived at http://www.politechbot.com/
Declan McCullagh's photographs are at http://www.mccullagh.org/
-------------------------------------------------------------------------
Like Politech? Make a donation here: http://www.politechbot.com/donate/
Recent CNET News.com articles: http://news.search.com/search?q=declan
-------------------------------------------------------------------------
________________________________________________________________________top


5. EFF's Brad Templeton and Norm Singleton on TIA's true threat

Date: Thu, 12 Dec 2002 21:02:41 -0500
From: Declan McCullagh <declan[at]well.com>
To: politech[at]politechbot.com
Reply to: declan[at]well.com

Other Politech messages:
http://www.politechbot.com/cgi-bin/politech.cgi?name=poindexter

Previous Politech message:
http://www.politechbot.com/p-04234.html

---

From: "Singleton, Norman" <Norman.Singleton[at]mail.house.gov>
To: declan[at]well.com
Subject: RE: What's so bad about Total Information Awareness? by Ben Brunk
Date: Tue, 10 Dec 2002 08:52:16 -0500

good analysis except he misses how this system will be sued to further the
use of the SSN as a uniform identifier and further the war on cash, begin in
the freedom-loving Regan administration as part of the war on drugs, in
order to more efficiently identify terrorists.

Norman Kirk Singleton
Legislative Director
Congressman Ron Paul
US House of Representatives
202-225-2831 (ph)

---

Date: Mon, 9 Dec 2002 23:58:24 -0800
From: Brad Templeton <brad[at]templetons.com>
To: Declan McCullagh <declan[at]well.com>
Cc: Ben Brunk <brunkb[at]ils.unc.edu>
Subject: Re: FC: What's so bad about Total Information Awareness? by Ben Brunk
Organization: http://www.templetons.com/brad

Why is TIA scary? First off, let me say that I wonder how real it is.
It's so scary that I can't quite figure out why its proposal is public.
The amount of money being asked for whould normally be easily placed in
any agency's classified "black budget." I wonder if it's a diversion.

But if it's not a diversion, it surely is to be worried about.
The fact that corporations can't track us that well for their
credit card offers is no reason to feel safe. Indeed, that's what
Admiral Poindexter claims he will develop, a better system, with AI
techniques, that can do a better job. The cost of sending out a
magazine offer to the wrong John Smith is insignificant. The cost of
arresting the wrong John Smith until he can be cleared is immense.

The system need not be super accurate. All it will do is identify
the "suspicious" activity it finds and sort the identified leads by
some score. If agents have time to look at 200 leads, then the top
200 scoring leads will be investigated. They hope the terrorist
is in there -- if not, they are wasting the time of those agents.
In the meantime, 199, and probably 200 innocents are subjects of
investigations.

And everybody's under surveillance. That has a deep cost, which I
detail in my short essay at www.templetons.com/brad/watched.html.
When we become afraid our every activity is watched, we are less free.

I agree, by the way, that terrorists will work to bypass such systems,
to not be in the top 200 leads. They may or may not succeed. However,
this does not appear to deter the intel agencies from wanting to try.
They feel their ass is in the fire if they do nothing. Our asses are in the
fire when they do too much.

> All in all, I can't see how TIA will do anything except harm innocent
> people and create new jobs for bureaucrats. Any numerate person who spends

This should not make you feel less afraid of TIA. Indeed, it's exactly
what to be afraid of.

-------------------------------------------------------------------------
POLITECH -- Declan McCullagh's politics and technology mailing list
You may redistribute this message freely if you include this notice.
To subscribe to Politech: http://www.politechbot.com/info/subscribe.html
This message is archived at http://www.politechbot.com/
Declan McCullagh's photographs are at http://www.mccullagh.org/
-------------------------------------------------------------------------
Like Politech? Make a donation here: http://www.politechbot.com/donate/
Recent CNET News.com articles: http://news.search.com/search?q=declan
-------------------------------------------------------------------------
________________________________________________________________________top


6. Links out on Total Information Awareness and privacy in general

-----

New Yorker
December 12, 2002 | home
COMMENT
TOO MUCH INFORMATIONIssue of 2002-12-09
http://www.newyorker.com/talk/content/?021209ta_talk_hertzberg

-----

We'll All Be Under Surveillance
Computers Will Say What We Are
by Nat Hentoff
Village Voice
http://www.villagevoice.com/issues/0250/hentoff.php

-----

The Death of Operation TIPS
Our country's spying corps gets nixed
By Nat Hentoff
Village Voice
http://villagevoice.com/alertrd.php3?article=40587

-----

A speech by Poindexter on TIA
http://www.fas.org/irp/agency/dod/poindexter.html

-----

July 26, 2002
The Societal Costs of Surveillance
By MICHELE KAYAL, Honoluly
New York Times
http://www.nytimes.com/2002/07/26/opinion/26KAYA.html

-----

Popular Science article on a day in the privacy-life of a Chicagoite
http://www.popsci.com/popsci/science/article/0,12543,260388-1,00.html

-----

Make Sure You Are Privacy Literate
Karen Coyle
Library Journal, 10/1/02
http://libraryjournal.reviewsnews.com/index.asp?layout=article&articleid=CA245045

________________________________________________________________________top


L I B R A R Y J U I C E
-
| http://libr.org/Juice/
|
| Library Juice is supported by a voluntary subscription
| fee of $10 per year, variable based on ability and
| desire to pay. You may send a check payable in US funds
| to Rory Litwin, at 1821 'O' St. Apt. 9, Sacramento, CA 95814,
| or, alternatively, you may use PayPal, by going to:
| https://www.paypal.com/xclick/business=rlitwin%40earthlink.net
|
| To subscribe, email majordomo[at]libr.org with the message
| "subscribe juice".
|
| To unsubscribe, email majordomo[at]libr.org with the message
| "unsubscribe juice".
|
| Other majordomo commands are available in the help file,
| which you can get by emailing majordomo[at]libr.org with the
| message "help".
|
| Original material and added value in Library Juice
| are dedicated to the public domain and may be copied
| freely with appropriate attribution; beyond that the
| publisher makes no guarantees. Library Juice is a
| free weekly publication edited and published by
| Rory Litwin. Original senders are credited wherever
| possible; opinions are theirs. If you are the author
| of some email in Library Juice which you want removed
| from the web, please write to me and I will remove it.
|
| Your comments and suggestions are welcome.
|
| Rory[at]libr.org