Reputation
Roger
Dingledine, Michael J Freedman, David
Molnar, David Parkes, Paul Syverson
Section 1:
Introduction
Reputation
is a tool to predict behavior based on past
actions and characteristics. We use
reputation regularly in our daily lives ---
reputations of individuals, as when we
choose a physician; groups, as when we
decide that individuals above a certain age
can be trusted to purchase alcohol; and
collective entities, as when we decide
whether Ford is a company that will sell us
good cars.
Reputation
is based on linkability. When we can link
actions to an identity, and link actions by
that identity to other actions by that same
identity, then we can begin to make
predictions about the identity's future
actions. We distinguish between the problem
of verification (establishing identity and
tying it to a real-world entity) and the
problem of reputation (judging the behavior
of a given identity). This chapter deals
mainly with the latter, but we also tackle
some of the problems with verification that
impact reputation.
The concept
of reputation, and the associated themes of
building and assessing reputations, are not
in themselves new ideas. Reputation, often
signaled via a brand such as the New York
Times, is an important concept in
environments in which there is asymmetric
information. Newspapers are a good example
-- a reader must pay for the good before
evaluating the good, and the seller and
publisher likely have better information
about the quality of the product. Without
the ability to brand a product the newspaper
market would likely spiral towards Akerlof's
"Market for Lemons", with high
quality products unable to differentiate
themselves from low quality products and
only low quality products surviving in the
long-term. Reputation provides what Axelrod
refers to as "the shadow of the
future", with participants in a system
considering the effect of current
misbehavior on future interactions, even if
future interactions are with different
people. Reputation systems (systems that
attempt to attach reputation to an identity
and make an ongoing assessment of that
reputation) promise to "unsqueeze the
bitter lemon" (Resnick et al. 2000) by
automating word-of-mouth reputation for
electronic networks.
While
reputation itself is not new, we now have
the ability to provide electronic
intermediaries, or reputation systems, that
efficiently collect, aggregate, and
disseminate reputation information. The
ability to automate traditional
word-of-mouth reputation usefully
complements the new opportunities provided
by electronics markets such as eBay.
Well-functioning reputation systems enable
efficient electronic markets, allowing
geographically dispersed and pseudonymous
buyers and sellers to trade goods, often in
one-time transactions and with payments made
before the physical delivery of the goods.
Reputation systems are used on news and
commentary sites such as Slashdot to help
readers to filter and prioritize information
from multiple, often unmoderated, sources.
Dellarocas (2002) notes that the appeal of
electronic reputation systems is that they
can facilitate cooperation without a need
for costly enforcement; the cheap collection
and dissemination of feedback information
can replace the roles of litigation and
enforcement in preventing the abuse of
information asymmetries.
We are all
familiar with reputations in the physical
world, but using the same approaches online
introduces some pitfalls. Many of our social
norms were established when we were
connected to a relatively small number of
people, but the Internet connects to
millions of people. Because our preexisting
notions of how to build trust do not apply
as well to this more dynamic world,
traditional reputation approaches allow some
new attacks. Still, the internet provides us
with a staggering amount of raw data about
preferences, behavior patterns, and other
details, if only we could rely on it. To
build a good reputation system, we must find
ways to use these data without naively
believing in their quality. In particular,
online reputation systems must consider
several issues:
- How
broad is the context we're trying to
build a reputation for? Are we trying to
predict how well a given person will
perform at a very specific task (eg
delivering a letter to the post office),
or are we trying to predict his
performance at a broad range of tasks
such as delivering letters, running a
company, and babysitting our kids?
- What's
at stake? Are we talking about
predicting whether he'll put a stamp on
his envelope, or whether he'll betray
our government's secrets?
- Can we
tie people to their online identities,
or is it possible for people to create
many online identities that are not
obviously related to each other?
- Assuming
we've solved the issue of tying
identities to real people, how can we
measure reputation?
- Finally,
what algorithms can let us make accurate
predictions based on these reputation
values?
Unfortunately,
the field of reputation research is still
young, and there are no clear technological
solutions (now or on the horizon) robust
enough for most government transactions. In
this chapter we lay out the problems in
reputation theory and discuss the lessons we
can learn from existing models. Section 2
presents some fielded reputation systems.
Section 3 generalizes some of the issues and
presents some of the current roadblocks in
current reputation systems, and Section 4
presents some more roadblocks for the
specific context of pseudonymous systems.
In section
5, we make some simplifying assumptions to
develop the notion of an "economy of
reputation." Reputation can be earned,
spent, and traded --- after all, people
listen to somebody who has a good reputation
when he recommends somebody else.
Furthermore, reputation entails costs: if
you make it hard to provide reputation
information, people won't bother, whereas if
you reward them in some way they'll give
more information. Reputation systems
motivate people in ways similar to monetary
economies: as a simple example, just as the
fear of losing money can inspire someone to
take preventative measures, so can a fear of
loss of reputation. And finally, people can
try to game the system by planting false
reputation, or buying and selling
reputation.
Section 6
finishes with some conclusions and further
thoughts.
Section 2:
Fielded reputation systems
Who will
moderate the moderators: Slashdot
The news
site Slashdot.org is a very popular news
service that attracts a particular kind of
"Slashdot reader" -- lots of them.
Each and every Slashdot reader is capable of
posting comments on Slashdot news stories,
and sometimes it seems like each and every
one actually does. Reputations based on this
interaction can help a user figure out which
of the many comments are worth reading.
To help
readers wade through the resulting mass of
comments, Slashdot has a moderation system
for postings. Certain users of the system
are picked to become moderators. Moderators
can assign scores to postings and posters.
These scores are then aggregated and can be
used to tweak a user's view of the posts
depending on a user's preferences. For
example, a user can request to see no posts
rated lower than 2.
The
Slashdot moderation system is one existing
example of a partially automated reputation
system. Ratings are entered by hand, using
trusted human moderators, but then these
ratings are collected, aggregated, and
displayed in an automatic fashion.
Although
moderation on Slashdot serves the needs of
many of its readers, there are many
complaints that a posting was rated too high
or too low. It is probably the best that can
be done without trying to maintain
reputations for moderators themselves.
Reputations
worth real money: eBay
The eBay
feedback system is another example of a
reputation service in practice. Because
buyers and sellers on eBay usually have
never met each other, neither has much
reason to believe that the other will follow
through on their part of the deal. They need
to decide whether or not to trust each
other.
To help
them make the decision, eBay collects
feedback about each eBay participant in the
form of ratings and comments. After a trade,
eBay users are encouraged to post feedback
about the trade and rate their trading
partner. Good trades, in which the buyer and
seller promptly exchange money for item,
yield good feedback for both parties. Bad
trades, in which one party fails to come
through, yield bad feedback which goes into
the eBay record. All this feedback can be
seen by someone considering a trade.
The idea is
that such information will lead to more good
trades and fewer bad trades --- which
translates directly into more and better
business for eBay. This isn't always the
case in practice, but it is often enough to
give eBay a reputation of its own as the
preeminent web auction site.
Codifying
reputation on a wide scale: The PGP web of
trust
One of the
first applications to handle reputations in
an automated fashion on a genuinely large
scale was the "web of trust"
system introduced in Phil Zimmermann's
Pretty Good Privacy (PGP). This was also the
first program to bring public key
cryptography to the masses (see the Crypto
chapter for more details on public key
crypto).
With public
key cryptography comes the key certification
problem, a type of reputation issue.
Reputations are necessary because there is
no way to tell from the key alone which
public key belongs to which person.
For
example, suppose Alice would like people to
be able to send encrypted messages to her.
She creates a key and posts it with the name
"Alice." Unbeknownst to her, Carol
has also made up a key with the name
"Alice" and posted it in the same
place. When Bob wants to send a message to
Alice, which key does he choose? This
happens in real life; as an extreme example,
the name "Bill Gates" is currently
associated with dozens of different PGP keys
available from popular PGP key servers.
So the key
certification problem in PGP (and other
public key services) consists of verifying
that a particular public key really does
belong to the entity to whom it
"should" belong. PGP uses a system
called a web of trust to combat this
problem. Alice's key may have one or more
certifications that say "Such and such
person believes that this key belongs to
Alice." These certifications exist
because Alice knows these people personally;
they have established to their satisfaction
that Alice really does own this key. Thus
Alice's key builds up a reputation as being
the right key to use when talking to Alice.
Carol's fake "Alice" key has no
such certifications, because it was made up
on the spot.
When Bob
looks at the key, his copy of PGP can assign
it a trust level based on how many of the
certifications are made by people he knows.
The higher the trust level, the more
confidence Bob can have in using the key.
But because there's a limit to how many
people Alice and Bob can know, in reality
Bob's software will look for broader
connections, such as a "certification
chain" that is less than, say, 4 hops
long, or how many independent paths through
the web go through at most 4 people?
There are
still a number of tricky issues that make
the PGP web of trust concept hard to use
securely: for example, what exactly did Bob
mean when he certified Charlie's key, and
does Charlie mean the same thing when he
certifies David's key? But the key point to
remember here is that the web of trust
depends on reputation to extend trust to new
parties.
Attack-resistant
trust metrics: Google
In the
1990's, web search engines downloaded and
catalogued millions of web pages. Enginges
responded to queries based on simple
metrics, such as whether the page included
the search phrase, and how many times a page
included the search phrase.
Over time,
people began "attacking" these
search engines to steer visitors to their
respective pages. People created pages with
many common key words: if a page contained
enough attractive words, search engines
would be more likely to return it to users
searching for that phrase. The attack was
improved by creating many copies of a given
page, thus increasing the likelihood that it
would be returned.
In response
to these attacks, and in hopes of developing
more accurate search engines, researchers
began evaluating alternative trust metrics
for search engines. In 1998, Stanford
researchers opened a new search engine,
called Google, that uses reputation
technology similar to projects such as IBM's
"Clever" project. Google's
fundamental innovation is to determine a
page's usefulness or page rank based
on which pages link to it.
Google's
trust algorithm can be visualized as
follows: Begin on the front page of a very
popular website (e.g., Yahoo, CNN, Slashdot,
etc.). Now randomly follow links from the
front page for a few minutes. The page you
end up on (it may still be somewhere within
the original website, or you may be
somewhere else on the WWW) receives a small
amount of reputation. Repeat this operation
millions of times. Over time, pages which
are commonly linked to will emerge as having
higher reputation because they are
"more connected" in the web. Pages
on the fringes of the WWW will have lower
reputation because few other pages point at
them. Sites which are linked from many
different reputable websites will be ranked
higher, as will sites linked from extremely
popular websites (e.g. a direct link from
the main Amazon page to an offsite page).
In
actuality, Google's page rank computation is
more efficient than the random walk approach
outlined above, and Google's actual
algorithm is complicated by its ability to
consider factors such as whether the words
on a page are relevant to the words in the
query. Though complex, Google's
reputation-based scheme results in far
better results to search queries than other
search engines.
Attack-resistant
trust metrics: Advogato
Raph
Levien's "Advogato" trust metric
applies this same approach to the PGP web of
trust problem. One problem with the basic
web of trust algorithm is that trust
decisions that Alice might make, such as,
"If he's within 4 hops of me then I
trust him," leave her open to attacks.
Imagine David manages to get enough
certifications that his key is within 3 hops
of Alice. Then, if he creates a new identity
(key) and certifies it with his
"David" key, Alice will trust the
new identity. In fact, he can create as many
new identities as he wants, causing Alice to
trust more and more new "people."
This attack is similar to the search engine
attack above, where a website author creates
many different web pages to increase the
chance of drawing users.
Advogato's
trust metric aims to withstand this attack.
Its operation is similar to Google's above:
start with a few trusted people in the web
of trust (called seeds), and walk out
randomly along the web of trust. People
frequently walked to are considered
reputable. In the scenario above, David's
key would be a trust bottleneck,
meaning that the total amount of trust that
David's alter egos can gain is limited by
the number of certifications that David can
trick out of honest users. That is, the
amount of trust that David can pass on to
his new nodes is bounded by the amount of
trust that is passed to David himself.
One of the
interesting features of these trust metrics
is that the seed doesn't always have to be
the same. In theory, google could let each
user specify a website or websites which
should be the "starting point" for
reputation calculations; similarly Advogato
could let a user ask "If I'm the
seed, then which nodes count as
reputable?"
Section 3:
General issues with reputation systems
Attacks
and adversaries
Two
questions determine how we evaluate the
security of reputation systems. First,
what's the context for the reputation system
-- what's at stake? Second, who are the
adversaries? The capabilities of potential
adversaries and the extent to which they can
damage or influence the system dictate how
much energy should be spent on security.
In
mid-2000, a group of people engaged in eBay
auctions and behaved well. As a result,
their trust ratings went up. Once their
trust ratings were sufficiently high to
engage in high-value deals, the group
suddenly "turned evil and cashed
out." That is, they used their
reputations to start auctions for
high-priced items, received payment for
those items, and then disappeared, leaving
dozens of eBay users holding the bag.
If a
corporation planning a $50 million
transaction bases its decisions on website
that computes and publishes reputations for
companies, it could be worth many millions
of dollars to influence the system so that a
particular vendor is chosen. Indeed, there
are quite a few potential adversaries for a
company tracking online reputations for
large businesses. A dishonest vendor might
want to forge or use bribes to create good
feedback to raise his resulting reputation.
In addition to wanting good reputations,
vendors might like their competitors'
reputations to appear low. Exchanges --
online marketplaces that try to bring
together vendors to make transactions more
convenient -- would like their vendors'
reputations to appear higher than those of
vendors that do business on other exchanges.
Vendors with low reputations -- or those
with an interest in people being kept in the
dark -- would like reputations to appear
unusably random. Dishonest users might like
the reputations of the vendors that they use
to be inaccurate, so that their competitors
will have inaccurate information.
Perhaps the
simplest attack that can be made against a
reputation system is called shilling.
This term is often used to refer to
submitting fake bids in an auction, but it
can be considered in a broader context of
submitting fake or misleading ratings. In
particular, a person might submit positive
ratings for one of her friends (positive
shilling) or negative ratings for her
competition (negative shilling). Either of
these ideas introduces more subtle attacks,
such as negatively rating a friend or
positively rating a competitor to try to
trick others into believing that competitors
are trying to cheat.
Shilling is
a very straightforward attack, but many
systems are vulnerable to it. A very simple
example is the AOL Instant Messenger system.
You can click to claim that a given user is
abusing the system. Since there is no
support for detecting multiple comments from
the same person, a series of repeated
negative votes will exceed the threshold
required to kick the user off the system for
bad behavior, effectively denying him
service. Even in a more sophisticated system
that detects multiple comments by the same
person, an attacker could mount the same
attack by assuming many different
identities.
Vulnerabilities
from overly simple scoring systems are not
limited to "toy" systems like
Instant Messenger. Indeed, eBay's reputation
system, used naively, suffers from a similar
problem. If the reputation score for an
individual were the good ratings minus the
bad ratings, then a vendor who has performed
dozens of transactions, and cheats on only 1
out of every 4 customers, will have a
steadily rising reputation; whereas a vendor
who is completely honest but has only done
10 transactions will be displayed as less
reputable. A vendor could make a good profit
(and build a strong reputation!) by being
honest for several small transactions and
then being dishonest for a single large
transaction. Fortunately, this attack does
not work against eBay -- ratings from users
include a textual description of what
happened, and in practice even a few vocally
unhappy customers mean that a vendor's
reputation is ruined.
More
issues to consider
In some
situations, such as verifying voting age, a
fine-grained reputation measurement is not
necessary -- simply indicating who seems to
be sufficient or insufficient is good
enough.
Also, in a
lot of domains, it is very difficult to
collect enough information to provide a
comprehensive view of each entity's
behavior. It might be difficult to collect
information about entities because the
volume of transactions is very low, as we
see today in large online business markets.
But there's
a deeper issue than just whether there are
transactions, or whether these transactions
are trackable. More generally: does there
exist some sort of proof (a receipt or other
evidence) that the rater and ratee have
actually interacted?
Being able
to prove the existence of transactions
reduces problems on a wide variety of
fronts. For instance, it makes it more
difficult to forge large numbers of entities
or transactions. Such verification would
reduce the potential we described in the
previous section for attacks on AOL Instant
Messenger. Similarly, eBay users currently
are able to directly purchase a high
reputation by giving eBay a cut of a dozen
false transactions which they claim to have
performed with their friends. With
transaction verification, they would be
required to go through the extra step of
actually shipping goods back and forth.
Proof of
transaction provides the basis for
Amazon.com's simple referral system,
"Customers who bought this book also
bought..." It is hard to imagine that
someone would spend money on a book just to
affect this system. It happens, however. For
instance, a publishing company was able to
identify the several dozen bookstores across
America that are used as sample points for
the New York Times bestseller list; they
purchased thousands of copies of their
author's book at these bookstores,
skyrocketing the score of that book in the
charts. [David D. Kirkpatrick, "Book
Agent's Buying Fuels Concern on Influencing
Best-Seller Lists," New York Times
Abstracts, 08/23/2000, Section C, p. 1, col.
2, c. 2000, New York Times Company.]
In some
domains, it is to most raters' perceived
advantage that everyone agree with the
rater. This is how chain letters, Amway, and
Ponzi schemes get their shills: they
establish a system in which customers are
motivated to recruit other customers.
Similarly, if a vendor offered to discount
past purchases if enough future customers
buy the same product, it would be hard to
get honest ratings for that vendor. This
applies to all kinds of investments; once
you own an investment, it is not in your
interest to rate it negatively so long as it
holds any value at all.
Transferability
of reputations
Another
important issue in reputation is
tranferability. In some contexts,
transferability is inherent. Group
reputation is only useful if it is based on
behavior of members of a group and can be
assigned to most members of that group, as
in legal drinking age determination. Even
here there can be many factors, for example
there may be good evidence that a reputation
is strongly associated with a group, but the
reputation might only apply to some small
percentage of the group. Alternatively, a
reputation might only accurately reflect
individuals while in the group, which turns
out to be highly dynamic. These are some of
the difficulties faced in combining
reputation inputs in order to meaningfully
answer a reputation question.
Of course
in the context of a single national
identifier, much of this collapses. A simple
reputation economy seems easier to manage,
hence stronger than one involving multiple
identities, pseudonyms, or full anonymity.
However, this simplicity is not without
drawbacks and other complexities.
Individuals may currently have multiple
legitimate online identities associated with
distinct merchants, when communicating with
friends, when engaged in their occupations,
when interacting with different parts of
various governments and jurisdictions.
Combining all of these identities creates a
number of problems. Aside from the obvious
privacy concerns, it inherently increases
vulnerability to identity theft. In the face
of a single national identifier, identity
theft becomes both a larger threat from more
widespread exposure and a more devastating
threat because more is at stake in a single
identity. And, because we would need to
bootstrap a single national identifier from
the current system of many identities per
person, we still need to deal with the issue
of transferability of reputation from one
identity to another. Thus we must consider
multiple identities if only temporarily.
Ubiquitous
anonymity is not a simple answer either.
Currently, there are no reputation systems
that work anonymously (much as cash
functions anonymously in economic systems)
and remain safe. Theoretically it might be
possible to develop such a reputation
system; this is an active area of a research
currently. The security, key management, and
credential systems that would be necessary
to make this feasible are no more in place
for this purpose as yet than for any other.
Digital
credentials are one means of storing and
managing reputation. That is, credentials
can be used to sum up reputation. We are
already familiar with such usage in the
physical world: reputation as a longstanding
resident, as gainfully employed, as a
responsible driver are often established by
means of credentials. Digital credentials
are in principle no different. In practice
there are many issues, some of which we have
already mentioned. The management of these
credentials is outside the scope of this
work. Here we consider how to assess and
formulate what a credential attests to.
Credentials figure in only to the extent
that they are used in determining and
managing reputation. Reputation systems are
more readily thought of as those systems
that attempt to attach reputation to an
identity and make an ongoing assessment of
that reputation.
With
reputation, besides the possibility of
transferring reputation between multiple
identities of the same individual we must
consider the possibility of transferring a
single reputation between multiple
individuals. This may result in a reputation
that correctly attaches to no one person.
For example, consider roles such as
CTO for a company, or an agent in charge of
handling a given contract for the company.
These identities must be able to be
transferred smoothly to another person ---
or shared. Whether or not this is acceptable
would depend on what the reputation is
intended to do. Of course even deciding when
two individuals are the same, or more
generally, when two actions are by the same
individual can be quite difficult online. A
good reputation system should either provide
adequate counters to the above identity
attacks such as pseudospoofing, or provide
useful information when such answers cannot
be adequately given.
Section 4:
Reputation systems in a pseudonymous
environment
Here we
focus on the additional problem of operating
from some middle point in the spectrum of
identification. Instead of a single global
permanent identifier, individuals may expose
some pseudonym identifier to the system, to
which sets of attributes or credentials are
bound. So, the basic problem from this
scenario changes: Given a set of statements
not linked directly to real people, how can
we believe them, i.e., trust that they both
valid and complete?
Of course,
one obvious solution to this problem is the
introduction of centralized, trusted
authorities to authenticate individuals when
they register a pseudonym with the system.
Cryptographic techniques known as blinding
may be used to prevent the authority from
linking entities to pseudonyms while still
ensuring that an entity can only present one
pseudonym to some system environment at any
given time. Such a task is certainly not
free or fool-proof (consider Verisign's
weakness if their root key is compromised),
and such security often raises an
individual's barrier to entry.
In this
section we describe some limitations and
open problems of reputation systems in a
pseudonym-based setting. In particular,
whether systems are centralized or
distributed (peer-to-peer) has a lot of
impact on how easy it is to handle
reputations online.
Verifying
the uniqueness of identities
One
fundamental limitation of reputation systems
in online settings is the difficulty of
ensuring that each entity only presents a
single identity to the system.
Classic
authentication considers whether a presented
identity corresponds to the expected entity,
but not whether the authenticated identities
are distinct from one another. In a virtual
setting, a single person may create and
control multiple distinct online identities.
In some situations, a honest person
certainly should be able to have multiple
pseudonyms that are used for different
aspects of the person's online life
(consider credit cards, library cards,
student badges, hospital IDs, etc). However,
within the context of a specific reputation
environment, a dishonest person may
establish multiple, seemingly distinct,
pseudonyms that all secretly collaborate
with each other to attack the system.
Therefore,
each entity should be represented with equal
weight, rather than being able to inflate
his importance. To do this, we must bound
the number of pseudonyms that an adversary
may control (see also the above discussion
about Google and Advogato). In many cases,
e.g., within a specific category of
reputation, we seek to establish a
one-to-one correspondence between entity and
pseudonym. This "pseudospoofing"
problem is the hardest issue facing the
security of reputations systems in
large-scale, decentralized environments.
One
technique in a distributed setting to bound
the number of pseudonyms that an adversary
controls involves requiring "proofs of
work" for participating.
To make
reputation attestations, users must perform
time-consuming operations. For example,
reviews at online communities (which go
toward the reputation of products or
services) generally take the form of written
descriptions of users' experiences, as
opposed to merely assigning some numeric
score. To prevent the bulk submission of
scores, one may take a more computational
approach by requiring users' machines to
solve computationally-difficult problems to
create pseudonyms or make statements to the
system. Other online systems may attempt to
establish that users are actually humans
rather than automated scripts, such as the
graphical "reverse Turing tests"
employed by Yahoo Mail, where the user must
read a distorted word from the screen, or
describe what word characterizes a given
picture (see www.captcha.net).
However,
while these approaches may bound the number
of pseudonyms or attestations, they
certainly cannot ensure the one-to-one
correspondence that we want: the
participants in large-scale systems are
simply too heterogeneous. Individuals are
willing to commit different amounts of their
times, and computers run at greatly
differing speeds or bandwidth.
Validating
statements
Validation
means making sure that the statement was
actually made by the person to whom it is
attributed. In a system that prevents
pseudospoofing by authenticating pseudonyms
as distinct, the practice of validating
statements is easy. For example, in web
portals that require users to login -- i.e.,
a centralized storage and authentication
facility -- posted statements are implicitly
valid and bound to pseudonyms. In more
diffuse peer-to-peer environments, digital
signatures can similarly guarantee the
authenticity of statements -- although the
problem of verifying large sets of
signatures may require more complicated
system engineering.
One
implication of shilling attacks is that a
user cannot trust attested statements carte
blanche. Colluding identities can
cooperatively raise each others'
reputations, or degrade the reputations of
their competition. Therefore, a reputation
system must consider the credibility of the
identities making statements. The PGP Web of
Trust and Advogato's Trust Metric are both
systems that explore the automated
transitively of trust.
Ensuring
reports are complete
When
centralized, trusted storage or aggregation
entities are assured, users posting
statements under their respective pseudonyms
know that their rating will be included in
their target's report or aggregated totals.
For example, Ebay ensures that they expose
all the feedback on a given user.
Furthermore, they can limit feedback to
those identities performing a transaction,
i.e., the winner of an auction, to limit
shilling attacks.
A
distributed system, on the other hand, has a
difficult time ensuring that reports or
rating aggregations are complete.
Participants must broadcast their statements
to all users. However, sending individual
messages to every participant does not
scale, and Internet multicast is not a
viable option in its current deployment.
There are cryptographic techniques to ensure
secure broadcast, but they are too expensive
both in terms of computation and bandwidth.
Bootstrapping
the system and users' initial reactions
Lastly,
reputation-based trust must have some method
to bootstrap operation. After a system
starts but before sufficient ratings have
been collected, how do you make decisions?
In noncommercial domains, it may be fine to
list some entities and declare no knowledge
or preference. In others, it may be more
reasonable to list only entities for which a
sufficiently certain score is known. Initial
ratings could be collected by user surveys
or polls relating only to known out-of-band
information, such as if the entity exists in
some alternate real-world domain. As the
user continues to participate, the system
can collect more feedback based on
transactions, interactions, etc.
Bootstrapping
is a different issue in a centralized
authentication system than in a
decentralized system. In a decentralized
system, each user develops his own picture
of the universe: he builds his trust of each
entity based on his own evidence of past
performance and on referrals from other
trusted parties. Every new user effectively
joins the system "at the
beginning," and the process of building
a profile for new users is an ongoing
process throughout the entire lifetime of
the system. In a centralized environment, on
the other hand, ratings are accumulated
across many different transactions and over
long periods of time. New users trust the
centralized repository to provide
information about times and transactions
that happened before the user joined the
system.
Section 5:
Economics of Reputation
Despite the
many theoretical problems and pitfalls with
online reputation systems in a pseudonymous
context, we can make significant progress
with reputation system design from
simplified models.
This section explores such simplified
reputation systems in an economic context.
From an
economic perspective, the first question is
whether a reputation system promotes good
behavior by participants, for example
truthful reporting of quality by sellers in
an electronic market or good-faith comments
by participants in an electronic discussion
forum. Some other interesting challenges in
the design of well-functioning reputation
systems include: (a) eliciting the right
amount of feedback at the right time; (b)
eliciting truthful feedback; and (c)
preventing strategic manipulation by
participants. Recent years have seen some
interesting theoretical analysis, that
addresses different methods to tackle each
of these challenges. Although the analysis
is often performed for quite stylized
models, e.g. with simplified assumptions
about what agents can do or how decisions
are made, we can nevertheless draw some
useful conclusions. In particular, social
and economic incentives influence design
criteria in ways that a purely technological
discussion of reputation systems and
pseudonymous identities don't consider.
A number of
studies have considered eBay-style
reputation systems, in which ratings are
positive, negative or neutral, and
information is provided in aggregate form (a
tally of the amount of feedback in each
category over the past 6 months). Simple
stylized models allow us to consider the
incentives for a seller to misreport the
quality of a good, or a buyer to provide
feedback based on differences between
reported quality and actual quality.
A basic
question, addressed by Dellarocas (2001), is
whether eBay-style reputation mechanisms can
be well-functioning (stable), such
that sellers adopt a consistent strategy of
(mis)reporting quality levels of goods for
sale, and truthful reporting, such
that the assessment of quality by buyers was
equal to the true quality of a good.
Assuming truthful feedback, in which buyers
provide negative feedback if and only if the
actual quality is sufficiently less than the
reported quality, the analysis demonstrates
that binary reputation mechanisms can be
well-functioning and provide incentives for
sellers to resist the temptations of
"misrepresentation and sloth"
(Miller et al. 2002). Recently, Dellarocas
(2002) has considered the role of the type
of information that is disseminated by
reputation systems based on feedback
information, and demonstrated that it is
sufficient to provide eBay-style aggregate
information instead of a full feedback
history. The model is again stylized, with
cooperation by buyers in providing truthful
and complete feedback.
One common
theme is the interaction between reputation
systems and concepts of pseudonymity, which
are important in environments in which fluid
identities are either necessary for reasons
of privacy and free discourse, or
unavoidable for technological or economic
reasons. Friedman and Resnick (2001)
consider the social cost of cheap
pseudonyms, and note that participants that
can costlessly adopt a new pseudonym should
rationally do so whenever their reputation
falls below that of a newcomer to a system.
In this sense, pseudonyms incur
a social cost because all else being equal
it would be more efficient to trust than
distrust strangers in a world in which most
people are trustworthy. As one solution,
Friedman and Resnick propose that
participants can choose to adopt {\em
once-in-a-lifetime} (1L) pseudonyms for
particular arenas, such as product classes
on eBay or discussion threads on Slashdot.
These 1L pseudonyms allow a participant to
commit to maintaining a single identity and
can be implemented with a trusted
intermediary and the standard encryption
technique of blind signatures.
Turning to
the problem of providing incentives for
reputation systems to effectively elicit the
right amount of feedback, and at the right
time and with the right level of detail,
this can be considered within the standard
framework of social-choice theory that seeks
to implement good system-wide outcomes in
systems with self-interested participants.
Avery et al. (1999) consider the problem of
eliciting the right amount of feedback about
a product of a fixed quality, such as a new
restaurant or a new Broadway show. Noting
that the value of information in early
feedback is higher than in late feedback
because it can improve the decisions of more
individuals, payment schemes are proposed to
provide incentives for a socially-optimal
quantity and sequencing of evaluations.
Early evaluators are paid to
provide information and later evaluators pay
to balance the budget, to mitigate the
tendency for an underprovision of
evaluations and internalize the system-wide
value-of-information provided by early
feedback. This "market for
evaluations" continues to assume full
and honest evaluations.
Recently,
Miller et al. (2002) suggest the use of proper
scoring rules to elicit honest
and truthful feedback from participants in a
reputation system. The basic idea is to make
payments based on how well feedback predicts
the feedback from later participants. Thus,
in addition to collecting, aggregating, and
distributing information, the reputation
system also provides rewards and imposes
penalties based on the feedback provided by
participants. In the setting of an
electronic market, feedback information
provided about a particular seller by a
buyer is used by the intermediary to infer posterior
distributional information about the future
feedback on that seller. The buyer then
receives a payment that is computed,
according to a proper scoring rule, so that
the expected payment is maximized when the
buyer provides truthful feedback and
maximizes the predictive accuracy of the
posterior distribution and honest reporting
is a Nash equilibrium. Extensions are also
proposed to induce the right level of effort
in providing feedback information, and to
compensate for costly reporting of
information. One problem that remains is
that there are many Nash equilibrium of the
proposed payment scheme, with collusive
reports and "herding" style
behavior (Bikchandani et al. 1989) another
possible outcome.
In summary,
useful reputation mechanisms cannot and
should not be designed without regards to
the economics of the problem, but rather
through a careful consideration of both the
computational and incentive aspects of
well-functioning reputation systems. Design
considerations include: the type of feedback
information that is collected (e.g. binary,
continuous, etc.); the form of information
feedback that is distributed (e.g.
aggregated, complete history, time-windowed,
etc.); the role of the reputation system in
providing incentives for the provision of
honest and accurate feedback information,
for example via payment schemes; the
interaction between psuedonymity and the
economics of reputation; and the robustness
of a system against strategic manipulation
by participants.
Section 6:
Brief conclusion
Reputation
is a complex but powerful tool to predict an
entity's behavior online. Effective
reputation systems must use algorithms
suited to the situation ("how much is
at stake? who wants to attack the
system?"), and they must also consider
social and economic incentives.
We've
gained experience from a variety of fielded
systems, ranging from auction sites like
eBay that track reputation of their users
for being good at paying on time or
delivering the claimed goods, to web search
engines designed to determine the most
suitable web page for a query while
simultaneously preventing people from easily
influencing which page is returned. While
the field of reputation system research is
still quite young, the idea of making
predictions based on observed behavior is
intuitive and flexible. Reputation must play
a role in any identity management solution.
References
P. Resnick R. Zeckhauser, E. Friedman, K. Kuwabara.
Reputation Systems: Facilitating Trust in
Internet Transactions. Comm'ACM. 40(3) 56-58, 2000.
C. Dellarocas, "Analyzing the Economic Efficiency of eBay-like Online
Reputation Reporting Mechanisms", in Proc. 3rd ACM Conf. on Electronic
Commerce (EC'01), 2001.
C. Dellarocas, Efficiency and Robustness of Mediated Online Feedback
Mechanisms: The Case of eBay. SITE'02. (working paper).
E. Friedman & P. Resnick. The Social Cost of Cheap Pseudonyms.
J. Economics and Management Strategy 10(2) 173-199, 2001.
George A Akerlof. The Market for Lemons: Quality Uncertainty and the
Market Mechanism. Quarterly Journal of Economics. 84:488-500, 1970.
C. Avery, P. Resnick, R. Zeckhauser (99). The Market for Evaluations.
American Economic Review 89(3): 564-584.
Nolan Miller, Paul Resnick and Richard Zeckhauser. Eliciting Honest
Feedback in Electronic Markets. August, 2002. SITE'02.
R. Axelrod. The Evolution of Cooperation. 1984. New York: Basic
Books.
S. Bikchandani, D. Hirshleifer, and I. Welch. A Theory of Fads,
Fashion, Custom and Cultural Change as Informational
Cascades. J. Pol. Econ., 100(5) 992-1026.
|