Mapreduce and Primary-Backup Key-Value Service

今天写完了分布式的作业2,正好总结一下作业1-Mapreduce和作业2-Primary-Backup Key-Value Service。

Assignment 1: MapReduce

Due: Sunday Sep 22, 11:59:59pm

Introduction

In this assignment you’ll build a MapReduce library as a way to learn the Go
programming language and as a way to learn about fault tolerance in
distributed systems. In the first part you will write a simple
MapReduce program. In the second part you will write a Master that
hands out jobs to workers, and handles failures of workers. The
interface to the library and the approach to fault tolerance is
similar to the one described in the original
MapReduce paper.

Collaboration Policy

Please refer to Assignment 0.

Software

You’ll implement this assignment (and all the assignments) in Go 1.2 or
later
. The Go web site contains lots of
tutorial information which you may want to look at. We supply you with
a non-distributed MapReduce implementation, and a partial
implementation of a distributed implementation (just the boring bits).

It’s your responsibility to install Go in your development
environment. We recommend using your distribution’s package manager.

On OS X with homebrew:

1
$ brew install go

On Ubuntu/Debian:

1
$ sudo apt-get install golang

On Arch:

1
$ sudo pacman -S go

Getting started

There is an input file kjv12.txt in src/main, which was
downloaded from here.
Compile the initial software we provide you with and run it with the downloaded input
file:

1
2
3
4
5
6
$ export GOPATH=$HOME/4113
$ cd ~/4113/src/main
$ go run wc.go master kjv12.txt sequential
# command-line-arguments
./wc.go:11: missing return at end of function
./wc.go:15: missing return at end of function

The compiler produces two errors, because the implementation of the
Map and Reduce functions is incomplete.

Part I: Word count

Modify Map and Reduce so that wc.go reports the
number of occurrences of each word in alphabetical order.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
$ go run wc.go master kjv12.txt sequential
Split kjv12.txt
Split read 4834757
DoMap: read split mrtmp.kjv12.txt-0 966954
DoMap: read split mrtmp.kjv12.txt-1 966953
DoMap: read split mrtmp.kjv12.txt-2 966951
DoMap: read split mrtmp.kjv12.txt-3 966955
DoMap: read split mrtmp.kjv12.txt-4 966944
DoReduce: read mrtmp.kjv12.txt-0-0
DoReduce: read mrtmp.kjv12.txt-1-0
DoReduce: read mrtmp.kjv12.txt-2-0
DoReduce: read mrtmp.kjv12.txt-3-0
DoReduce: read mrtmp.kjv12.txt-4-0
DoReduce: read mrtmp.kjv12.txt-0-1
DoReduce: read mrtmp.kjv12.txt-1-1
DoReduce: read mrtmp.kjv12.txt-2-1
DoReduce: read mrtmp.kjv12.txt-3-1
DoReduce: read mrtmp.kjv12.txt-4-1
DoReduce: read mrtmp.kjv12.txt-0-2
DoReduce: read mrtmp.kjv12.txt-1-2
DoReduce: read mrtmp.kjv12.txt-2-2
DoReduce: read mrtmp.kjv12.txt-3-2
DoReduce: read mrtmp.kjv12.txt-4-2
Merge phaseMerge: read mrtmp.kjv12.txt-res-0
Merge: read mrtmp.kjv12.txt-res-1
Merge: read mrtmp.kjv12.txt-res-2

The output will be in the file “mrtmp.kjv12.txt”. Your implementation is
correct if the following command produces the following top 10 words:

1
2
3
4
5
6
7
8
9
10
11
$ sort -n -k2 mrtmp.kjv12.txt | tail -10
unto: 8940
he: 9666
shall: 9760
in: 12334
that: 12577
And: 12846
to: 13384
of: 34434
and: 38850
the: 62075

To make testing easy for you, run:

1
$ ./test-wc.sh

and it will report if your solution is correct or not.

Before you start coding reread Section 2 of the MapReduce
paper
and our code for MapReduce, which is in mapreduce.go in
package mapreduce. In particular, you want to read the code of the
function RunSingle and the functions it calls. This will help you
understand what MapReduce does and learn Go by example.

Once you understand this code, implement Map and Reduce in
wc.go.

Hint: you can use
strings.FieldsFunc
to split a string into components.

Hint: for the purposes of this exercise, you can consider a word to be
any contiguous sequence of letters, as determined by
unicode.IsLetter.
A good read on what strings are in Go is the Go Blog on strings.

Hint: the strconv package (http://golang.org/pkg/strconv/) is handy to
convert strings to integers, etc.

You can remove the output file and all intermediate files with:

1
$ rm mrtmp.*
1
2
3
4
5
$ git commit -am "[you fill me in]"
$ git tag -a -m "i finished assignment 1 part 1" a1p1
$ git push origin master
$ git push origin a1p1
$

Part II: Distributing MapReduce jobs

In this part you will design and implement a master who distributes
jobs to a set of workers. We give you the code for the RPC messages
(see common.go in the mapreduce package) and the code
for a worker (see worker.go in the mapreduce package).

Your job is to complete master.go in the mapreduce
package. In particular, the RunMaster() function in
master.go should return only when all of the map and reduce tasks
have been executed. This function will be invoked from the Run()
function in mapreduce.go.

The code in mapreduce.go already implements the
MapReduce.Register RPC function for you, and passes the new
worker’s information to mr.registerChannel. You should process
new worker registrations by reading from this channel.

Information about the MapReduce job is in the MapReduce struct,
defined in mapreduce.go. Modify the MapReduce struct to
keep track of any additional state (e.g., the set of available workers),
and initialize this additional state in the InitMapReduce()
function. The master does not need to know which Map or Reduce functions
are being used for the job; the workers will take care of executing the
right code for Map or Reduce.

In Part II, you don’t have to worry about the failures of the workers. You are
done with Part II when your implementation passes the first test set in
test_test.go in the mapreduce package.

test_test.go uses Go’s unit testing. From now on all exercises
(including subsequent assignments) will use it, but you can always run the actual
programs from the main directory. You run unit tests in a package
directory as follows:

1
$ go test

The master should send RPCs to the workers in parallel so that the workers
can work on jobs concurrently. You will find the go statement useful
for this purpose, and so is the Go RPC documentation.

The master may have to wait for a worker to finish before it can hand out
more jobs. You may find channels useful to synchronize the threads that are waiting
for reply with the master once the reply arrives. Channels are explained in the
document on Concurrency in Go.

We’ve given you code that sends RPCs via “UNIX-domain sockets”.
This means that RPCs only work between processes on the same machine.
It would be easy to convert the code to use TCP/IP-based
RPC instead, so that it would communicate between machines;
you’d have to change the first argument to calls to Listen() and Dial() to
“tcp” instead of “unix”, and the second argument to a port number
like “:5100”. You will need a shared distributed file system.

The easiest way to track down bugs is to insert log.Printf()
statements, collect the output in a file with go test >
out
, and then think about whether the output matches your
understanding of how your code should behave. The last step is most important.

Please let us know that you’ve gotten this far in the assignment, by
pushing a tag to github.

1
2
3
4
$ git commit -am "[you fill me in]"
$ git tag -a -m "i finished assignment 1 part 2" a1p2
$ git push origin master
$ git push origin a1p2

Part III: Handling worker failures

In this part you will make the master handle worker failures. In
MapReduce handling failures of workers is relatively straightforward,
because the workers don’t have persistent state. If the worker fails,
any RPCs that the master issued to that worker will fail (e.g., due to
a timeout). Thus, if the master’s RPC to the worker fails, the master
should re-assign the job given to the failed worker to another worker.

An RPC failure doesn’t necessarily mean that the worker failed; the worker
may just be unreachable but still computing. Thus, it may happen that two
workers receive the same job and compute it. However, because jobs are
idempotent, it doesn’t matter if the same job is computed twice - both times it
will generate the same output. So, you don’t have to do anything special for this
case. (Our tests never fail workers in the middle of job, so you don’t even have
to worry about several workers writing to the same output file.)

You don’t have to handle failures of the master; we will assume it
won’t fail. Making the master fault tolerant is more difficult because
it keeps persistent state that must be replicated in order to make the master
fault tolerant. Keeping replicated state consistent in the presence of
failures is challenging. Much of the later assignments is devoted to this
challenge.

Your implementation must pass two remaining test cases in
test_test.go. The first case tests the failure of one
worker. The second test case tests handling of many failures of
workers. Periodically, the test cases start new workers that the
master can use to make progress, but these workers fail after
handling a few jobs.

Handin procedure

You hand in your assignment exactly as you’ve been letting us know
your progress:

1
2
3
4
$ git commit -am "[you fill me in]"
$ git tag -a -m "i finished assignment 1" a1handin
$ git push origin master
$ git push origin a1handin

You should verify that you are able to see your final commit and your
a1handin tag on the Github page in your repository for this
assignment.

You will receive full credit if your software passes the
test_test.go tests when we run your software on our machines.
We will use the timestamp of your last a1handin tag for the
purpose of calculating late days, and we will only grade that version of the
code. (We’ll also know if you backdate the tag, don’t do that.)

Questions

Please post questions on Piazza.

作业1:作业主要是重现了著名的Mapreduce方法。Mapreduce的思路并不复杂,由名字可以知道最重要的两个步骤map和reduce,在分布式中还partition和merge两部,也就是:先将任务partition成多个部分->对每个部分分别进行map和reduce,工作可以由不同的worker同时分布进行->将结果merge在一起。
举最常见的word count作为例子,先将文章partition成多个部分,然后对每一部分进行map,map的作用是将文章中的每个字转化成{word : 1}的模式;然后reduce将所有相同的key的value加在一起得到每部分的最终结果,然后各个部分再merge到一起得到结果。本身Mapreduce并不是很高深的方法,但它的优势在于很适合分布式系统,所以在作业中我们实现了master+worker的模式。

我在作业中的思路是:先进行map操作,有多少个map任务(partition),master就开多少个线程,这些线程不干别的,只是一直在等待可以空闲的worker,一有空闲的worker他们就去占用然后进行map操作,操作完再释放这个worker,所以这里我们使用一个workChannel去出栈入栈现在空闲的worker。每个线程的map操作完成后向master发送一个完成的信号(通过workDoneChannel),master统计发现所有开的线程的任务都完成后在进行reduce,思路相同。


Assignment 2: Primary/Backup Key/Value Service

Part A Due: Saturday Oct 5, 11:59:59pm

Part B Due: Saturday Oct 12, 11:59:59pm

Introduction

In the MapReduce assignment handling failures is relatively easy
because the workers don’t maintain state. The master does maintain
state, but you didn’t have to make the master fault tolerant. This
assignment is a first step towards making stateful servers fault tolerant.

Road map for Assignment 2-4

In the next 3 assignments you will build several key/value
services. The service supports three RPCs: Put(key, value),
PutHash(key, value), and Get(key). The service maintains a simple
database of key/value pairs. Put() updates the value for a particular
key in the database. PutHash chains all values for a key together,
which is useful for testing purposes; PutHash stores the hash(old
value of the key in database, new supplied value) into database, and
returns the old value. Get() fetches the current value for a key.

These 3 assignments differ in the degree of fault tolerance,
performance, and scalability they provide for the key/value service:

In all three assignments you will have to do substantial design. We
give you a sketch of the overall design (and code for the boring
pieces), but you will have to flesh it out and nail down a complete
protocol. The test cases test failure scenarios to see if your
protocol is correct for that scenario. It is likely that some of the
test cases will point out a flaw in your design and protocol, and you
may have to redesign your implementation and protocol. Think
carefully before you start coding so that you can avoid many
iterations. We don’t give you a description of the test cases (other
than the Go code); in the real world, you would have to come up with
them yourself.

Overview of Assignment 2

In this assignment you’ll make a key/value service fault-tolerant
using a form of primary/backup replication. In order to ensure that
all parties (clients and servers) agree on which server is the
primary, and which is the backup, we’ll introduce a kind of master
server, called the viewservice. The viewservice monitors whether each
available server is alive or dead. If the current primary or backup
becomes dead, the viewservice selects a server to replace it. A client
checks with the viewservice to find the current primary. The servers
cooperate with the viewservice to ensure that at most one primary is
active at a time.

Your key/value service will allow replacement of failed servers. If
the primary fails, the viewservice will promote the backup to be
primary. If the backup fails, or is promoted, and there is an idle
server available, the viewservice will cause it to be the backup.
The primary will send its complete database to the new backup,
and then send subsequent Puts to the backup to ensure that the
backup’s key/value database remains identical to the primary’s.

It turns out that the primary must send Gets as well as Puts to the backup
(if there is one), and must wait for the backup to reply before
responding to the client. This helps prevent two servers from acting
as primary (a “split brain”). An example: S1 is the primary and S2 is
the backup. The viewservice decides (incorrectly) that S1 is dead,
and promotes S2 to be primary. If a client thinks that S1 is still the
primary and sends it an operation, S1 will forward the operation to
S2, and S2 will reply with an error indicating that it is no longer
the backup (assuming S2 obtained the new view from the viewservice).
S1 can then return an error to the client indicating that S1 might no
longer be the primary (reasoning that, since S2 rejected the
operation, a new view must have been formed); the client can then ask
the viewservice for the correct primary (S2) and send it the
operation.

A failed key/value server may restart, but it will do so without a
copy of the replicated data (i.e. the keys and values). That is, your
key/value server will keep the data in memory, not on disk. One
consequence of keeping data only in memory is that if there’s no
backup, and the primary fails, and then restarts, it cannot then act
as primary.

Only RPC may be used for interaction between clients and servers,
between different servers, and between different clients. For example,
different instances of your server are not allowed to share Go
variables or files.

The design outlined in the assignment has some fault-tolerance and
performance limitations:

We will address these limitations in later assignments by using better
designs and protocols. This assignment will make you understand what
the tricky issues are so that you can design better design/protocols.
Also, parts of this assignment’s design (e.g., a separate view
service) are uncommon in practice.

The primary/backup scheme in this assignment is not based on any
published protocol. In fact, this assignment doesn’t specify a
complete protocol; you must flesh out the protocol. The protocol has
similarities with Flat Datacenter Storage (the viewservice is like
FDS’s metadata server, and the primary/backup servers are like FDS’s
tractservers), though FDS pays far more attention to performance.
It’s also a bit like a MongoDB replica set (though MongoDB selects the
leader with a Paxos-like election). For a detailed description of a
(different) primary-backup-like protocol, see Chain
Replication
.
Chain Replication has higher performance than this assignment’s
design, though it assumes that the viewservice never declares a
server dead when it is merely partitioned. See Harp and Viewstamped
Replication for a detailed treatment of high-performance
primary/backup and reconstruction of system state after various kinds
of failures.

Collaboration Policy

Please refer to Assignment 0.

Software

Do a git pull to get the latest assignment software. We supply you
with new skeleton code and new tests in src/viewservice and
src/pbservice.

1
2
3
4
5
6
7
8
9
10
11
$ cd ~/4113
$ git pull
...
$ cd src/viewservice
$ go test
2012/12/28 14:51:47 method Kill has wrong number of ins: 1
First primary: --- FAIL: Test1 (1.02 seconds)
test_test.go:13: wanted primary /var/tmp/viewserver-35913-1, got
FAIL
exit status 1
FAIL _/afs/athena.mit.edu/user/r/t/rtm/4113/src/viewservice 1.041s

Ignore the method Kill error message now and in the future.
Our test code fails because viewservice/server.go has empty
RPC handlers.

You can run your code as stand-alone programs using the source in
main/viewd.go,
main/pbd.go, and
main/pbc.go.
See the comments in pbc.go.

Part A: The Viewservice

First you’ll implement a viewservice and make sure it passes our tests; in
Part B you’ll build the key/value service. Your viewservice won’t itself be
replicated, so it will be relatively straightforward. Part B is much harder than
part A, because the K/V service is replicated and you have to flesh out the
replication protocol.

The viewservice goes through a sequence of numbered
views, each with a primary and (if possible) a backup.
A view consists of a view number and the identity (network port name) of
the view’s primary and backup servers.

The primary in a view must always be either the primary
or the backup of the previous view. This helps ensure
that the key/value service’s state is preserved.
An exception: when the viewservice first starts, it should
accept any server at all as the first primary.
The backup in a view can be any server (other than the primary),
or can be altogether missing if no server is available
(represented by an empty string, “”).

Each key/value server should send a Ping RPC once per
PingInterval
(see viewservice/common.go).
The viewservice replies to the Ping with a description of the current
view. A Ping lets the viewservice know that the key/value
server is alive; informs the key/value server of the current
view; and informs the viewservice of the most recent view
that the key/value server knows about.
If the viewservice doesn’t receive a Ping from a server
for DeadPings PingIntervals, the
viewservice should consider the server to be dead.
When a server re-starts after a crash, it should send
one or more Pings with an argument of zero to inform
the viewservice that it has crashed (of course, duplicate
Ping(0) calls will be interpreted as repeated
crashes).

The viewservice proceeds to a new view when either it hasn’t
received a Ping from the primary or backup for DeadPings
PingIntervals, or
if the primary or backup crashed and restarted, or
if there is no backup and there’s an idle server
(a server that’s been Pinging but is
neither the primary nor the backup).
But the viewservice must not change views (i.e., return
a different view to callers) until
the primary from the current view acknowledges
that it is operating in the current view (by sending
a Ping with the current view number). If the viewservice has not yet
received an acknowledgment for the current view from the primary of
the current view, the viewservice should not change views even if it
thinks that the primary or backup has died.

The acknowledgment rule prevents the viewservice from getting more than one
view ahead of the key/value servers. If the viewservice could get arbitrarily
far ahead, then it would need a more complex design in which it kept a history
of views, allowed key/value servers to ask about old views, and
garbage-collected information about old views when appropriate. The downside of
the acknowledgement rule is that if the primary fails before it acknowledges the
view in which it is primary, then the viewservice cannot change views, spins
forever, and cannot make forward progress.

An example sequence of view changes:

The above example is overspecified; for example, when the view server
gets Ping(1) from S1 for the first time, it is also OK for it
to return view 1, as long as it eventually switches to view 2 (which
includes S2).

We provide you with a complete client.go and
appropriate RPC definitions in common.go.
Your job is to supply the needed code in server.go.
When you’re done, you should pass all the viewservice
tests:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
$ cd ~/4113/src/viewservice
$ go test
Test: First primary ...
... Passed
Test: First backup ...
... Passed
Test: Backup takes over if primary fails ...
... Passed
Test: Restarted server becomes backup ...
... Passed
Test: Idle third server becomes backup if primary fails ...
... Passed
Test: Restarted primary treated as dead ...
... Passed
Test: Viewserver waits for primary to ack view ...
... Passed
Test: Uninitialized server can't become primary ...
... Passed
PASS
ok viewservice 7.457s

The above output omits some benign Go RPC errors.

Hint: you’ll want to add field(s) to ViewServer in
server.go in order to keep track of the most recent
time at which the viewservice has heard a Ping from each
server. Perhaps a map from server names to
time.Time. You can find the current time with time.Now().

Hint: add field(s) to ViewServer to keep track of the
current view.

Hint: you’ll need to keep track of whether the primary for the
current view has acknowledged it (in PingArgs.Viewnum).

Hint: your viewservice needs to make periodic decisions, for
example to promote the backup if the viewservice has missed DeadPings
pings from the primary. Add this code to the tick()
function, which is called once per PingInterval.

Hint: there may be more than two servers sending Pings. The
extra ones (beyond primary and backup) are volunteering
to be backup if needed.

Hint: the viewservice needs a way to detect that a primary
or backup has failed and re-started. For example, the primary
may crash and quickly restart without missing sending a
single Ping.

Hint: study the test cases before you start programming. If you fail a
test, you may have to look at the test code in test_test.go to figure
out what the failure scenario is.

The easiest way to track down bugs is to insert log.Printf()
statements, collect the output in a file with go test >
out
, and then think about whether the output matches your
understanding of how your code should behave. The last step is most important.

Remember that the Go RPC server framework starts a new thread for each
received RPC request. Thus if multiple RPCs arrive at the same time
(from multiple clients), there may be multiple threads running
concurrently in the server.

The tests kills a server by setting its dead flag. You must
make sure that your server terminates correctly when that flag is set, otherwise
you may fail to complete the test cases.

Part B: The primary/backup key/value service

Your key/value service should continue operating correctly as long as
there has never been a time at which no server was alive. It should
also operate correctly with partitions: a server that suffers from
temporary network failure without crashing, or can talk to some
computers but not others. If your service is operating with just one
server, it should be able to incorporate a recovered or idle server
(as backup), so that it can then tolerate another server failure.

Correct operation means that calls to Clerk.Get(k) return the latest
value set by a successful call to Clerk.Put(k,v) or
Clerk.PutHash(k,v), or an empty string if the key has never been
Put()’ed. All operations should provide at-most-once semantic.

You should assume that the viewservice never halts or crashes.

Your clients and servers may only communicate using RPC, and both
clients and servers must
send RPCs with the call() function in client.go.

It’s crucial that only one primary be active at any given time. You
should have a clear story worked out for why that’s the case for your
design. A danger: suppose in some view S1 is the primary; the viewservice changes
views so that S2 is the primary; but S1 hasn’t yet heard about the new
view and thinks it is still primary. Then some clients might talk to
S1, and others talk to S2, and not see each other’s Put()s.

A server that isn’t the active primary should either not respond to
clients, or respond with an error: it should set GetReply.Err or
PutReply.Err to something other than OK.

Clerk.Get(), Clerk.Put(), and Clerk.PutHash() should only return when they
have completed the operation. That is, Puts should keep trying until it has
updated the key/value database, and Clerk.Get() should keep trying until it has
retrieved the current value for the key (if any). Your server must filter out
the duplicate RPCs that these client re-tries will generate to ensure
at-most-once semantics for operations. You can assume that each clerk has only
one outstanding Put or Get. Think carefully about what the commit point is for
a Put.

A server should not talk to the viewservice for every Put/Get
it receives, since that would put the viewservice on the critical path
for performance and fault-tolerance. Instead servers should
Ping the viewservice periodically
(in pbservice/server.go‘s tick())
to learn about new views.

Part of your one-primary-at-a-time strategy should rely on the
viewservice only promoting the backup from view i
to be primary in view i+1. If the old primary from
view i tries to handle a client request, it will
forward it to its backup. If that backup hasn’t heard about
view i+1, then it’s not acting as primary yet, so
no harm done. If the backup has heard about view i+1
and is acting as primary, it knows enough to reject the old
primary’s forwarded client requests.

You’ll need to ensure that the backup sees every update to the
key/value database, by a combination of the primary initializing it with
the complete key/value database and forwarding subsequent
client Puts.

The skeleton code for the key/value servers is in src/pbservice.
It uses your viewservice, so you’ll have to set up
your GOPATH as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
$ export GOPATH=$HOME/4113
$ cd ~/4113/src/pbservice
$ go test -i
$ go test
Single primary, no backup: --- FAIL: TestBasicFail (2.00 seconds)
test_test.go:50: first primary never formed view
--- FAIL: TestFailPut (5.55 seconds)
test_test.go:165: wrong primary or backup
Concurrent Put()s to the same key: --- FAIL: TestConcurrentSame (8.51 seconds)
...
Partition an old primary: --- FAIL: TestPartition (3.52 seconds)
test_test.go:354: wrong primary or backup
...

Here’s a recommended plan of attack:

You’re done if you can pass all the pbservice tests:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
$ cd ~/4113/src/pbservice
$ go test
Test: Single primary, no backup ...
... Passed
Test: Add a backup ...
... Passed
Test: Primary failure ...
... Passed
Test: Kill last server, new one should not be active ...
... Passed
Test: at-most-once Put; unreliable ...
... Passed
Test: Put() immediately after backup failure ...
... Passed
Test: Put() immediately after primary failure ...
... Passed
Test: Concurrent Put()s to the same key ...
... Passed
Test: Concurrent Put()s to the same key; unreliable ...
... Passed
Test: Repeated failures/restarts ...
... Put/Gets done ...
... Passed
Test: Repeated failures/restarts; unreliable ...
... Put/Gets done ...
... Passed
Test: Old primary does not serve Gets ...
... Passed
Test: Partitioned old primary does not complete Gets ...
... Passed
PASS
ok pbservice 113.352s

You’ll see some “method Kill has wrong number of ins” complaints
and lots of “rpc: client protocol error” and “rpc: writing response”
complaints; ignore them.

Hint: you’ll probably need to create new RPCs to forward client
requests from primary to backup, since the backup should reject
a direct client request but should accept a forwarded request.

Hint: you’ll probably need to create new RPCs to handle the transfer
of the complete key/value database from the primary to a new backup.
You can send the whole database in one RPC (for example,
include a map[string]string in the RPC arguments).

Hint: the state to filter duplicates must be replicated along with the key/value
state.

Hint: the tester arranges for RPC replies to be lost in tests whose
description includes “unreliable”. This will cause RPCs to be executed
by the receiver, but since the sender sees no reply, it cannot
tell whether the server executed the RPC.

You may find you want to generate numbers that have
a high probability of being unique. Try this:

1
2
3
4
5
6
7
8
import "crypto/rand"
import "math/big"
func nrand() int64 {
max := big.NewInt(int64(1) << 62)
bigx, _ := rand.Int(rand.Reader, max)
x := bigx.Int64()
return x
}

The tests kills a server by setting its dead flag. You must
make sure that your server terminates correctly when that flag is set, otherwise
you may fail to complete the test cases.

Hint: even if your viewserver passed all the tests in Part A, it
may still have bugs that cause failures in Part B.

Hint: study the test cases before you start programming

Handin procedure

You hand in your assignment as before.

For Part A:

1
2
3
4
$ git commit -am "[you fill me in]"
$ git tag -a -m "i finished assignment 2a" a2ahandin
$ git push origin master
$ git push origin a2ahandin

For Part B:

1
2
3
4
$ git commit -am "[you fill me in]"
$ git tag -a -m "i finished assignment 2b" a2bhandin
$ git push origin master
$ git push origin a2bhandin

You should verify that you are able to see your final commit and tags
on the Github page of your repository for this assignment.

You will receive full credit if your software passes the
test_test.go tests when we run your software on our machines.
We will use the timestamp of your last handin tag for the purpose
of calculating late days and we will only grade that version of the code.
(We’ll also know if you backdate the tag, don’t do that.)

Questions

Please post questions on Piazza.

作业2:作业1中的wordCount由于是stateless,所以面对fault-tolerant处理方法很简单,但是大多数生活中的系统都是stateless的,所以我们后面三个作业逐步深入解决这个问题。
作业2中我们需要实现的是一个key/value数据库,他并不是stateless的,所以我们需要replication。因此这里我们采用Primary+Backup的组合,Primary是现在存储数据的数据库,Backup是后备且与Primary有完全一样数据的备份。那么问题来了:我们的client如何知道哪个数据库server是Primary哪个是Backup,我们不可能不同的client向两个数据库随便传数据,肯定是数据交互都是与Primary进行,Primary死掉Backup才会上位,这个作业我们采用的方式是假设有一个不会死掉的viewServer一直监视着Primary和Backup,client想要发数据前先去viewServer那里得到最新的Primary地址,然后进行数据交互
PartA:PartA部分比较简单,主要就是实现一个viewServer,我们假设viewServer不会死掉,它的功能就是自身存储viewNumber+Primary+Backup,同时每隔固定的时间去Ping Primary和Backup当前viewNumber,这种时候Primary和Backup有两种情况:如果正常则回传viewNumber,死掉则无法回传,我们的viewServer在检测到Primary传回相同的viewNumber后得知Primary进行了ACK,如果viewServer已知Primary已经传回了ACK,同时长时间收不到Primary/backup的信号,那么认为他们死掉了,随即采取相应的解决措施,并改变viewNumber。作业里面有解释为什么一定要等Primary ACK后才可以改变viewNumber。
PartB:PartB部分比起PartA就要复杂的多了,主要原因是这里我们的Primary和Backup会经常死掉,而PartA的viewServer我们假设它不会死掉,所以其实PartB的关键就是要保证时时刻刻Primary与BackUp的数据是一致的。所以我么需要有一个机制去不停地将Primary的数据共享给我们的备份以确保Primary死掉以后Backup存储的数据是正确的,所以作业中我用了两个RPC,分别是当我的Primary检测到有新的Backup启动的时候,我把自己的数据发给他;另一个是每当client向Priamry put的时候,我都会发给Backup同时必须是先发给backup并确认backup接受这个数据没问题以后才更新我的Primary,否则如果Backup接受失败而Primary却更新了,那么系统就会出现问题
另一个难点在于我们的系统会有网络故障的情况,也就是Primary与Backup之间可能会出现unreliable的通信问题。那么解决办法是我们实现at-most once的机制,client的put和get请求都有一个uniqueID,同时client的请求机制是如果没得到结果就一直不同的去请求,我们的Primary和Backup如果对于某个put执行完全成功了,我们把结果记录在{UniqueID:result}的map中,当下次client发来同样的ID的时候,我们直接返回,不再进行其他操作。
这里就遇到了一个很严重的bug:对于多个client同时发来的请求put1(1, 1)和put2(1, 2),put1进入Primary后传给Backup,Backup执行成功返回Primary,但这时返回丢失,Primary返回给client失败(此时Backup{1, 1}),在client再次请求put1之前,put2进来了并且一切正常(此时Primary{1, 2},Backup{1, 2}),之后client1再次发送有uniqueID的请求put1,由于Backup此前已经存有了这个UniqueID,那么不会更新结果,但Primary没有这个UniqueID,会更新为{1, 1},这时就导致了Primary和Backup的不一样。详见testcase:TestConcurrentSameUnreliable。