MIT6.5840-2025 Lab1: MapReduce

MIT6.5840-2025 Lab1: MapReduce

Lab1 URL: https://pdos.csail.mit.edu/6.824/labs/lab-mr.html

任务

Your job is to implement a distributed MapReduce, consisting of two programs, the coordinator and the worker. There will be just one coordinator process, and one or more worker processes executing in parallel. In a real system the workers would run on a bunch of different machines, but for this lab you’ll run them all on a single machine. The workers will talk to the coordinator via RPC. Each worker process will, in a loop, ask the coordinator for a task, read the task’s input from one or more files, execute the task, write the task’s output to one or more files, and again ask the coordinator for a new task. The coordinator should notice if a worker hasn’t completed its task in a reasonable amount of time (for this lab, use ten seconds), and give the same task to a different worker.

架构图

设计细节

如何处理Crashed Workers?

The best you can do is have the coordinator wait for some amount of time, and then give up and re-issue the task to a different worker. For this lab, have the coordinator wait for ten seconds; after that the coordinator should assume the worker has died (of course, it might not have).

Go语言RPC传输中,应该Register什么?

在Go的RPC系统中,类型注册的机制有其特定的设计哲学。本质上,RPC框架期望我们注册的是服务端点(包含可调用方法的对象),而非单纯的数据传输载体。这就像在餐厅里,我们需要登记的是提供服务的服务员,而不是在厨房间传递的食材。

最佳实践启示我们:

  1. 服务注册应聚焦于具备服务能力的对象(如Coordinator),这些对象需要提供符合RPC规范的方法签名(方法名首字母大写,特定参数格式)
  2. 数据传输结构体应当保持简单纯粹,作为消息载体而非服务提供者。Gob编码器会自动处理这些结构的序列化,就像快递员自然知道如何打包标准尺寸的包裹
  3. 当遇到类型注册错误时,就像看到"此路不通"的标志,我们应该重新审视:
    • 是否误将数据传输模型当作服务端点注册
    • 方法签名是否满足RPC调用的格式要求
    • 通信双方的类型定义是否一致

这种设计体现了Go语言"显式优于隐式"的哲学,要求开发者明确区分服务提供者和数据传输对象。就像在邮政系统中,我们需要区分邮局(服务提供者)和信件(数据载体)的不同角色。

Go语言RPC传输中,什么能传什么不能传?

​​不可传输​​函数类型或闭包。RPC系统仅支持数据传输,无法传输代码逻辑。

Go RPC默认使用Gob编码,支持:

  • 基本类型(int, string等)
  • struct(所有导出字段)
  • array/slice/map
  • 嵌套结构(只要所有层级都可导出)

Reference