MIT6.5840-2025 Lab1: MapReduce

MIT6.5840-2025 Lab1: MapReduce
ExisfarMIT6.5840-2025 Lab1: MapReduce
任务
Your job is to implement a distributed MapReduce, consisting of two programs, the coordinator and the worker. There will be just one coordinator process, and one or more worker processes executing in parallel. In a real system the workers would run on a bunch of different machines, but for this lab you’ll run them all on a single machine. The workers will talk to the coordinator via RPC. Each worker process will, in a loop, ask the coordinator for a task, read the task’s input from one or more files, execute the task, write the task’s output to one or more files, and again ask the coordinator for a new task. The coordinator should notice if a worker hasn’t completed its task in a reasonable amount of time (for this lab, use ten seconds), and give the same task to a different worker.
架构图
设计细节
如何处理Crashed Workers?
The best you can do is have the coordinator wait for some amount of time, and then give up and re-issue the task to a different worker. For this lab, have the coordinator wait for ten seconds; after that the coordinator should assume the worker has died (of course, it might not have).
Go语言RPC传输中,应该Register什么?
在Go的RPC系统中,类型注册的机制有其特定的设计哲学。本质上,RPC框架期望我们注册的是服务端点(包含可调用方法的对象),而非单纯的数据传输载体。这就像在餐厅里,我们需要登记的是提供服务的服务员,而不是在厨房间传递的食材。
最佳实践启示我们:
- 服务注册应聚焦于具备服务能力的对象(如Coordinator),这些对象需要提供符合RPC规范的方法签名(方法名首字母大写,特定参数格式)
- 数据传输结构体应当保持简单纯粹,作为消息载体而非服务提供者。Gob编码器会自动处理这些结构的序列化,就像快递员自然知道如何打包标准尺寸的包裹
- 当遇到类型注册错误时,就像看到"此路不通"的标志,我们应该重新审视:
- 是否误将数据传输模型当作服务端点注册
- 方法签名是否满足RPC调用的格式要求
- 通信双方的类型定义是否一致
这种设计体现了Go语言"显式优于隐式"的哲学,要求开发者明确区分服务提供者和数据传输对象。就像在邮政系统中,我们需要区分邮局(服务提供者)和信件(数据载体)的不同角色。
Go语言RPC传输中,什么能传什么不能传?
不可传输函数类型或闭包。RPC系统仅支持数据传输,无法传输代码逻辑。
Go RPC默认使用Gob编码,支持:
- 基本类型(int, string等)
- struct(所有导出字段)
- array/slice/map
- 嵌套结构(只要所有层级都可导出)
Reference
- Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM 51, 1 (January 2008), 107–113. https://doi.org/10.1145/1327452.1327492