User Tools

Site Tools


archives:chinasys-1

1st ChinaSys Workshop

  • Date: November 23, 2011
  • Location: Huangxun Hotel(皇轩酒店),Shengzhen,China
  • Organizing Chair: Wenguang Chen (Department of Computer Science, Tsinghua University)
Workshop Program

8:20 – 8:30 Opening by Wenguang Chen

8:30 – 9:50 Who’s who session

  • Chair: ZZ
  • Fudan/SJTU: Haibo Chen
  • HUST: Xiaofei Liao
  • ICT: Han Yinhe
  • MSRA: Lidong Zhou
  • PKU: Wang Xiaolin
  • Tsinghua: Wenguang Chen
  • USTC: Zhang Yu

Each participating institute has a 10-15 minutes introduction on research interest of the group, the group members. Especially, to introduce people who attend the seminar.

  • 09:50 – 10:10 Tea Break 1
  • 10:10 – 12:10 Work in Progress Session (Chair: Wenguang Chen)
  • 10:10 – 10:40 Scalable Deterministic Replay in a Parallel Full-system Emulator, 陈宇飞, Fudan
  • 10:40 – 11:10 Improve GPU Virtualization with QoS Feedback for Cloud Gaming, 于淼, Fudan
  • 11:10 – 11:40 Adaptive Runtime systems for CPU/GPU, Xiaofei Liao, HUST
  • 11:40 – 12:10 Low Power Architecture, Yinhe Han, ICT
  • 12:10 – 14:00 Lunch
  • 14:00 – 15:30 Work in Progress Session (Chair: Yinhe Han)
  • 14:00 – 14:30 Static Analysis and Optimization for Data-Parallel Programs, Hucheng Zhou, MSRA
  • 14:30 – 15:00 Large-scale Fault-tolerant Stream Processing in the Cloud, Zhengping Qian, MSRA
  • 15:00 – 15:30 A cloud database interface, Wentao Han, Tsinghua
  • 15:30 – 16:00 Centralized Run Queue based Fair Scheduling on Composable Multicore Architectures, Tao Sun, USTC
  • 16:00 – 16:30 Tea Break
  • 16:30 – 17:30 Free Discussion (Chair: Haibo Chen)
Attendee List
  • Fudan/SJTU:老师:臧斌宇,陈海波,学生:于淼 (导师:戚正伟), 陈宇飞
  • HUST: 教师:韩建军, 学生:范学鹏、朱亮、叶晨成
  • ICT: 教师:韩银和,霍玮,鄢贵海, 学生:马君,孙发强,李雪亮
  • MSRA: 钱正平,周虎成,洪春涛,张峥, 周礼栋,张霖涛
  • PKU: 教师:汪小林
  • Tsinghua: 教师:陈文光,陈渝,陈康,董渊,韩文弢(学生)
  • USTC: 教师:吴俊敏,张昱, 博士生:孙涛(导师:安虹),王俊昌(导师:华蓓),张凯(导师:华蓓)
Panel Discussion Summary
Credit: Yufei Chen (did the scribe)

Lidong Zhou

Dependability, Redefined

Traditional approach

- replication and fault tolerance - checkpoint and recovery - testing and bug finding - monitoring, diagnosis and repair

Emergent misbehavior

- sufficiently many machines - predictability and stability

Replication at finer granularity

Assumptions and issues for replication

- black box and coarse granularity - detection time - different failure types - replicated state machine - unnecessary serialization

Language-based security

- widely used in bug finding - compiler assisted checkpointing and replication - ds data-parallel computing

Yinhe Han


基础可靠性保障

针对芯片,测试、验证、可调式性设计

内存系统可靠性保障

- 总线数据传输协议,芯片设计 - 内存检查点和故障恢复协议

故障管理

- 数据中心故障模型及传播分析,故障注入 - 节点状态监测

系统级保障信息收集和管理

- 处理器、内存 - 成果形态:独立软件或者提供接口

单机故障信息收集管理

Xiaolin Wang


Conflict between security and perf?

constrained and unconstrained

not requiring security all the time

- transitional locking? - lock with attar. - let user control

Wenguang Chen


From lang and compiler

locks

- checkpoint on various platform - not sure whether it's correct

MPI

- No fault tolerance - One error, all error

programming model

- provide sth.? - checkpoint

arch. provide

- Garth Gibbson. HPC needs checkpoint, but huge amount data. - Needs arch. support.

Yu Zhang


Verification

- Existing para application - Certified kernel

Compiler

- Certifying compiler - Proof checker - Subset C certified compiler (ongoing work)

Deterministic execution

- diff with existing deter. exec - specific input, same result - select one possible scheduling - may generate scheduling which is different from existing model - programmer should be able to infer the order

- problem on shared memory (Bryan Ford) - consistency memory model

Jianjun Han


real-time needs reliability

hot temp. impact reliability problem

- task migration to reduce load on hot component - guarantee real-time requirement

ZZ —

Algorithm dependable flexibility

- algorithm itself is not sensible to error - e.g. Monte Carlo - some can be relaxed, others can't - machine learning algorithm - OS over design for these algorithms

Part of the program is verifiable

- verify subspace of a program - exec in this space is safe - what happens when the exec goes outside this space

MPI global checkpoint

- but we can change a view point

Energy

- analyze power according to program exec (Haibo) - power model, emulator (not accurate, 1 time error – Yinhe)

Free diss


100W power generate 100W heat?

- computation consume energy - radio - but this is rather small

What's the standard of verifiable? (Yinhe Han)

- How to verify the correctness of Google's search result - Integrate economic factor (Wenguang)

Benchmark now cares what the user can feel

We can do reliability in different layers

- Is redundancy necessary? - worse: conflict

Acknowledgement

Microsoft Research Asia sponsored the dinners and conference room.

archives/chinasys-1.txt · Last modified: 2015/09/03 00:53 by root

Page Tools