Loading…
MesosCon NA 2016 has ended
Wednesday, June 1 • 11:00am - 11:50am
Building Highly Available Mesos Frameworks - Neil Conway, Mesosphere

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Production-quality Mesos frameworks must be able to continue managing tasks despite unreliable networks and faulty computers. Mesos provides tools to help developers do fault-tolerant task management, but putting these tools together effectively remains something of a black art. This talk will offer practical guidance to current and prospective framework developers to help them understand how Mesos deals with failures and the tools it provides to enable fault tolerant frameworks. Mesos operators will also benefit from a discussion of exactly how Mesos behaves during network partitions and other failure scenarios.

This talk will cover the following specific topics:
* fault tolerance in Mesos itself: how Mesos masters and agents behave in the face of process crashes and network partitions
* the tools that Mesos provides to help framework authors write reliable systems (e.g., task state reconciliation, the state abstraction, and the MasterDetector interface)
* the lifecycle of a Mesos task
* a collection of recommendations for how framework developers should build highly available framework schedulers and executors

Speakers
NC

Neil Conway

Distributed Systems Engineer​, Mesosphere
Neil Conway is a Distributed Systems Engineer at Mesosphere, where he works on the Apache Mesos project. Before Mesosphere, Neil built automated trading systems at a quantitative hedge fund, completed a PhD in distributed systems at UC Berkeley, was a principal engineer at a stream... Read More →


Wednesday June 1, 2016 11:00am - 11:50am PDT
Chasm Creek