Self-Healing Systems


Viktor Farcic


@vfarcic

TechnologyConversations.com

CloudBees.com

Viktor Farcic

Continuous Deployment

  • Provision
  • Test
  • Build
  • Test
  • Deploy
  • Test
  • Enable
  • Test

Continuous Deployment

The Second Half

  • Monitor
  • React to problems
  • Prevent problems
  • Automated?

What Are We Trying to Accomplish?

Perfection?

  • Applications that never fail?
  • Services that can handle any load?
  • Commits without bugs?
  • Hardware that never breaks?

What Are We Trying to Accomplish?

There is no such thing as perfection

  • Applications fail!
  • Services that can NOT handle any load!
  • Commits contain undetected bugs!
  • Hardware breaks!

What Are We Trying to Accomplish?

Self-Replicating System

Life

  • System over individuals
  • Small and incremental evolutionary improvements
  • Reproduction
  • Self-healing

What Are We Trying to Accomplish?

Resilient System

Cells

  • Restoring to the state of equilibrium
  • Monitoring and adjusting processes
  • Reproducing
  • Healing

What Are We Trying to Accomplish?

Body Equivalent?

  • Datacenter
  • Orchestrator
  • (micro)Services
  • Containers
  • Scaling
  • Self-Healing?

Self-Healing System

  • Discover problems
  • Restore itself to the desired state
  • Make decisions
  • Adapt to changed conditions

Self-Healing System

Levels

  • Application level
  • System level
  • Hardware level

Self-Healing System

Types

  • Reactive healing
  • Preventive healing

Prerequisites

Prerequisite #1


Ability to deploy to any server inside a cluster that meets hardware requirements


  • Pets or cattle?
  • No SSH
  • Service discovery

Prerequisite #2


Ability to deploy without downtime and to automatically scale and de-scale instances


  • Blue-green deployment
  • Rolling updates
  • Relative scaling

Prerequisite #3


Ability to monitor hardware and dynamically adjust cluster capacity (elasticity)


  • React
  • Prevent

Prerequisite #4


Ability to monitor services in (near) real-time and execute reactive actions


  • Service checks
  • Redeployment

Prerequisite #5


Ability to predict the future and execute proactive actions


  • Scheduled scaling
  • Scaling based on historical actions

Tools

  • Cluster orchestrator
  • Realtime Registry
  • Historical Registry
  • Monitoring
  • General orchestrator

Tools

Cluster Orchestrator Tools

Tools

Realtime Registry


Tools

Historical Registry

Tools

Monitoring



Tools

General Orchestrator

Viktor Farcic


@vfarcic


TechnologyConversations.com

Amazon
LeanPub

 
LeanPub

Viktor Farcic