2016-12-11

Testing CoreOS on homelab servers - part 1

Testing CoreOS on homelab servers - part 1




I've been working on some scripts to build and deploy cloud-config files in my homelab.

core01: 192.168.61
core02: 192.168.62
core03: 192.168.63
core04: 192.168.64



Create and deploy cluster configuration


 From my Mac, build and deploy new cloud-config with:

$ ./create.and.deploy.sh  


For each core server in the homelab, the scripts build and deploy configuration files for my coreos test cluster.

E.g for server core01

#cloud-config
#version: 20161210_203041
hostname: "core01"
ssh_authorized_keys:
  - ssh-rsa ...
coreos:
  etcd2:
    # Static cluster
    name: core01
    advertise-client-urls: http://192.168.1.61:2379
    initial-advertise-peer-urls: http://192.168.1.61:2380
    initial-cluster: "core01=http://192.168.1.61:2380,core02=http://192.168.1.62:2380,core03=http://192.168.1.63:2380,core04=http://192.168.1.64:2380"
    initial-cluster-state: new
    listen-client-urls: http://0.0.0.0:2379,http://0.0.0.0:4001
    listen-peer-urls: http://0.0.0.0:2380,http://0.0.0.0:7001
  fleet:
    public-ip: $public_ipv4
    metadata: "role=services"
  flannel:
    interface: $public_ipv4
  update:
      reboot-strategy: "etcd-lock"
  units:
    - name: 00-eth0.network
      runtime: true
      content: |
        [Match]
        Name=eno1

        [Network]
        DNS=192.168.1.1
        Address=192.168.1.61/24
        Gateway=192.168.1.1
        Domains=home.lab

    - name: etcd2.service
      command: start
    - name: fleet.service
      command: start
    - name: flanneld.service
      drop-ins:
        - name: 50-network-config.conf
          content: |
            [Service]
            ExecStartPre=/usr/bin/etcdctl set /coreos.com/network/config '{ "Network": "10.1.0.0/16" }'
      command: start
    - name: docker-tcp.socket
      command: start
      enable: true
      content: |
        [Unit]
        Description=Docker Socket for the API
        [Socket]
        ListenStream=2375
        Service=docker.service
        BindIPv6Only=both
        [Install]
        WantedBy=sockets.target

write_files:
  - path: "/etc/motd"
    permissions: "0644"
    owner: "root"
    content: |
      --- My CoreOS Cluster (core01) ---



Checking cluster status



core@core01 ~ $ etcdctl member list
4374c5ef9f2370d6: name=core03 peerURLs=http://192.168.1.63:2380 clientURLs=http://192.168.1.63:2379 isLeader=true
45337feea7d7a60f: name=core01 peerURLs=http://192.168.1.61:2380 clientURLs=http://192.168.1.61:2379 isLeader=false
6688d9448380b482: name=core02 peerURLs=http://192.168.1.62:2380 clientURLs=http://192.168.1.62:2379 isLeader=false
c9a76f89ee66e035: name=core04 peerURLs=http://192.168.1.64:2380 clientURLs=http://192.168.1.64:2379 isLeader=false


core@core01 ~ $ fleetctl list-machines
MACHINE        IP        METADATA
2d0e73b6...    192.168.1.64    role=services
497f6384...    192.168.1.61    role=services
9f8f9d8a...    192.168.1.62    role=services
c6d410a0...    192.168.1.63    role=services



Launching containers and testing failover


Container configuration


vi myapp.service

--- snip begin ---
[Unit]
Description=MyApp
After=docker.service
Requires=docker.service

[Service]
TimeoutStartSec=0
ExecStartPre=-/usr/bin/docker kill busybox1
ExecStartPre=-/usr/bin/docker rm busybox1
ExecStartPre=/usr/bin/docker pull busybox
ExecStart=/usr/bin/docker run --name busybox1 busybox /bin/sh -c "trap 'exit 0' INT TERM; while true; do echo Hello World; sleep 1; done"
ExecStop=/usr/bin/docker stop busybox1


--- snip end ---
 



vi apache@.service

--- snip begin ---
[Unit]
Description=My Apache Frontend
After=docker.service
Requires=docker.service

[Service]
TimeoutStartSec=0
ExecStartPre=-/usr/bin/docker kill apache1
ExecStartPre=-/usr/bin/docker rm apache1
ExecStartPre=/usr/bin/docker pull coreos/apache
ExecStart=/usr/bin/docker run --rm --name apache1 -p 80:80 coreos/apache /usr/sbin/apache2ctl -D FOREGROUND
ExecStop=/usr/bin/docker stop apache1

[X-Fleet]
Conflicts=apache@*.service

--- snip end ---

Launch the containers


core@core01 ~ $ fleetctl start myapp.service
core@core01 ~ $ fleetctl start apache@1
core@core01 ~ $ fleetctl start apache@2


core@core01 ~ $ fleetctl list-units
UNIT            MACHINE                ACTIVE    SUB
apache@1.service    2d0e73b6.../192.168.1.64    active    running
apache@2.service    c6d410a0.../192.168.1.63    active    running
myapp.service        2d0e73b6.../192.168.1.64    active    running





 

Testing failover



core@core04 ~ $ sudo reboot

core@core01 ~ $ fleetctl list-units
UNIT            MACHINE                ACTIVE    SUB
apache@1.service    497f6384.../192.168.1.61    active    running
apache@2.service    c6d410a0.../192.168.1.63    active    running
myapp.service        9f8f9d8a.../192.168.1.62    active    running

 

2016-12-03

Setting up CoreOS on homelab servers

Setting up CoreOS on homelab servers with static IPs
 
Work Notes







core01: 192.168.61
core02: 192.168.62
core03: 192.168.63
core04: 192.168.64


For each node core01,core02,core03,core04
core@coreXX ~ $ sudo vi /var/lib/coreos-install/user_data
core@coreXX ~ $ sudo rm -rf /var/lib/etcd2/*; sudo rm -f /etc/systemd/system/etcd*
core@coreXX ~ $ sudo reboot


FILE: /var/lib/coreos-install/user_data  (core01)
--- snip --

#cloud-config
hostname: core01
ssh_authorized_keys:
  - ssh-rsa ...
coreos:
  etcd2:
    # Static cluster
    name: core01
    advertise-client-urls: http://192.168.61:2379
    initial-advertise-peer-urls: http://192.168.61:2380
    initial-cluster: "core01=http://192.168.61:2380,core02=http://192.168.1.62:2380,core03=http://192.168.1.63:2380"
    initial-cluster-state: new
    listen-client-urls: http://0.0.0.0:2379,http://0.0.0.0:4001
    listen-peer-urls: http://0.0.0.0:2380,http://0.0.0.0:7001
  fleet:
    public-ip: $public_ipv4
    metadata: "role=services"
  flannel:
    interface: $public_ipv4
  update:
      reboot-strategy: "etcd-lock"
  units:
    - name: 00-eth0.network
      runtime: true
      content: |
        [Match]
        Name=eno1

        [Network]
        DNS=192.168.1.1
        Address=192.168.61/24
        Gateway=192.168.1.1

    # To use etcd2, comment out the above service and uncomment these
    # Note: this requires a release that contains etcd2
    - name: etcd2.service
      command: start
    - name: fleet.service
      command: start
    - name: flanneld.service
      drop-ins:
        - name: 50-network-config.conf
          content: |
            [Service]
            ExecStartPre=/usr/bin/etcdctl set /coreos.com/network/config '{ "Network": "10.1.0.0/16" }'
      command: start
    - name: docker-tcp.socket
      command: start
      enable: true
      content: |
        [Unit]
        Description=Docker Socket for the API
        [Socket]
        ListenStream=2375
        Service=docker.service
        BindIPv6Only=both
        [Install]
        WantedBy=sockets.target

write_files:
  - path: "/etc/motd"
    permissions: "0644"
    owner: "root"
    content: |
      --- My CoreOS Cluster ---


---snip--


core@core01 ~ $ etcdctl cluster-health
member 4374c5ef9f2370d6 is healthy: got healthy result from http://192.168.1.63:2379
member 45337feea7d7a60f is healthy: got healthy result from http://192.168.1.61:2379
member 6688d9448380b482 is healthy: got healthy result from http://192.168.1.62:2379


core@core02 ~ $ etcdctl member list
4374c5ef9f2370d6: name=core03 peerURLs=http://192.168.1.63:2380 clientURLs=http://192.168.1.63:2379 isLeader=true
45337feea7d7a60f: name=core01 peerURLs=http://192.168.1.61:2380 clientURLs=http://192.168.1.61:2379 isLeader=false
6688d9448380b482: name=core02 peerURLs=http://192.168.1.62:2380 clientURLs=http://192.168.1.62:2379 isLeader=false


core@core02 ~ $ fleetctl list-machines
MACHINE   IP    METADATA
497f6384... 192.168.1.61  role=services
9f8f9d8a... 192.168.1.62  role=services
c6d410a0... 192.168.1.63  role=services

2016-09-04

Ubuntu 16.04 boot failure

One of my homelab servers (ts01) running Ubuntu 16.04 refused to boot today.

It's running kernel 4.4.0-36-generic.

Looking at the console I see this


Multiple /scripts/local-block ... done lines. The USB keyboard I have attached did not work, so I rebooted and at the grub boot menu I choose another kernel:

  4.4.0-34-generic instead of the default  4.4.0-36-generic which fails to boot.

Kernel  4.4.0-34-generic boots ok.

I tried to re-install the packages for the kernel, but still got the same failure.

So, I compared the files kernel files with another identical homelab server (ts02).

This server works:
root@ts02:/boot/grub# ls -l /boot/
total 87640
-rw-r--r-- 1 root root 32061904 Aug 29 21:55 initrd.img-4.4.0-36-generic


This server fails during boot.
root@ts01:/boot/grub# ls -l /boot/
total 87555
-rw-r--r-- 1 root root 31977897 Sep  4 20:42 initrd.img-4.4.0-36-generic

So I copied the initrd.img-4.4.0-36-generic from ts02 to ts01. This fixed the problem and ts01 booted OK.



2016-08-23

DC/OS installation on Ubuntu 16.04

Work In Progress ...

My work notes so far trying to install DC/OS 1.8.1 on Ubuntu 16.04

Source: https://dcos.io/docs/1.8/administration/installing/custom/advanced/

My DC/OS homelab servers: 192.168.1.71, 192.168.1.72, 192.168.1.73
My bootstrap server: 192.168.1.73



WORKDIR: /sw/dcos

FILE: genconf/config.yaml (bootstrap server)
---
bootstrap_url: http://192.168.1.73:8877
cluster_name: 'soulfunk'
exhibitor_storage_backend: static
ip_detect_filename: /genconf/ip-detect
master_list:
- 192.168.1.73
- 192.168.1.72
- 192.168.1.71
resolvers:
- 8.8.4.4
- 8.8.8.8


FILE: genconf/ip-detect (bootstrap server)
#!/usr/bin/env bash
set -o nounset -o errexit
export PATH=/usr/sbin:/usr/bin:$PATH
echo $(ip addr show | grep inet | grep 192.168.1 | grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | head -1)



FILE: get_boostrap_install.sh (bootstrap server)
#!/bin/bash
curl -O https://downloads.dcos.io/dcos/EarlyAccess/dcos_generate_config.sh


FILE: create_build_file.sh (bootstrap server)
#!/bin/bash
bash dcos_generate_config.sh


FILE: run_nginx.sh (bootstrap server)
#!/bin/bash
docker run -d -p 8877:80 -v $PWD/genconf/serve:/usr/share/nginx/html:ro nginx

FILE: symlink_binaries.sh (all servers)
#!/bin/bash
cd /usr/bin || exit -1
FILES="ln mkdir tar"
for f in $FILES ; do
  test -e /usr/bin/$f || ln -s /bin/$f $f
done

FILE: get_dcos_installer_from_bootstrap_server.sh (all servers)
#!/bin/bash
curl -O http://192.168.1.73:8877/dcos_install.sh


1) On boostrap server, download installer and create build file:


# cd /sw/dcos
# mkdir genconf
# <create genconf/config.yaml> 
# <create genconf/ip-detect>  
# ./get_bootstrap_install.sh
# ./create_build_file.sh


2) On bootstrap server, start nginx:
# ./run_nginx.sh

3) On each server:
# ./get_dcos_installer_from_bootstrap_server.sh
# ./symlink_binaries.sh
# bash dcos_install.sh master
# bash dcos_install.sh slave



Monitor Exhibitor

http://192.168.1.73:8181/exhibitor/v1/ui/index.html



DC/OS Web Interface:

http://192.168.1.73/





2016-08-13

Upgrading from Ubuntu 14.04 to 16.04

Upgraded from Ubuntu 14.04 to 16.04 on one of my homelab test-servers today.

After the upgrade, running 'apt-get update' resulted in this error:

# apt-get update
apt-get: relocation error: /usr/lib/x86_64-linux-gnu/libapt-pkg.so.5.0: symbol _ZNKSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE7compareERKS4_, version GLIBCXX_3.4.21 not defined in file libstdc++.so.6 with link time reference




To fix this, I tried to manually install libstdc++6:

Download:

wget http://security.ubuntu.com/ubuntu/pool/main/g/gcc-5/libstdc++6_5.4.0-6ubuntu1~16.04.2_amd64.deb


Install:

# dpkg -i libstdc++6_5.4.0-6ubuntu1~16.04.2_amd64.deb

 But this failed with:

dpkg: warning: downgrading libstdc++6:amd64 from 6.1.1-3ubuntu11~14.04.1 to 5.4.0-6ubuntu1~16.04.2
dpkg: regarding libstdc++6_5.4.0-6ubuntu1~16.04.2_amd64.deb containing libstdc++6:amd64:
 libstdc++6:amd64 breaks libboost-date-time1.55.0
  libboost-date-time1.55.0:amd64 (version 1.55.0-1) is present and installed.


dpkg: error processing archive libstdc++6_5.4.0-6ubuntu1~16.04.2_amd64.deb (--install):
 installing libstdc++6:amd64 would break libboost-date-time1.55.0:amd64, and
 deconfiguration is not permitted (--auto-deconfigure might help)
Errors were encountered while processing:
 libstdc++6_5.4.0-6ubuntu1~16.04.2_amd64.deb



So, ran this a few times, to remove libboost libraries:

# for p in $(dpkg -l |grep libboost|awk '{print $2}'); do dpkg --purge "$p" ; done

To do a  dry-run (ie check what would be removed but not actually do it), add an echo infront of dpkg:

Dry-run (test without removing)

# for p in $(dpkg -l |grep libboost|awk '{print $2}'); do echo dpkg --purge "$p" ; done



And then could finally run:

# dpkg -i libstdc++6_5.4.0-6ubuntu1~16.04.2_amd64.deb
dpkg: warning: downgrading libstdc++6:amd64 from 6.1.1-3ubuntu11~14.04.1 to 5.4.0-6ubuntu1~16.04.2
(Reading database ... 147174 files and directories currently installed.)
Preparing to unpack libstdc++6_5.4.0-6ubuntu1~16.04.2_amd64.deb ...
Unpacking libstdc++6:amd64 (5.4.0-6ubuntu1~16.04.2) over (6.1.1-3ubuntu11~14.04.1) ...
Setting up libstdc++6:amd64 (5.4.0-6ubuntu1~16.04.2) ...
Processing triggers for libc-bin (2.23-0ubuntu3) ...


And apt-get was working again



2016-04-02

GitLab CE web-interface "broken" after upgrade on Ubuntu 14.04

I've installed GitLab CE in my homelab as detailed here.

After an apt upgrade, the web interface was not loading the JavaScript and CSS assets:



The fix that worked for me:

mygitlab$ sudo gitlab-rake cache:clear 
mygitlab$ sudo gitlab-ctl restart






2016-03-15

Pc power on problems

Suddenly got this problem with my Komplett Gamer PC, after working for 18 months.


Update: 2016.08.28 - I've had to replace two faulty Corsair AX1200i PSUs in the past 5 months.