UserOnline

, 5 Guests,

Archives

Print This Post Print This Post

Building Proxmox 4.3 2 Node Cluster Part 2

The more I use Proxmox the more I like it.  I upgraded to 4.4 yesterday and I really like the at a glance data center interface, but I will go into that later.  I am now up to 5 VMs running on the servers and they are all on NAS shared storage.  I have also written a few script that help me monitor the cluster and VMs using Nagios.

I am now up to 7 VMs but had something weird happen.  The night after I upgraded the servers one of the boxes just shut off.  There was an mc error that showed in the log files but I didn’t have mclog installed so I could get the exact problem that happened.  None the less there was several new packages that showed up that day in the Proxmox repo and I upgraded to them.  I have not had a problem since.  The upgrade process was pretty straight forward.  You can power down your VMs but I migrated the VMs to the other node during the upgrade.  Below are the upgrade steps I took in order from a root terminal:

  1.  apt-get update
  2.  pveupdate
  3.  apt-get upgrade
  4.  pveupgrade
  5.  pveversion -v      Just to see if your packages upgraded.

This took about 20 minutes for both systems.  I forgot the pveupgrade on one of the systems and was wondering why some of the packages didn’t upgrade, so don’t forget to run it.  It will save time trouble shooting a problem you caused.

The Nagios scripts I put together were pretty basic bash scripts that run a command and use if statements to compare the output.  Then they exit with a code 0,1,2, or 3.  I didn’t want to put NRPE on my servers so I used ssh to run the checks on the servers.  So I setup a one way password less ssh check.  This is easy to do and as long as you setup the user to sudo run these commands it is relatively secure..   Below are the steps to setup the checks.

  1.  Create a user on the servers named nagios
  2.  Create an .ssh directory in the nagios user on Proxmox using mkdir -p .ssh
  3. On the Proxmox servers run ssh-keygen -t rsa as nagios  and save the output in the .ssh directory
  4. Log into the Nagios machine and copy the ssh key from the .ssh/id_rsa.pub file in the nagios user directory
  5. Create an authorized_keys file on the Proxmox server in the /home/nagios/.ssh/ and copy the key from the Nagios server into it.
  6. You should now be able to ssh to the Proxmox server from the Nagios server without a password using the nagios user.

There are two caveats.  One I used putty from a Windows machine while doing this so cut and paste was simple.  Two it also assumes that you already have ssh setup for the nagios user on the Nagios server.  If not you will have to run steps 2 and 3 on the Nagios server as well.  If this is clear as mud you can go to this website for more help.  http://www.linuxproblem.org/art_9.html 

Once this was done I created 6 scripts to check different things on the server.  The standard cpu load, running processes, number of running VMs, drive space, NAS mount, etc.  They are not perfect and will be revised as I find problems but for the most part they work and are not to bad for about 2 hours putting them together.  I will post these files at the end of the post.  They are copied to the nagios user home directory on each Proxmox server.

I then setup I the commands in the Nagios commands.cfg and the checks in my Linux.cfg for my Linux servers.  This part can be tricky and time consuming if you have problems.  Just remember nagios -v /usr/local/nagios/etc/nagios.cfg is your friend for checking if Nagios will work after the changes.  It will also tell you where the problem is.  When there are errors Nagios will not start.  The location of the config file will vary based on your install. 

Once I had this all up and running I decided to build another machine to setup a full HA cluster.  I had another system that was used as a game server and really was not used very much.  It was an HP 8200 Ultra Elite slim desktop.  It has an i7 cpu and 16 gig of memory.  The only down fall with this one is that it doesn’t have any USB 3 ports.  I still needed to use a USB NIC for the second connection, so I swapped the on board NIC for storage and the USB NIC for normal traffic.  The only time I noticed any difference was during VM builds.  Using the USB NIC for storage took longer to install so that was my primary reason to swap them.  Also once I got the machine up and working I couldn’t get the VMs to turn on or migrate to this machine.  Well it turned I didn’t enable support for virtualization in the BIOs.  It took me about a half of hour of trouble shooting before I found it, so don’t be a bone head like me and configure your systems correctly.

One of the best things about this cluster is the fact that it only takes up a 2 ft x 2ft square on my shelf.  I am now up to 9 VMs and have migrated them all around the cluster without problems.  I know this is not a full scale commercial setup nor is the system completely redundant, but it is a ultra low power cluster that can handle a pretty good work load.  It also didn’t cost a great deal of money.  If I went out an bought everything today I would pay around $1500.  you wouldn’t be able to touch a commercial server for that price.  Any how as I finish the build I am going to add the ability to monitor the UPS the machines are hooked to and auto power up and down of the VMs.  Once I figure out the HA part of this I will write one more post stating how I did it and how well it works.

Print This Post Print This Post

Build Proxmox 4.3 two node cluster

images

I was looking for a good alternative to ESXi hypervisor.  I also wanted to have some high end features that I paid for with my VMWare setup.  I also wanted to use some mini PCs to save on space and maybe even a little electricity.  I had played with ProxmoxVE 2.0 a while back and I read an article in Linux Journal that featured ProxmoxVE so I thought I would look in to using it.   After a bunch of reading and video watching I decided to give it a shot.  I will be using this for web hosting and a few servers that I use inside my own network.  I currently run 14 VMs and will be down sizing to what I really need.

ProxmoxVE uses KVM and container based virtualization.  It has an easy to use web interface that is loaded with features.  It has automated backups, High Availability,  Live Migration, built in monitoring, with the ability to support almost any hardware.  So lets start with my hardware.  I used two HP 30-300 mini PCs and a QNAP 231+ NAS.  Below are the PC specs:

pc

HP 30-300 Mini PC

  • Processor:  Intel Core i3-4025U
  • Memory:  16Gig DDR3-1600
  • NICs:  Realtek RTL8151 and J5 JUE130 USB 3.0
  • Hard Drive:  AxionII 256G SSD
  • 4 USB 3.0 ports

So you may be asking why I used USB NICs, but I have used the JUE130 NIC on a Mini PC I turned into a firewall for over a year with no problems.  USB 3 runs at 5 Gbps so it can handle the speed of a Gig NIC.

I downloaded the Proxmox VE 4.3 and installed it on the two PCs.  First thing I will say is have the names planned before install.  Once you install changing the name is not worth the amount of work and doing caused my web interface to stop working.  So I just reinstalled and had no problems.  It is recommended that you install a min of 3 machines to create a cluster but I only have the two so that is what I am going with.  I used an NFS share on the QNAP NAS for the shared storage but you can use DRBD to create shared storage if you choose not to use a NAS.  I wanted to have a fully separate storage in case I lost a drive.  My NAS is a RAID 1 as well.

Creating a cluster is pretty simple.  If you don’t have a DNS server on your network you will want to add the nodes into each others hosts file.  To create a cluster simple run the following commands once the nodes are on the network.

  1. On the primary node or first node you choose.
    • pvecm create cluster name
  2. On the second node
    • pvecm add IP of first node
  3. To check status of cluster
    • pvecm status

I ended up rebuilding my cluster a few time due to Quorum problems.  Quorum is how the cluster lets each node know it is still there.  It would work for a few minutes and then one of the nodes would drop off.  So I ended up adding a cron job that would change a setting on boot and didn’t have another problem.  It is a multicast snooping setting, and there is no guarantee you will have this.  My cluster has been running for 2 weeks now without losing quorum.  It would also reset this value upon reboot so I made the cron to take care of this.  The cron syntax is below.

@reboot echo 0 >/sys/class/net/vmbrx/bridge/multicast_snooping

I then create an NFS share on my QNAP and added it in the data center tab of the web interface.  It installs on both nodes and you are ready to create VMs.  I didn’t use any container VMs, just KVM based ones.  As a side note I could only get live migration to work with the qemu disk type.  I already have a bunch of pre made vmdk images, but it required a shutdown to migrate them.  There is a ton of settings to build the VMs with.  The only thing I saw that was important for shared network storage was to set the VM to have no caching.   I built 4 VMs to test the system.  I build a Windows 2012 server, a MineOS (Minecraft server), a Ubuntu 16.0.4 (Mumble server), and a Ubuntu 16.0.4 (DNS server).  I built all of these with the qemu format except the Mumble server and that is how I found the migration problem.  It is also possible that I did something wrong and that is why it will not live migrate.  Any how I built the Minecraft server to really test load and migration.  So I got it built and had a few people login and play while I migrated it to the other node.  It worked great and we didn’t even see any lag.  Also there is over 100 different templates that can be downloaded to build VMs on you nodes.  This may save you some time building out your VMs.

dcstorage

templates2

Below is also a good list of CLI commands that can be used to check status or to write scripts to do your bidding.

  1. Proxmox commands
    • pveversion -v        to check version of packages, can be done in the web GUI as well
    • pveupdate            update package db
    • pveupgrade          upgrade ProxmoxVE install
    • pvesh                   open interactive custom shell
    • pveperf                 to check Proxmox node performance
    • pvecm                  cluster management with a lot of switches
    • pvesm                   to deal with Proxmox storage
  2. KVM VMs Command
    • qm                        to work with VMs directly, has a lot of switches
  3. Container VM Command
    • pvectl                    to work with VMs directly, has a lot of switches
    • vzctl                      to work with VMs directly, has a lot of switches

As I have been writing this a new version has come out with some web GUI changes.  So I am going to write another post that goes over some other features and will include the upgrade process.  But I will say I am happy with this product.  It is free and has a lot of potential with tons of high end features.  So feel free to ask questions and I will try to help.

 

Print This Post Print This Post

Ubuntu 16.04.1 NIC problems/changes

ubuntu

  I upgraded one of my servers last night to Ubuntu Server 16.04.1 and it went smooth as I would expect it to.  Until I rebooted and my network monitors started saying it couldn’t contact the server.  I thought it was normal I just rebooted from the upgrade.  So after looking through configs to see that everything was still there, about 10 minutes, the monitor was still critical.  So I ran ifconfig and only got the loopback.  I opened the interfaces file and everything was as I had configured it.  This machine is a virtual so I started checking the config and it was fine as well.  So to make a long story short Ubuntu renamed the NIC to ens160 and would not activate the NIC.  I checked the udev rules and there was nothing in the 70-persistant-net.. file and after messing around for about 30 minutes I turned to my friend Google to find solutions.  So the best solution I found and it worked right away was to modify an entry in the GRUB.conf file.  Here is the commands to run.

To see if the NIC is still there run ifconfig -a

Then in /etc/default/grub edit the following line and add the text in red:

GRUB_CMDLINE_LINUX=net.ifnames=0 biosdevname=0

and as root run:

update-grub

Reboot you system and your NICs should be back to eth0,eth1,etc..  You do need to make sure the OS still sees the NIC before editing the GRUB config file.  I did reinstall drivers before I started down this path.

However if you are using dhcp you may never really notice that this has happened.  I use static and wanted them to be back to eth0.  I did a fresh install for a web server I am working on and this time it named it ens33.  I had no problem because it pulled it’s first IP from DHCP and everything worked fine.  I noticed when I started changing the NIC config.  So mostly this is something to look out for when you are doing an upgrade and the network just stops working.  I liked not having to change my interfaces file so this turns out to be good information.  So good luck and feel free to leave questions.

Page 1 of 4912345...102030...Last »