20110910

ClusterXL and FIBMGR down


[ClusterXL marks FIB as “problem]
I was doing some work the other day on a R75 H/A cluster to enable advanced routing and ran into some problems with ClusterXL. “My” problem was trying this on during business hours and not scheduling a maintenance window, writing up change control, getting approval, etc., etc.

Cowboy boots on, yeeharrrrr....

So environment is distributed management + a couple of UTM-1 1070's in H/A running R75.10. When you build the UTM's, the install just lays Secure Platform. To get advanced routing, you need to enable the “pro” version. Conveniently, Check Point have a command for that. So on the standby member, I run the following command and reboot:

# pro enable

When I'm working on clusters, I usually run the 'cphaprob state' command to keep an eye on the cluster state. Handy for installs, controlled fail overs, etc. So on the other member, I'm running:

# watch cpahprob state

After a while, I realised the secondary unit wasn't going standby and would stay down regardless of how long I lokked at it. Ran a 'cphaprob list' on it and found that FIB was being reported as down. Nice. Spent a bit of time looking through logs, scraping Secure Knowledge, etc. even put FIBMGR into debug more... great help that was. Logs showed it was trying to contact the active unit, and since advanced routing wasn't enabled on that member yet, it was failing.

Now, in a maintenance windows, I probably just would have 'pro enabled' the active unit and rebooted it. The standby member should have gone active, connection table would have been lost but whatever. Given it was business hours had to find another way. So...

Started by trying to get the FIB to report a good status by doing a:

# cphaprob -d FIB -s OK report

This did work but the process sets it back to 'problem' fairly quickly. Doing a 'cphaprob list' shows the status and the time last reported. Trying to set the timer with -t didn't work, this did:

# cpwd_admin list
{get PID of FIBMGR}
# cpwd_admin stop -name FIBMGRD -path "$ADVRDIR/bin/fibmgrd" -command "kill -TERM {FIBMGR PID}"
# cphaprob -d FIB -s OK report

The secondary firewall should go standby. Fail over, pro enable on other member, reboot. When its back to standby, fail over and reboot the member that had the FIBMGR terminated...

Happy days...
exit 0

20110523

BT5 Running on a Nexus One

Quick post to capture the changes I made to the bootbt script to get the ARM version of BT5 running on my Nexus One.

I started by following John Strand’s post on PaulDotCom after downloading the original ARM version from Back|Track Linux I realised I also needed the XDA version to fit it on the SD card. Essentially I did everything John did (except uploading busybox) but had some problems with the loopback mount of the bt5 image. Here is a diff of the the original bootbt & my modified one.
# diff bootbt.org bootbt
5c5
< mount -o remount,rw /dev/block/mmcblk0p5 /system
---
> mount -o remount,rw /dev/block/mtdblock3 /system
13c13
< if [ -b /dev/loop2 ]; then
---
> if [ -b /dev/loop7 ]; then
16c16
<  busybox mknod /dev/loop2 b 7 0
---
>  busybox mknod /dev/loop7 b 7 0
18c18,20
< mount -o loop,noatime -t ext2 $kit/bt5.img $mnt
---
> #mount -o loop,noatime -t ext2 $kit/bt5.img $mnt
> losetup /dev/block/loop7 $kit/bt5.img
> mount -o noatime -t ext2 /dev/block/loop7 $mnt
Random:
 Phone: HTC Nexus One
 Mod: CyanogenMod-7.0.0-RC4-N1
 SDCard: 4GB
 VNC Client: AndroidVNC v0.5.0



# sweet....
exit 0


Ref: Complete bootbt script.
perm=$(id|cut -b 5)

if [ "$perm" != "0" ];then echo "This Script Needs Root! Type : su";exit;fi

mount -o remount,rw /dev/block/mtdblock3 /system
export kit=/sdcard/BT5
export bin=/system/bin
export mnt=/data/local/mnt
mkdir -p $mnt
export PATH=$bin:/usr/bin:/usr/local/bin:/usr/sbin:/bin:/usr/local/sbin:/usr/games:$PATH
export TERM=linux
export HOME=/root
if [ -b /dev/loop7 ]; then
 echo "Loop device exists"
else
 busybox mknod /dev/loop7 b 7 0
fi
#mount -o loop,noatime -t ext2 $kit/bt5.img $mnt
losetup /dev/block/loop7 $kit/bt5.img
mount -o noatime -t ext2 /dev/block/loop7 $mnt
mount -t devpts devpts $mnt/dev/pts
mount -t proc proc $mnt/proc
mount -t sysfs sysfs $mnt/sys
busybox sysctl -w net.ipv4.ip_forward=1
echo "nameserver 8.8.8.8" > $mnt/etc/resolv.conf
echo "127.0.0.1 localhost bt5" > $mnt/etc/hosts
busybox chroot $mnt /bin/bash

echo "Shutting down BackTrack ARM For Xoom... Xoom... smeg Xoom..."
umount $mnt/dev/pts
umount $mnt/proc 
umount $mnt/sys 
umount $mnt