Japan Stopover

On my way to Germany, I stayed five nights in Japan, doing a whirlwind tour of Tokyo, Nagoya, Kyoto and Osaka. Apart from the blood, sweat (and almost tears) of lugging 27 kg of gear through Tokyo train stations, Japan was a really cool place. The people are amazingly friendly and generous, and the whole place has a really good nightlife. You wouldn’t have to twist my arm very hard to convince me to return.

Upgrading IOS remotely

I recently had to upgrade a bunch of Cisco routers to an up to date IOS. These routers were scattered up and down the country, and I don’t have much to do with the servers sitting behind them, so I needed to do a remote upgrade over the Internet.

Now, TFTP is pretty hit and miss at upgrading remotely – and not particularly fast either. Given that TFTP runs over an reliable transport protocol (UDP), I tend to only use it on LANs, or for truly “trivial” things like backing up configs (and SCP is more secure for that). Since the routers were running an older IOS that didn’t support HTTP, I decided to have a crack at using FTP. What a drama…

Firstly, you need to realise that by default, the FTP client in IOS tries to use passive mode. The server I was hosting the new IOS images from was behind a firewall that was only configured for active FTP (ie, only port 20 and 21 open). So when the router tried a passive FTP download of the new image, the firewall denied the randomly-chosen port that the router had chosen to connect on.

Cisco “ip inspect” to the rescue. I added a stateful FTP inspection rule on the firewall (Cisco also) like so:

ip inspect fw-in ftp
interface Dialer0
ip inspect fw-in in

Now the firewall would do a stateful inspection of the FTP connection, and allow the subsequent randomly-chosen port passive FTP transfer.
That got a little further, but now the connection was stalling, even though vsftpd was showing a successful login and transfer begin. After searching for a bit, I came across some references to Cisco routers and FTP ABOR(t) commands causing problems with ProFTPD. I read through the vsftpd config on the FTP server and discovered an option for asynchronous aborts.


I suspect the need for this arises from the fact that, when upgrading IOS, the router always checks to make sure it can actually read the file you specify, before it offers to wipe the flash. So, in this case, the router was starting an FTP transfer, then aborting it, then wiping the flash, then trying to start the transfer again. Once I had enabled that option, the transfer seemed to work. I say “seemed to work”, because I actually only got this to work on one router, and by this time it was about 2:30am. I was rapidly coming to the conclusion that the FTP client is a bit borked in older IOS releases.

So in the end I had to resort to upgrading a few routers via TFTP. Hopefully they are now running recent enough IOS that the FTP is a bit more reliable, or even better, supports HTTP (which is much more likely to succeed, since it carries control and data in a single connection).

It seems that the “ip inspect” feature of IOS is one of the most misunderstood commands of all, since I only ever see it being used in the outbound direction. Apart from using it to inspect outbound TCP sessions, and do away with the need for a rather insecure “permit tcp any any established” in an access-list, I don’t see a lot of point in inspecting outbound traffic. A few tricky protocols need a bit of assistance here and there, such as instant messaging and P2P protocols, to allow return traffic to establish an unrelated connection inbound. But the most use I see for it, is handling those tricky inbound connections such as when you’re hosting FTP, so that you don’t have to leave gaping holes in your inbound access-list.

I also found that http works a metric shitload faster if you don’t inspect it in the outbound direction. Even Cisco don’t recommend enabling it, unless you want to do Java blocking.


CIT exam

Second time around, passed with flying colours. It’s amazing the difference it makes when you have up to date study material. So that’s all four exams done, but it appears I might not yet be a CCNP. Since my CCNA expired a couple of years ago, I think I will have to re-sit the CCNA exam just to renew it so that it counts towards my CCNP.

It’s a pretty stupid rule, IMHO, since, if you can pass all the CCNP exams, you are obviously well beyond CCNA level. It seems like it’s just another way for Cisco to make money. I’ll be in big trouble if I ever let my CCNP expire, because that would mean sitting all four (or maybe five, including CCNA) exams again!

Of course, CCIE does not have any prerequisites – not even a current CCNP or CCNA. You’d be pretty brave to attempt CCIE without at least several years experience and/or having at least gained CCNP once. But since I plan to tackle CCIE next, I doubt I’ll worry too much about keeping my CCNP up to date. The way I see it, CCIE trumps all previous qualifications anyway.

Cisco 857 router

I’ve finally replaced my trusty old D-Link DSL500, which I’ve had for about four years, with a Cisco 857. What can I say about these routers… well…

My 857 router arrived with SDM Express, but not SDM, installed on the flash drive. While SDM Express is an improvement over the old Cisco Router Web Setup (CRWS), one of the reasons I bought the 857 was to see whether SDM is as good for routers as ASDM is for PIX. So I set up the router using SDM Express, and had a look at the lovely mess of a config it generated. It would probably have sufficed for a non-technical user, but being three-quarters of the way to a CCNP, I don’t think I qualify as that anymore.

First up is the ATM0.1 sub-interface that SDM creates. Ok, this probably is a good way to do it, since, even when configuring a single DLCI with frame-relay, I’ve got into the habit of using a sub-interface. But in NZ, I think we’re far less likely to have the option of multiple ATM PVC’s on ADSL than we are of having multiple DLCI’s on frame-relay.

The 857 (in fact, all the 850 and 870 series routers) have a built-in four port fast ethernet switch. While this shows up as four individual interfaces in the config, and you manually set some options per interface (layer 2 options only, I suspect), it does not function as a VLAN-capable switch, such as in the 870 series routers.

So, now for some of the gotchas. If you plan to run a server behind a router like this (and this probably would affect any Cisco ADSL router), and you only have the one public IP assigned to the Dialer interface, there are two ways you can go about it. If you run a large number of public services on that server, you may be tempted to do something like:

ip nat inside source static interface Dialer0
ip nat inside source list 1 interface Dialer0 overload
access-list 1 permit ip any

Of course, you should apply an access-list inbound on the Dialer0 interface, so you don’t completely expose that server. Cisco IOS is smart enough that you can have other hosts on your internal network NAT outbound. You can even specify individual inbound port-NAT entries, such as:

ip nat inside source static tcp 4662 interface Dialer0 4662

for a P2P eMule client, and the port NAT will take precedence over the whole IP NAT for the server.

Where this comes unstuck however, is if you want to terminate an IPSEC tunnel on your router. Remember, we’ve only got one public IP on our Dialer0 interface. Unfortunately, IOS is not smart enough to figure out that it should locally process incoming ESP and ISAKMP traffic – and instead forwards it to the server that you specified. So, faced with this situation myself, I have had to create individual port NAT entries for all the services I host on my server. Fortunately, IOS no longer seems to suffer a bug I enountered years ago, where UDP DNS packets didn’t NAT properly. Since DNS quite often uses UDP (like, if the query is less than 512 bytes), this bug used to make it impossible to host a DNS server behind a router like this.

The next gotcha I came across is the “ip inspect” command having a fit when confronted with out-of-sequence packets. When running an IPSEC tunnel to a NetScreen 25, I found that certain protocols that were in my “ip inspect” list were stalling. Debug revealed that large numbers of packets were being dropped, due to being out-of-sequence. After some research, I learned that Cisco’s IOS-based IPS (ip inspect) really doesn’t like having to deal with fragments. I suspect this is the reason for the relatively new IOS command “ip virtual-reassembly”, which attempts to reassemble packets prior to “ip inspect” checking them. I suspect my problem was that I was getting a lot of fragments over the VPN, due to incorrect TCP-MSS settings, and the smaller fragments were arriving before the larger fragments – hence “ip inspect” considered them out-of-sequence. Debugging “ip virtual-reassembly” revealed “invalid parameters” – which I could find no further information on. It seemed the best course of action would be to eliminate the fragmentation to begin with. After spending several hours unsuccessfully experimenting with MTU and TCP-MSS settings, the solution finally came down to setting one parameter on the far-end NetScreen – “set flow path-mtu”. Once this was enabled, everything worked fine. Obviously, PMTU discovery figured out it needed to decrease the TCP-MSS to account for the ESP encapsulation overhead. This turned to be a preferable solution to manually clamping the TCP-MSS for all traffic.

Getting back to SDM, I installed the full SDM on my router via TFTP (since the actual SDM installer just hung repeatedly, despite following Cisco’s instructions for retro-fitting existing routers with SDM). SDM is certainly more feature rich than SDM Express, but I don’t rate it quite as highly as ASDM for PIX. I ended up doing the bulk of my config by hand, from CLI, and using SDM just as a monitoring front end. It does have an audit tool however, which can be a nice security check of your config. It mostly suggests turning off services like pad and finger. Hopefully someday soon, these will be off by default anyway.

A few complaints about SDM – setting the timezone for your router is kind of weird. It called my timezone “Napier”, which, although is in NZ, and the same timezone as Auckland, I’ve never seen it referred to like that before. Officially, our timezone should be NZST/NZDT or Pacific/Auckland. SDM also configured absolute dates for daylight saving start/end. This is not correct – DST start/end is determined by week number in October and March respectively.

Configuring the IPSEC tunnel initially in SDM was a lesson in Cisco etiquette. It had some default IPSEC proposals that it wouldn’t let me delete, so I had to add my preferred proposals as secondary options. Afterwards, I tweaked the crypto map by hand in the CLI.

Don’t rely on SDM to get the ordering right of access-list entries. For ease of editing, I no longer used numeric access-lists, except for simple one or two-liners. Instead I use the “ip access-list extended ” format, makes it easy to remove individual entries. You can also easily insert entries by specifying the entry line-number, a bit like a BASIC program listing. Lastly, be careful when closing the SDM window, because it closes all your browser windows!

A couple of things to beware of with the 857 (as opposed to the 877). The 857 is the successor to the SOHO 97, not the 827 or 837 as one might think. As such, it is not particularly grunty, and if you run a lot of sessions or IPSEC tunnels (maximum of 5), you might find the CPU getting quite bogged down. The 857 does not support IPv6, which is surprising, since an 827/837 can, with the right IOS image. It also does not support class-based queuing, which can be a problem if you wanted to reserve bandwidth for, and prioritise VoIP traffic. I haven’t yet found a way to run the router’s SSH on a non-standard port, since the vty complains if you try to assign it to a different rotary group.

So, while the 857 is successfully doing firewalling, NAT and IPSEC for me now, I’m sorta wishing I’d spent the extra money and bought an 877.

Catalyst Express(?!) 500 switches

On the weekend I helped a friend set up some Cisco kit involving Catalyst Express 500 switches, Aironet 1310 wireless bridges, and various Cisco IP phones. What should have been a relatively simple job ended up taking about 10 hours.

My friend had already configured the 2800 series router, with Cisco Call Manager Express. Phones attached to a local Cat 500 worked fine – upon booting up, they configured their VLAN, got a DHCP lease, and registered to CCME. Next came the Aironets. Having never configured VLAN trunking on a wireless link, this was new ground for me. At first we had each VLAN (native VLAN 1, and voice VLAN 200) using its own SSID. It wasn’t until later that we realised we could run both over the same SSID.

Where the problems arose, was on the remote Cat 500 (ie, on the other side of the wireless bridge). The phones connected to that switch appeared to time out, trying to get a DHCP lease. Since we had previously plugged them in on the local Cat 500, they eventually fell back to their previous DHCP lease, and were actually able to make calls. This wouldn’t help us for plugging new phones in though. Since my friend had done the initial configuration of the Cat 500’s, and wasn’t sure at the time whether the Aironets were supposed to run as bridges, access points or both, he had set the Smart Port type on Aironets’ Cat 500 switchports to “Access Point”.

When I looked at the interface stats on each of the Aironets, neither of them reported any broadcasts for the radio interface. I thought this was odd, since a DHCP request is a broadcast. I even tried debugging IP packets on the Aironet, while a phone booted up. Nope, don’t see any DHCP requests. I started to get the feeling the switches were blocking broadcasts somehow, but at that point still assumed my friend had set up the switch correctly. Also, without a CLI on the Catalyst 500’s we couldn’t debug packets or VLAN encapsulation on the suspect port. In between a few drinks of vodka and Coke, some pizza, and a few rounds of the game “Burnout”, we chased our tails for several hours, eventually giving up and sleeping on it.

The next morning, I revisited my theory of the Cat 500 being at fault. We upgraded the IOS of all the switches, and the Aironet bridges. Still no luck. I went back into the Cat 500 web config (did I mention these things have no CLI?), and examined the switchport config. Neither my friend or I were exactly sure what the various Smart Port roles did, in terms of spanning tree, VLAN access/trunking, and QoS. We could only guess, since the documentation doesn’t really make it very clear either, and without a CLI, we couldn’t even verify the config against our own knowledge of Cisco IOS. Then I said, “Hey, we have a VLAN trunk going across the radio, these Aironet ports are gonna have to be in trunk mode too. What does the ‘Switch’ Smart Port role do?”

And then it worked.

Yep, a “Switch” Smart Port allows VLAN trunking. Apparently an “Access Point” Smart Port does too, otherwise we wouldn’t have been able to make calls earlier (remember the Skinny & RTP traffic would have been in VLAN 200). For some reason I’m still scratching my head about though, it was not passing broadcasts (even from the native VLAN, when we tried to get a notebook attached to the remote Cat 500 to pick up a DHCP lease). The only hint given by the documentation, is that an “Access Point” Smart Port allows up to 30 connected users on the access point. Ok, so it’s enforcing a MAC address limit of 31 on that switch port. Why not just say that… it still didn’t explain not forwarding broadcasts though. How is a client associated to an access point supposed to obtain an IP address?

So that was all good, and we proceded to set up the encryption on the wireless link. That was Brainfuck #2. Over the years, Cisco have added so many extensions to IEEE and IETF standards, that setting up Aironets is like eating alphabet soup. It was only due to my previous training and experience with Proxim wireless gear that I was able to correctly choose the ciphers that are more commonly known as WPA2. After another treasure hunt through the config to find where to set the pre-shared key for WPA, we had encryption working.

In the meantime however, the Catalyst 500’s had disabled the ports that the Aironets were connected to. According to the Cat 500 event log, it was because it had detected one way communication on the port. Err… what? Ok, sure, the wireless link was down while we were setting up the encryption, but disabling a port because of that could get real annoying, especially if the link got knocked out by weather. We re-enabled the ports (a tedious process of going into the Cat 500 web GUI, disabling the port, applying that, re-enabling the port, applying that again).

Finally, it seemed everything was working. Looking back on it after some thought, I suspect the Catalyst 500’s are getting upset with the “Switch” Smart Ports when they don’t hear any spanning tree BPDU’s on them. STP was enabled for both VLANs on the Catalyst 500’s, we told it we had a switch connected to those ports, and it recognised the active ethernet link status. With the wireless link down however, the switches wouldn’t have seen any BPDU’s arrive on that port for several minutes (we are not running STP on the Aironets). That would be long enough for spanning tree to get upset – while on the other hand, the ethernet link status was still ok. The switch might have thought there was a faulty TX at the other end of the cable. Since these Catalyst 500 switches are seriously dumbed down, your guess is as good as mine.

Either that, or it was secretly using UDLD (unidirectional link detection), a feature often used by “real” Catalyst switches to detect faulty transmit circuitry on GBIC interfaces. Except that UDLD should only be applied on fibre interfaces (ie, where one fibre of the pair could be damaged). Copper interfaces shouldn’t need UDLD.

So, since we are not running STP on the Aironet bridges, my only guess is that enabling it might keep the Catalyst 500’s happy in the event of a wireless link failure. The Cat 500 will still see BPDU’s from its directly-attached Aironet… Enabling spanning tree on the Aironets could be Brainfuck #3 though, since there is no way (that I could find) to set the bridge priority on a Catalyst 500. In an environment such as the one I was helping set up, there were no other “real” Catalysts that we could have set as the spanning tree root bridge. So the Catalyst 500 (or Aironet) with the lowest MAC address would be elected as the root bridge. It would be seriously uncool if that switch just happened to be the one at the far end of the wireless link.

So what did I learn from that experience? I won’t be using a Catalyst 500 anytime soon, and even if I do, only as an access layer switch in conjunction with a more fully featured Catalyst (at least a 2950).

BCMSN exam

I passed my Cisco BCMSN exam with flying colours, despite a really bad feeling I had about the exam before clicking the final “submit” button. Cisco have changed the format of the exam, and while it certainly looks more modern, I couldn’t really say whether it’s an improvement or not.

I’ve decided to study for the BSCI exam next, which should hopefully be a little more of the topics I’m used to – routing, IP subnetting, etc. Finding time to study has been difficult lately, with work starting to get a little busier. My estimate of two weeks of study per exam might need a little tweaking, or at least consideration of a couple of weeks between each study run.