Ict-innovation/LPI/109.3
109.3 Basic Network Troubleshooting
[edit | edit source]Candidates should be able to troubleshoot networking issues on client hosts
Key Knowledge Areas
- Manually and automatically configure network interfaces and routing tables to include adding, starting, stopping, restarting, deleting or reconfiguring network interfaces.
- Change, view, or configure the routing table and correct an improperly set default route manually.
- Debug problems associated with the network configuration.
The figure summarises various points at which networking can fail, and may serve as a basis for fault-finding.
Figure 109.3-1: Potential Points of Failure in a Network
Can Linux find your network card?
The network interface card (NIC) must be supported by the kernel. Topic 109.2 provides some suggestions to help you determine if the kernel has detected your network card and loaded an appropriate driver.
Does it have an IP address assigned?
Use the command ifconfig to determine the network settings of your card:
Example:
# ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:0C:29:76:13:68 inet addr:192.168.81.130 Bcast:192.168.81.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fe76:1368/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:6240 errors:0 dropped:0 overruns:0 frame:0 TX packets:8402 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:475800 (464.6 KiB) TX bytes:1392930 (1.3 MiB) Base address:0x2000 Memory:d8920000-d8940000 |
In particular verify that the IP address and netmask are correct.
Can you ping other machines on the network?
The ping command can be used to test reachability of a machine by sending an ICMP echo request packet and awaiting a reply. A simple test is to ping another machine on the same network. In the example below the -c flag is used to limit the number of pings. By default, ping will continue indefinitely at 1 second intervals.
# ping -c 4 192.168.81.129 PING 192.168.81.129 (192.168.81.129) 56(84) bytes of data. 64 bytes from 192.168.81.129: icmp_seq=1 ttl=64 time=0.822 ms 64 bytes from 192.168.81.129: icmp_seq=2 ttl=64 time=1.15 ms 64 bytes from 192.168.81.129: icmp_seq=3 ttl=64 time=0.812 ms 64 bytes from 192.168.81.129: icmp_seq=4 ttl=64 time=0.745 ms
4 packets transmitted, 4 received, 0% packet loss, time 3000ms rtt min/avg/max/mdev = 0.745/0.884/1.157/0.160 ms |
Does DNS name resolution work?
If you can connect to a machine by specifying its IP address but not by specifying its name you should suspect that name resolution is not working. First, verify that the correct DNS servers are specified in /etc/resolv.conf. See topic 109.4 for more detail on DNS client-side configuration.
Verify that you can ping your DNS servers by IP address.
Use the dig utility to manually test DNS lookups:
# dig www.lpi.org ; <<>> DiG 9.3.4-P1 <<>> www.lpi.org ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 24846 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;www.lpi.org.INA
www.lpi.org.5INA24.215.7.162
;; SERVER: 192.168.81.2#53(192.168.81.2) ;; WHEN: Thu Sep 16 15:09:10 2010 ;; MSG SIZE rcvd: 45 |
Verify that an answer (an A record) was received, and that it came from the DNS server you expect (192.168.81.2 in the example above).
The nslookup command can also be used but gives slightly less information:
Server: 192.168.81.2 Address: 192.168.81.2#53
Name:www.lpi.org Address: 24.215.7.162 |
Are your servers listening?
If other machines cannot connect to your services, verify that the services are running; for example:
# service sshd status sshd (pid 5851 5849 5220) is running... |
Or you can look for them in the output from ps:
root 5220 1 0 08:41 ? 00:00:00 /usr/sbin/sshd root 11440 11415 0 15:25 pts/2 00:00:00 grep sshd |
Verify that the service is listening on the expected port. The command netstat can be used to examine active ports; for example:
$ netstat -ant Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 192.168.122.1:53 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:23 0.0.0.0:* LISTEN tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN tcp 0 0 192.168.81.1:60864 192.168.81.130:22 ESTABLISHED tcp 0 0 192.168.1.78:43011 174.129.193.12:443 ESTABLISHED tcp6 0 0 :::22 :::* LISTEN tcp6 0 0 ::1:631 :::* LISTEN |
Here, the first three lines of output represent:
- A local DNS server listening on an internal network 192.168.122.0
- A secure shell server listening on port 22
- A CUPS print server listening on port 631 (but only on the loopback address)
The command netstat -i will show network interfaces. It provides similar information to running ifconfig with no arguments:
$ netstat -i Kernel Interface table Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg eth0 1500 0 0 0 0 0 0 0 0 0 BMU lo 16436 0 60 0 0 0 60 0 0 0 LRU virbr0 1500 0 0 0 0 0 34 0 0 0 BMRU vmnet1 1500 0 0 0 0 0 34 0 0 0 BMRU vmnet8 1500 0 8858 0 0 0 6676 0 0 0 BMRU wlan0 1500 0 66642 0 0 0 20490 0 0 0 BMRU |
In the output above, you see a wired network interface (eth0), a wireless interface (wlan0), the loopback interface (lo) and three interfaces to support virtualization (virbr0, vmnet1 and vmnet8)
Other useful options for the netstat command include:
-r, --route:Show the routing table (similar to route -n)
-t, --tcp:Show TCP endpoints
-u, --udp:Show UDP endpoints
-a, --all:Show both listening and connected endpoints. By default only connectedendpoints are shown
-n, --numeric:Show numeric values instead of trying to determine symbolic names forhosts or ports
You can also examine active TCP and UDP endpoints with the command lsof -i:
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME portmap 4977 rpc 3u IPv4 9744 UDP *:sunrpc portmap 4977 rpc 4u IPv4 9745 TCP *:sunrpc (LISTEN) rpc.statd 5006 root 3u IPv4 9807 UDP *:945 rpc.statd 5006 root 6u IPv4 9794 UDP *:942 rpc.statd 5006 root 7u IPv4 9814 TCP *:948 (LISTEN) hpiod 5200 root 0u IPv4 10272 TCP m1530-rhel.example.com:2208 (LISTEN) python 5205 root 4u IPv4 10302 TCP m1530-rhel.example.com:2207 (LISTEN) sshd 5220 root 3u IPv6 10327 TCP *:ssh (LISTEN) cupsd 5231 root 3u IPv4 10362 TCP m1530-rhel.example.com:ipp (LISTEN) cupsd 5231 root 5u IPv4 10365 UDP *:ipp sendmail 5251 root 4u IPv4 10440 TCP m1530-rhel.example.com:smtp (LISTEN) avahi-dae 5370 avahi 13u IPv4 10721 UDP *:mdns avahi-dae 5370 avahi 14u IPv6 10722 UDP *:mdns avahi-dae 5370 avahi 15u IPv4 10723 UDP *:filenet-rpc avahi-dae 5370 avahi 16u IPv6 10724 UDP *:filenet-nch dhclient 11325 root 5u IPv4 17296 UDP *:bootpc |
As you can see, the lsof output is more informative – it tells you the name and the PID of the process that is using the endpoint.
Is your firewall blocking access?
If other machines are unable to connect to your services, check that your firewall is not blocking access. A quick way to make this check is to briefly disable the firewall and repeat the test. However, do not forget to re-enable it immediately afterwards. From the command line you can flush the firewall rules with the command:
|
Other diagnostic tools
The command traceroute can be used to trace the path that a packet takes to a specific destination. For example:
traceroute to www.lpi.org (24.215.7.162), 30 hops max, 40 byte packets 1 192.168.81.2 (192.168.81.2) 0.326 ms 0.207 ms 0.212 ms 2 BThomehub.home (192.168.1.254) 81.405 ms 78.430 ms 77.940 ms 3 217.47.111.122 (217.47.111.122) 20.152 ms 21.611 ms 23.679 ms 4 217.47.111.161 (217.47.111.161) 23.903 ms 26.248 ms 27.741 ms 5 213.1.69.38 (213.1.69.38) 29.059 ms 31.459 ms 33.284 ms 6 213.120.180.197 (213.120.180.197) 33.977 ms 36.439 ms 38.711 ms 7 213.120.179.26 (213.120.179.26) 41.249 ms 25.029 ms 25.639 ms 8 213.120.179.178 (213.120.179.178) 25.474 ms 26.261 ms 25.895 ms ... lines deleted ... 21 clark.lpi.org (24.215.7.162) 149.154 ms 123.495 ms 123.828 ms |
The times shown in the output above are the round-trip times of the probes to each gateway. (Each gateway is probed three times.)
The tracepath utility gives similar information with a slightly different format:
1: 192.168.81.130 (192.168.81.130) 0.161ms pmtu 1500 1: 192.168.81.2 (192.168.81.2) 0.379ms 2: BThomehub.home (192.168.1.254) asymm 1 98.572ms 3: 217.47.111.122 (217.47.111.122) asymm 1 54.991ms 4: 217.47.111.161 (217.47.111.161) asymm 1 49.023ms 5: 213.1.69.38 (213.1.69.38) asymm 1 48.824ms 6: 213.120.180.197 (213.120.180.197) asymm 1 48.795ms 7: 213.120.179.26 (213.120.179.26) asymm 1 48.786ms 8: 213.120.179.178 (213.120.179.178) asymm 1 48.361ms ... lines deleted ... 21: clark.lpi.org (24.215.7.162) asymm 1 278.213ms reached |
Usually it is only the first couple of hops in this route that are on your own network. Anything beyond that is likely outside of your administrative control.
The following is a partial list of the used files, terms and utilities:* ifconfig
- ifup
- ifdown
- route
- host
- hostname
- dig
- netstat
- ping
- traceroute