How to find out that TCP Payloads Are Identical!
Problem Statement!
Vendor tech support was suggesting that a firewall was subtly
modifying TCP data payloads.
Solution:
I took a packet capture from both hosts involved in the conversation and
started thinking about how to see if the data sent by the server was the same
as the data received by the client. I couldn't just compare the capture files
themselves, because elements like timestamps, TTLs, and IP checksums would be
different.
After a bunch of fiddling around, I came up with the idea of using tshark to extract the TCP
payloads for each stream in the capture file and hash the results. If the hashes matched, the TCP payloads were
being transferred unmodified. Here are the shell commands to do this:
tshark -r server.pcap -T fields -e tcp.stream
| sort -u | sed 's/\r//' | xargs -i tshark -r server.pcap -q -z
follow,tcp,raw,{} | md5sum
2cfe2dbb5f6220f29ff8aff82f7f68f5 *-
You then run exactly the same commands on the "client.pcap" file and
compare the resulting hashes. Let's break this down a bit more:
tshark -r server.pcap -T fields -e
tcp.stream
This invokes tshark to read the "server.pcap" file and output the TCP
stream indexes of each packet. This is just a long series of integers:
0
0
1
2
1
etc.
The next command, sort -u,
produces a logical set of the unique (hence the "-u") stream indexes.
In other words, it removes duplicates from the previous list. Not all Unix-like
operating systems have the "sort
-u" option; if yours is missing it, you can use "| sort | uniq" instead.
Next, sed 's/\r//' removes
the line break from the end of the resulting stream indexes. If you don't do
this, you'll get an error from the next command.
The next one's a bit of a doozy: xargs
-i takes each stream index (remember, these are just integers) and
executes the tshark -r server.pcap
-q -z follow,tcp,raw,{}command once for each stream index, substituting the
input stream index for the {} characters.
The tshark -r server.pcap -q -z
follow,tcp,raw,{} command itself reads the capture file a second time,
running the familiar "Follow TCP Stream in Raw Format" command from
Wireshark on the specified TCP stream index that replaces the {} characters. If
you're rusty on Wireshark, "Follow TCP Stream" just dumps the TCP
payload data in one of a variety of formats, such as "raw" or ASCII.
If you've never used this option in Wireshark, make sure you try it today!
The final command, md5sum, runs
a MD5 hash on the preceding input.
To summarize, we've done this: taken a file, extracted all the raw TCP data
payloads from its packets (without headers), and hashed the data with MD5. If
we do this on two files and the hashes are the same, we know they contain
exactly the same TCP data (barring the infinitesimally small probability of a
MD5 hash collision).
In my case, both capture files produced the same hash, proving that the
firewall was (for once) playing nice.
I took a packet capture from both hosts involved in the conversation and started thinking about how to see if the data sent by the server was the same as the data received by the client. I couldn't just compare the capture files themselves, because elements like timestamps, TTLs, and IP checksums would be different.
After a bunch of fiddling around, I came up with the idea of using tshark to extract the TCP payloads for each stream in the capture file and hash the results. If the hashes matched, the TCP payloads were being transferred unmodified. Here are the shell commands to do this:
tshark -r server.pcap -T fields -e tcp.stream | sort -u | sed 's/\r//' | xargs -i tshark -r server.pcap -q -z follow,tcp,raw,{} | md5sum
2cfe2dbb5f6220f29ff8aff82f7f68f5 *-
You then run exactly the same commands on the "client.pcap" file and compare the resulting hashes. Let's break this down a bit more:
tshark -r server.pcap -T fields -e tcp.stream
This invokes tshark to read the "server.pcap" file and output the TCP stream indexes of each packet. This is just a long series of integers:
0
0
1
2
1
etc.
The next command, sort -u, produces a logical set of the unique (hence the "-u") stream indexes. In other words, it removes duplicates from the previous list. Not all Unix-like operating systems have the "sort -u" option; if yours is missing it, you can use "| sort | uniq" instead.
Next, sed 's/\r//' removes the line break from the end of the resulting stream indexes. If you don't do this, you'll get an error from the next command.
The next one's a bit of a doozy: xargs -i takes each stream index (remember, these are just integers) and executes the tshark -r server.pcap -q -z follow,tcp,raw,{}command once for each stream index, substituting the input stream index for the {} characters.
The tshark -r server.pcap -q -z follow,tcp,raw,{} command itself reads the capture file a second time, running the familiar "Follow TCP Stream in Raw Format" command from Wireshark on the specified TCP stream index that replaces the {} characters. If you're rusty on Wireshark, "Follow TCP Stream" just dumps the TCP payload data in one of a variety of formats, such as "raw" or ASCII. If you've never used this option in Wireshark, make sure you try it today!
The final command, md5sum, runs a MD5 hash on the preceding input.
To summarize, we've done this: taken a file, extracted all the raw TCP data payloads from its packets (without headers), and hashed the data with MD5. If we do this on two files and the hashes are the same, we know they contain exactly the same TCP data (barring the infinitesimally small probability of a MD5 hash collision).
In my case, both capture files produced the same hash, proving that the firewall was (for once) playing nice.
ReplyDeleteVery informative post. I was looking for information about this topic and this post really helped me a lot. Thanks for sharing. Also you may want to take a look at these amazing Cisco products.
C9300-24P-E
C9400-LC-48UX
C9400-PWR-3200AC
C9500-24X-A