Analyzing Security Monitoring Data

Lesson Introduction

Security information derives from network packet captures, traffic monitoring, and logs from security appliances and network application services. A monitoring tool is software that collects this data from hosts and network appliances for analysis and is the basis for an alerting system. These tools can be used to identify ongoing cybersecurity attacks and perform forensic analysis of past attacks. As a cybersecurity analyst, you must be able to analyze these data sources to identify threats and implement appropriate configuration changes in response. You should also be able to analyze email messages to detect malicious links and attachments, and to verify security properties, such as digital signatures and sender policy frameworks.

Lesson Objectives

In this lesson you will:

Analyze network monitoring output.
Analyze appliance monitoring output.
Analyze endpoint monitoring output.
Analyze email monitoring output.

OBJECTIVES COVERED

Given a scenario, analyze data as part of security-monitoring activities.

Given a scenario, utilize basic digital forensics techniques.

Network-related indicators of compromise (IoCs) derive from packet capture and traffic flow data, plus logs and alerts from security and network appliances. Detecting indicators from everything that can be observed about a network requires automated tools and visualization software, and even then, the information reported takes skill and experience to interpret appropriately. As a CySA+ professional, you must be able to analyze packet data and traffic-monitoring statistics to identify indicators of abnormal activity.

NETWORK FORENSICS ANALYSIS TOOLS

Protocol and packet security monitoring depends on forensics tools to capture and decode the frames of data. Network traffic can be captured from a host or from a network segment. Using a host means that only traffic directed at that host is captured. Capturing from a network segment can be performed by a switched port analyzer (SPAN) port (or mirror port). This means that a network switch is configured to copy frames passing over designated source ports to a destination port, which the packet sniffer is connected to. Sniffing can also be performed over a network cable segment by using a test access port (TAP). This means that a device is inserted in the cabling to copy frames passing over it. There are passive and active (powered) versions.

Typically, sniffers are placed inside a firewall or close to a server of particular importance. The idea is usually to identify malicious traffic that has managed to get past the firewall. A single sniffer can generate an exceptionally large amount of data, so you cannot just put multiple sensors everywhere in the network without provisioning the resources to manage them properly. Depending on network size and resources, one or just a few sensors will be deployed to monitor key assets or network paths.

The two tools most often used to perform network analysis are tcpdump and Wireshark. These can be used to operate a sniffer to perform live packet capture, or to analyze a PCAP file of saved network data.

TCPDUMP

tcpdump is a command-line packet capture utility for Linux, though a version of the program called windump is available for certain earlier versions of Windows (winpcap.org/windump). The basic syntax of the command is tcpdump -i eth, where eth is the interface to listen on (you can substitute with the keyword any to listen on all interfaces of a multihomed host). The utility will then display captured packets until halted manually (Ctrl+C).

The operation of the basic command can be modified by switches. Some of the most important of these are:

Switch	Usage
-n	Show addresses in numeric format (don’t resolve host names).
-nn	Show address and ports in numeric format.
-e	Include the data link (Ethernet) header.
-v, -vv, -vvv	Increase the verbosity of output, to show more IP header fields, such as TTL.
-X	Capture the packet payload in hex and ASCII. Use -XX to include the data link header too.
-s Bytes	By default, tcpdump captures the first 96 bytes of the data payload. To capture the full payload, set the snap length to zero with -s 0.
-w File	Write the output to a file. Packet capture files are normally identified with a .pcap extension.
-r File	Display the contents of a packet capture file.

There are numerous filter options, which can be combined using logical and (&&), or (||), not (!), and groups (parentheses). Some basic filter keywords include:

Switch	Usage
host	Capture source and destination traffic from the specified IP or host name.
src / dst	Capture only source or destination traffic from the specified IP.
net	Capture traffic from the specified subnet (use CIDR notation).
port	Filter to the specified port (or range of ports, such as 21-1024). You can also use src port or dst port.
proto	Filter to a protocol, such as ip, ip6, arp, tcp, udp, or icmp.

** Refer to tcpdump.org for the full help and usage examples. **

WIRESHARK

Wireshark (wireshark.org) is an open-source graphical packet capture utility, with installer packages for most operating systems. Having chosen the interfaces to listen on, the output is displayed in a three-pane view, with the top pane showing each frame, the middle pane showing the fields from the currently selected frame, and the bottom pane showing the raw data from the frame in hex and ASCII. Wireshark is capable of parsing (interpreting) the headers and payloads of hundreds of network protocols.

You can apply a capture filter or filter the output using the same expression syntax as tcpdump (though the expression can be built via the GUI tools too). You can save the output to a .pcap file or load a file for analysis. Wireshark supports very powerful display filters (wiki.wireshark.org/DisplayFilters) that can be applied to a live capture or to a capture file. You can also adjust the coloring rules (wiki.wireshark.org/ColoringRules), which control the row shading and font color for each frame.

Another useful option is to use the Follow TCP Stream context command to reconstruct the packet contents for a TCP session.

** There is also a command-line version of the program, called tshark (wireshark.org/docs/man-pages/tshark.html). **

** The PCAP file format has some limitations, which has led to the development of PCAP Next Generation (PCAPNG). Wireshark now uses PCAPNG by default, and tcpdump can process files in the new format too (cloudshark.io/articles/5-reasons-to-move-to-pcapng). **

PACKET ANALYSIS

Packet analysis refers to deep-down frame-by-frame scrutiny of captured frames using a tool such as Wireshark. You can use packet analysis to detect whether packets passing over a standard port have been manipulated in some nonstandard way, to work as a beaconing mechanism for a C&C server for instance. You can inspect protocol payloads to try to identify data exfiltration attempts or attempts to contact suspicious domains and URLs.

One use case for packet analysis is to identify and extract binary file data being sent over the network. A network file-carving tool, such as NetworkMiner (netresec.com/?page=networkminer), can reconstruct the correct byte order (in case packets were transmitted out of sequence), strip the protocol information, and save the resulting data to a file. In the case of Windows binaries, the tool will also usually be able to identify the file type. In the case of malware, you will be interested in locating Windows PE (executable) files. The file-carving tool must be able to support the network protocol used: HTTP, FTP, or SMB, for instance. Many network-based intrusion detection systems, notably Suricata (suricata.readthedocs.io/en/suricata-4.1.2/file-extraction/file-extraction.html) and Zeek/Bro (docs.zeek.org/en/stable/frameworks/file-analysis.html), can also perform file extraction.

You should note that there are opportunities to obfuscate the presence of binaries within network traffic. An adversary may encode the binary differently for reconstruction offline, or strip a small part of it, such as the byte header identifying the file type. They may also make changes to the protocol headers to try to frustrate extraction of the file by common tools.

In the following example, a packet capture has been loaded into NetworkMiner for analysis. The program detects two executable files being transferred between hosts on a local network over SMB.

*Using NetworkMiner to identify malicious executables being transferred over SMB. (Screenshot NetworkMiner* *netresec.com/?page=networkminer*)

The files can also be extracted using Wireshark (File > Export Objects > SMB). Once exported, you could put the files in a sandbox for analysis.

*Using Wireshark to identify malicious executables being transferred over SMB. (Screenshot Wireshark* *wireshark.org*)

Slice & Dice PCAP files in Wireshark

Check out:

Statistics -> I/O Graph
Statistics -> Flow Graph

https://malware-traffic-analysis.net/

PROTOCOL ANALYSIS

Packet analysis means looking in detail at the header fields and payloads of selected frames within a packet capture. Protocol analysis means using statistical tools to analyze a sequence of packets, or packet trace. Analyzing statistics for the whole packet trace is the best way of identifying individual frames for closer packet analysis. The contents and metadata of protocol sessions can reveal insights when packet content might not be available, such as when the packet contents are encrypted. For example, a brief exchange of small payloads with consistent pauses between each packet might be inferred as an interactive session between two hosts, whereas sustained streams of large packets might be inferred as a file transfer.

An unidentified or unauthorized protocol is a strong indicator of an intrusion, but you should also be alert to changes in relative usage of protocols. When performing statistical analysis, you need a baseline for comparison so that you can identify anomalous values. This is best done with visualization tools, to give you a graph of protocol usage. A given network will often use a stable mix of standard protocols. If you notice that DNS traffic (for instance) is much higher than normal, that might be cause for investigation. As another example, you might notice multigigabyte transfers over HTTP at an unusual time of day and decide to investigate the cause.

In the following example, an analyst looks at statistics for protocol usage during a packet capture in Wireshark (Statistics > Protocol Hierarchy).

*The Protocol Hierarchy Statistics report shows which protocols generated the most packets or consumed the most bandwidth. (Screenshot: Wireshark* *wireshark.org*)

Is the volume of ARP traffic unusual? Browsing the packet capture for ARP packets reveals a big block of scanning activity:

*An ARP sweep in progress. (Screenshot: Wireshark* *wireshark.org*)

In the following example, an analyst captures traffic for five minutes. As a first step, the analyst looks at a summary of the capture using Wireshark’s Expert Info feature (Analyze > Expert Info).

*Viewing a traffic summary report in Wireshark. (Screenshot: Wireshark* *wireshark.org*)

The screenshot of the packet capture shows a high number of chats. This could just mean a busy server but note also the high number of resets and compare the activity to a summary of more “normal” traffic (captured over a similar five-minute duration):

*Summary of traffic in “baseline” conditions. (Screenshot: Wireshark* *wireshark.org*)

Also note the high numbers of errors and warnings compared to the baseline report. Next, the analyst chooses to view a graph of traffic flows (in Wireshark, select Statistics > Flow Graph). This view shows that traffic from 10.1.0.131 is highly one-sided, compared to more normal traffic from 10.1.0.128.

*Graphing traffic flow—a “normal” exchange is shown at the top between 10.1.0.1 and 10.1.0.128, but the traffic generated by 10.1.0.131 is typical of half-open scanning. (Screenshot: Wireshark* *wireshark.org*)

Having identified a suspect IP address, the analyst applies a filter to the traffic capture and adjusts the time field to show elapsed time from the previous packet.

*Viewing a suspected port scan in Wireshark. (Screenshot: Wireshark* *wireshark.org*)

The screenshot of the packet capture shows that there are tens of connection attempts to different ports within milliseconds of one another, indicative of port scanning activity.

FLOW ANALYSIS

Packet capture generates a large volume of data very quickly. Full packet capture (FPC) and retrospective network analysis (RNA) allow complete recall and analysis of network traffic over a given period, but the capture and storage resources required to implement such a system are massive. Many companies do not have the resources to run captures all the time. A flow collector is a means of recording metadata and statistics about network traffic rather than recording each frame. Network traffic and flow data may come from a wide variety of sources (or probes), such as switches, routers, firewalls, web proxies, and so forth. Data from probes is stored in a database and can be queried by client tools to produce reports and graphs. Flow analysis tools can provide features such as:

Highlighting of trends and patterns in traffic generated by particular applications, hosts, and ports.
Alerting based on detection of anomalies, flow analysis patterns, and custom triggers that you can define.
Visualization tools that enable you to quickly create a map of network connections and interpret patterns of traffic and flow data.
Identification of traffic patterns revealing rogue user behavior, malware in transit, tunneling, applications exceeding their allocated bandwidth, and so forth.

NetFlow

NetFlow is a Cisco-developed means of reporting network flow information to a structured database. NetFlow has been redeveloped as the IP Flow Information Export (IPFIX) IETF standard (tools.ietf.org/html/rfc7011). A particular traffic flow can be defined by packets sharing the same characteristics, referred to as keys, such as IP source and destination addresses and protocol type. A selection of keys is called a flow label, while traffic matching a flow label is called a flow record. NetFlow provides the following useful information about packets that traverse NetFlow-enabled devices:

The networking protocol interface used
The version and type of IP used
The source and destination IP addresses
The source and destination User Datagram Protocol (UDP)/Transmission Control Protocol (TCP) port
The IP’s type of service (ToS) used

You can use a variety of NetFlow monitoring tools to capture data for point-in-time analysis and to diagnose any security or operational issues the network is experiencing. There are plenty of commercial NetFlow suites, plus products offering similar functionality to NetFlow. The SiLK suite (tools.netsa.cert.org/silk) and nfdump/nfsen (nfsen.sourceforge.net) are examples of open-source implementations. Another popular tool is Argus (openargus.org). This uses a different data format to NetFlow, but the client tools can read and translate NetFlow data.

Zeek (Bro)

NetFlow reports metadata about network traffic rather than capturing actual traffic, and the data is often sampled. This means that it is not an accurate forensic record of network activity. Packet capture provides a complete record but includes a huge amount of data that is not relevant to security intelligence. A tool such as Zeek Network Monitor (zeek.org), formerly called Bro, occupies the space in-between. It operates as a passive network monitor, reading packets from a network tap or mirror port in the same way as a sniffer. Unlike a sniffer, Zeek is configured to log only data of potential interest, reducing storage and processing requirements. It also performs normalization on the data, storing it as tab-delimited or Java Script Object Notation (JSON) formatted text files. This configuration is achieved using a scripting language, which can also be used to customize data collection and alert settings.

Multi Router Traffic Grapher (MRTG)

The Multi Router Traffic Grapher (MRTG) creates graphs showing traffic flows through the network interfaces of routers and switches by polling the appliances using the Simple Network Management Protocol (SNMP). This can provide a visual clue if a network link is experiencing higher than normal traffic flow. MRTG (oss.oetiker.ch/mrtg/index.en.html) is open-source software that must be compiled for the target UNIX or Linux system from the source code. It can also be used under Windows from within a Perl interpreter. Once the program is installed, you configure the list of SNMP-enabled IP or Ethernet interfaces that it will monitor.

IP ADDRESS AND DNS ANALYSIS

One of the principal areas of interest when analyzing traffic for signs of compromise is access requests to external hosts. Many intrusions rely on a C&C server to download additional attack tools and to exfiltrate data. It is also an area where threat intelligence is extremely valuable because you can correlate the IP addresses, domains, and URLs you see in local network traffic with reputation tracking whitelists and blacklists via a SIEM.

IP Address and Domain Name System (DNS) Analysis

Historically, malware would be configured to contact a C&C server using a static IP address or range of IP addresses, or DNS name, incorporating the address as part of the malware code. This type of beaconing is not highly effective because the malicious addresses can be identified quite easily, blocked from use, and the malware located and destroyed. Where this type of attack is still used, it can be identified by correlating the destination address information from packet traces with a threat intelligence feed of known-bad IP addresses and domains.

There are many providers of reputation risk intelligence and IP/URL blacklists. Some examples additional to the CTI sources we have listed already include BrightCloud (brightcloud.com), MX Toolbox (mxtoolbox.com/blacklists.aspx), urlvoid.com, and ipvoid.com.

*Configuring a URL blacklist in the SquidGuard proxy running on pfSense. (Screenshot Netgate pfSense* *netgate.com/solutions/pfsense*.)

Domain Generation Algorithm Analysis

To avoid using hard-coded IP ranges, malware has switched to domains that are dynamically generated using an algorithm, usually referred to as a domain generation algorithm (DGA). These work in a comparable way to time-based one-time passwords, with the advantage that the attacker only needs to generate a range of matching values. In outline, DGA works as follows:

The attacker sets up one or more dynamic DNS (DDNS) services, typically using fraudulent credentials and payment methods or using bulletproof hosting, where the service provider does not act against illicit activity and usage.
The malware code implements a DGA to create a list of new domain names. The DGA uses a base value (seed) plus some time- or counter-based element. This element could be the actual date, but most attackers will try to come up with something that is more difficult to reverse engineer and block, to frustrate the development of detection rules. The output may be a cryptic string of characters, or it may use word lists to create domains that do not raise suspicion. These domain name parts are combined with one or more top level domains (TLD), or possibly used as a subdomain of some public domain that allows user-generated parts. The public domain label might imitate a legitimate domain in some way (as a cousin domain).
A parallel DGA, coded with the same seed and generation method, is used to create name records on the DDNS service. This will produce a range of possible DNS names, at least some of which should match those generated by the malware code.
When the malware needs to initiate a C&C connection, it tries a selection of the domains it has created.
The C&C server will be able to communicate a new seed, or possibly a completely different DGA periodically to frustrate attempts to analyze and block the algorithm.

DGA can be combined with techniques to continually change the IP address that a domain name resolves to. This continually-changing architecture is referred to as a fast flux network (Digging Deeper—an In-depth Analysis of a Fast Flux Network).

Some DGAs produce long DNS labels, which are relatively easy to identify through expression matching filters. For shorter labels, statistical analysis software can be used to identify suspicious labels by checking factors such as the consonant-to-vowel ratio. Unfortunately, many legitimate providers use computer-generated labels rather than human-readable ones, making this method prone to false positives. Another indicator for DGA is a high rate of NXDOMAIN errors returned to the client or logged by the local DNS resolver.

A mitigation approach is to use a secure recursive DNS resolver. Clients make recursive queries through the DNS hierarchy to resolve unknown domains. On most networks, a client workstation will use a stub resolver to route requests via a forwarder, which might then use an ISP’s recursive resolver or a third-party secure DNS service, which can monitor closely for use of DGA (https://www.akamai.com/blog/security/tackling-dga-based-malware-detection-in-dns-traffic). This can be combined with a feed that identifies suspect DNS hosting providers. Only DNS traffic for the authorized resolver should be allowed to pass the firewall.

** A presentation by researchers at OpenDNS (resources.sei.cmu.edu/library/asset-view.cfm?assetid=450345) provides useful background information on global DNS traffic analysis. **

** As blacklists are hard to keep up-to-date and are less likely to catch techniques such as DGA, an alternative approach is to filter out everything that can be regarded as “normal” traffic by using a top sites list, such as https://trends.netcraft.com/topsites or docs.umbrella.com/investigate-api/docs/top-million-domains. Traffic to any domains outside of the top sites list can then be prioritized for investigation. Malware attacks may make use of legitimate domains however, so this method cannot be relied upon exclusively. **

UNIFORM RESOURCE LOCATOR (URL) ANALYSIS

As well as pointing to the host or service location on the Internet (by domain name or IP address), a URL can encode some action or data to submit to the server host. This is a common vector for malicious activity. URL analysis is performed to identify whether a link is already flagged on an existing reputation list, and if not, to identify what malicious script or activity might be coded within it. There are various tools you can use to identify malicious behaviors by processing the URL within a sandbox. Some of the features of these tools include:

Resolving percent encoding.
Assessing what sort of redirection the URL might perform.
Showing source code for any scripts called by the URL without executing them.

*Uniform Resource Locator (URL) Analysis*

HTTP Methods

As part of URL analysis, it is important to understand how HTTP operates. An HTTP session starts with a client (a user-agent, such as a web browser) making a request to an HTTP server. The connection establishes a TCP connection. This TCP connection can be used for multiple requests, or a client can start new TCP connections for different requests. A request typically comprises a method, a resource (such as a URL path), version number, headers, and body. The principal method is GET, used to retrieve a resource. Other methods include:

POST—Send data to the server for processing by the requested resource.
PUT—Create or replace the resource. DELETE can be used to remove the resource.
HEAD—Retrieve the headers for a resource only (not the body).

Data can be submitted to a server either by using a POST or PUT method and the HTTP headers and body, or by encoding the data within the URL used to access the resource. Data submitted via a URL is delimited by the ? character, which follows the resource path. Query parameters are usually formatted as one or more name=value pairs, with ampersands delimiting each pair. A URL can also include a fragment or anchor ID, delimited by #. The fragment is not processed by the web server. An anchor ID is intended to refer to a section of a page but can be misused to inject JavaScript.

HTTP Response Codes

The server response comprises the version number and a status code and message, plus optional headers, and message body. An HTTP response code is the header value returned by a server when a client requests a URL. Statistical analysis of response codes can be used to detect abnormal traffic patterns. Response codes are categorized as follows:

200—This indicates a successful GET or POST request (OK).
201—This indicates where a PUT request has succeeded in creating a resource.
3xx—Codes in this range indicate a redirect, where the server tells the client to use a different path to access the resource.
4xx—Codes in this range indicate an error in the client request, such as requesting a non-existent resource (404), not supplying authentication credentials (401), or requesting a resource without sufficient permissions (403). Code 400 indicates a request that the server could not parse.
5xx—These codes indicate a server-side issue, such as a general error (500) or overloading causing service unavailability (503). If the server is acting as a proxy, messages such as 502 (bad gateway) and 504 (gateway timeout) indicate an issue with the upstream server.

Percent Encoding

A URL can contain only unreserved and reserved characters from the ASCII set. Unreserved characters are:

a-z A-Z 0-9 – . _ ~

Reserved ASCII characters are used as delimiters within the URL syntax and should only be used unencoded for those purposes. The reserved characters are:

: / ? # [ ] @ ! $ & ‘ ( ) * + , ; =

There are also unsafe characters, which cannot be used in a URL. Control characters, such as null string termination, carriage return, line feed, end of file, and tab, are unsafe. Other unsafe characters are space and the following:

\ < > { }

Percent encoding allows a user-agent to submit any safe or unsafe character (or binary data) to the server within the URL. Its legitimate uses are to encode reserved characters within the URL when they are not part of the URL syntax and to submit Unicode characters. Percent encoding can be misused to obfuscate the nature of a URL (encoding unreserved characters) and submit malicious input as a script or binary or to perform directory traversal. Percent encoding can exploit weaknesses in the way the server application performs decoding. Consequently, URLs that make unexpected or extensive use of percent encoding should be treated carefully. You can use a resource such as W3 Schools (w3schools.com/tags/ref_urlencode.asp) for a complete list of character codes, but it is helpful to know some of the characters most widely used in exploits.

Character	Percent Encoding
null	%00
space	%20
+	%2B
%	%25
/	%2F
\	%5C
.	%2E
?	%3F
“	%22
‘	%27
<	%3C
>	%3E

An adversary may use double or triple encoding to subvert faulty input handling. Double encoding means that the percent sign is itself encoded.

** Analyze the URL used during Web Application attacks. **

Front of Flashcard 1 of 3

What is the effect of running ‘tcpdump -i eth0 -w server.pcap’?

Back of Flashcard 1 of 3

Write the output of the packet capture running on network interface eth0 to the ‘server.pcap’ file.

Front of Flashcard 2 of 3

You need to log Internet endpoints and bandwidth consumption between clients and servers on a local network, but do not have the resources to capture and store all network packets. What technology or technologies could you use instead?

Back of Flashcard 2 of 3

You could use a NetFlow/Argus collector or simple network protocol (SNMP) collector. Another option is a sniffer such as Zeek/Bro that records traffic statistics and content selectively.

Front of Flashcard 3 of 3

You are analyzing DNS logs for malicious traffic and come across two types of anomalous entry. The first type is for a single label with apparently random characters, in the form: vbhyofcyae, wcfmozjycv, rtbsaubliq.

The other type is of the following form, but with different TLDs: nahekhrdizaiupfm.info, tlaawnpkfcqorxuo.cn, uwguhvpzqlzcmiug.org.

Which is more likely to be an indicator for DGA?

Back of Flashcard 3 of 3

The second type is more likely to be a domain generation algorithm. A query for a single label with no top level domain (TLD) will not resolve over the Internet and so the first type cannot be used for C&C. The first type is typical of a local client testing DNS. The Chrome browser performs this test to see how the local ISP handles NXDOMAIN errors, for instance.