{"id":1956,"date":"2016-01-30T00:31:03","date_gmt":"2016-01-30T08:31:03","guid":{"rendered":"https:\/\/www.privateinternetaccess.com\/blog\/?p=1956"},"modified":"2021-10-26T01:33:58","modified_gmt":"2021-10-26T08:33:58","slug":"linux-networking-stack-from-the-ground-up-part-4","status":"publish","type":"post","link":"https:\/\/www.privateinternetaccess.com\/blog\/linux-networking-stack-from-the-ground-up-part-4\/","title":{"rendered":"Linux networking stack from the ground up, part 4"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/www.privateinternetaccess.com\/blog\/linux-networking-stack-from-the-ground-up-part-1\/\">part 1<\/a> | <a href=\"https:\/\/www.privateinternetaccess.com\/blog\/linux-networking-stack-from-the-ground-up-part-2\/\">part 2<\/a> | <a href=\"https:\/\/www.privateinternetaccess.com\/blog\/linux-networking-stack-from-the-ground-up-part-3\/\">part 3<\/a> | <a href=\"https:\/\/www.privateinternetaccess.com\/blog\/linux-networking-stack-from-the-ground-up-part-4\/\">part 4<\/a> | <a href=\"https:\/\/www.privateinternetaccess.com\/blog\/linux-networking-stack-from-the-ground-up-part-4-2\/\">part 5<\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Overview<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">This post will pick up where part 3 left off beginning by describing Receive Packet Steering (RPS), what it is and how to configure it, followed by an examination of the network stack describing how packets are dealt with based on RPS settings, the packet backlog queue, the start of the IP protocol layer, and netfilter.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Receive Packet Steering<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">We saw that device drivers register NAPI poll instances. Each NAPI poller instance is executed in the context of a kernel thread called a softirq of which there are one per CPU. The kernel thread for the CPU that the hardware interrupt handler runs on is woken up \/ scheduled to run in the hardware interrupt handler.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Thus, a single CPU processes the hardware interrupt and polls from the networking layer to process the incoming data.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Some NICs support multiple queues at the hardware level. This means that incoming packets can be DMA\u2019d to separate receive rings, each receive ring having its own hardware interrupt being delivered to indicate data is available. Each of these hardware interrupts would schedule NAPI poll instances to run on each of the associated CPUs.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This allows multiple CPUs to process hardware interrupts and poll from the networking layer.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Receive Packet Steering (RPS) is a software implementation of hardware enable multi-queue NICs. It allows multiple CPUs to process incoming packets even if the NIC only supports a single receive queue in hardware.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">RPS works by generating a hash for an incoming data to determine which CPU should process the data. The data is then enqueued to the per-CPU receive network backlog to be processed. An Inter-processor interrupt is delivered to the CPU owning the backlog. This helps to kick-start backlog processing by the remote CPU if it is not currently processing packets.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><code>netif_receive_skb<\/code> will either continue sending network data up the networking stack, or hand it over to RPS for processing on a different CPU.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">configure RPS<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">For RPS to work, it must be enabled in the kernel configuration (it is on Ubuntu for Linux kernel 3.13.0), and a bit mask describing which CPUs should process packets for a given interface and rx queue.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The bit masks to modify are found in <code>\/sys\/class\/net\/DEVICE_NAME\/queues\/QUEUE\/rps_cpus<\/code>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">So, for eth0, and receive queue 0, you would modify: <code>\/sys\/class\/net\/eth0\/queues\/rx-0\/rps_cpus<\/code> with a hexadecimal number indicating which CPUs should process packets from eth0\u2019s receive queue 0.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Back to <code>netif_receive_skb<\/code>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><code>netif_receive_skb<\/code><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">As a reminder, <code>netif_receive_skb<\/code> function is called from <code>napi_skb_finish<\/code> in the softirq context from the NAPI poller registered by the device driver.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><code>netif_receive_skb<\/code> will either attempt to use RPS (as described above) or continue sending the data up the network stack.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Let\u2019s first examine the second path: sending the data up the stack if RPS is disabled.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><code>netif_receive_skb<\/code> without RPS<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><code>netif_receive_skb<\/code> calls <code>__netif_receive_skb<\/code> which does some bookkeeping prior to calling <code>__netif_receive_skb_core<\/code> to move the data along up the network stack toward the protocol levels.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><code>__netif_receive_skb_core<\/code><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">This function passes the skb up to the protocol layer in this piece of code (<a href=\"https:\/\/github.com\/torvalds\/linux\/blob\/v3.13\/net\/core\/dev.c#L3612-L3622\">net\/core\/dev.c:3628<\/a>):<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">type = skb-&gt;protocol;                                                             \nlist_for_each_entry_rcu(ptype, &amp;ptype_base[ntohs(type) &amp; PTYPE_HASH_MASK], list) {               \n  if (ptype-&gt;type == type &amp;&amp;                                                \n     (ptype-&gt;dev == null_or_dev || ptype-&gt;dev == skb-&gt;dev || ptype-&gt;dev == orig_dev)) {                                           \n    if (pt_prev)\n      ret = deliver_skb(skb, pt_prev, orig_dev);\n    pt_prev = ptype;\n  }                                                                         \n}<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">We will examine precisely how this code delivers data to the protocol layer below, but first, let\u2019s see what happens when RPS is enabled.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><code>netif_receive_skb<\/code> with RPS<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">If RPS is enabled, <code>netif_receive_skb<\/code> will compute which CPU\u2019s backlog it should queue the data. It does this by using the function <code>get_rps_cpu<\/code> (defined at <a href=\"https:\/\/github.com\/torvalds\/linux\/blob\/v3.13\/net\/core\/dev.c#L2980\">net\/core\/dev.c:2980<\/a>):<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">int cpu = get_rps_cpu(skb-&gt;dev, skb, &amp;rflow);                             \n\nif (cpu &gt;= 0) {\n  ret = enqueue_to_backlog(skb, cpu, &amp;rflow-&gt;last_qtail);           \n  rcu_read_unlock();                                                \n  return ret;                                                       \n}<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\"><code>enqueue_to_backlog<\/code><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">This function begins by getting a pointer to the remote CPU\u2019s <code>softnet_data<\/code> structure which contains a pointer to a NAPI poller.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Next, the queue length of the <code>input_pkt_queue<\/code> of the remote CPU is checked:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">qlen = skb_queue_len(&amp;sd-&gt;input_pkt_queue);\nif (qlen &lt;= netdev_max_backlog &amp;&amp; !skb_flow_limit(skb, qlen)) { if (skb_queue_len(&amp;sd-&gt;input_pkt_queue)) {<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">It is first compared to the <code>netdev_max_backlog<\/code>. If the queue length is larger than the backlog, the data is dropped and the drop is counted against the remote CPU.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">You can prevent drops by increasing the <code>netdev_max_backlog<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">sysctl -w net.core.netdev_max_backlog=3000<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">If the queue length isn\u2019t too large, the code next checks if the flow limit has been reached. By default, flow limits are disabled. In order to enable flow limits, you must specify a bitmap (similar to RPS\u2019 bitmap) in <code>\/proc\/sys\/net\/core\/flow_limit_cpu_bitmap<\/code>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Once you enable flow limits per CPU, you can also adjust the size of the flow limit hash table by modifying the sysctl <code>net.core.flow_limit_table_len<\/code>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">You can read more about flow limits in the <a href=\"https:\/\/github.com\/torvalds\/linux\/blob\/v3.13\/Documentation\/networking\/scaling.txt\">Documentation\/networking\/scaling.txt<\/a> file.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Assuming that the flow limit has not been reached, <code>enqueue_to_backlog<\/code> then checks if the backlog queue has data queued to it already.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">If so, the data is queued:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">if (skb_queue_len(&amp;sd-&gt;input_pkt_queue)) {\nenqueue:\n  __skb_queue_tail(&amp;sd-&gt;input_pkt_queue, skb);\n  input_queue_tail_incr_save(sd, qtail);\n  rps_unlock(sd);\n  local_irq_restore(flags);\n  return NET_RX_SUCCESS;\n}<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">If the queue is empty, first the NAPI poller for the backlog queue is started:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">\/* Schedule NAPI for backlog device                                       \n * We can use non atomic operation since we own the queue lock            \n *\/                                                                       \nif (!__test_and_set_bit(NAPI_STATE_SCHED, &amp;sd-&gt;backlog.state)) {          \n  if (!rps_ipi_queued(sd))                                          \n    ____napi_schedule(sd, &amp;sd-&gt;backlog);                      \n}                                                                         \ngoto enqueue;<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The <code>goto<\/code> at the bottom brings execution back up to the block of code above, queuing the data to the backlog.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">backlog queue NAPI poller<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The per-CPU backlog queue plugs into NAPI the same way a device driver does. A poll function is provided that is used to process packets from the softirq context.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This NAPI struct is provided during initialization of the networking system. From <code>net_dev_init<\/code> in <a href=\"https:\/\/github.com\/torvalds\/linux\/blob\/v3.13\/net\/core\/dev.c#L6952-L6955\">net\/core\/dev.c:6952<\/a>:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">sd-&gt;backlog.poll = process_backlog;\nsd-&gt;backlog.weight = weight_p;\nsd-&gt;backlog.gro_list = NULL;\nsd-&gt;backlog.gro_count = 0;<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The backlog NAPI structure differs from the device driver NAPI structure in that the <code>weight<\/code> parameter is adjustable. The drivers hardcode their values (most hardcode to 64, as seen in e1000e).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To adjust the backlog\u2019s NAPI poller weight, modify \/proc\/sys\/net\/core\/dev_weight.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The poll function for the backlog is called <code>process_backlog<\/code> and, similar to e1000e\u2019s function <code>e1000e_poll<\/code>, is called from the softirq context.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><code>process_backlog<\/code><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The <code>process_backlog<\/code> function (<a href=\"https:\/\/github.com\/torvalds\/linux\/blob\/v3.13\/net\/core\/dev.c#L4097\">net\/core\/dev.c:4097<\/a>) is a loop which runs until its weight (specified in `\/proc\/sys\/net\/core\/dev_weight`) has been consumed or no more data remains on the backlog.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Each piece of data on the backlog queue is removed from the backlog queue and passed on to <code>__netif_receive_skb<\/code>. As explained earlier in the no RPS case, data passed to this function eventually reaches the protocol layers after some bookkeeping.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Similarly to device driver NAPI implementations, the <code>process_backlog<\/code> code disables its poller if the total weight will not be used. The poller is restarted with the call to <code>____napi_schedule<\/code> from <code>enqueue_to_backlog<\/code> as described above.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The function returns the amount of work done, which <code>net_rx_action<\/code> (described above) will subtract from the budget (which is adjusted with the <code>net.core.netdev_budget<\/code>, as described above).<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><code>__netif_receive_skb_core<\/code> delivers data to protocol layers<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The <code>__netif_receive_skb_core<\/code> delivers data to protocol layers. It does this by obtaining the protocol field from the <code>skb<\/code> and iterating across a list of <code>deliver<\/code> functions registered for that protocol type.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This happens in this piece of code (as seen above):<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">type = skb-&gt;protocol;                                                             \nlist_for_each_entry_rcu(ptype, &amp;ptype_base[ntohs(type) &amp; PTYPE_HASH_MASK], list) {               \n  if (ptype-&gt;type == type &amp;&amp;\n       (ptype-&gt;dev == null_or_dev || ptype-&gt;dev == skb-&gt;dev || \n        ptype-&gt;dev == orig_dev)) {\n    if (pt_prev)\n      ret = deliver_skb(skb, pt_prev, orig_dev);\n    pt_prev = ptype;\n  }                                                                         \n}<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The <code>ptype_base<\/code> identifier is defined at <a href=\"https:\/\/github.com\/torvalds\/linux\/blob\/v3.13\/net\/core\/dev.c#L146\">net\/core\/dev.c:146<\/a>as a hash table of lists:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">struct list_head ptype_base[PTYPE_HASH_SIZE] __read_mostly;<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Each protocol layer adds a <code>struct packet_type<\/code> to a list at a given slot in the hash table.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The slot in the hash table is computed by <code>ptype_head<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">static inline struct list_head *ptype_head(const struct packet_type *pt)\n{\n  if (pt-&gt;type == htons(ETH_P_ALL))\n    return &amp;ptype_all;\n  else\n    return &amp;ptype_base[ntohs(pt-&gt;type) &amp; PTYPE_HASH_MASK];\n}<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The protocol layers call <code>dev_add_pack<\/code> to add themselves to the list.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">IP protocol layer<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The IP protocol layer plugs itself into the <code>ptype_base<\/code> hash table so that data will be delivered to it from the lower layers.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This happens in the function <code>inet_init<\/code> from <a href=\"https:\/\/github.com\/torvalds\/linux\/blob\/v3.13\/net\/ipv4\/af_inet.c#L1788\">net\/ipv4\/af_inet.c:1815<\/a><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">dev_add_pack(&amp;ip_packet_type);<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">This registers the IP packet type structure defined as:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">static struct packet_type ip_packet_type __read_mostly = {                                \n  .type = cpu_to_be16(ETH_P_IP),                                                    \n  .func = ip_rcv,\n};<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\"><code>__netif_receive_skb_core<\/code> calls <code>deliver_skb<\/code> (as seen in the above section). This function (<a href=\"https:\/\/github.com\/torvalds\/linux\/blob\/v3.13\/net\/core\/dev.c#L1706-L1708\">net\/core\/dev.c:1712<\/a>):<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">static inline int deliver_skb(struct sk_buff *skb,                                        \n                              struct packet_type *pt_prev,                                \n                              struct net_device *orig_dev)                                \n{                                                                                         \n  if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))\n    return -ENOMEM;\n  atomic_inc(&amp;skb-&gt;users);\n  return pt_prev-&gt;func(skb, skb-&gt;dev, pt_prev, orig_dev);                           \n}<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">In the case of the IP protocol, the <code>ip_rcv<\/code> function is called.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><code>ip_rcv<\/code><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The <code>ip_rcv<\/code> function is pretty straight-forward at a high level. There are several integrity checks to ensure the data is valid. Statistics counters that are bumped, as well.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><code>ip_rcv<\/code> ends by passing the packet to <code>ip_rcv_finish<\/code> by way of netfilter. This is done so that any iptables rules that should be matched at the ip protocol layer can take a look at the packet before it continues on (<a href=\"https:\/\/github.com\/torvalds\/linux\/blob\/v3.13\/net\/ipv4\/ip_input.c#L453\">net\/ipv4\/ip_input.c:453<\/a>):<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">return NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING, skb, dev, NULL, ip_rcv_finish);<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">netfilter<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The <code>NF_HOOK_THRESH<\/code> function is simple enough. It calls down to <code>nf_hook_thresh<\/code> and on success, calls <code>okfn<\/code> which in our case is <code>ip_rcv_finish<\/code> (<a href=\"https:\/\/github.com\/torvalds\/linux\/blob\/v3.13\/include\/linux\/netfilter.h#L157-L175\">include\/linux\/netfilter.h:175<\/a>):<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">static inline int\nNF_HOOK_THRESH(uint8_t pf, unsigned int hook, struct sk_buff *skb,                        \n               struct net_device *in, struct net_device *out,                             \n               int (*okfn)(struct sk_buff *), int thresh)                                 \n{       \n        int ret = nf_hook_thresh(pf, hook, skb, in, out, okfn, thresh);                   \n        if (ret == 1) \n                ret = okfn(skb);                                                          \n        return ret;                                                                       \n}<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The <code>nf_hook_thresh<\/code> function continues down approaching iptables. It begins by determining if there are any netfilter hooks for the netfilter protocol family and netfilter chain passed in.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In our example above, the protocol family is <code>NFPROTO_IPV4<\/code> and chain type is <code>NF_INET_PRE_ROUTING<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">\/**\n *      nf_hook_thresh - call a netfilter hook\n *      \n *      Returns 1 if the hook has allowed the packet to pass.  The function\n *      okfn must be invoked by the caller in this case.  Any other return\n *      value indicates the packet has been consumed by the hook.\n *\/\nstatic inline int nf_hook_thresh(u_int8_t pf, unsigned int hook,\n                                 struct sk_buff *skb,\n                                 struct net_device *indev,\n                                 struct net_device *outdev,\n                                 int (*okfn)(struct sk_buff *), int thresh)\n{\n  if (nf_hooks_active(pf, hook))\n    return nf_hook_slow(pf, hook, skb, indev, outdev, okfn, thresh);\n  return 1;\n}<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">This function calls the <code>nf_hooks_active<\/code> function which examines a table called <code>nf_hooks_needed<\/code> (<a href=\"https:\/\/github.com\/torvalds\/linux\/blob\/v3.13\/include\/linux\/netfilter.h#L105-L112\">include\/linux\/netfilter.h:114<\/a>):<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">static inline bool nf_hooks_active(u_int8_t pf, unsigned int hook)         \n{                                                                          \n  return !list_empty(&amp;nf_hooks[pf][hook]);                           \n}<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">And if there is a hook present, <code>nf_hook_slow<\/code> is called to go deeper into iptables.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><code>nf_hook_slow<\/code><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><code>nf_hook_slow<\/code> iterates through the list of hooks in the <code>nf_hooks<\/code> table for the protocol type and chain type by calling <code>nf_iterate<\/code> for each entry in the hook list.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><code>nf_iterate<\/code> in turn calls the hook function associated with an entry on the hook list and returns a \u201cverdict\u201d about the packet.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><code>iptables<\/code> \u2026 tables<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><code>iptables<\/code> registers hook functions for each of the packet matching tables: filter, nat, mangle, raw, and security.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In our example, we\u2019re interested in <code>NF_INET_PRE_ROUTING<\/code> chains which are found in the <code>nat<\/code> table.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Sure enough, the struct with the hook function pointer which is registered with netfilter is found in <a href=\"https:\/\/github.com\/torvalds\/linux\/blob\/v3.13\/net\/ipv4\/netfilter\/iptable_nat.c#L251-L259\">net\/ipv4\/netfilter\/iptable_nat.c:251<\/a>:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">static struct nf_hook_ops nf_nat_ipv4_ops[] __read_mostly = {\n  \/* Before packet filtering, change destination *\/\n  {\n    .hook           = nf_nat_ipv4_in,\n    .owner          = THIS_MODULE,\n    .pf             = NFPROTO_IPV4,\n    .hooknum        = NF_INET_PRE_ROUTING,\n    .priority       = NF_IP_PRI_NAT_DST,\n  },<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">which is registered in <code>iptable_nat_init<\/code> (<a href=\"https:\/\/github.com\/torvalds\/linux\/blob\/v3.13\/net\/ipv4\/netfilter\/iptable_nat.c#L316\">net\/ipv4\/netfilter\/iptable_nat.c:316<\/a>):<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">err = nf_register_hooks(nf_nat_ipv4_ops, ARRAY_SIZE(nf_nat_ipv4_ops));\nif (err &lt; 0)\n  goto err2;<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">In our example above from the IP protocol layer, packets will be passed down to the <code>nf_nat_ipv4_in<\/code> to descend further into iptables via the <code>nf_hook_slow<\/code> function described in the previous section.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><code>nf_nat_ipv4_in<\/code><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><code>nf_nat_ipv4_in<\/code> passes the packet on to <code>nf_nat_ipv4_fn<\/code> which starts by obtaining the conntrack information for the packet:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">struct nf_conn *ct;\nenum ip_conntrack_info ctinfo;\n\n\/* slightly abbreviated code sample *\/\n\nct = nf_ct_get(skb, &amp;ctinfo);<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">If the packet being examined is a packet for a new connection, the function <code>nf_nat_rule_find<\/code> is called (<a href=\"https:\/\/github.com\/torvalds\/linux\/blob\/v3.13\/net\/ipv4\/netfilter\/iptable_nat.c#L117-L135\">net\/ipv4\/netfilter\/iptable_nat.c:117<\/a>):<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">case IP_CT_NEW:\n  \/* Seen it before?  This can happen for loopback, retrans,\n   * or local packets.\n   *\/\n  if (!nf_nat_initialized(ct, maniptype)) {\n    unsigned int ret;\n\n    ret = nf_nat_rule_find(skb, ops-&gt;hooknum, in, out, ct);\n    if (ret != NF_ACCEPT)\n      return ret;<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">And, finally, <code>nf_nat_rule_find<\/code> calls <code>ipt_do_table<\/code> which enters the iptables subsystem. This is as far as we will go into the netfilter and iptables systems, as they are complex enough to warrant their own multi-page documents.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The return value from the <code>ipt_do_table<\/code> function will either:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>not be <code>NF_ACCEPT<\/code>, in which case it is returned immediately, OR<\/li><li>will be <code>NF_ACCEPT<\/code> causing <code>nf_nat_ipv4_fn<\/code> to call <code>nf_nat_packet<\/code> to do packet manipulation and return either <code>NF_ACCEPT<\/code> or <code>NF_DROP<\/code>.<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Unwinding the return value<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">In either case of the return value for <code>ipt_do_table<\/code>, the final value of <code>nf_nat_ipv4_fn<\/code> is returned backward through all the functions described above until <code>NF_HOOK_THRESH<\/code>:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li><code>nf_nat_ipv4_fn<\/code>\u2019s return value is returned back to <code>nf_nat_ipv4_in<\/code><\/li><li>which returns back to <code>nf_iterate<\/code><\/li><li>which returns back to <code>nf_hook_slow<\/code><\/li><li>which returns back to <code>nf_hook_thresh<\/code><\/li><li>which returns back to <code>NF_HOOK_THRESH<\/code><\/li><\/ol>\n\n\n\n<p class=\"wp-block-paragraph\"><code>NF_HOOK_THRESH<\/code> checks the return value and if it is <code>NF_ACCEPT<\/code> (1), it calls the function pointed to by <code>okfn<\/code>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In our example, the <code>okfn<\/code> is <code>ip_rcv_finish<\/code> which will do some processing and pass the packet up to the next protocol layer.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>part 1 | part 2 | part 3 | part 4 | part 5 Overview This post will pick up where part 3 left off beginning by describing Receive Packet Steering (RPS), what it is and how to configure it, followed by an examination of the network stack describing how packets are dealt with based &hellip; <a href=\"https:\/\/www.privateinternetaccess.com\/blog\/linux-networking-stack-from-the-ground-up-part-4\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Linux networking stack from the ground up, part 4&#8221;<\/span><\/a><\/p>\n","protected":false},"author":9,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_stopmodifiedupdate":false,"_modified_date":"","footnotes":""},"categories":[1],"tags":[],"class_list":["post-1956","post","type-post","status-publish","format-standard","hentry","category-news"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v26.9 (Yoast SEO v26.9) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Linux networking stack from the ground up, part 4<\/title>\n<meta name=\"description\" content=\"part 1 | part 2 | part 3 | part 4 | part 5 Overview This post will pick up where part 3 left off beginning by describing Receive Packet Steering (RPS),\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.privateinternetaccess.com\/blog\/linux-networking-stack-from-the-ground-up-part-4\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Linux networking stack from the ground up, part 4\" \/>\n<meta property=\"og:description\" content=\"part 1 | part 2 | part 3 | part 4 | part 5 Overview This post will pick up where part 3 left off beginning by describing Receive Packet Steering (RPS),\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.privateinternetaccess.com\/blog\/linux-networking-stack-from-the-ground-up-part-4\/\" \/>\n<meta property=\"og:site_name\" content=\"PIA\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/privateinternetaccess\/\" \/>\n<meta property=\"article:published_time\" content=\"2016-01-30T08:31:03+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-10-26T08:33:58+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.privateinternetaccess.com\/blog\/wp-content\/uploads\/2018\/07\/ogimage.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"630\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"PIA Research\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@buyvpnservice\" \/>\n<meta name=\"twitter:site\" content=\"@buyvpnservice\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"PIA Research\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"10 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.privateinternetaccess.com\/blog\/linux-networking-stack-from-the-ground-up-part-4\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.privateinternetaccess.com\/blog\/linux-networking-stack-from-the-ground-up-part-4\/\"},\"author\":{\"name\":\"PIA Research\",\"@id\":\"https:\/\/www.privateinternetaccess.com\/blog\/#\/schema\/person\/867d81a36eafaf83e91f6528aca0ba29\"},\"headline\":\"Linux networking stack from the ground up, part 4\",\"datePublished\":\"2016-01-30T08:31:03+00:00\",\"dateModified\":\"2021-10-26T08:33:58+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.privateinternetaccess.com\/blog\/linux-networking-stack-from-the-ground-up-part-4\/\"},\"wordCount\":1651,\"commentCount\":1,\"publisher\":{\"@id\":\"https:\/\/www.privateinternetaccess.com\/blog\/#organization\"},\"articleSection\":[\"General Privacy News\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.privateinternetaccess.com\/blog\/linux-networking-stack-from-the-ground-up-part-4\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.privateinternetaccess.com\/blog\/linux-networking-stack-from-the-ground-up-part-4\/\",\"url\":\"https:\/\/www.privateinternetaccess.com\/blog\/linux-networking-stack-from-the-ground-up-part-4\/\",\"name\":\"Linux networking stack from the ground up, part 4\",\"isPartOf\":{\"@id\":\"https:\/\/www.privateinternetaccess.com\/blog\/#website\"},\"datePublished\":\"2016-01-30T08:31:03+00:00\",\"dateModified\":\"2021-10-26T08:33:58+00:00\",\"description\":\"part 1 | part 2 | part 3 | part 4 | part 5 Overview This post will pick up where part 3 left off beginning by describing Receive Packet Steering (RPS),\",\"breadcrumb\":{\"@id\":\"https:\/\/www.privateinternetaccess.com\/blog\/linux-networking-stack-from-the-ground-up-part-4\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.privateinternetaccess.com\/blog\/linux-networking-stack-from-the-ground-up-part-4\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.privateinternetaccess.com\/blog\/linux-networking-stack-from-the-ground-up-part-4\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.privateinternetaccess.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Linux networking stack from the ground up, part 4\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.privateinternetaccess.com\/blog\/#website\",\"url\":\"https:\/\/www.privateinternetaccess.com\/blog\/\",\"name\":\"PIA\",\"description\":\"Online privacy news from around the world.\",\"publisher\":{\"@id\":\"https:\/\/www.privateinternetaccess.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.privateinternetaccess.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.privateinternetaccess.com\/blog\/#organization\",\"name\":\"Private Internet Access\",\"url\":\"https:\/\/www.privateinternetaccess.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.privateinternetaccess.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.privateinternetaccess.com\/blog\/wp-content\/uploads\/2018\/07\/pialogowhitekglogo.png\",\"contentUrl\":\"https:\/\/www.privateinternetaccess.com\/blog\/wp-content\/uploads\/2018\/07\/pialogowhitekglogo.png\",\"width\":1200,\"height\":1200,\"caption\":\"Private Internet Access\"},\"image\":{\"@id\":\"https:\/\/www.privateinternetaccess.com\/blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/privateinternetaccess\/\",\"https:\/\/x.com\/buyvpnservice\",\"https:\/\/www.instagram.com\/piavpn\/\",\"https:\/\/www.youtube.com\/channel\/UClyJZ47Rizb1xnwuKXDI0_w\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.privateinternetaccess.com\/blog\/#\/schema\/person\/867d81a36eafaf83e91f6528aca0ba29\",\"name\":\"PIA Research\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.privateinternetaccess.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/717a02042be28ce22f2ae82923cdaa82551205b8768e3cd4a8fce14988ed0ccd?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/717a02042be28ce22f2ae82923cdaa82551205b8768e3cd4a8fce14988ed0ccd?s=96&d=mm&r=g\",\"caption\":\"PIA Research\"},\"sameAs\":[\"https:\/\/www.privateinternetaccess.com\"],\"url\":\"https:\/\/www.privateinternetaccess.com\/blog\/author\/piaresearch\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Linux networking stack from the ground up, part 4","description":"part 1 | part 2 | part 3 | part 4 | part 5 Overview This post will pick up where part 3 left off beginning by describing Receive Packet Steering (RPS),","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.privateinternetaccess.com\/blog\/linux-networking-stack-from-the-ground-up-part-4\/","og_locale":"en_US","og_type":"article","og_title":"Linux networking stack from the ground up, part 4","og_description":"part 1 | part 2 | part 3 | part 4 | part 5 Overview This post will pick up where part 3 left off beginning by describing Receive Packet Steering (RPS),","og_url":"https:\/\/www.privateinternetaccess.com\/blog\/linux-networking-stack-from-the-ground-up-part-4\/","og_site_name":"PIA","article_publisher":"https:\/\/www.facebook.com\/privateinternetaccess\/","article_published_time":"2016-01-30T08:31:03+00:00","article_modified_time":"2021-10-26T08:33:58+00:00","og_image":[{"width":1200,"height":630,"url":"https:\/\/www.privateinternetaccess.com\/blog\/wp-content\/uploads\/2018\/07\/ogimage.png","type":"image\/png"}],"author":"PIA Research","twitter_card":"summary_large_image","twitter_creator":"@buyvpnservice","twitter_site":"@buyvpnservice","twitter_misc":{"Written by":"PIA Research","Est. reading time":"10 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.privateinternetaccess.com\/blog\/linux-networking-stack-from-the-ground-up-part-4\/#article","isPartOf":{"@id":"https:\/\/www.privateinternetaccess.com\/blog\/linux-networking-stack-from-the-ground-up-part-4\/"},"author":{"name":"PIA Research","@id":"https:\/\/www.privateinternetaccess.com\/blog\/#\/schema\/person\/867d81a36eafaf83e91f6528aca0ba29"},"headline":"Linux networking stack from the ground up, part 4","datePublished":"2016-01-30T08:31:03+00:00","dateModified":"2021-10-26T08:33:58+00:00","mainEntityOfPage":{"@id":"https:\/\/www.privateinternetaccess.com\/blog\/linux-networking-stack-from-the-ground-up-part-4\/"},"wordCount":1651,"commentCount":1,"publisher":{"@id":"https:\/\/www.privateinternetaccess.com\/blog\/#organization"},"articleSection":["General Privacy News"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.privateinternetaccess.com\/blog\/linux-networking-stack-from-the-ground-up-part-4\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.privateinternetaccess.com\/blog\/linux-networking-stack-from-the-ground-up-part-4\/","url":"https:\/\/www.privateinternetaccess.com\/blog\/linux-networking-stack-from-the-ground-up-part-4\/","name":"Linux networking stack from the ground up, part 4","isPartOf":{"@id":"https:\/\/www.privateinternetaccess.com\/blog\/#website"},"datePublished":"2016-01-30T08:31:03+00:00","dateModified":"2021-10-26T08:33:58+00:00","description":"part 1 | part 2 | part 3 | part 4 | part 5 Overview This post will pick up where part 3 left off beginning by describing Receive Packet Steering (RPS),","breadcrumb":{"@id":"https:\/\/www.privateinternetaccess.com\/blog\/linux-networking-stack-from-the-ground-up-part-4\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.privateinternetaccess.com\/blog\/linux-networking-stack-from-the-ground-up-part-4\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.privateinternetaccess.com\/blog\/linux-networking-stack-from-the-ground-up-part-4\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.privateinternetaccess.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Linux networking stack from the ground up, part 4"}]},{"@type":"WebSite","@id":"https:\/\/www.privateinternetaccess.com\/blog\/#website","url":"https:\/\/www.privateinternetaccess.com\/blog\/","name":"PIA","description":"Online privacy news from around the world.","publisher":{"@id":"https:\/\/www.privateinternetaccess.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.privateinternetaccess.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.privateinternetaccess.com\/blog\/#organization","name":"Private Internet Access","url":"https:\/\/www.privateinternetaccess.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.privateinternetaccess.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.privateinternetaccess.com\/blog\/wp-content\/uploads\/2018\/07\/pialogowhitekglogo.png","contentUrl":"https:\/\/www.privateinternetaccess.com\/blog\/wp-content\/uploads\/2018\/07\/pialogowhitekglogo.png","width":1200,"height":1200,"caption":"Private Internet Access"},"image":{"@id":"https:\/\/www.privateinternetaccess.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/privateinternetaccess\/","https:\/\/x.com\/buyvpnservice","https:\/\/www.instagram.com\/piavpn\/","https:\/\/www.youtube.com\/channel\/UClyJZ47Rizb1xnwuKXDI0_w"]},{"@type":"Person","@id":"https:\/\/www.privateinternetaccess.com\/blog\/#\/schema\/person\/867d81a36eafaf83e91f6528aca0ba29","name":"PIA Research","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.privateinternetaccess.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/717a02042be28ce22f2ae82923cdaa82551205b8768e3cd4a8fce14988ed0ccd?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/717a02042be28ce22f2ae82923cdaa82551205b8768e3cd4a8fce14988ed0ccd?s=96&d=mm&r=g","caption":"PIA Research"},"sameAs":["https:\/\/www.privateinternetaccess.com"],"url":"https:\/\/www.privateinternetaccess.com\/blog\/author\/piaresearch\/"}]}},"_links":{"self":[{"href":"https:\/\/www.privateinternetaccess.com\/blog\/wp-json\/wp\/v2\/posts\/1956","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.privateinternetaccess.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.privateinternetaccess.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.privateinternetaccess.com\/blog\/wp-json\/wp\/v2\/users\/9"}],"replies":[{"embeddable":true,"href":"https:\/\/www.privateinternetaccess.com\/blog\/wp-json\/wp\/v2\/comments?post=1956"}],"version-history":[{"count":9,"href":"https:\/\/www.privateinternetaccess.com\/blog\/wp-json\/wp\/v2\/posts\/1956\/revisions"}],"predecessor-version":[{"id":18782,"href":"https:\/\/www.privateinternetaccess.com\/blog\/wp-json\/wp\/v2\/posts\/1956\/revisions\/18782"}],"wp:attachment":[{"href":"https:\/\/www.privateinternetaccess.com\/blog\/wp-json\/wp\/v2\/media?parent=1956"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.privateinternetaccess.com\/blog\/wp-json\/wp\/v2\/categories?post=1956"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.privateinternetaccess.com\/blog\/wp-json\/wp\/v2\/tags?post=1956"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}