commit 3c464c73693b6d118c129d7f07dbc1ea26aa6d8a Author: Greg Kroah-Hartman Date: Wed Apr 29 10:30:26 2015 +0200 Linux 3.19.6 commit b23d104f5577300cadd6351469cdaa3fb98b2a74 Author: Jann Horn Date: Sun Apr 19 02:48:39 2015 +0200 fs: take i_mutex during prepare_binprm for set[ug]id executables commit 8b01fc86b9f425899f8a3a8fc1c47d73c2c20543 upstream. This prevents a race between chown() and execve(), where chowning a setuid-user binary to root would momentarily make the binary setuid root. This patch was mostly written by Linus Torvalds. Signed-off-by: Jann Horn Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 9fe9c37cec84924db46258b7c237d20038fe9c8d Author: Troy Tan Date: Tue Feb 3 11:15:17 2015 -0600 rtlwifi: rtl8192ee: Fix handling of new style descriptors commit d0311314d00298f83aa5450a1d4a92889e7cc2ea upstream. The hardware and firmware for the RTL8192EE utilize a FIFO list of descriptors. There were some problems with the initial implementation. The worst of these failed to detect that the FIFO was becoming full, which led to the device needing to be power cycled. As this condition is not relevant to most of the devices supported by rtlwifi, a callback routine was added to detect this situation. This patch implements the necessary changes in the pci handler, and the linkage into the appropriate rtl8192ee routine. Signed-off-by: Troy Tan Signed-off-by: Larry Finger Cc: Stable [V3.18] Signed-off-by: Kalle Valo Signed-off-by: Greg Kroah-Hartman commit 014275c88433fdab9e9acf2087250b421bcd5210 Author: Naoya Horiguchi Date: Wed Feb 11 15:25:22 2015 -0800 mm/hugetlb: take page table lock in follow_huge_pmd() commit e66f17ff71772b209eed39de35aaa99ba819c93d upstream. We have a race condition between move_pages() and freeing hugepages, where move_pages() calls follow_page(FOLL_GET) for hugepages internally and tries to get its refcount without preventing concurrent freeing. This race crashes the kernel, so this patch fixes it by moving FOLL_GET code for hugepages into follow_huge_pmd() with taking the page table lock. This patch intentionally removes page==NULL check after pte_page. This is justified because pte_page() never returns NULL for any architectures or configurations. This patch changes the behavior of follow_huge_pmd() for tail pages and then tail pages can be pinned/returned. So the caller must be changed to properly handle the returned tail pages. We could have a choice to add the similar locking to follow_huge_(addr|pud) for consistency, but it's not necessary because currently these functions don't support FOLL_GET flag, so let's leave it for future development. Here is the reproducer: $ cat movepages.c #include #include #include #define ADDR_INPUT 0x700000000000UL #define HPS 0x200000 #define PS 0x1000 int main(int argc, char *argv[]) { int i; int nr_hp = strtol(argv[1], NULL, 0); int nr_p = nr_hp * HPS / PS; int ret; void **addrs; int *status; int *nodes; pid_t pid; pid = strtol(argv[2], NULL, 0); addrs = malloc(sizeof(char *) * nr_p + 1); status = malloc(sizeof(char *) * nr_p + 1); nodes = malloc(sizeof(char *) * nr_p + 1); while (1) { for (i = 0; i < nr_p; i++) { addrs[i] = (void *)ADDR_INPUT + i * PS; nodes[i] = 1; status[i] = 0; } ret = numa_move_pages(pid, nr_p, addrs, nodes, status, MPOL_MF_MOVE_ALL); if (ret == -1) err("move_pages"); for (i = 0; i < nr_p; i++) { addrs[i] = (void *)ADDR_INPUT + i * PS; nodes[i] = 0; status[i] = 0; } ret = numa_move_pages(pid, nr_p, addrs, nodes, status, MPOL_MF_MOVE_ALL); if (ret == -1) err("move_pages"); } return 0; } $ cat hugepage.c #include #include #include #define ADDR_INPUT 0x700000000000UL #define HPS 0x200000 int main(int argc, char *argv[]) { int nr_hp = strtol(argv[1], NULL, 0); char *p; while (1) { p = mmap((void *)ADDR_INPUT, nr_hp * HPS, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0); if (p != (void *)ADDR_INPUT) { perror("mmap"); break; } memset(p, 0, nr_hp * HPS); munmap(p, nr_hp * HPS); } } $ sysctl vm.nr_hugepages=40 $ ./hugepage 10 & $ ./movepages 10 $(pgrep -f hugepage) [n-horiguchi@ah.jp.nec.com: resolve conflict to apply to v3.19.1] Fixes: e632a938d914 ("mm: migrate: add hugepage migration code to move_pages()") Signed-off-by: Naoya Horiguchi Reported-by: Hugh Dickins Cc: James Hogan Cc: David Rientjes Cc: Mel Gorman Cc: Johannes Weiner Cc: Michal Hocko Cc: Rik van Riel Cc: Andrea Arcangeli Cc: Luiz Capitulino Cc: Nishanth Aravamudan Cc: Lee Schermerhorn Cc: Steve Capper Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit a15d51461445280f1a3cabcd4962b99c9ee6c32c Author: Naoya Horiguchi Date: Wed Feb 11 15:25:15 2015 -0800 mm/hugetlb: reduce arch dependent code around follow_huge_* commit 61f77eda9bbf0d2e922197ed2dcf88638a639ce5 upstream. Currently we have many duplicates in definitions around follow_huge_addr(), follow_huge_pmd(), and follow_huge_pud(), so this patch tries to remove the m. The basic idea is to put the default implementation for these functions in mm/hugetlb.c as weak symbols (regardless of CONFIG_ARCH_WANT_GENERAL_HUGETL B), and to implement arch-specific code only when the arch needs it. For follow_huge_addr(), only powerpc and ia64 have their own implementation, and in all other architectures this function just returns ERR_PTR(-EINVAL). So this patch sets returning ERR_PTR(-EINVAL) as default. As for follow_huge_(pmd|pud)(), if (pmd|pud)_huge() is implemented to always return 0 in your architecture (like in ia64 or sparc,) it's never called (the callsite is optimized away) no matter how implemented it is. So in such architectures, we don't need arch-specific implementation. In some architecture (like mips, s390 and tile,) their current arch-specific follow_huge_(pmd|pud)() are effectively identical with the common code, so this patch lets these architecture use the common code. One exception is metag, where pmd_huge() could return non-zero but it expects follow_huge_pmd() to always return NULL. This means that we need arch-specific implementation which returns NULL. This behavior looks strange to me (because non-zero pmd_huge() implies that the architecture supports PMD-based hugepage, so follow_huge_pmd() can/should return some relevant value,) but that's beyond this cleanup patch, so let's keep it. Justification of non-trivial changes: - in s390, follow_huge_pmd() checks !MACHINE_HAS_HPAGE at first, and this patch removes the check. This is OK because we can assume MACHINE_HAS_HPAGE is true when follow_huge_pmd() can be called (note that pmd_huge() has the same check and always returns 0 for !MACHINE_HAS_HPAGE.) - in s390 and mips, we use HPAGE_MASK instead of PMD_MASK as done in common code. This patch forces these archs use PMD_MASK, but it's OK because they are identical in both archs. In s390, both of HPAGE_SHIFT and PMD_SHIFT are 20. In mips, HPAGE_SHIFT is defined as (PAGE_SHIFT + PAGE_SHIFT - 3) and PMD_SHIFT is define as (PAGE_SHIFT + PAGE_SHIFT + PTE_ORDER - 3), but PTE_ORDER is always 0, so these are identical. [n-horiguchi@ah.jp.nec.com: resolve conflict to apply to v3.19.1] Signed-off-by: Naoya Horiguchi Acked-by: Hugh Dickins Cc: James Hogan Cc: David Rientjes Cc: Mel Gorman Cc: Johannes Weiner Cc: Michal Hocko Cc: Rik van Riel Cc: Andrea Arcangeli Cc: Luiz Capitulino Cc: Nishanth Aravamudan Cc: Lee Schermerhorn Cc: Steve Capper Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit b52d86967f6a2cab714dc8424cf7a0c1c559a958 Author: Ian Abbott Date: Fri Feb 27 16:04:42 2015 +0000 staging: comedi: adv_pci1710: fix AI INSN_READ for non-zero channel commit abe46b8932dd9a6dfc3698e3eb121809b7b9ed28 upstream. Reading of analog input channels by the `INSN_READ` comedi instruction is broken for all except channel 0. `pci171x_ai_insn_read()` calls `pci171x_ai_read_sample()` with the wrong value for the third parameter. It is supposed to be the current index in a channel list (which is always of length 1 in this case, so the index should be 0), but instead it is passing the actual channel number. `pci171x_ai_read_sample()` checks the channel number encoded in the raw sample value read from the hardware matches the channel number stored in the specified index of the previously set up channel list and returns `-ENODATA` if it doesn't match. Since the index should always be 0 in this case, the match will fail unless the channel number is also 0. Fix it by passing 0 as the channel index. Note that when the bug first appeared, it was `pci171x_ai_dropout()` that was called with the wrong parameter value. `pci171x_ai_dropout()` got replaced with `pci171x_ai_read_sample()` in commit 7fd2dae2500d ("staging: comedi: adv_pci1710: introduce pci171x_ai_read_sample()"). Fixes: 16c7eb6047bb ("staging: comedi: adv_pci1710: always enable PCI171x_PARANOIDCHECK code") Signed-off-by: Ian Abbott Signed-off-by: Greg Kroah-Hartman commit 1b4137224d2c2a5201606d8f1acabd3d28668499 Author: Radim Krčmář Date: Tue Mar 17 14:02:32 2015 +0100 KVM: nVMX: mask unrestricted_guest if disabled on L0 commit 0790ec172de1bd2e23f1dbd4925426b6cc3c1b72 upstream. If EPT was enabled, unrestricted_guest was allowed in L1 regardless of L0. L1 triple faulted when running L2 guest that required emulation. Another side effect was 'WARN_ON_ONCE(vmx->nested.nested_run_pending)' in L0's dmesg: WARNING: CPU: 0 PID: 0 at arch/x86/kvm/vmx.c:9190 nested_vmx_vmexit+0x96e/0xb00 [kvm_intel] () Prevent this scenario by masking SECONDARY_EXEC_UNRESTRICTED_GUEST when the host doesn't have it enabled. Fixes: 78051e3b7e35 ("KVM: nVMX: Disable unrestricted mode if ept=0") Cc: stable@vger.kernel.org Tested-By: Kashyap Chamarthy Signed-off-by: Radim Krčmář Signed-off-by: Marcelo Tosatti Signed-off-by: Greg Kroah-Hartman commit edc7cc6c53c6aecbf493eabd7b8cf37f3eef6ea7 Author: Jun'ichi Nomura \\\\(NEC\\\\) Date: Thu Feb 12 01:26:24 2015 +0000 tg3: Hold tp->lock before calling tg3_halt() from tg3_init_one() [ Upstream commit d0af71a3573f1217b140c60b66f1a9b335fb058b ] tg3_init_one() calls tg3_halt() without tp->lock despite its assumption and causes deadlock. If lockdep is enabled, a warning like this shows up before the stall: [ BUG: bad unlock balance detected! ] 3.19.0test #3 Tainted: G E ------------------------------------- insmod/369 is trying to release lock (&(&tp->lock)->rlock) at: [] tg3_chip_reset+0x14d/0x780 [tg3] but there are no more locks to release! tg3_init_one() doesn't call tg3_halt() under normal situation but during kexec kdump I hit this problem. Fixes: 932f19de ("tg3: Release tp->lock before invoking synchronize_irq()") Signed-off-by: Jun'ichi Nomura Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit f1cb2a0f6b8cdc07fa540d53a00e93bac7118640 Author: Ben Hutchings Date: Wed Mar 25 21:41:33 2015 +0100 usbnet: Fix tx_bytes statistic running backward in cdc_ncm [ Upstream commit 7a1e890e2168e33fb62d84528e996b8b4b478fea ] cdc_ncm disagrees with usbnet about how much framing overhead should be counted in the tx_bytes statistics, and tries 'fix' this by decrementing tx_bytes on the transmit path. But statistics must never be decremented except due to roll-over; this will thoroughly confuse user-space. Also, tx_bytes is only incremented by usbnet in the completion path. Fix this by requiring drivers that set FLAG_MULTI_FRAME to set a tx_bytes delta along with the tx_packets count. Fixes: beeecd42c3b4 ("net: cdc_ncm/cdc_mbim: adding NCM protocol statistics") Signed-off-by: Ben Hutchings Signed-off-by: Bjørn Mork Signed-off-by: Greg Kroah-Hartman commit 3d206780d1e55e2b4660fa57e907d4fc0705e3cd Author: Ben Hutchings Date: Thu Feb 26 19:34:37 2015 +0000 usbnet: Fix tx_packets stat for FLAG_MULTI_FRAME drivers [ Upstream commit 1e9e39f4a29857a396ac7b669d109f697f66695e ] Currently the usbnet core does not update the tx_packets statistic for drivers with FLAG_MULTI_PACKET and there is no hook in the TX completion path where they could do this. cdc_ncm and dependent drivers are bumping tx_packets stat on the transmit path while asix and sr9800 aren't updating it at all. Add a packet count in struct skb_data so these drivers can fill it in, initialise it to 1 for other drivers, and add the packet count to the tx_packets statistic on completion. Signed-off-by: Ben Hutchings Tested-by: Bjørn Mork Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 174fbb303fdd8f42486ae7efa2ce428708c28e06 Author: Jesse Gross Date: Thu Apr 9 11:19:14 2015 -0700 udptunnels: Call handle_offloads after inserting vlan tag. [ Upstream commit b736a623bd099cdf5521ca9bd03559f3bc7fa31c ] handle_offloads() calls skb_reset_inner_headers() to store the layer pointers to the encapsulated packet. However, we currently push the vlag tag (if there is one) onto the packet afterwards. This changes the MAC header for the encapsulated packet but it is not reflected in skb->inner_mac_header, which breaks GSO and drivers which attempt to use this for encapsulation offloads. Fixes: 1eaa8178 ("vxlan: Add tx-vlan offload support.") Signed-off-by: Jesse Gross Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d385d003dc72f09d83b05030d27a53c0b2475e78 Author: Herbert Xu Date: Thu Apr 16 09:03:27 2015 +0800 skbuff: Do not scrub skb mark within the same name space [ Upstream commit 213dd74aee765d4e5f3f4b9607fef0cf97faa2af ] On Wed, Apr 15, 2015 at 05:41:26PM +0200, Nicolas Dichtel wrote: > Le 15/04/2015 15:57, Herbert Xu a écrit : > >On Wed, Apr 15, 2015 at 06:22:29PM +0800, Herbert Xu wrote: > [snip] > >Subject: skbuff: Do not scrub skb mark within the same name space > > > >The commit ea23192e8e577dfc51e0f4fc5ca113af334edff9 ("tunnels: > Maybe add a Fixes tag? > Fixes: ea23192e8e57 ("tunnels: harmonize cleanup done on skb on rx path") > > >harmonize cleanup done on skb on rx path") broke anyone trying to > >use netfilter marking across IPv4 tunnels. While most of the > >fields that are cleared by skb_scrub_packet don't matter, the > >netfilter mark must be preserved. > > > >This patch rearranges skb_scurb_packet to preserve the mark field. > nit: s/scurb/scrub > > Else it's fine for me. Sure. PS I used the wrong email for James the first time around. So let me repeat the question here. Should secmark be preserved or cleared across tunnels within the same name space? In fact, do our security models even support name spaces? ---8<--- The commit ea23192e8e577dfc51e0f4fc5ca113af334edff9 ("tunnels: harmonize cleanup done on skb on rx path") broke anyone trying to use netfilter marking across IPv4 tunnels. While most of the fields that are cleared by skb_scrub_packet don't matter, the netfilter mark must be preserved. This patch rearranges skb_scrub_packet to preserve the mark field. Fixes: ea23192e8e57 ("tunnels: harmonize cleanup done on skb on rx path") Signed-off-by: Herbert Xu Acked-by: Thomas Graf Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 5fe5245de99b6842fe4b22cc98fae6c2e95eebad Author: Herbert Xu Date: Thu Apr 16 16:12:53 2015 +0800 Revert "net: Reset secmark when scrubbing packet" [ Upstream commit 4c0ee414e877b899f7fc80aafb98d9425c02797f ] This patch reverts commit b8fb4e0648a2ab3734140342002f68fb0c7d1602 because the secmark must be preserved even when a packet crosses namespace boundaries. The reason is that security labels apply to the system as a whole and is not per-namespace. Signed-off-by: Herbert Xu Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit e366600224bfc2beda6ab595443f84e3dbbae673 Author: Alexei Starovoitov Date: Tue Apr 14 15:57:13 2015 -0700 bpf: fix verifier memory corruption [ Upstream commit c3de6317d748e23b9e46ba36e10483728d00d144 ] Due to missing bounds check the DAG pass of the BPF verifier can corrupt the memory which can cause random crashes during program loading: [8.449451] BUG: unable to handle kernel paging request at ffffffffffffffff [8.451293] IP: [] kmem_cache_alloc_trace+0x8d/0x2f0 [8.452329] Oops: 0000 [#1] SMP [8.452329] Call Trace: [8.452329] [] bpf_check+0x852/0x2000 [8.452329] [] bpf_prog_load+0x1e4/0x310 [8.452329] [] ? might_fault+0x5f/0xb0 [8.452329] [] SyS_bpf+0x806/0xa30 Fixes: f1bca824dabb ("bpf: add search pruning optimization to verifier") Signed-off-by: Alexei Starovoitov Acked-by: Hannes Frederic Sowa Acked-by: Daniel Borkmann Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 63f49b7f8b7af83bddf174bac04e48baf3704206 Author: Eric Dumazet Date: Tue Apr 14 18:45:00 2015 -0700 bnx2x: Fix busy_poll vs netpoll [ Upstream commit 074975d0374333f656c48487aa046a21a9b9d7a1 ] Commit 9a2620c877454 ("bnx2x: prevent WARN during driver unload") switched the napi/busy_lock locking mechanism from spin_lock() into spin_lock_bh(), breaking inter-operability with netconsole, as netpoll disables interrupts prior to calling our napi mechanism. This switches the driver into using atomic assignments instead of the spinlock mechanisms previously employed. Based on initial patch from Yuval Mintz & Ariel Elior I basically added softirq starvation avoidance, and mixture of atomic operations, plain writes and barriers. Note this slightly reduces the overhead for this driver when no busy_poll sockets are in use. Fixes: 9a2620c877454 ("bnx2x: prevent WARN during driver unload") Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 1b6c8d50c051e5cd96701c0c8067024a1046b75b Author: Eric Dumazet Date: Thu Apr 9 13:31:56 2015 -0700 tcp: tcp_make_synack() should clear skb->tstamp [ Upstream commit b50edd7812852d989f2ef09dcfc729690f54a42d ] I noticed tcpdump was giving funky timestamps for locally generated SYNACK messages on loopback interface. 11:42:46.938990 IP 127.0.0.1.48245 > 127.0.0.2.23850: S 945476042:945476042(0) win 43690 20:28:58.502209 IP 127.0.0.2.23850 > 127.0.0.1.48245: S 3160535375:3160535375(0) ack 945476043 win 43690 This is because we need to clear skb->tstamp before entering lower stack, otherwise net_timestamp_check() does not set skb->tstamp. Fixes: 7faee5c0d514 ("tcp: remove TCP_SKB_CB(skb)->when") Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 3a0cf55b0986b3bb0b41a825811e5303e0714fb0 Author: Jack Morgenstein Date: Sun Apr 5 17:50:48 2015 +0300 net/mlx4_core: Fix error message deprecation for ConnectX-2 cards [ Upstream commit fde913e25496761a4e2a4c81230c913aba6289a2 ] Commit 1daa4303b4ca ("net/mlx4_core: Deprecate error message at ConnectX-2 cards startup to debug") did the deprecation only for port 1 of the card. Need to deprecate for port 2 as well. Fixes: 1daa4303b4ca ("net/mlx4_core: Deprecate error message at ConnectX-2 cards startup to debug") Signed-off-by: Jack Morgenstein Signed-off-by: Amir Vadai Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 3fe207e4637a2e792c46a08666aa722f77d7f8f7 Author: hannes@stressinduktion.org Date: Wed Apr 1 17:07:44 2015 +0200 ipv6: protect skb->sk accesses from recursive dereference inside the stack [ Upstream commit f60e5990d9c1424af9dbca60a23ba2a1c7c1ce90 ] We should not consult skb->sk for output decisions in xmit recursion levels > 0 in the stack. Otherwise local socket settings could influence the result of e.g. tunnel encapsulation process. ipv6 does not conform with this in three places: 1) ip6_fragment: we do consult ipv6_npinfo for frag_size 2) sk_mc_loop in ipv6 uses skb->sk and checks if we should loop the packet back to the local socket 3) ip6_skb_dst_mtu could query the settings from the user socket and force a wrong MTU Furthermore: In sk_mc_loop we could potentially land in WARN_ON(1) if we use a PF_PACKET socket ontop of an IPv6-backed vxlan device. Reuse xmit_recursion as we are currently only interested in protecting tunnel devices. Cc: Jiri Pirko Signed-off-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 84212e36d28eafcc936165417b8ac8684cfa1bd6 Author: Neal Cardwell Date: Wed Apr 1 20:26:46 2015 -0400 tcp: fix FRTO undo on cumulative ACK of SACKed range [ Upstream commit 666b805150efd62f05810ff0db08f44a2370c937 ] On processing cumulative ACKs, the FRTO code was not checking the SACKed bit, meaning that there could be a spurious FRTO undo on a cumulative ACK of a previously SACKed skb. The FRTO code should only consider a cumulative ACK to indicate that an original/unretransmitted skb is newly ACKed if the skb was not yet SACKed. The effect of the spurious FRTO undo would typically be to make the connection think that all previously-sent packets were in flight when they really weren't, leading to a stall and an RTO. Signed-off-by: Neal Cardwell Signed-off-by: Yuchung Cheng Fixes: e33099f96d99c ("tcp: implement RFC5682 F-RTO") Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 5ce5201615c1e37bda0a8fd57b19e6f6cb2b708d Author: Jonathan Davies Date: Tue Mar 31 11:05:15 2015 +0100 xen-netfront: transmit fully GSO-sized packets [ Upstream commit 0c36820e2ab7d943ab1188230fdf2149826d33c0 ] xen-netfront limits transmitted skbs to be at most 44 segments in size. However, GSO permits up to 65536 bytes, which means a maximum of 45 segments of 1448 bytes each. This slight reduction in the size of packets means a slight loss in efficiency. Since c/s 9ecd1a75d, xen-netfront sets gso_max_size to XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER, where XEN_NETIF_MAX_TX_SIZE is 65535 bytes. The calculation used by tcp_tso_autosize (and also tcp_xmit_size_goal since c/s 6c09fa09d) in determining when to split an skb into two is sk->sk_gso_max_size - 1 - MAX_TCP_HEADER. So the maximum permitted size of an skb is calculated to be (XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER) - 1 - MAX_TCP_HEADER. Intuitively, this looks like the wrong formula -- we don't need two TCP headers. Instead, there is no need to deviate from the default gso_max_size of 65536 as this already accommodates the size of the header. Currently, the largest skb transmitted by netfront is 63712 bytes (44 segments of 1448 bytes each), as observed via tcpdump. This patch makes netfront send skbs of up to 65160 bytes (45 segments of 1448 bytes each). Similarly, the maximum allowable mtu does not need to subtract MAX_TCP_HEADER as it relates to the size of the whole packet, including the header. Fixes: 9ecd1a75d977 ("xen-netfront: reduce gso_max_size to account for max TCP header") Signed-off-by: Jonathan Davies Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 69ed02241c04c6a0515720d079ff49e420d4b2f8 Author: Thomas Graf Date: Mon Mar 30 13:57:41 2015 +0200 openvswitch: Return vport module ref before destruction [ Upstream commit fa2d8ff4e3522b4e05f590575d3eb8087f3a8cdc ] Return module reference before invoking the respective vport ->destroy() function. This is needed as ovs_vport_del() is not invoked inside an RCU read side critical section so the kfree can occur immediately before returning to ovs_vport_del(). Returning the module reference before ->destroy() is safe because the module unregistration is blocked on ovs_lock which we hold while destroying the datapath. Fixes: 62b9c8d0372d ("ovs: Turn vports with dependencies into separate modules") Reported-by: Pravin Shelar Signed-off-by: Thomas Graf Acked-by: Pravin B Shelar Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 28cc484cd75f43bc758838c34e7fd48e495cbff6 Author: Anton Nayshtut Date: Sun Mar 29 14:20:25 2015 +0300 bonding: Bonding Overriding Configuration logic restored. [ Upstream commit f5e2dc5d7fe78fe4d8748d217338f4f7b6a5d7ea ] Before commit 3900f29021f0bc7fe9815aa32f1a993b7dfdd402 ("bonding: slight optimizztion for bond_slave_override()") the override logic was to send packets with non-zero queue_id through the slave with corresponding queue_id, under two conditions only - if the slave can transmit and it's up. The above mentioned commit changed this logic by introducing an additional condition - whether the bond is active (indirectly, using the slave_can_tx and later - bond_is_active_slave), that prevents the user from implementing more complex policies according to the Documentation/networking/bonding.txt. Signed-off-by: Anton Nayshtut Signed-off-by: Alexey Bogoslavsky Signed-off-by: Andy Gospodarek Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 1d069b5a314de8f2c9e598ca0794e358da84f223 Author: Alexey Kodanev Date: Fri Mar 27 12:24:22 2015 +0300 net: tcp6: fix double call of tcp_v6_fill_cb() [ Upstream commit 4ad19de8774e2a7b075b3e8ea48db85adcf33fa6 ] tcp_v6_fill_cb() will be called twice if socket's state changes from TCP_TIME_WAIT to TCP_LISTEN. That can result in control buffer data corruption because in the second tcp_v6_fill_cb() call it's not copying IP6CB(skb) anymore, but 'seq', 'end_seq', etc., so we can get weird and unpredictable results. Performance loss of up to 1200% has been observed in LTP/vxlan03 test. This can be fixed by copying inet6_skb_parm to the beginning of 'cb' only if xfrm6_policy_check() and tcp_v6_fill_cb() are going to be called again. Fixes: 2dc49d1680b53 ("tcp6: don't move IP6CB before xfrm6_policy_check()") Signed-off-by: Alexey Kodanev Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 6e24551ff94701e62c13c982eca0bc3a682817ad Author: Alex Gartrell Date: Thu Dec 25 23:22:49 2014 -0800 tun: return proper error code from tun_do_read [ Upstream commit 957f094f221f81e457133b1f4c4d95ffa49ff731 ] Instead of -1 with EAGAIN, read on a O_NONBLOCK tun fd will return 0. This fixes this by properly returning the error code from __skb_recv_datagram. Signed-off-by: Alex Gartrell Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 553ecf745589a74d4195e6019a026c15b5048c58 Author: D.S. Ljungmark Date: Wed Mar 25 09:28:15 2015 +0100 ipv6: Don't reduce hop limit for an interface [ Upstream commit 6fd99094de2b83d1d4c8457f2c83483b2828e75a ] A local route may have a lower hop_limit set than global routes do. RFC 3756, Section 4.2.7, "Parameter Spoofing" > 1. The attacker includes a Current Hop Limit of one or another small > number which the attacker knows will cause legitimate packets to > be dropped before they reach their destination. > As an example, one possible approach to mitigate this threat is to > ignore very small hop limits. The nodes could implement a > configurable minimum hop limit, and ignore attempts to set it below > said limit. Signed-off-by: D.S. Ljungmark Acked-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit b2137fcc996c7c5c0aad37287444346c894e0c25 Author: Ido Shamay Date: Tue Mar 24 15:18:38 2015 +0200 net/mlx4_en: Call register_netdevice in the proper location [ Upstream commit e5eda89d97ec256ba14e7e861387cc0468259c18 ] Netdevice registration should be performed a the end of the driver initialization flow. If we don't do that, after calling register_netdevice, device callbacks may be issued by higher layers of the stack before final configuration of the device is done. For example (VXLAN configuration race), mlx4_SET_PORT_VXLAN was issued after the register_netdev command. System network scripts may configure the interface (UP) right after the registration, which also attach unicast VXLAN steering rule, before mlx4_SET_PORT_VXLAN was called, causing the firmware to fail the rule attachment. Fixes: 837052d0ccc5 ("net/mlx4_en: Add netdev support for TCP/IP offloads of vxlan tunneling") Signed-off-by: Ido Shamay Signed-off-by: Or Gerlitz Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 5fd2b0c00bb4d7152f743046472b6eb238e156f9 Author: Simon Horman Date: Tue Mar 24 09:31:40 2015 +0900 rocker: handle non-bridge master change [ Upstream commit a6e95cc718c8916a13f1e1e9d33cacbc5db56c0f ] Master change notifications may occur other than when joining or leaving a bridge, for example when being added to or removed from a bond or Open vSwitch. Previously in those cases rocker_port_bridge_leave() was called which results in a null-pointer dereference as rocker_port->bridge_dev is NULL because there is no bridge device. This patch makes provision for doing nothing in such cases. Fixes: 6c7079450071f ("rocker: implement L2 bridge offloading") Acked-by: Jiri Pirko Acked-by: Scott Feldman Signed-off-by: Simon Horman Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit c3d7204d3d87c57294e639f27edf4454110d0426 Author: Michal Kubeček Date: Mon Mar 23 15:14:00 2015 +0100 tcp: prevent fetching dst twice in early demux code [ Upstream commit d0c294c53a771ae7e84506dfbd8c18c30f078735 ] On s390x, gcc 4.8 compiles this part of tcp_v6_early_demux() struct dst_entry *dst = sk->sk_rx_dst; if (dst) dst = dst_check(dst, inet6_sk(sk)->rx_dst_cookie); to code reading sk->sk_rx_dst twice, once for the test and once for the argument of ip6_dst_check() (dst_check() is inline). This allows ip6_dst_check() to be called with null first argument, causing a crash. Protect sk->sk_rx_dst access by READ_ONCE() both in IPv4 and IPv6 TCP early demux code. Fixes: 41063e9dd119 ("ipv4: Early TCP socket demux.") Fixes: c7109986db3c ("ipv6: Early TCP socket demux") Signed-off-by: Michal Kubecek Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman