Home Home > GIT Browse
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-07-27Linux 4.12.4v4.12.4Greg Kroah-Hartman
2017-07-27sched/cputime: Don't use smp_processor_id() in preemptible contextWanpeng Li
commit 0e4097c3354e2f5a5ad8affd9dc7f7f7d00bb6b9 upstream. Recent kernels trigger this warning: BUG: using smp_processor_id() in preemptible [00000000] code: 99-trinity/181 caller is debug_smp_processor_id+0x17/0x19 CPU: 0 PID: 181 Comm: 99-trinity Not tainted 4.12.0-01059-g2a42eb9 #1 Call Trace: dump_stack+0x82/0xb8 check_preemption_disabled() debug_smp_processor_id() vtime_delta() task_cputime() thread_group_cputime() thread_group_cputime_adjusted() wait_consider_task() do_wait() SYSC_wait4() do_syscall_64() entry_SYSCALL64_slow_path() As Frederic pointed out: | Although those sched_clock_cpu() things seem to only matter when the | sched_clock() is unstable. And that stability is a condition for nohz_full | to work anyway. So probably sched_clock() alone would be enough. This patch fixes it by replacing sched_clock_cpu() with sched_clock() to avoid calling smp_processor_id() in a preemptible context. Reported-by: Xiaolong Ye <xiaolong.ye@intel.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1499586028-7402-1-git-send-email-wanpeng.li@hotmail.com [ Prettified the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27alarmtimer: don't rate limit one-shot timersGreg Hackmann
Commit ff86bf0c65f1 ("alarmtimer: Rate limit periodic intervals") sets a minimum bound on the alarm timer interval. This minimum bound shouldn't be applied if the interval is 0. Otherwise, one-shot timers will be converted into periodic ones. This patch is specific to 4.11.y and 4.12.y. Older -stable trees have a slightly different patch, and 4.13-rc2 isn't impacted due to a later refactoring. Fixes: ff86bf0c65f1 ("alarmtimer: Rate limit periodic intervals") Reported-by: Ben Fennema <fennema@google.com> Signed-off-by: Greg Hackmann <ghackmann@google.com> Cc: John Stultz <john.stultz@linaro.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27smp/hotplug: Replace BUG_ON and react usefulThomas Gleixner
commit dea1d0f5f1284e3defee4b8484d9fc230686cd42 upstream. The move of the unpark functions to the control thread moved the BUG_ON() there as well. While it made some sense in the idle thread of the upcoming CPU, it's bogus to crash the control thread on the already online CPU, especially as the function has a return value and the callsite is prepared to handle an error return. Replace it with a WARN_ON_ONCE() and return a proper error code. Fixes: 9cd4f1a4e7a8 ("smp/hotplug: Move unparking of percpu threads to the control CPU") Rightfully-ranted-at-by: Linux Torvalds <torvalds@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27smp/hotplug: Move unparking of percpu threads to the control CPUThomas Gleixner
commit 9cd4f1a4e7a858849e889a081a99adff83e08e4c upstream. Vikram reported the following backtrace: BUG: scheduling while atomic: swapper/7/0/0x00000002 CPU: 7 PID: 0 Comm: swapper/7 Not tainted 4.9.32-perf+ #680 schedule schedule_hrtimeout_range_clock schedule_hrtimeout wait_task_inactive __kthread_bind_mask __kthread_bind __kthread_unpark kthread_unpark cpuhp_online_idle cpu_startup_entry secondary_start_kernel He analyzed correctly that a parked cpu hotplug thread of an offlined CPU was still on the runqueue when the CPU came back online and tried to unpark it. This causes the thread which invoked kthread_unpark() to call wait_task_inactive() and subsequently schedule() with preemption disabled. His proposed workaround was to "make sure" that a parked thread has scheduled out when the CPU goes offline, so the situation cannot happen. But that's still wrong because the root cause is not the fact that the percpu thread is still on the runqueue and neither that preemption is disabled, which could be simply solved by enabling preemption before calling kthread_unpark(). The real issue is that the calling thread is the idle task of the upcoming CPU, which is not supposed to call anything which might sleep. The moron, who wrote that code, missed completely that kthread_unpark() might end up in schedule(). The solution is simpler than expected. The thread which controls the hotplug operation is waiting for the CPU to call complete() on the hotplug state completion. So the idle task of the upcoming CPU can set its state to CPUHP_AP_ONLINE_IDLE and invoke complete(). This in turn wakes the control task on a different CPU, which then can safely do the unpark and kick the now unparked hotplug thread of the upcoming CPU to complete the bringup to the final target state. Control CPU AP bringup_cpu(); __cpu_up() ------------> bringup_ap(); bringup_wait_for_ap() wait_for_completion(); cpuhp_online_idle(); <------------ complete(); unpark(AP->stopper); unpark(AP->hotplugthread); while(1) do_idle(); kick(AP->hotplugthread); wait_for_completion(); hotplug_thread() run_online_callbacks(); complete(); Fixes: 8df3e07e7f21 ("cpu/hotplug: Let upcoming cpu bring itself fully up") Reported-by: Vikram Mulukutla <markivx@codeaurora.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Sewior <bigeasy@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1707042218020.2131@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27drm/i915: reintroduce VLV/CHV PFI programming power domain workaroundGabriel Krisman Bertazi
commit 9c75b185274b7766fe69c2e73607c1ed780b284b upstream. There are still cases on these platforms where an attempt is made to configure the CDCLK while the power domain is off, like when coming back from a suspend. So the workaround below is still needed. This effectively reverts commit 63ff30442519 ("drm/i915: Nuke the VLV/CHV PFI programming power domain workaround"). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101517 Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170628210605.4994-1-krisman@collabora.co.uk Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> (cherry picked from commit 886015a0ad43c7fc034b23ea4614ba39162f9ddd) Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27drm/i915: Hold RPM wakelock while initializing OA buffersagar.a.kamble@intel.com
commit 04941829b0049d2446c7042ab9686dd057d809a6 upstream. OA buffer initialization involves access to HW registers to set the OA base, head and tail. Ensure device is awake while setting these. With this, all oa.ops are covered under RPM and forcewake wakelock. Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1498585181-23048-1-git-send-email-sagar.a.kamble@intel.com Fixes: d79651522e89c ("drm/i915: Enable i915 perf stream for Haswell OA unit") (cherry picked from commit 987f8c444aa2c33d98e7030d0c5f0a5325cc84ea) Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27drm/i915/fbdev: Check for existence of ifbdev->vma before operationsChris Wilson
commit 7581d5ca2bb269cfc2ce2d0cb489aac513167f6b upstream. Commit fabef825626d ("drm/i915: Drop struct_mutex around frontbuffer flushes") adds a dependency to ifbdev->vma when flushing the framebufer, but the checks are only against the existence of the ifbdev->fb and not against ifbdev->vma. This leaves a window of opportunity where we may try to operate on the fbdev prior to it being probed (thanks to asynchronous booting). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101534 Fixes: fabef825626d ("drm/i915: Drop struct_mutex around frontbuffer flushes") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170622160211.783-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> (cherry picked from commit 15727ed0d944ce1dec8b9e1082dd3df29a0fdf44) Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27tracing: Fix kmemleak in instance_rmdirChunyu Hu
commit db9108e054700c96322b0f0028546aa4e643cf0b upstream. Hit the kmemleak when executing instance_rmdir, it forgot releasing mem of tracing_cpumask. With this fix, the warn does not appear any more. unreferenced object 0xffff93a8dfaa7c18 (size 8): comm "mkdir", pid 1436, jiffies 4294763622 (age 9134.308s) hex dump (first 8 bytes): ff ff ff ff ff ff ff ff ........ backtrace: [<ffffffff88b6567a>] kmemleak_alloc+0x4a/0xa0 [<ffffffff8861ea41>] __kmalloc_node+0xf1/0x280 [<ffffffff88b505d3>] alloc_cpumask_var_node+0x23/0x30 [<ffffffff88b5060e>] alloc_cpumask_var+0xe/0x10 [<ffffffff88571ab0>] instance_mkdir+0x90/0x240 [<ffffffff886e5100>] tracefs_syscall_mkdir+0x40/0x70 [<ffffffff886565c9>] vfs_mkdir+0x109/0x1b0 [<ffffffff8865b1d0>] SyS_mkdir+0xd0/0x100 [<ffffffff88403857>] do_syscall_64+0x67/0x150 [<ffffffff88b710e7>] return_from_SYSCALL_64+0x0/0x6a [<ffffffffffffffff>] 0xffffffffffffffff Link: http://lkml.kernel.org/r/1500546969-12594-1-git-send-email-chuhu@redhat.com Fixes: ccfe9e42e451 ("tracing: Make tracing_cpumask available for all instances") Signed-off-by: Chunyu Hu <chuhu@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27PM / Domains: defer dev_pm_domain_set() until genpd->attach_dev succeeds if ↵Sudeep Holla
present commit 975e83cfb8dc16e7a2fdc58188c77c0c605876c2 upstream. If the genpd->attach_dev or genpd->power_on fails, genpd_dev_pm_attach may return -EPROBE_DEFER initially. However genpd_alloc_dev_data sets the PM domain for the device unconditionally. When subsequent attempts are made to call genpd_dev_pm_attach, it may return -EEXISTS checking dev->pm_domain without re-attempting to call attach_dev or power_on. platform_drv_probe then attempts to call drv->probe as the return value -EEXIST != -EPROBE_DEFER, which may end up in a situation where the device is accessed without it's power domain switched on. Fixes: f104e1e5ef57 (PM / Domains: Re-order initialization of generic_pm_domain_data) Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27drm/imx: parallel-display: Accept drm_of_find_panel_or_bridge failurePhilipp Zabel
commit 799ee2970485dc206c3bf347d6e6827c04d5e4f9 upstream. The parallel panel driver should continue to work without having an endpoint linking to an panel in DT for backwards compatibility. With the recent switch to drm_of_find_panel_or_bridge, an absent panel results in a failure with -ENODEV error return code. To restore the old behaviour, ignore the -ENODEV return code. Reported-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com> Fixes: ebc944613567 ("drm: convert drivers to use drm_of_find_panel_or_bridge") Tested-by: Chris Healy <cphealy@gmail.com> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27device-dax: fix sysfs duplicate warningsDan Williams
commit bbb3be170ac2891526ad07b18af7db226879a8e7 upstream. Fix warnings of the form... WARNING: CPU: 10 PID: 4983 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x62/0x80 sysfs: cannot create duplicate filename '/class/dax/dax12.0' Call Trace: dump_stack+0x63/0x86 __warn+0xcb/0xf0 warn_slowpath_fmt+0x5a/0x80 ? kernfs_path_from_node+0x4f/0x60 sysfs_warn_dup+0x62/0x80 sysfs_do_create_link_sd.isra.2+0x97/0xb0 sysfs_create_link+0x25/0x40 device_add+0x266/0x630 devm_create_dax_dev+0x2cf/0x340 [dax] dax_pmem_probe+0x1f5/0x26e [dax_pmem] nvdimm_bus_probe+0x71/0x120 ...by reusing the namespace id for the device-dax instance name. Now that we have decided that there will never by more than one device-dax instance per libnvdimm-namespace parent device [1], we can directly reuse the namepace ids. There are some possible follow-on cleanups, but those are saved for a later patch to simplify the -stable backport. [1]: https://lists.01.org/pipermail/linux-nvdimm/2016-December/008266.html Fixes: 98a29c39dc68 ("libnvdimm, namespace: allow creation of multiple pmem...") Cc: Jeff Moyer <jmoyer@redhat.com> Reported-by: Dariusz Dokupil <dariusz.dokupil@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27reiserfs: Don't clear SGID when inheriting ACLsJan Kara
commit 6883cd7f68245e43e91e5ee583b7550abf14523f upstream. When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit set, DIR1 is expected to have SGID bit set (and owning group equal to the owning group of 'DIR0'). However when 'DIR0' also has some default ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on 'DIR1' to get cleared if user is not member of the owning group. Fix the problem by moving posix_acl_update_mode() out of __reiserfs_set_acl() into reiserfs_set_acl(). That way the function will not be called when inheriting ACLs which is what we want as it prevents SGID bit clearing and the mode has been properly set by posix_acl_create() anyway. Fixes: 073931017b49d9458aa351605b43a7e34598caef CC: reiserfs-devel@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27spmi: Include OF based modalias in device ueventBjorn Andersson
commit d50daa2af2618dab6d21634e65a5fbcf4ae437d6 upstream. Include the OF-based modalias in the uevent sent when registering SPMI devices, so that user space has a chance to autoload the kernel module for the device. Tested-by: Rob Clark <robdclark@gmail.com> Reported-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27cpufreq: intel_pstate: Correct the busy calculation for KNLSrinivas Pandruvada
commit 6e34e1f23d780978da65968327cbba6d7013a73f upstream. The busy percent calculated for the Knights Landing (KNL) platform is 1024 times smaller than the correct busy value. This causes performance to get stuck at the lowest ratio. The scaling algorithm used for KNL is performance-based, but it still looks at the CPU load to set the scaled busy factor to 0 when the load is less than 1 percent. In this case, since the computed load is 1024x smaller than it should be, the scaled busy factor will always be 0, irrespective of CPU business. This needs a fix similar to the turbostat one in commit b2b34dfe4d9a (tools/power turbostat: KNL workaround for %Busy and Avg_MHz). For this reason, add one more callback to processor-specific callbacks to specify an MPERF multiplier represented by a number of bit positions to shift the value of that register to the left to copmensate for its rate difference with respect to the TSC. This shift value is used during CPU busy calculations. Fixes: ffb810563c (intel_pstate: Avoid getting stuck in high P-states when idle) Reported-and-tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27vmbus: re-enable channel taskletStephen Hemminger
commit 6463a4571ceefc43908df4b016d8d5d8b8e85357 upstream. This problem shows up in 4.11 when netvsc driver is removed and reloaded. The problem is that the channel is closed during module removal and the tasklet for processing responses is disabled. When module is reloaded the channel is reopened but the tasklet is marked as disabled. The fix is to re-enable tasklet at the end of close which gets it back to the initial state. The issue is less urgent in 4.12 since network driver now uses NAPI and not the tasklet; and other VMBUS devices are rarely unloaded/reloaded. Fixes: dad72a1d2844 ("vmbus: remove hv_event_tasklet_disable/enable") Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27acpi/nfit: Fix memory corruption/Unregister mce decoder on failurePrarit Bhargava
commit 7e700d2c59e5853c9126642976b4f5768f64c9b3 upstream. nfit_init() calls nfit_mce_register() on module load. When the module load fails the nfit mce decoder is not unregistered. The module's memory is freed leaving the decoder chain referencing junk. This will cause panics as future registrations will reference the free'd memory. Unregister the nfit mce decoder on module init failure. [v2]: register and then unregister mce handler to avoid losing mce events [v3]: also cleanup nfit workqueue Fixes: 6839a6d96f4e ("nfit: do an ARS scrub on hitting a latent media error") Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Len Brown <lenb@kernel.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: "Lee, Chun-Yi" <joeyli.kernel@gmail.com> Cc: Linda Knippers <linda.knippers@hpe.com> Cc: lszubowi@redhat.com Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Prarit Bhargava <prarit@redhat.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27kernel/fork.c: virtually mapped stacks: do not disable interruptsChristoph Lameter
commit 112166f88cf83dd11486cf1818672d42b540865b upstream. The reason to disable interrupts seems to be to avoid switching to a different processor while handling per cpu data using individual loads and stores. If we use per cpu RMV primitives we will not have to disable interrupts. Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1705171055130.5898@east.gentwo.org Signed-off-by: Christoph Lameter <cl@linux.com> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27writeback: rework wb_[dec|inc]_stat family of functionsNikolay Borisov
commit 3e8f399da490e6ac20a3cfd6aa404c9aa961a9a2 upstream. Currently the writeback statistics code uses a percpu counters to hold various statistics. Furthermore we have 2 families of functions - those which disable local irq and those which doesn't and whose names begin with double underscore. However, they both end up calling __add_wb_stats which in turn calls percpu_counter_add_batch which is already irq-safe. Exploiting this fact allows to eliminated the __wb_* functions since they don't add any further protection than we already have. Furthermore, refactor the wb_* function to call __add_wb_stat directly without the irq-disabling dance. This will likely result in better runtime of code which deals with modifying the stat counters. While at it also document why percpu_counter_add_batch is in fact preempt and irq-safe since at least 3 people got confused. Link: http://lkml.kernel.org/r/1498029937-27293-1-git-send-email-nborisov@suse.com Signed-off-by: Nikolay Borisov <nborisov@suse.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Josef Bacik <jbacik@fb.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Jeff Layton <jlayton@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27percpu_counter: Rename __percpu_counter_add to percpu_counter_add_batchNikolay Borisov
commit 104b4e5139fe384431ac11c3b8a6cf4a529edf4a upstream. Currently, percpu_counter_add is a wrapper around __percpu_counter_add which is preempt safe due to explicit calls to preempt_disable. Given how __ prefix is used in percpu related interfaces, the naming unfortunately creates the false sense that __percpu_counter_add is less safe than percpu_counter_add. In terms of context-safety, they're equivalent. The only difference is that the __ version takes a batch parameter. Make this a bit more explicit by just renaming __percpu_counter_add to percpu_counter_add_batch. This patch doesn't cause any functional changes. tj: Minor updates to patch description for clarity. Cosmetic indentation updates. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fb.com> Cc: David Sterba <dsterba@suse.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Jan Kara <jack@suse.com> Cc: Jens Axboe <axboe@fb.com> Cc: linux-mm@kvack.org Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27sched/fair: Fix load_balance() affinity redo pathJeffrey Hugo
commit 65a4433aebe36c8c6abeb69b99ef00274b971c6c upstream. If load_balance() fails to migrate any tasks because all tasks were affined, load_balance() removes the source CPU from consideration and attempts to redo and balance among the new subset of CPUs. There is a bug in this code path where the algorithm considers all active CPUs in the system (minus the source that was just masked out). This is not valid for two reasons: some active CPUs may not be in the current scheduling domain and one of the active CPUs is dst_cpu. These CPUs should not be considered, as we cannot pull load from them. Instead of failing out of load_balance(), we may end up redoing the search with no valid CPUs and incorrectly concluding the domain is balanced. Additionally, if the group_imbalance flag was just set, it may also be incorrectly unset, thus the flag will not be seen by other CPUs in future load_balance() runs as that algorithm intends. Fix the check by removing CPUs not in the current domain and the dst_cpu from considertation, thus limiting the evaluation to valid remaining CPUs from which load might be migrated. Co-authored-by: Austin Christ <austinwc@codeaurora.org> Co-authored-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Tyler Baicar <tbaicar@codeaurora.org> Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Austin Christ <austinwc@codeaurora.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Timur Tabi <timur@codeaurora.org> Link: http://lkml.kernel.org/r/1496863138-11322-2-git-send-email-jhugo@codeaurora.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27sched/cputime: Accumulate vtime on top of nsec clocksourceWanpeng Li
commit 2a42eb9594a1480b4ead9e036e06ee1290e5fa6d upstream. Currently the cputime source used by vtime is jiffies. When we cross a context boundary and jiffies have changed since the last snapshot, the pending cputime is accounted to the switching out context. This system works ok if the ticks are not aligned across CPUs. If they instead are aligned (ie: all fire at the same time) and the CPUs run in userspace, the jiffies change is only observed on tick exit and therefore the user cputime is accounted as system cputime. This is because the CPU that maintains timekeeping fires its tick at the same time as the others. It updates jiffies in the middle of the tick and the other CPUs see that update on IRQ exit: CPU 0 (timekeeper) CPU 1 ------------------- ------------- jiffies = N ... run in userspace for a jiffy tick entry tick entry (sees jiffies = N) set jiffies = N + 1 tick exit tick exit (sees jiffies = N + 1) account 1 jiffy as stime Fix this with using a nanosec clock source instead of jiffies. The cputime is then accumulated and flushed everytime the pending delta reaches a jiffy in order to mitigate the accounting overhead. [ fweisbec: changelog, rebase on struct vtime, field renames, add delta on cputime readers, keep idle vtime as-is (low overhead accounting), harmonize clock sources. ] Suggested-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: Luiz Capitulino <lcapitulino@redhat.com> Tested-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wanpeng Li <kernellwp@gmail.com> Link: http://lkml.kernel.org/r/1498756511-11714-6-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27sched/cputime: Move the vtime task fields to their own structFrederic Weisbecker
commit bac5b6b6b11560f323e71d0ebac4061cfe5f56c0 upstream. We are about to add vtime accumulation fields to the task struct. Let's avoid more bloatification and gather vtime information to their own struct. Tested-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wanpeng Li <kernellwp@gmail.com> Link: http://lkml.kernel.org/r/1498756511-11714-5-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27sched/cputime: Rename vtime fieldsFrederic Weisbecker
commit 60a9ce57e7c5ac1df3a39fb941022bbfa40c0862 upstream. The current "snapshot" based naming on vtime fields suggests we record some past event but that's a low level picture of their actual purpose which comes out blurry. The real point of these fields is to run a basic state machine that tracks down cputime entry while switching between contexts. So lets reflect that with more meaningful names. Tested-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wanpeng Li <kernellwp@gmail.com> Link: http://lkml.kernel.org/r/1498756511-11714-4-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27sched/cputime: Always set tsk->vtime_snap_whence after accounting vtimeFrederic Weisbecker
commit 9fa57cf5a5c4aed1e45879b335fe433048709327 upstream. Even though it doesn't have functional consequences, setting the task's new context state after we actually accounted the pending vtime from the old context state makes more sense from a review perspective. vtime_user_exit() is the only function that doesn't follow that rule and that can bug the reviewer for a little while until he realizes there is no reason for this special case. Tested-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wanpeng Li <kernellwp@gmail.com> Link: http://lkml.kernel.org/r/1498756511-11714-3-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27vtime, sched/cputime: Remove vtime_account_user()Frederic Weisbecker
commit 1c3eda01a79b8e9237d91c52c5a75b20983f47c6 upstream. It's an unnecessary function between vtime_user_exit() and account_user_time(). Tested-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wanpeng Li <kernellwp@gmail.com> Link: http://lkml.kernel.org/r/1498756511-11714-2-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27hfsplus: Don't clear SGID when inheriting ACLsJan Kara
commit 84969465ddc4f8aeb3b993123b571aa01c5f2683 upstream. When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit set, DIR1 is expected to have SGID bit set (and owning group equal to the owning group of 'DIR0'). However when 'DIR0' also has some default ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on 'DIR1' to get cleared if user is not member of the owning group. Fix the problem by creating __hfsplus_set_posix_acl() function that does not call posix_acl_update_mode() and use it when inheriting ACLs. That prevents SGID bit clearing and the mode has been properly set by posix_acl_create() anyway. Fixes: 073931017b49d9458aa351605b43a7e34598caef Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] arrayBart Van Assche
commit 99975cd4fda52974a767aa44fe0b1a8f74950d9d upstream. ib_map_mr_sg() can pass an SG-list to .map_mr_sg() that is larger than what fits into a single MR. .map_mr_sg() must not attempt to map more SG-list elements than what fits into a single MR. Hence make sure that mlx5_ib_sg_to_klms() does not write outside the MR klms[] array. Fixes: b005d3164713 ("mlx5: Add arbitrary sg list support") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Israel Rukshin <israelr@mellanox.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27drm/i915: Make DP-MST connector info workMaarten Lankhorst
commit 50740024bc393b608f7e391ac35e70f33938dd24 upstream. Commit 9a148a96fc3a ("drm/i915/debugfs: add dp mst info") adds support for DP-MST to intel_connector_info, but forgot to remove the early return for DP-MST. Remove it, and print out MST connectors directly. Fixes: 9a148a96fc3a ("drm/i915/debugfs: add dp mst info") Cc: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com> Cc: Libin Yang <libin.yang@intel.com> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170626083349.24389-1-maarten.lankhorst@linux.intel.com Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com> (cherry picked from commit 77d1f615c78a73a04254fa2bff07ee9fa27145d9) Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27drm/mst: Avoid processing partially received up/down message transactionsImre Deak
commit 636c4c3e762b62aa93632c645ca65879285b16e3 upstream. Currently we may process up/down message transactions containing uninitialized data. This can happen if there was an error during the reception of any message in the transaction, but we happened to receive the last message correctly with the end-of-message flag set. To avoid this abort the reception of the transaction when the first error is detected, rejecting any messages until a message with the start-of-message flag is received (which will start a new transaction). This is also what the DP 1.4 spec 2.11.8.2 calls for in this case. In addtion this also prevents receiving bogus transactions without the first message with the the start-of-message flag set. v2: - unchanged v3: - git add the part that actually skips messages after an error in drm_dp_sideband_msg_build() Cc: Dave Airlie <airlied@redhat.com> Cc: Lyude <lyude@redhat.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Lyude <lyude@redhat.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20170719134632.13366-1-imre.deak@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27drm/mst: Avoid dereferencing a NULL mstb in drm_dp_mst_handle_up_req()Imre Deak
commit 7f8b3987da54cb4d41ad2545cd4d7958b9a36bdf upstream. In case of an unknown broadcast message is sent mstb will remain unset, so check for this. Cc: Dave Airlie <airlied@redhat.com> Cc: Lyude <lyude@redhat.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Lyude <lyude@redhat.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20170719114330.26540-3-imre.deak@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27drm/mst: Fix error handling during MST sideband message receptionImre Deak
commit 448421b5e93b9177c5698f0cf6f5e72d2995eeca upstream. Handle any error due to partial reads, timeouts etc. to avoid parsing uninitialized data subsequently. Also bail out if the parsing itself fails. Cc: Dave Airlie <airlied@redhat.com> Cc: Lyude <lyude@redhat.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Lyude <lyude@redhat.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20170719114330.26540-2-imre.deak@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27RDMA/core: Initialize port_num in qp_attrIsmail, Mustafa
commit a62ab66b13a0f9bcb17b7b761f6670941ed5cd62 upstream. Initialize the port_num for iWARP in rdma_init_qp_attr. Fixes: 5ecce4c9b17b("Check port number supplied by user verbs cmds") Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27RDMA/uverbs: Fix the check for port numberIsmail, Mustafa
commit 5a7a88f1b488e4ee49eb3d5b82612d4d9ffdf2c3 upstream. The port number is only valid if IB_QP_PORT is set in the mask. So only check port number if it is valid to prevent modify_qp from failing due to an invalid port number. Fixes: 5ecce4c9b17b("Check port number supplied by user verbs cmds") Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27ceph: fix race in concurrent readdirYan, Zheng
commit 84583cfb973c4313955c6231cc9cb3772d280b15 upstream. For a large directory, program needs to issue multiple readdir syscalls to get all dentries. When there are multiple programs read the directory concurrently. Following sequence of events can happen. - program calls readdir with pos = 2. ceph sends readdir request to mds. The reply contains N1 entries. ceph adds these N1 entries to readdir cache. - program calls readdir with pos = N1+2. The readdir is satisfied by the readdir cache, N2 entries are returned. (Other program calls readdir in the middle, which fills the cache) - program calls readdir with pos = N1+N2+2. ceph sends readdir request to mds. The reply contains N3 entries and it reaches directory end. ceph adds these N3 entries to the readdir cache and marks directory complete. The second readdir call does not update fi->readdir_cache_idx. ceph add the last N3 entries to wrong places. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27staging: lustre: ko2iblnd: check copy_from_iter/copy_to_iter return codeArnd Bergmann
commit 566e1ce22e04426fa52328b2adcdf1df49acd98e upstream. We now get a helpful warning for code that calls copy_{from,to}_iter without checking the return value, introduced by commit aa28de275a24 ("iov_iter/hardening: move object size checks to inlined part"). drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c: In function 'kiblnd_send': drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:1643:2: error: ignoring return value of 'copy_from_iter', declared with attribute warn_unused_result [-Werror=unused-result] drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c: In function 'kiblnd_recv': drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:1744:3: error: ignoring return value of 'copy_to_iter', declared with attribute warn_unused_result [-Werror=unused-result] In case we get short copies here, we may get incorrect behavior. I've added failure handling for both rx and tx now, returning -EFAULT as expected. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: James Simmons <jsimmons@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27staging: sm750fb: avoid conflicting vesafbTeddy Wang
commit 740c433ec35187b45abe08bb6c45a321a791be8e upstream. If vesafb is enabled in the config then /dev/fb0 is created by vesa and this sm750 driver gets fb1, fb2. But we need to be fb0 and fb1 to effectively work with xorg. So if it has been alloted fb1, then try to remove the other fb0. In the previous send, why #ifdef is used was asked. https://lkml.org/lkml/2017/6/25/57 Answered at: https://lkml.org/lkml/2017/6/25/69 Also pasting here for reference. 'Did a quick research into "why". The patch d8801e4df91e ("x86/PCI: Set IORESOURCE_ROM_SHADOW only for the default VGA device") has started setting IORESOURCE_ROM_SHADOW in flags for a default VGA device and that is being done only for x86. And so, we will need that #ifdef to check IORESOURCE_ROM_SHADOW as that needs to be checked only for a x86 and not for other arch.' Signed-off-by: Teddy Wang <teddy.wang@siliconmotion.com> Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27staging: comedi: ni_mio_common: fix AO timer off-by-one regressionIan Abbott
commit 15d5193104a457d5151840247e3bce561c42e3e9 upstream. As reported by Éric Piel on the Comedi mailing list (see <https://groups.google.com/forum/#!topic/comedi_list/ueZiR7vTLOU/discussion>), the analog output asynchronous commands are running too fast with a period 50 ns shorter than it should be. This affects all boards with AO command support that are supported by the "ni_pcimio", "ni_atmio", and "ni_mio_cs" drivers. This is a regression bug introduced by commit 080e6795cba3 ("staging: comedi: ni_mio_common: Cleans up/clarifies ni_ao_cmd"), specifically, this line in `ni_ao_cmd_set_update()`: /* following line: N-1 per STC */ ni_stc_writel(dev, trigvar - 1, NISTC_AO_UI_LOADA_REG); The `trigvar` variable value comes from a call to `ni_ns_to_timer()` which converts a timer period in nanoseconds to a hardware divisor value. The function already reduces the divisor by 1 as required by the hardware, so the above line should not reduce it further by 1. Fix it by replacing `trigvar` by `trigvar - 1` in the above line, and remove the misleading comment. Reported-by: Éric Piel <piel@delmic.com> Fixes: 080e6795cba3 ("staging: comedi: ni_mio_common: Cleans up/clarifies ni_ao_cmd") Cc: Éric Piel <piel@delmic.com> Cc: Spencer E. Olson <olsonse@umich.edu> Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27staging: rtl8188eu: add TL-WN722N v2 supportMichael Gugino
commit 5a1d4c5dd4eb2f1f8a9b30e61762f3b3b564df70 upstream. Add support for USB Device TP-Link TL-WN722N v2. VendorID: 0x2357, ProductID: 0x010c Signed-off-by: Michael Gugino <michael.gugino.2@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27Revert "perf/core: Drop kernel samples even though :u is specified"Ingo Molnar
commit 6a8a75f3235724c5941a33e287b2f98966ad14c5 upstream. This reverts commit cc1582c231ea041fbc68861dfaf957eaf902b829. This commit introduced a regression that broke rr-project, which uses sampling events to receive a signal on overflow (but does not care about the contents of the sample). These signals are critical to the correct operation of rr. There's been some back and forth about how to fix it - but to not keep applications in limbo queue up a revert. Reported-by: Kyle Huey <me@kylehuey.com> Acked-by: Kyle Huey <me@kylehuey.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20170628105600.GC5981@leverpostej Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27perf/core: Fix scheduling regression of pinned groupsAlexander Shishkin
commit 3bda69c1c3993a2bddbae01397d12bfef6054011 upstream. Vince Weaver reported: > I was tracking down some regressions in my perf_event_test testsuite. > Some of the tests broke in the 4.11-rc1 timeframe. > > I've bisected one of them, this report is about > tests/overflow/simul_oneshot_group_overflow > This test creates an event group containing two sampling events, set > to overflow to a signal handler (which disables and then refreshes the > event). > > On a good kernel you get the following: > Event perf::instructions with period 1000000 > Event perf::instructions with period 2000000 > fd 3 overflows: 946 (perf::instructions/1000000) > fd 4 overflows: 473 (perf::instructions/2000000) > Ending counts: > Count 0: 946379875 > Count 1: 946365218 > > With the broken kernels you get: > Event perf::instructions with period 1000000 > Event perf::instructions with period 2000000 > fd 3 overflows: 938 (perf::instructions/1000000) > fd 4 overflows: 318 (perf::instructions/2000000) > Ending counts: > Count 0: 946373080 > Count 1: 653373058 The root cause of the bug is that the following commit: 487f05e18a ("perf/core: Optimize event rescheduling on active contexts") erronously assumed that event's 'pinned' setting determines whether the event belongs to a pinned group or not, but in fact, it's the group leader's pinned state that matters. This was discovered by Vince in the test case described above, where two instruction counters are grouped, the group leader is pinned, but the other event is not; in the regressed case the counters were off by 33% (the difference between events' periods), but should be the same within the error margin. Fix the problem by looking at the group leader's pinning. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Tested-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 487f05e18a ("perf/core: Optimize event rescheduling on active contexts") Link: http://lkml.kernel.org/r/87lgnmvw7h.fsf@ashishki-desk.ger.corp.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27perf annotate: Fix broken arrow at row 0 connecting jmp instruction to its ↵Jin Yao
target commit 80f62589fa52f530cffc50e78c0b5a2ae572d61e upstream. When the jump instruction is displayed at the row 0 in annotate view, the arrow is broken. An example: 16.86 │ ┌──je 82 0.01 │ movsd (%rsp),%xmm0 │ movsd 0x8(%rsp),%xmm4 │ movsd 0x8(%rsp),%xmm1 │ movsd (%rsp),%xmm3 │ divsd %xmm4,%xmm0 │ divsd %xmm3,%xmm1 │ movsd (%rsp),%xmm2 │ addsd %xmm1,%xmm0 │ addsd %xmm2,%xmm0 │ movsd %xmm0,(%rsp) │82: sub $0x1,%ebx 83.03 │ ↑ jne 38 │ add $0x10,%rsp │ xor %eax,%eax │ pop %rbx │ ← retq The patch increments the row number before checking with 0. Signed-off-by: Yao Jin <yao.jin@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: 944e1abed9e1 ("perf ui browser: Add method to draw up/down arrow line") Link: http://lkml.kernel.org/r/1496901704-30275-1-git-send-email-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27iser-target: Avoid isert_conn->cm_id dereference in isert_login_recv_doneNicholas Bellinger
commit fce50a2fa4e9c6e103915c351b6d4a98661341d6 upstream. This patch fixes a NULL pointer dereference in isert_login_recv_done() of isert_conn->cm_id due to isert_cma_handler() -> isert_connect_error() resetting isert_conn->cm_id = NULL during a failed login attempt. As per Sagi, we will always see the completion of all recv wrs posted on the qp (given that we assigned a ->done handler), this is a FLUSH error completion, we just don't get to verify that because we deref NULL before. The issue here, was the assumption that dereferencing the connection cm_id is always safe, which is not true since: commit 4a579da2586bd3b79b025947ea24ede2bbfede62 Author: Sagi Grimberg <sagig@mellanox.com> Date: Sun Mar 29 15:52:04 2015 +0300 iser-target: Fix possible deadlock in RDMA_CM connection error As I see it, we have a direct reference to the isert_device from isert_conn which is the one-liner fix that we actually need like we do in isert_rdma_read_done() and isert_rdma_write_done(). Reported-by: Andrea Righi <righi.andrea@gmail.com> Tested-by: Andrea Righi <righi.andrea@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27target: Fix COMPARE_AND_WRITE caw_sem leak during se_cmd quiesceJiang Yi
commit 1d6ef276594a781686058802996e09c8550fd767 upstream. This patch addresses a COMPARE_AND_WRITE se_device->caw_sem leak, that would be triggered during normal se_cmd shutdown or abort via __transport_wait_for_tasks(). This would occur because target_complete_cmd() would catch this early and do complete_all(&cmd->t_transport_stop_comp), but since target_complete_ok_work() or target_complete_failure_work() are never called to invoke se_cmd->transport_complete_callback(), the COMPARE_AND_WRITE specific callbacks never release caw_sem. To address this special case, go ahead and release caw_sem directly from target_complete_cmd(). (Remove '&& success' from check, to release caw_sem regardless of scsi_status - nab) Signed-off-by: Jiang Yi <jiangyilism@gmail.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27udf: Fix deadlock between writeback and udf_setsize()Jan Kara
commit f2e95355891153f66d4156bf3a142c6489cd78c6 upstream. udf_setsize() called truncate_setsize() with i_data_sem held. Thus truncate_pagecache() called from truncate_setsize() could lock a page under i_data_sem which can deadlock as page lock ranks below i_data_sem - e. g. writeback can hold page lock and try to acquire i_data_sem to map a block. Fix the problem by moving truncate_setsize() calls from under i_data_sem. It is safe for us to change i_size without holding i_data_sem as all the places that depend on i_size being stable already hold inode_lock. Fixes: 7e49b6f2480cb9a9e7322a91592e56a5c85361f5 Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27udf: Fix races with i_size changes during readpageJan Kara
commit 9795e0e8ac0d6a3ee092f1b555b284b57feef99e upstream. __udf_adinicb_readpage() uses i_size several times. When truncate changes i_size while the function is running, it can observe several different values and thus e.g. expose uninitialized parts of page to userspace. Also use i_size_read() in the function since it does not hold inode_lock. Since i_size is guaranteed to be small, this cannot really cause any issues even on 32-bit archs but let's be careful. Fixes: 9c2fc0de1a6e638fe58c354a463f544f42a90a09 Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27NFS: only invalidate dentrys that are clearly invalid.NeilBrown
commit cc89684c9a265828ce061037f1f79f4a68ccd3f7 upstream. Since commit bafc9b754f75 ("vfs: More precise tests in d_invalidate") in v3.18, a return of '0' from ->d_revalidate() will cause the dentry to be invalidated even if it has filesystems mounted on or it or on a descendant. The mounted filesystem is unmounted. This means we need to be careful not to return 0 unless the directory referred to truly is invalid. So -ESTALE or -ENOENT should invalidate the directory. Other errors such a -EPERM or -ERESTARTSYS should be returned from ->d_revalidate() so they are propagated to the caller. A particular problem can be demonstrated by: 1/ mount an NFS filesystem using NFSv3 on /mnt 2/ mount any other filesystem on /mnt/foo 3/ ls /mnt/foo 4/ turn off network, or otherwise make the server unable to respond 5/ ls /mnt/foo & 6/ cat /proc/$!/stack # note that nfs_lookup_revalidate is in the call stack 7/ kill -9 $! # this results in -ERESTARTSYS being returned 8/ observe that /mnt/foo has been unmounted. This patch changes nfs_lookup_revalidate() to only treat -ESTALE from nfs_lookup_verify_inode() and -ESTALE or -ENOENT from ->lookup() as indicating an invalid inode. Other errors are returned. Also nfs_check_inode_attributes() is changed to return -ESTALE rather than -EIO. This is consistent with the error returned in similar circumstances from nfs_update_inode(). As this bug allows any user to unmount a filesystem mounted on an NFS filesystem, this fix is suitable for stable kernels. Fixes: bafc9b754f75 ("vfs: More precise tests in d_invalidate") Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27PNFS fix EACCESS on commit to DS handlingOlga Kornievskaia
commit a0bc01e0f1fa39702b5244b3bac699bea0d4f413 upstream. Commit fabbbee0eb0f "PNFS fix fallback to MDS if got error on commit to DS" moved the pnfs_set_lo_fail() to unhandled errors which was not correct and lead to a kernel oops on umount. Instead, fix the original EACCESS on commit to DS error by getting the new layout and re-doing the IO. Fixes: fabbbee0eb0f ("PNFS fix fallback to MDS if got error on commit to DS") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27NFS: Fix initialization of nfs_page_array->npagesBenjamin Coddington
commit 2eb3aea7d9c43325a12df312adfc7fb25bbd636b upstream. Commit 8ef9b0b9e1c0 open-coded nfs_pgarray_set(), and left out the initialization of the nfs_page_array's npages. This mistake didn't show up until testing with block layouts, and there shows that all pNFS reads return -EIO. Fixes: 8ef9b0b9e1c0 ("NFS: move nfs_pgarray_set() to open code") Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27net/sunrpc/xprt_sock: fix regression in connection error reporting.NeilBrown
commit 3ffbc1d65583394be12801655781dd2b079ce169 upstream. Commit 3d4762639dd3 ("tcp: remove poll() flakes when receiving RST") in v4.12 changed the order in which ->sk_state_change() and ->sk_error_report() are called when a socket is shut down - sk_state_change() is now called first. This causes xs_tcp_state_change() -> xs_sock_mark_closed() -> xprt_disconnect_done() to wake all pending tasked with -EAGAIN. When the ->sk_error_report() callback arrives, it is too late to pass the error on, and it is lost. As easy way to demonstrate the problem caused is to try to start rpc.nfsd while rcpbind isn't running. nfsd will attempt a tcp connection to rpcbind. A ECONNREFUSED error is returned, but sunrpc code loses the error and keeps retrying. If it saw the ECONNREFUSED, it would abort. To fix this, handle the sk->sk_err in the TCP_CLOSE branch of xs_tcp_state_change(). Fixes: 3d4762639dd3 ("tcp: remove poll() flakes when receiving RST") Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>