commit 3cae0a01d0a60dca3d3aa089c6a19d52243c75b9 Author: Benjamin Kaduk Date: Mon Dec 4 18:14:22 2017 -0600 Update NEWS for rx security fix Change-Id: I30282ac8f51a7b16dd851fdbd41464f8fdafc279 commit eae2575dc738bd69bb6a0a84f87f02f5cf2b4eb9 Author: Benjamin Kaduk Date: Mon Dec 4 17:20:57 2017 -0600 OPENAFS-SA-2017-001: rx: Sanity-check received MTU and twind values Rather than blindly trusting the values received in the (unauthenticated) ack packet trailer, apply some minmial sanity checks to received values. natMTU and regular MTU values are subject to Rx minmium/maximum packet sizes, and the transmit window cannot drop below one without risk of deadlock. The maxDgramPackets value that can also be present in the trailer already has sufficient sanity checking. Extremely low MTU values (less than 28 == RX_HEADER_SIZE) can cause us to set a negative "maximum usable data" size that gets used as an (unsigned) packet length for subsequent allocation and computation, triggering an assertion when the connection is used to transmit data. FIXES 134450 (cherry picked from commit 894555f93a2571146cb9ca07140eb98c7a424b01) Change-Id: I98e2a65d1aa291a73e8cfed9c9eaac71c6af00dc commit 352fbc811162fcdaa39cb7834475f40ba72fad11 Author: Benjamin Kaduk Date: Wed Nov 8 07:11:45 2017 -0600 Make OpenAFS 1.8.0pre3 Update the version strings for the third 1.8.0 prerelease. Change-Id: I25a4eee4de04e57ffcf9055f69ae9a3d683b8d64 Reviewed-on: https://gerrit.openafs.org/12765 Reviewed-by: Stephan Wiesand Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 1efc44f397efb647a55347cb71e7d41c050f6c3c Author: Benjamin Kaduk Date: Mon Nov 6 21:30:04 2017 -0600 Update NEWS for 1.8.0pre3 Change-Id: I38110825cbe8b5c4ca18d86e4542374ae26f6fd4 Reviewed-on: https://gerrit.openafs.org/12764 Reviewed-by: Stephan Wiesand Tested-by: BuildBot Reviewed-by: Benjamin Kaduk Reviewed-by: Michael Meffie commit e2c47cae56ba0d804af119fb158a9fe77fa3a15e Author: Benjamin Kaduk Date: Mon Nov 27 22:17:28 2017 -0600 afs: Fix bounds check in PNewCell Reported by the opensuse buildbot: CC [M] /home/buildbot/opensuse-tumbleweed-i386-builder/build/src/libafs/MODLOAD-4.13.12-1-default-MP/rx_packet.o /home/buildbot/opensuse-tumbleweed-i386-builder/build/src/afs/afs_pioctl.c: In function ‘PNewCell’: /home/buildbot/opensuse-tumbleweed-i386-builder/build/src/afs/afs_pioctl.c:3075:55: error: ‘*’ in boolean context, suggest ‘&&’ instead [-Werror=int-in-bool-context] if ((afs_pd_remaining(ain) < AFS_MAXCELLHOSTS +3) * sizeof(afs_int32)) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~ The bug was introduced in commit 718f85a8b6. Reviewed-on: https://gerrit.openafs.org/12782 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk (cherry picked from commit 4fa0ee620cfb9991ca9748b5ee116cc8e1e6c505) Change-Id: I0963403846a62dddf2d13ce3c03d772a6d869119 Reviewed-on: https://gerrit.openafs.org/12784 Reviewed-by: Michael Meffie Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 6e611c56c5b910e329c20c3a20ed2ba5755b0461 Author: Benjamin Kaduk Date: Mon Nov 27 22:07:53 2017 -0600 rx: fix call refcount leak in error case The recent event handling normalization in commit 304d758983b499dc568d6ca57b6e92df24b69de8 had event handlers switch to dropping their reference on the associated connection/call just before return. An early return case was missed in the conversion, leading to a refcount leak in an error case. Reviewed-on: https://gerrit.openafs.org/12781 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk (cherry picked from commit 66b74e78ba5fea6a8236dcd3b8b46e1dfa6a0ac7) Change-Id: I532c49b2ef6ec95dd26a99c02e12ea53348f9690 Reviewed-on: https://gerrit.openafs.org/12783 Reviewed-by: Michael Meffie Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit ad11867973dc2481ee4897540a6d9279ebf36c42 Author: Marcio Barbosa Date: Thu Nov 16 17:24:03 2017 -0500 afs: fix kernel_write / kernel_read arguments The order / content of the arguments passed to kernel_write and kernel_read are not right. As a result, the kernel will panic if one of the functions in question is called. [kaduk@mit.edu: include configure check for multiple kernel_read() variants, per linux commits bdd1d2d3d251c65b74ac4493e08db18971c09240 and e13ec939e96b13e664bb6cee361cc976a0ee621a] FIXES 134440 Reviewed-on: https://gerrit.openafs.org/12769 Tested-by: BuildBot Tested-by: Marcio Brito Barbosa Reviewed-by: Benjamin Kaduk (cherry picked from commit 3ce55426ee6912b78460465bcaa1428333ad1fbc) Change-Id: I28f04f7625a471c37f98515d5186f80082bf6a43 Reviewed-on: https://gerrit.openafs.org/12780 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 42993b3a33d53a6e16337d2ebe15539d0febdef1 Author: Michael Meffie Date: Mon Nov 6 17:37:46 2017 -0500 tests: fix out of bounds access in the rx-event test Use the NUMEVENTS symbol which defines the array size instead of an incorrect hard coded number when checking if a second event can be added to be fired at the same time. This fixes a potential out of bounds access of the event test array. Also update the comment which incorrectly mentions the incorrect number of events in the test. Reviewed-on: https://gerrit.openafs.org/12762 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot (cherry picked from commit 50a3eb7b7ee94bffaadc98429bd404164e89ec7f) Change-Id: I7a975e7498c1c7416a800c9294c97ee4de4fd57a Reviewed-on: https://gerrit.openafs.org/12779 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 6c635a66b54bcfa2920ac532905758346c89c772 Author: Benjamin Kaduk Date: Thu Nov 16 04:49:49 2017 -0600 Sprinkle rx_GetConnection() for concision Instead of inlining the body (taking the lock, incrementing the refcount, and dropping the lock), use the convenience function designed for this purpose. Reviewed-on: https://gerrit.openafs.org/12772 Reviewed-by: Mark Vitale Tested-by: BuildBot Reviewed-by: Benjamin Kaduk (cherry picked from commit 2ae84bf053fe66b73a2c77b5d71305bae2c17587) Change-Id: I60794d877a76fbb7c8ba59207e710a20641cc8f1 Reviewed-on: https://gerrit.openafs.org/12778 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 667617b8702e797e34cc957ef200a803030ee901 Author: Benjamin Kaduk Date: Thu Nov 16 04:48:02 2017 -0600 rx: fix mutex leak in error case Reported by Mark Vitale Reviewed-on: https://gerrit.openafs.org/12771 Reviewed-by: Mark Vitale Tested-by: BuildBot Reviewed-by: Benjamin Kaduk (cherry picked from commit 01bcfd3e14f6ee1faa4b8ce5a7932de37d585fd3) Change-Id: I4384d6813a5cfb053e6991eb3c157fa59ecfa11b Reviewed-on: https://gerrit.openafs.org/12777 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 4220eadae01d09b2c54e16c689b1a58e558db19c Author: Benjamin Kaduk Date: Tue Oct 31 19:49:09 2017 -0500 Add event-related mutex assertions In utility functions that access fields of type struct rxevent *, assert that the appropriate lock is held for the access in question. These assertions are only compiled in when built with -DOPR_DEBUG_LOCKS, which can be enbled by --debug-locks at configure time. Reviewed-on: https://gerrit.openafs.org/12757 Reviewed-by: Mark Vitale Tested-by: BuildBot Reviewed-by: Benjamin Kaduk (cherry picked from commit a7a3108e602c83176c5578c9f28b6312f71aba78) Change-Id: I147a2e475feffb1b75a08ac5b08614bd6d8f46a5 Reviewed-on: https://gerrit.openafs.org/12776 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk commit 8ce3b5e253d980ebab34c3928720d1097b1ba342 Author: Benjamin Kaduk Date: Sat Oct 7 22:42:38 2017 -0500 Standardize rx_event usage Go over all consumers of the rx event framework and normalize its usage according to the following principles: rxevent_Post() is used to create an event, and it returns an event handle (with a reference on the event structure) that can be used to cancel the event before its timeout fires. (There is also an additional reference on the event held by the global event tree.) In all(*) usage within the tree, that event handle is stored within either an rx_connection or an rx_call. Reads/writes to the member variable that holds the event handle require either the conn_data_lock or call lock, respectively -- that means that in most cases, callers of rxevent_Post() and rxevent_Cancel() will be holding one of those aforementioned locks. The event handlers themselves will need to modify the call/connection object according to the nature of the event, which requires holding those same locks, and also a guarantee that the call/connection is still a live object and has not been deallocated! Whether or not rxevent_Cancel() succeeds in cancelling the event before it fires, whenever passed a non-NULL event structure it will NULL out the supplied pointer and drop a reference on the event structure. This is the correct behavior, since the caller has asked to cancel the event and has no further use for the event handle or its reference on the event structure. The caller of rxevent_Cancel() must check its return value to know whether or not the event was cancelled before its handler was able to run. The interaction window between the call/connection lock and the lock protecting the red/black tree of pending events opens up a somewhat problematic race window. Because the application thread is expected to hold the call/connection lock around rxevent_Cancel() (to protect the write to the field in the call/connection structure that holds an event handle), and rxevent_Cancel() must take the lock protecting the red/black tree of events, this establishes a lock order with the call/connection lock taken before the eventTree lock. This is in conflict with the event handler thread, which must take the eventTree lock first, in order to select an event to run (and thus know what additional lock would need to be taken, by virtue of what handler function is to be run). The conflict is easy to resolve in the standard way, by having a local pointer to the event that is obtained while the event is removed from the red/black tree under the eventTree lock, and then the eventTree lock can be dropped and the event run based on the local variable referring to it. The race window occurs when the caller of rxevent_Cancel() holds the call/connection lock, and rxevent_Cancel() obtains the eventTree lock just after the event handler thread drops it in order to run the event. The event handler function begins to execute, and immediately blocks trying to obtain the call/connection lock. Now that rxevent_Cancel() has the eventTree lock it can proceed to search the tree, fail to find the indicated event in the tree, clear out the event pointer from the call/connection data structure, drop its caller's reference to the event structure, and return failure (the event was not cancelled). Only then does the caller of rxevent_Cancel() drop the call/connection lock and allow the event handler to make progress. This race is not necessarily problematic if appropriate care is taken, but in the previous code such was not the case. In particular, it is a common idiom for the firing event to call rxevent_Put() on itself, to release the handle stored in the call/connection that could have been used to cancel the event before it fired. Failing to do so would result in a memory leak of event structures; however, rxevent_Put() does not check for a NULL argument, so a segfault (NULL dereference) was observed in the test suite when the race occurred and the event handler tried to rxevent_Put() the reference that had already been released by the unsuccessful rxevent_Cancel() call. Upon inspection, many (but not all) of the uses in rx.c were susceptible to a similar race condition and crash. The test suite also papers over a related issue in that the event handler in the test suite always knows that the data structure containing the event handle will remain live, since it is a global array that is allocated for the entire scope of the test. In rx.c, events are associated with calls and connections that have a finite lifetime, so we need to take care to ensure that the call/connection pointer stored in the event remains valid for the duration of the event's lifecycle. In particular, even an attempt to take the call/connection lock to check whether the corresponding event field is NULL is fraught with risk, as it could crash if the lock (and containing call/connection) has already been destroyed! There are several potential ways to ensure the liveness of the associated call/connection while the event handler runs, most notably to take care in the call/connection destruction path to ensure that all associated events are either successfully cancelled or run to completion before tearing down the call/connection structure, and to give the pending event its own reference on the associated call/connection. Here, we opt for the latter, acknowledging that this may result in the event handler thread doing the full call/connection teardown and delay the firing of subsequent events. This is deemed acceptable, as pending events are for intentionally delayed tasks, and some extra delay is probably acceptable. (The various keepalive events and the challenge event could delay the user experience and/or security properties if significantly delayed, but I do not believe that this change admits completely unbounded delay in the event handler thread, so the practical risk seems minimal.) Accordingly, this commit attempts to ensure that: * Each event holds a formal reference on its associated call/connection. * The appropriate lock is held for all accesses to event pointers in call/connection structures. * Each event handler (after taking the appropriate lock) checks whether it raced with rxevent_Cancel() and only drops the call/connection's reference to the event if the race did not occur. * Each event handler drops its reference to the associated call/connection *after* doing any actions that might access/modify the call/connection. * The per-event reference on the associated call/connection is dropped by the thread that removes the event from the red/black tree. That is, the event handler function if the event runs, or by the caller of rxevent_Cancel() when the cancellation succeed. * No non-NULL event handles remain in a call/connection being destroyed, which would indicate a refcounting error. (*) There is an additional event used in practice, to reap old connections, but it is effectively a background task that reschedules itself periodically, with no handle to the event retained so as to be able to cancel it. As such, it is unaffected by the concerns raised here. While here, standardize on the rx_GetConnection() function for incrementing the reference count on a connection object, instead of inlining the corresponding mutex lock/unlock and variable access. In contrast to what was done on master, for the 1.8 branch we do not force-enable refcount checking. Reviewed-on: https://gerrit.openafs.org/12756 Reviewed-by: Mark Vitale Reviewed-by: Michael Meffie Tested-by: BuildBot Reviewed-by: Benjamin Kaduk (cherry picked from commit 304d758983b499dc568d6ca57b6e92df24b69de8) Change-Id: I68e6cc162a148b6ebbabe037a7bc3cccd648423c Reviewed-on: https://gerrit.openafs.org/12775 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot commit 6db2c0a111336a24199c0acf4e02635c97f4ff2b Author: Benjamin Kaduk Date: Wed Oct 4 23:03:44 2017 -0500 Adjust rx-event test to exercise cancel/fire race We currently do not properly handle the case where a thread runs rxevent_Cancel() in parallel with the event-handler thread attempting to fire that event, but the test suite only picked up on this issue in a handful of the Debian automated builds (somewhat less-resourced ones, perhaps). Modify the event scheduling algorithm in the test so as to create a larger chunk of events scheduled to fire "right away" and thereby exercise the race condition more often when we proceed to cancel a quarter of events "right away". Reviewed-on: https://gerrit.openafs.org/12755 Tested-by: BuildBot Reviewed-by: Mark Vitale Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk (cherry picked from commit bdb509fb1d8e0fdca05dffecdbcbf60a95ea502e) Change-Id: I27cebed3c2c3daff10b8d3f5f6f949e667791a72 Reviewed-on: https://gerrit.openafs.org/12774 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit 527ead6fdd8acd241db04cf4c43677248b59d164 Author: Michael Laß Date: Thu Nov 2 21:16:49 2017 +0100 gtx: link against libtinfo if termlib is seperated If ncurses is built with "./configure --with-termlib=tinfo", gtx fails to link because of an undefined reference to the LINES symbol which is then provided by libtinfo.so and not libncurses.so. If ncurses is present, additionally check whether LINES is provided by ncurses or tinfo and set $LIB_curses accordingly. This change is based on a patch provided by Bastian Beischer. FIXES 134420 Reviewed-on: https://gerrit.openafs.org/12760 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk (cherry picked from commit 311f1d28a2f626350b33ad432e674055b62511bd) Change-Id: I2f69fe51bbefeeb2a17145a88aa9c891644f2f61 Reviewed-on: https://gerrit.openafs.org/12763 Tested-by: BuildBot Reviewed-by: Michael Laß Reviewed-by: Benjamin Kaduk commit d93f80622370f50d7bce5c5b00cd062f15ee9eba Author: Damien Diederen Date: Mon Sep 18 12:18:39 2017 +0200 Linux: Use kernel_read/kernel_write when __vfs variants are unavailable We hide the uses of set_fs/get_fs behind a macro, as those functions are likely to soon become unavailable: > Christoph Hellwig suggested removing all calls outside of the core > filesystem and architecture code; Andy Lutomirski went one step > further and said they should all go. https://lwn.net/Articles/722267/ Reviewed-on: https://gerrit.openafs.org/12729 Tested-by: BuildBot Reviewed-by: Mark Vitale Reviewed-by: Benjamin Kaduk (cherry picked from commit 5ee516b3789d3545f3d78fb3aba2480308359945) Change-Id: I28a7126bf6ab048f8d949f190e557a3fa44f3f46 Reviewed-on: https://gerrit.openafs.org/12737 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Mark Vitale Reviewed-by: Benjamin Kaduk commit c42a25d28fbcc76fdcac2b5f29704f8f1b353b45 Author: Damien Diederen Date: Mon Sep 18 11:59:40 2017 +0200 Linux: Test for __vfs_write rather than __vfs_read The following commit: commit eb031849d52e61d24ba54e9d27553189ff328174 Author: Christoph Hellwig Date: Fri Sep 1 17:39:23 2017 +0200 fs: unexport __vfs_read/__vfs_write unexports both __vfs_read and __vfs_write, but keeps the former in fs.h--as it is is still being used by another part of the tree. This situation results in a false positive in our Autoconf check, which does not see the export statements, and ends up marking the corresponding API as available. That, in turn, causes some code which assumes symmetry with __vfs_write to fail to compile. Switch to testing for __vfs_write, which correctly marks the API as unavailable. Reviewed-on: https://gerrit.openafs.org/12728 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk (cherry picked from commit 929e77a886fc9853ee292ba1aa52a920c454e94b) Change-Id: I03e3c8222360a6b04b45b45a8f56b5df054f6783 Reviewed-on: https://gerrit.openafs.org/12736 Tested-by: BuildBot Reviewed-by: Mark Vitale Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk commit bc384b7d5e4818c567a64fe4a935f021d936444f Author: Benjamin Kaduk Date: Mon Oct 16 16:53:22 2017 -0500 Correct m4 conditionals in curses.m4 AS_IF does not invoke the test(1) shell builtin for us, so we must take care to consistently use it ourself. While here, sprinkle some missing double-quotes around variable expansions in AS_IF statements in this file. Submitted by Bastian Beischer. FIXES 134414 Change-Id: Iccfe311011f17de6317cf64abdc58b0812b81b8c Reviewed-on: https://gerrit.openafs.org/12738 Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk Tested-by: Benjamin Kaduk (cherry picked from commit e0c5ada214596d5adb6798682d5e280cc99f447c) Reviewed-on: https://gerrit.openafs.org/12739 commit 688b3570867cda3035ec6bcd9c7538cf651f38f6 Author: Anders Kaseorg Date: Fri Sep 1 23:37:07 2017 -0400 vol: Fix two buffers being one char too short Fixes these warnings: namei_ops.c: In function 'namei_copy_on_write': namei_ops.c:1328:31: warning: 'snprintf' output may be truncated before the last format character [-Wformat-truncation=] snprintf(path, sizeof(path), "%s-tmp", name.n_path); ^~~~~~~~ namei_ops.c:1328:2: note: 'snprintf' output between 5 and 260 bytes into a destination of size 259 snprintf(path, sizeof(path), "%s-tmp", name.n_path); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ vol_split.c: In function 'split_volume': vol_split.c:576:22: warning: 'sprintf' may write a terminating nul past the end of the destination [-Wformat-overflow=] sprintf(symlink, "#%s", V_name(newvol)); ^~~~~ vol_split.c:576:5: note: 'sprintf' output between 2 and 33 bytes into a destination of size 32 sprintf(symlink, "#%s", V_name(newvol)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Reviewed-on: https://gerrit.openafs.org/12722 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot (cherry picked from commit 0a9a6b57ce6e1c97fcc651c8cb74e66fc8422a1e) Change-Id: Ia60439aed7925b786a0213d96a7afb413579e01f Reviewed-on: https://gerrit.openafs.org/12723 Tested-by: BuildBot Reviewed-by: Michael Meffie Reviewed-by: Benjamin Kaduk