ui-smoke phase 3: screenshots, writable config mirrors, homing settle#4136
ui-smoke phase 3: screenshots, writable config mirrors, homing settle#4136grandixximo wants to merge 19 commits into
Conversation
|
For now the screenshots of the UI are uploaded, and it is up to the reviewers to check, if wanted we could save in source a screenshot to verify against that nothing changed, but then it will be a bit of a bother when anything is supposed to actually change, and for UI with multiple pages, we may want to verify with more than one screenshot at probably more than one resolution? that could take a while, or we can just call it done here, verify UI works with uploaded screenshot is already pretty good, |
fb9f031 to
186c3a3
Compare
|
Added the known-good image comparison. On a clean run each GUI's confirm.png is compared (ImageMagick) against a committed reference.png, and both the shot and a highlighted diff.png are uploaded as CI artifacts. It never fails a test: font/freetype versions differ across distros, so we just record the diff to gather experience first. Even same-machine the 3D preview drifts a few hundred pixels, a handy sanity check. Regenerate locally on a built RIP tree with tests/ui-smoke/_lib/make-references.sh (sets UI_SMOKE_UPDATE_REFERENCE=1). The four baselines were captured on Debian trixie; other distros will differ, which is what we want to observe. Also rebased onto master. @BsAtHome This should fit the shape we discussed in the meeting and by email. |
186c3a3 to
3523187
Compare
…pass Capture a PNG of the GUI before teardown and upload it as a CI artifact. On a failure the picture shows the cause (a GUI hung on a blocking modal leaves no core for crashdump.sh and no Python traceback, only a 60s timeout). On a clean Phase 2 run it is a confirmation shot (confirm.png) of the GUI in its post-movement idle state, so a reviewer can eyeball the final DRO / toolpath. Phase 1 passes stay silent (no program ran). _lib/screenshot.sh handles two capture paths: - GTK GUIs (axis, touchy, gmoccapy) render to the X server, grabbed with ImageMagick import (falling back to xwd|convert locally). There is no window manager under xvfb-run, so a GUI's maximize() is a no-op and it renders at its natural size; the Xvfb screen is sized 1920x1080 in launch-env.sh so the whole window is captured instead of clipped. - qtdragon runs the offscreen Qt platform (xcb segfaults qtvcp on Ubuntu 24.04), which never draws to X, so an X grab is black. Instead the qtdragon shim installs a SIGUSR1 handler that grabs its top-level window to UI_SMOKE_QT_SHOT; the launcher signals it and collects the file. The grab is a no-op with a logged reason when there is no display, no grabber, or no GUI to signal, so it can never turn a pass into a fail. screenshot.png, confirm.png and ui-smoke-qt.png are gitignored; imagemagick is declared under !nocheck; ci.yml uploads ui-smoke-screenshots-* on every run.
gmoccapy writes its preference file next to the config (config dir + <MACHINE>.pref, no PREFERENCE_FILE_PATH in gmoccapy.ini). The workspace is read-only for the runtime user on CI, so that write raises PermissionError partway through __init__, before the MDIHistory widget's _hal_init runs. gmoccapy pops an error dialog and limps on half built: the interp-idle handler then hits a widget with no .stat and throws a second dialog. The smoke test still passed (NML drove behind the dialogs), so the breakage was invisible until the Phase 3 confirm shot showed it. Mirror the config to a writable tmp dir (the fix qtdragon already uses) so the pref write succeeds. With the config writable the first-run "Important change(s)" modal then appears; it runs a nested gtk loop that never gets dismissed headless, sitting over the UI and swallowing the SIGTERM in the quit test. Seed hide_startup_messsage in the mirrored pref to suppress it, the same as a user ticking "Don't show this again". gmoccapy smoke and gmoccapy-quit now come up clean.
The driver requested AUTO the instant homing completed per NML stat, but gmoccapy only enables AUTO once it has processed the all-homed signal in its own event loop (and re-asserts MANUAL itself on that signal). The early request was rejected, bounced back to MANUAL with an "It is not possible to change to Auto Mode" warning that lingered in the confirm shot; ensure_mode retried and won, so the run still passed. A short settle after homing lets the GUI catch up, so AUTO is accepted first try.
On a clean run-program run we already grab confirm.png of the GUI in its post-movement idle state. This adds a committed per-GUI known-good reference.png and, on each run, an ImageMagick comparison that writes a highlighted diff.png and logs the differing-pixel count. The comparison NEVER fails a test: freetype/font versions differ across distros, so some drift is expected (the same machine already shows a few hundred AE pixels in the 3D toolpath preview). The diff is only recorded, not used to gate the test. compare.sh is sourced like screenshot.sh (logged no-op when it cannot run). make-references.sh regenerates the references locally via UI_SMOKE_UPDATE_REFERENCE=1. confirm.png/diff.png are gitignored and uploaded as CI artifacts; reference.png is committed. References for axis, gmoccapy, touchy and qtdragon were captured on Debian trixie.
…-get The ui-smoke crash dump helper used to apt-get install gdb on demand when a core landed and gdb was missing, which is awkward inside a test run (hdiethelm, PR LinuxCNC#4054 review). Add gdb to the build-essential block in .github/scripts/install-deps.sh so CI has it before any test starts, and drop the in-test apt-get from crashdump.sh: the helper now just checks for gdb and falls back to a 'gdb unavailable' note otherwise. A local developer who wants the native backtrace installs gdb the same way they installed any other build dep.
…or -u The -u option already skips test directories whose control file declares Restrictions: sudo, but a test still running under -u had no way to know its parent was run with -u and could not condition its own root-only side effects on it (hdiethelm, PR LinuxCNC#4054 review). Export RUNTESTS_NOSUDO=1 alongside the existing internal NOSUDO=true so child shells can branch on it. The ui-smoke crash dump helper uses this to skip the global kernel.core_pattern sysctl when -u is in effect, even when the suite is launched from a root shell.
Boots touchy on a 1024x768 screen and fails if its window is larger, guarding against a GUI growing off the display (touchy has no scrolling and has done this). window-fit.sh compares the window geometry to the root window via xwininfo; launch.sh runs it when UI_SMOKE_FIT_CLASS is set and folds a failure into the screenshot path. Needs x11-utils. Requires the touchy window-bounding fix to pass (red on current master).
6f15641 to
e423dea
Compare
Two runs of the axis ui-smoke in the same container produced 21665 differing pixels at 5 % fuzz because every antialiased glyph edge shifted subpixel between runs (axis has the most text of the four GUIs). The metric is intentionally informational, but a five-figure AE buries any real drift in noise. 40 % fuzz absorbs font and icon edge jitter and the AE drops to 77 pixels on the same pair, which is the genuine difference floor (subpixel rasterisation in the GL preview). Above 40 % the AE stays at 77. A widget that moves or a label that changes is still flagged loudly.
e423dea to
bf89855
Compare
LinuxCNC#4170 floors the touchy window height at 500 px so all notebook tabs are usable on first launch. Refresh the committed confirm reference so the ui-smoke comparison matches the new patched layout instead of the old ~270 px window. The reference is the exact image CI produces (taken from the ui-smoke-screenshots artifact), so it carries CI's font stack and the comparison starts from a zero baseline; a reference rendered on a developer machine has slightly different text metrics and shifts the whole layout a few pixels, ghosting every label in the diff. Also drop the fit screen size from 1024x768 to 1024x600: 1024x600 is the 7" touchscreen panel touchy actually targets, and the new 500 px floor still leaves room (touchy opens 984x500 inside 1024x600 so the window-fit assertion passes).
Replace the confirm references with the exact images CI produces (from the ui-smoke-screenshots artifact). The previous baselines were rendered on a developer machine and differed from CI by a few hundred pixels of font-edge jitter; adopting CI's own output makes the diff start from a zero baseline so a real regression stands out instead of hiding under the per-distro rendering noise floor.
bf89855 to
c4de595
Compare
|
Fedora: Have yet to test on Gentoo, still battling networking issues. |
|
I don't really understand the diff images. They have a gray tint and reveal the actual image. I expected them to be nearly or completely black. A (biased) difference is not interesting for which direction it goes. The absolute difference would be more telling. Essentially: for each pixel and color |
|
Anything that differs is bright red, I have not looked at the latest ones yet, if they are mostly greyish means no changes |
|
Tomorrow I will do some builds, and have different screenshots so we can pick a winner, at least now it seems to be working alright, the final look it's just a matter of taste 😉 |
| # (the suite must run unattended), and skip under runtests -u so a | ||
| # user opting out of root tweaks is respected even when invoking the | ||
| # suite from a root shell. Non-root falls back to a cwd "core". | ||
| if [ "$(id -u)" = 0 ] && [ "${RUNTESTS_NOSUDO:-0}" != "1" ]; then |
There was a problem hiding this comment.
Hmm, this makes no sense, checking if we are root and then if RUNTESTS_NOSUDO not doing it.
Here, you can only be root if you use sudo runtests right? However, sudo runtests will always fail.
I would just check for "${RUNTESTS_NOSUDO:-0}" != "1" and then do sudo sysctl
Did you ever test if you get a crashdump if there is a segfault?
|
I missed one, you have to add gdb here to: Suggestion: The same for the image tests, break a gui on purpose and see what happens. Having tests pass is good but having them fail when there is an issue is even better... ;-) Otherwise, the concept looks really good! Sadly I don't have the time right now to do a proper review. |
gdb was only added to .github/scripts/install-deps.sh, which runs on the Ubuntu rip-and-test jobs. The package-arch jobs build and test in debian containers and install build deps via apt-get build-dep, so they never got gdb and the ui-smoke crash dump path had none when a GUI cored there (hdiethelm, PR LinuxCNC#4136 review). Move gdb to debian/control.top.in under <!nocheck>, next to the other test-only deps (xvfb, x11-utils, imagemagick). build-dep installs it for both the rip jobs and the package-arch jobs, so the explicit entry in install-deps.sh is now redundant and dropped.
…race Testing the crash path (segfault injected into a GUI) surfaced two things in the crash dump helper. First, it globbed /tmp/core* and picked up a stale, root-owned core from an unrelated run, so gdb printed only 'Permission denied'. Second, the non-root case (CI, and local runtests -u) never produces a core at all: we will not sudo to point kernel.core_pattern at a writable dir, so nothing lands. Restrict the core search to a core the kernel wrote into our own fresh CORE_DIR, or a relative 'core' in the cwd that postdates arming, and require it to be readable. When there is no such core, say so and point at the Python faulthandler traceback in linuxcnc.err, which names the crash site and is the reliable signal in every environment. The native backtrace stays a best-effort extra for the root case. Verified: an injected GUI segfault now fails the test in ~20s (no hang), logs the Python traceback, and prints a clear 'no readable core dump' note instead of a misleading permission error.
|
@BsAtHome this is gmoccapy current method the icon and time differ axis current method the preview differs, the fonts I fuzzed a bit cause they were always coming up as positive all red with my reference picture, this is so when running locally you don't get all red
this is axis with absdiff with slightly different fonts, and the plot
this is with added fuzz on the absdiff method, only see plot difference
Both methods express the same thing in a different way, now that you see them next to each other, do you have a preference? |
76f47e8 to
ed7f125
Compare
|
@hdiethelm Moved gdb as suggested. The crashdump did work initially and was tested, but I did not run it again after recent permissions changes, my bad, I did testing again results are as follows Failure path: I injected a real segfault into a GUI and pushed it through CI to watch the actual runners. Good news: it fails fast (the test goes red in seconds, no hang), the Python faulthandler traceback in linuxcnc.err names the crash frame, the watchdog catches it with UI_SMOKE_FAIL: linuxcnc exited before the driver finished, and the failure screenshots still upload. Result line was 291 tests run, 290 successful, 1 failed. Your freeze was most likely the rtapi_app path: when the RT loader dies, linuxcnc hangs bringing HAL up rather than the GUI process exiting, so the launcher-PID watch never fires and only the outer timeout ends it, slowly. The GUI-crash path, which is what the crash dump is for, behaves. Testing it again also exposed and fixed two crash-dump bugs: it could pick up a stale root-owned core from /tmp (gdb then just printed Permission denied), and a non-root run produces no core at all since we will not sudo to set core_pattern. Fixed: only use a readable core from this run, otherwise say so and lean on the Python traceback. The native gdb backtrace stays a best-effort extra for when the suite runs as root. One note from the CI run: because the runners execute the suite non-root, the native backtrace path correctly no-ops and prints no readable core dump; see the Python traceback. The Python traceback is the crash signal CI actually delivers, and it does. |
|
Actually, I like both image diffs (without fuzz) when viewed together. Do we need to choose or can we have them both? |
|
Shouldn't be a problem to have both, can even have multiples fuzz levels, the text rendering I think goes away around 20% fuzz with no fuzz the text will always be almost all red, so it will be harder to spot if any labels were changed |
|
Instead of having them as separate diffs, I could collage everything in a single huge image, with tiles, where each tite is titled sensibly, like original and build next to each other on top, then more tiles for the "a-b diff" and the "red marks the diff" at 0 and 20% fuzz, 6 tiles, 2x3, I think that should give a proper overview, or am I overdoing it? |
|
Combining them in one image will make the lighter parts hide the darker parts (eye accommodation problem). Therefore, having them separate is best. The small text rendering differences are not bright red as long as they are just rendering differences. A real text change would make that much more visible. I think it be good to see small rendering changes because it says something about stability. Therefore, no fuzz would be fine. If we are to look at more differences, then we probably should look at differences between distributions. However, this is not a ui-smoke test problem. This would more be like a "let me see what I broke yesterday" matrix comparison. You'd need to collect images from all distros/versions and then be able to do a local RIP comparison. A different project ;-) |
Write two confirm-vs-reference diff images instead of one, both at 0%
fuzz so nothing is hidden:
diff.png red highlight over a faded reference (ImageMagick compare)
diff-abs.png absolute per-channel |b - a|, black where equal and
bright where changed
The absolute difference is unbiased (direction does not matter) and
preserves magnitude, which the red-on-faded-original loses. Both are
uploaded as CI artifacts (gcc and clang) and gitignored locally. The
AE pixel count stays informational; the comparison never gates a test.
Replace the axis and gmoccapy reference.png with the confirm shots from a clean CI run (run 27657962145 on this branch tip), so the committed known-good baseline is the CI ground truth everyone compares against rather than a developer-machine capture. The references are not a pass/fail gate; pinning them to CI just makes the recorded diffs meaningful across distros.
0fdde41 to
380e419
Compare
The offscreen grab could catch qtdragon in different startup states
between runs: the 3D preview at different zooms and the gcode view at
different scroll positions. Two distinct causes:
- The preview computes its zoom once, at file-load, from the live
widget size (glnav.py set_view_p reads winfo_width/height) and
never re-fits on resize (qt5_graphics.py resizeGL only resets the
GL viewport). Under offscreen the file load can run before the
layout settles, freezing a stale zoom from a transient size.
- The gcode editor resets to the top on load, then scrolls to the
running line on line-changed; which one the grab catches is timing.
Before grabbing, normalize at the now-final widget size: re-issue the
view fit (set_current_view) so the zoom is recomputed against the fixed
layout, and pin the gcode scrollbar to the top. This removes the large
preview-zoom and scroll swings (previously tens of thousands of
differing pixels). reference.png is captured from CI with this in place.
380e419 to
31c9888
Compare
Now gdb is installed with apt-get build-dep right? This just means everybody that builds linuxcnc will have gdb installed. But gdb is anyway needed to run the test, so fine in my opinion. An alternative would be to add it to the two places in the CI. If anyone has an issue with this, I can send you a patch to install it CI only.
Possibly I was not patient enough.
You can use sudo in CI, it has nopasswd set. Changing: |
crashdump_arm gated setting kernel.core_pattern on "id -u = 0", but the suite never runs as root (linuxcnc refuses to start as root), so the pattern was never pointed at our writable dir and the native gdb backtrace never ran, not even on CI where it could. Drop the root check and set core_pattern via sudo, which CI has passwordless; still skipped under runtests -u so a user opting out of root tweaks is respected. Verified on a real host with passwordless sudo: a GUI segfault now writes a core into our dir and the crashdump prints the gdb backtrace (the crash frame, ffi_call, g_main_loop_run, gtk_main) plus all-threads. In a docker container /proc/sys is read-only so the arm no-ops there and the run falls back to the Python traceback, as before. Suggested by hdiethelm.
|
You're right, the I tested the full path on a real host with passwordless sudo: a GUI segfault now writes a core into our dir and the crashdump prints the gdb backtrace (the crash frame, ffi_call, g_main_loop_run, gtk_main) plus all-threads. So CI should now produce a native backtrace instead of the "no readable core dump" fallback. (In a docker container One open question for you: |
Under CI load a GUI can still be initializing when the confirm shot is grabbed: axis has been caught with its Manual Control widgets not yet built and the program not loaded, and separately with the Max Velocity slider still syncing from the [DISPLAY] limit (300) down to the [TRAJ] limit (240). Replace the fixed pre-grab sleep with screenshot_grab_settled, which re-grabs the root window until two consecutive frames match within a threshold (absorbing 3D-preview anti-aliasing and a ticking clock) or a timeout hits, then keeps the last frame. Offscreen qtdragon normalizes itself in the grab shim, so it still grabs once. Verified in the CI container: axis and gmoccapy settle in normal time and produce a fully-built shot.
The An other option: Add a new option in scripts/runtests.in that enables crashdumps like Looking at your crashdump script: Instead of having it exclusive for your UI tests: Why not enable it globally for all tests in |
With the settle grab in place, axis reliably captures its post-clamp state: the Max Velocity slider shows 240 in/min (the [TRAJ] limit) rather than the 300 transient from [DISPLAY] that the old grab sometimes caught. Both the gcc and clang CI jobs now agree on 240, so adopt that confirm (run 27734628992) as the reference.
Arming kernel.core_pattern via sudo on every run is not friendly to someone running the suite outside a VM, since it leaves that global setting changed afterwards. Gate the whole crash dump behind a new runtests -d (ENABLE_CRASHDUMPS=1), off by default, and pass -d in the three CI invocations. Without -d, crashdump_arm/report no-op: no sudo, no core_pattern change. The Python faulthandler traceback is unaffected and remains the always-present crash signal. Per hdiethelm review.
|
Good call, done. Crash dumps are now opt-in behind On enabling it globally for every test in |







Builds on the merged phase 2 (#4054). This phase makes the ui-smoke runs more diagnosable and adds writable-config handling so the GUIs start cleanly headless.
What's here:
hide_startup_messsageso the "Important change(s)" modal stays hidden (the same effect as ticking "Don't show this again"). This is the headless-start fix discussed in gmoccapy: modal startup "Important change(s)" dialog blocks headless / non-interactive startup #4072.Scope note: the touchy window-fit smoke test is intentionally held back on a separate branch, since it depends on the touchy fitting changes in #4131 which are not merged yet. I will send it as a follow-up once #4131 lands.