think

2024-07-08 21:17:12 +01:00 · 2024-07-08 21:17:12 +01:00 · c75452549b
commit c75452549b
parent 2663f58807
1 changed files with 212 additions and 0 deletions
--- a/THOUGHTS.txt
+++ b/THOUGHTS.txt
@ -5161,3 +5161,215 @@ for s in $(s6-rc-db -d all-dependencies $service); do
    esac
  done
 done
+
+Sun Jun 16 23:13:53 BST 2024
+
+what we are trying to do is set up an l2tp by hostname
+
+1) this means looking up the hostname in the dns
+2) this means having a route to the dns server
+3) this means parsing the space-separated list of dns servers
+  provided by dhcp
+
+we could write the servers each into their own file, but that
+helps less than you'd think unless we give those files predictable
+names
+
+Thu Jun 20 10:16:52 BST 2024
+
+now we have l2tp-over-wwan, we need to do the failover mechanism
+
+- can't have both l2tp and pppoe running at once (at least for aaisp)
+  because same creds used for both and starting l2tp will cause them
+  to route all traffic to the l2tp instead of the FTTx
+
+- we could have the wwan stick permanently configured and ready to go,
+  as long as we're not actvely using it unless the main connection is
+  b0rked
+
+- can we have the same odhcp stuff running and point it to either?
+  maybe renaming the wan interface would be an easy-ish way to do this
+
+we need some kind of health check on the main connection that will
+bring up the backup if e.g. packet loss over x%. Or is lcp echo good
+enough here? for multipath to the same backhaul, if some weird routing
+cockup makes google unavailable from the main connection it will most
+likely also be unavailable from the backup, so lcp echo is arguably better
+
+
+on a side note, use of shell functions to get the output from another
+service is a bit icky
+
+Fri Jun 21 21:05:21 BST 2024
+
+We can have a controller with two controlled services, which runs the
+second one when the first one isn't working.
+
+how do we connect the dependent services (dhcp pd, defaultroute, anything
+else dependent on wan) to the correct upstream?
+
+we can't use bundles because bundles just flatten to atomic services, there's
+no either/or there
+
+controller
+  - main service
+  - backup service
+  - proxy service
+
+The proxy service is running when one of the main or backup services is
+up.  It provides all the outputs of whichever backend service is active
+
+https://skarnet.org/software/s6/s6-svwait.html
+
+proxy could use "s6-svwait -U -o main backup" to wait for one of the two
+backend services, provded that both are longruns
+
+so in the controller we start main-service, and if/when that fails start
+backup-service. we run proxy-service if any of the backend services is
+running, and use its outputs to indicate which.
+
+the proxy could just symlink to the backing service outputs directory,
+or it could copy and translate if the main and backup services have
+different outputs, so that it presents a common interface. I'm not
+sure proxy is the best name but I haven't thought of a better.
+
+we can do a manual switch back to main-service by restarting the
+controller.  we could do an automatic switch by adding logic to the
+controller to make it restart itself.
+
+perhaps the controller has an output that indicates which backend is
+active, then the proxy just needs to look at that to figure which one to
+use.
+
+while true; do
+  if s6-rc -u change $primary; then # will wait until succeeded, or exit 1 if timeout
+    ln -sf $primary outputs/active
+    s6-rc -u change $proxy
+  elif s6-rc -u change $secondary;  then
+    ln -sf $secondary outputs/active
+    s6-rc -u change $proxy
+  else
+    rm outputs/active
+    s6-rc -d change $proxy
+  fi
+  # wait for the backend to die (down cleanup will
+  # remove outputs directory)
+  while test -d outputs/active/.outputs
+    inotifywait outputs/active/.outputs
+  fi
+  rm outputs/active
+  s6-rc -d change $proxy
+end
+
+this script will when when primary dies, attempt to start primary: if
+it doesn't come up, start secondary
+
+if the primary comes up and then goes down later, we'll start it
+again - which isn't what we want. When the primary dies, we
+want to try the secondary next
+
+backends="primary secondary tertiary etc"
+rest=$backends
+while true ; do
+  first="${rest%% *}"
+  rest="${backends#* }"
+  if test -n "$first"; then
+    if s6-rc -u change $first; then
+      ln -sf $first outputs/active
+      s6-rc -u change $proxy
+
+      while test -d outputs/active/.outputs
+	inotifywait outputs/active/.outputs
+      fi
+    fi
+    rm outputs/active
+    s6-rc -d change $proxy
+  else
+    rest=$backends
+  fi
+done
+
+in this version when the secondary dies then we try the third backend
+(round-robin). are there circumstances where we'd rather retry the primary?
+Presumably there are circumstances where we would _not_ rather
+retry the primary, otherwise why are we even providing a tertiary?
+If we could answer that question then we'd know.
+
+
+Mon Jun 24 21:22:34 BST 2024
+
+the controller needs to know the names of its backends, which is ugly
+if they're computed names because we can't define the services themselves
+first without their references to the controller
+
+mutual recursion ... maybe it's time to understand how this fixpoint
+thing works
+
+Wed Jun 26 22:16:25 BST 2024
+
+s6 will restart the pppoe service when it dies, and keep doing this
+indefinitely - unless the ./finish script returns 125. Note that this
+is only true for longruns, but it's not as though oneshots can die
+anyway as there's no process to fail.
+
+Sat Jun 29 21:43:10 BST 2024
+
+> s6-supervise says it restarts the supervised process when it exits
+  "unless told not to"; however s6-rc talks about "failed
+  transitions": if a s6-rc service doesn't signal readiness before
+  timeout-up expires, it is stopped and won't be restarted.  I *think*
+  the behaviour I am observing is that ./run may be invoked several
+  times if it dies without ever signalling readiness, and then it's
+  killed when the timeout is exceeded
+
+
+... so ... that's OK, probably. pppoe will stop running after n
+lcp-echoes time out
+
+----
+
+inotifywait apparently requires c++ and libgcc and transitively the
+kitchen sink, which is a bit silly as we have linotify in lua. So
+we should replace the failover scripty thing with a lua program
+
+(table.concat rdepends ", ")
+
+
+Fri Jul  5 21:21:18 BST 2024
+
+
+1970-01-01 00:01:00.797696621 wan-switcher      blocks (        modem-modeswitch, modem-atz, wan.link.pppoe, 194.4.172.12.l2tp, wan-proxy     )       rdepends (      194.4.172.12.l2tp       )       start ( 194.4.172.12.l2tp       )
+
+
+why is it starting l2tp when it should depend on having a route to the
+l2tp server
+
+Sat Jul  6 14:24:26 BST 2024
+
+The logic for up-tree is not correct, as it assumes that the
+requested service is itself ready to start (so excludes it from
+the blocked list). If the requested service is dependent on
+some other block, it should not be started.
+
+[ I am confused. Isn't this what happens already? ]
+
+
+@40000000000000441b51b24c wan-switcher  blocks (        modem-atz, modem-modeswitch, 194.4.172.12.l2tp, wan.link.pppoe, wan-proxy     )       rdepends (      194.4.172.12.l2tp       )       start ( 194.4.172.12.l2tp       )
+
+
+# s6-rc-db all-dependencies  194.4.172.12.l2tp
+route-05029a9e8e2c-ee8d76f34e9c
+hostname
+modem-atz
+modem-modeswitch
+wwan0.link
+check-lns-address
+resolve-l2tp-server
+controlled
+route-07d8f171cb5a-ee8d76f34e9c
+wwan0.link.dhcpc
+wwan0.link.dhcpc-log
+194.4.172.12.l2tp-log
+194.4.172.12.l2tp
+s6rc-fdholder
+s6rc-oneshot-runner