Testnet Improvement Plan of Actions

Unrestricted Public Thread

  • Viewed Consensus Networks Consensus Networks Michael Lam Nate Miller Niels Klomp Paul B. Quintilian csmith75
  • Not Viewed Anton Ilzheev
Hi All,

I'm getting this thread started to begin discussions around areas where we can improve processes on the testnet. I've invited an initial batch of people to comment but it is not intended to be exclusive so if there are people who are interested in adding some value, please let me know and I'll invite them as well (you may be able to as well, this is a public thread). I think the first area to tackle is new release testing. Today being a great example; in the future I'll of course run the load test much earlier, however the question of regressions still remains. First, the test today identified a specific compatibility issue with a specific API/library that didn't necessarily have impact on Factomd itself. This is important for two reasons: First, because it isn't ('necessarily') a bug in factomd and the proximity of the event to the desire mainnet version release, it's tough for core devs to justify a delay in release. It's therefore important that we identify these things as early as possible so that we can make it easier to say yes to the fixes. Second, this was just one library, there may be others with compatibility issues so we should test others that get a lot of use. We need to ensure testing happens as early as possible and the same way every time. I'd like some feedback here:

1. It is pretty easy to run the load test so does automating it really get us anything?
2. Are there other libraries out there that could/should be tested as part of our process?

I've also got a testnet new release checklist put together by @Tor Paulsen that has a good process for new releases. I'd like to continue to use this as a living document and add to it to ensure our process for testnet updates happens smoothly. I'd also like to see what we can automate, such as a script that could encompass load testing and forcing elections (if possible). I'll leave it here for now and I'd appreciate your feedback, thanks!
Hi @Nate Miller , as mentioned yesterday during the guide meeting I will pick up the CI solution for several Factom clients versus different versions of factomd. Idea is to continously test these clients against testnet and new releases. We have seen some sporadic and temporary strange behavior on the network in the past as well. By running some of these test almost continuously we hope to battle test the clients, a small aspect of the network and parts of factomd. Obviously this isn't the only thing we should be doing, but it would have caught the regression early on for instance.
One little thing: on the chockablock home page, on the lower graph, you can see both the block times (which should be 600s) as well as elections (if an election happens the vertical bar would be red) I think that's good info to monitor when assessing a new release, verify there is no unexpected elections and that blocks are regular (with and without load).
My biggest push would be for automated testing. I like tests which run automatically every day. I like this because if anything changes in the software, at worst you have 24 hours before its picked up by tests. It may also seem silly to continuously rerun passing tests when you already know the tests passed and the software hasn't changed. But this too can catch race conditions, or weird timing bugs. What happens when you are running the tests during elections? I've caught weird bugs because over time, I've noticed a specific test would fail in the low single digits. Eventually I tracked this to a non-thread safe operation which was low percentage but would happen. I had to loop the test 100-1000 times to confirm the failure, but it inevitably catches things. If all client side tests could be automated onto testnet this would be great. Because factomd programmers won't necessarily have access to these tests. At any rate, i feel this would be the first direction to push and get set up.
I like this @Michael Lam , I think the next step is to assemble is the tests we'd like to do. @Niels Klomp mentioned his, which is a CI server that looks like it continuously runs? However, there are certainly other client side apps/libraries/packages etc that we should probably integrate as well. I'll need some help brainstorming and assembling the testing code but once we've got it, I'm willing to host at no cost.
What you guys are coming up with is exactly what I am setting up, exactly for these reasons. Normally you would make sure tests run during dev and before release. But we have seen some strange behavior on the network before which some tests might have captured or at least interesting to see how these clients behave.

We are already hosting the infra for it at no costs btw 🙂