Post #27083

Signature not verified

This entry might be using an old signature, or it was signed by a key that does not exist on the server.

The entry content as it exists in the database. This should be verified against the blockchain entry.
Maxing out a Factom network
[QUOTE="Matt York, post: 27064, member: 129"]
1. Factomd seems to be IO bound, maxing out a network will never max out the CPU of a single node.
Not necessarily. Golang channels are kinda tricky and due to the cooperative scheduling, I found that some channels are just not being worked off "correctly", meaning channel A will drain completely even though channel B has a much larger pile waiting. In practice, this resulted in some of the non-blocking channels overflowing when they shouldn't.

That's the reason why in factomd some of the channels have a capacity of 5,000 or more, it gives them enough of a buffer to pile up. For p2p2, I ended up inserting a manual release of the goroutine after it pulls a parcel from the parcel channel: [URL][/URL] And it's now using a channel capacity of just 50 without dropping parcels. ([url=]More details with playground examples in this issue[/url])

The reason the CPU doesn't get maxed out could be due to issues like that, where CPU threads are idling because specific channels aren't being filled fast enough or block because they're not drained. You can check this by seeing if some cores are maxed out while others are idling. If all cores have an equal load, it's probably not that.

In the long run, bandwidth will be the limiting factor due to gossip's inefficiency.

[QUOTE="Matt York, post: 27064, member: 129"]
2. On mainnet - we don't see the expected # of duplicate messages (from the point of view of a single node).
Essentially this means the gossip protocol performs differently on a larger network.

I wrote about this a lot in my early blogs, such as: [URL][/URL] and [URL][/URL].

There is also the theoretical data of waste messages I collected at [URL][/URL]. The TL;DR is that with enough message bounces, you should get around the same # of duplicates as the -broadcast setting. (16 for mainnet) However, since I gathered that data, Clay improved the smart filter in factomd and now it no longer tries to send messages to peers that have sent you that specific message already.

The big goal of p2p2 was to re-order the network structure of the nodes and be able to get away with a much lower fanout setting while still keeping a high reach. That has the potential of reducing the number of messages by ~50% and combined with a much lower encoding overhead, should significantly increase the (theoretical) bandwidth utilization.

I've been collecting a couple of mainnet traffic captures (receiving application messages only), here's some data that maybe helps:

Duration: 68h 44m 56s
Messages per second: ~19.2
EPS: ~1.9
TPS: ~3.8
raw bandwidth: ~.037 Megabit/s
Size: 1,133,187,694 bytes (~1.05GiB)

Total # of Messages: 4,750,722
Duplicates: 3,987,292 (~83.9% or ~5 of 6)

However, a good chunk of those are "MissingMsg" which are not broadcast messages. If we filter out non-broadcast messages, we get: 3,987,292 dupes / 4,580,405 total (~%87) or 7 out of 8 duplicates.


Judging from the really low bandwidth, we should be able to get much higher EPS in a test environment. In a simple scaleup, 5 Mbit down/up should get us ~180 eps (there's a 30% overhead from encoding v9 messages). In your test setup, you have a 2Gbit connection, which should be plenty for that.

At some point I'd like to get a full message dump (both send and receive, and not just app messages) for proper analysis. It would also be great to test the p2p network independently of the factomd code, which would test the theory of whether I/O is the limiting factor or not.
This is the raw content, without BBCode parsing.