Factom 2.0: Breaking the Blockchain Barrier to Performance

The Factom Protocol is demonstrating pretty solid stability. At the same time, we are really hurting for resources, and we are competing with protocols that have done some pretty good marketing.

This is the time to do a clean rebuild of the Factom Protocol. We have the opportunity to build on work done by TFA and Factom Inc. for the DOE, on the work done in the WAX build, and on the experience of other successful projects. We can do so without losing touch with the basic design principles in Factom that have proven successful.

Work is still needed to flush out the details. It is at a point where everyone can review the work so far.

Presentation to give the highlights of Factom 2.0

White paper for Factom 2.0

Prototype code for the Validator / Accumulator architecture

The Presentation is pretty approachable, while the white paper has sections that are deeply technical. The ValAcc prototype application can be built and evaluated by developers, and also implements the algorithms described in the white paper around Stateful Merkle Trees.

And not to bury the lead, the proposal suggests we can deploy Factom 2.0 in three phases, where the last two phases are transparent to users:

Phase I
Leaders would be refactored into the validator / accumulator architecture with go channels used to implement the communications between various components. Factomd would be a single server deployment of authority nodes and followers.

Estimated Performance: 1000 - 5000 tps

Phase II
Change Authority Node deployment to require a server per VM. Each VM amounts to a shard of the network with only its own concerns limiting its performance.

Estimated Performance: 28,000 - 144,000 tps

Phase III
Authority Set deployment would add an additional sharding layer within each VM to allow each VM to leverage multiple servers.

Estimated Performance: 25,000,000 - 130,000,000 tps

@Core Committee @Exchange Working Group

Would notify the Governance Working Group, but the @Governance Working Group doesn't actually do the lookup properly... @WB

ADDED: Factom 2.0 Design
 
Last edited:
I welcome this proposal and feel cautiously enthusiastic about the substantial improvements that it describes.

My concerns lie in the gulf between the white paper and its ultimate delivery; it looks like a massive undertaking that will require extensive resources. How do you propose we execute?
 
  • Like
Reactions: WB
I welcome this proposal and feel cautiously enthusiastic about the substantial improvements that it describes.

My concerns lie in the gulf between the white paper and its ultimate delivery; it looks like a massive undertaking that will require extensive resources. How do you propose we execute?
Actually it vastly simplifies the problem.

I believe Factom 2.0 can be rapidly developed , tested, and deployed.
 
Much of the code exists (gossip network, fct transactions, entry commits, entry reveals)

Much of the code goes away (elections, save state, database abstractions, the state object, interfaces, the current directory block construction, dbstates, minute handling)

Some code would be rewritten. (Holding, identities, authority set management, fct to EC conversations, network simulator, unit tests, API implementation, VM leader follower, replay detection, balance tracking, missing message handling)

And we have new code. (Directory block leader/follower, Merkel tree construction, message routing)

Not a definitive list, but pretty close. Most of this code isn't tough, most is easily tested, lends itself to unit testing. Most of the rewrites are much more simple than the current code.

In the past we found ourselves limited by fixes required to the existing Factomd, and a desire to avoid a rewrite. But I believe a rewite is easier, better, and less risky. We could have phase I done pretty rapidly (less than a year, maybe less than 6 months). Would need to do a thorough design against some known resources to set a timeline.
 
Thank you for sharing this technical proposal. It's pretty dense so I haven't been through it completely yet. But from a high level perspective, I don't see how that project is necessary nor doable given our current situation:
  1. "The Factom Protocol is demonstrating pretty solid stability. At the same time, we are really hurting for resources, and we are competing with protocols that have done some pretty good marketing." I cannot comprehend how that statement leads to the conclusion: "This is the time to do a clean rebuild of the Factom Protocol.". I don't see any logical link between the two ¯\(ツ)/¯ .
  2. Current average usage is 0.1TPS, even if our max is currently low, there is quite a lot (100x at least) of room before hitting any issue. I don't see more TSP anywhere close to be in the top list of Factom current problems. If your thinking is that more TPS is also a good marketing gimmick, I am afraid we are years late to the game.
  3. That projects sounds more like a cool project to keep a few developers entertained than trying to address our ecosystem real problems. I have no doubt the developers will have a lot of fun developing it. But given our resource constraints, is that a good idea?
  4. Unfortunately at this point you have eroded quite a lot your believability when it comes to execute a technical vision. I remember vividly the "sharding by end of year" stated publicly in 2018 (today we are nowhere even close to that). A huge factor in my opinion on this proposal will be how and who willexecute it. At this point I won't support such a big project without knowing who would be the tech lead on it and the devs assigned to it. It can be day and night depending who executes.
My post doesn't make any judgement on the technical proposal itself (yet) but the concerns above are largely more important than the technical details at this point.
 
Last edited:
  1. "The Factom Protocol is demonstrating pretty solid stability. At the same time, we are really hurting for resources, and we are competing with protocols that have done some pretty good marketing." I cannot comprehend how that statement leads to the conclusion: "This is the time to do a clean rebuild of the Factom Protocol.". I don't see any logical link between the two ¯\(ツ)/¯ .
  2. Current average usage is 0.1TPS, even if our max is currently low, there is quite a lot (100x at least) of room before hitting any issue. I don't see more TSP anywhere close to be in the top list of Factom current problems. If your thinking is that more TPS is also a good marketing gimmick, I am afraid we are years late to the game.
  3. That projects sounds more like a cool project to keep a few developers entertained than trying to address our ecosystem real problems. I have no doubt the developers will have a lot of fun developing it. But given our resource constraints, is that a good idea?
  4. Unfortunately at this point you have eroded quite a lot your believability when it comes to execute a technical vision. I remember vividly the "sharding by end of year" stated publicly in 2018 (today we are nowhere even close to that). A huge factor in my opinion on this proposal will be how and who willexecute it. At this point I won't support such a big project without knowing who would be the tech lead on it and the devs assigned to it. It can be day and night depending who executes.
100% agree with everything @Luciap said. He hit the nail on the head.
 
Thank you for sharing this technical proposal. It's pretty dense so I haven't been through it completely yet. But from a high level perspective, I don't see how that project is necessary nor doable given our current situation:
  1. "The Factom Protocol is demonstrating pretty solid stability. At the same time, we are really hurting for resources, and we are competing with protocols that have done some pretty good marketing." I cannot comprehend how that statement leads to the conclusion: "This is the time to do a clean rebuild of the Factom Protocol.". I don't see any logical link between the two ¯\(ツ)/¯ .
  2. Current average usage is 0.1TPS, even if our max is currently low, there is quite a lot (100x at least) of room before hitting any issue. I don't see more TSP anywhere close to be in the top list of Factom current problems. If your thinking is that more TPS is also a good marketing gimmick, I am afraid we are years late to the game.
  3. That projects sounds more like a cool project to keep a few developers entertained than trying to address our ecosystem real problems. I have no doubt the developers will have a lot of fun developing it. But given our resource constraints, is that a good idea?
  4. Unfortunately at this point you have eroded quite a lot your believability when it comes to execute a technical vision. I remember vividly the "sharding by end of year" stated publicly in 2018 (today we are nowhere even close to that). A huge factor in my opinion on this proposal will be how and who willexecute it. At this point I won't support such a big project without knowing who would be the tech lead on it and the devs assigned to it. It can be day and night depending who executes.
My post doesn't make any judgement on the technical proposal itself (yet) but the concerns above are largely more important than the technical details at this point.
1. The best time to plant a tree is twenty years ago. If we are going to succeed we can't just say 60 TPS is good enough.
2. Actually, we are not years late to the game. The game for the next couple of years with government contracts and supply chain is digital identity. But these use cases require more tps than what we can promise. If we don't have it, we better have it on the roadmap.
3. Depends on what you want you mean, "to keep a few developers entertained". We don't have the capacity at 60 tps to take on more public protocols. Yes, I'd say that data integrity and provenance is more useful than smart contracts, but it also needs more capacity.
4. I understand that many have lost faith in me. I'd have to be pretty deaf not to hear that message. There are times people have to just do the work because nobody is willing to believe. It may come to that. I would also say that in 2018 I thought we could just focus on the improvements. We chose to try and do them within the current code base. I will suggest that trying to work within the current codebase makes forward progress too hard. Too many basic things must be addressed (removing interfaces, reorganization of the code, simplifying consensus).

Part of this architecture is that we don't have stalls anymore. Part is that it gets rid of docker swarms. And part is that it allows a much better defined path to decentralization of onboarding and offboarding.

You might think that we are good enough, but one of the big lessons from the DOE project is that real world applications for IoT and supply chain want to see this kind of capacity. For applications that are distributed, they'd like to do this on a public blockchain. And we are not there.
 
100% agree with everything @Luciap said. He hit the nail on the head.
Other than "I haven't been through it completely yet. But...", 60 tps is good enough, it is too late to market high tps capacity, we have better things for developers to do than move the code base forward, and, "We don't believe in you Paul", what exactly did @Luciap say that "hit the nail on the head?"
 
ok enough with the passive agressiveness.

Paul B. stated a concern and asked a question:

" A huge factor in my opinion on this proposal will be how and who willexecute it. At this point I won't support such a big project without knowing who would be the tech lead on it and the devs assigned to it. It can be day and night depending who executes. "

How is this going to go Paul S.?
 
ok enough with the passive agressiveness.

Paul B. stated a concern and asked a question:

" A huge factor in my opinion on this proposal will be how and who willexecute it. At this point I won't support such a big project without knowing who would be the tech lead on it and the devs assigned to it. It can be day and night depending who executes. "

How is this going to go Paul S.?
Not sure what you are asking? I'm certainly committed to the success of the protocol. Regardless. We are not at the point where I can define a development team.
 
Last edited:
1. The best time to plant a tree is twenty years ago. If we are going to succeed we can't just say 60 TPS is good enough.
The problem is that this is pretty much currently true for all aspects the protocol: marketing, exchange, governance and tokeneconmics... and I believe they are all 4 more critical at this points than the tech. I really wish our biggest problem was tech, so I could help make an impact. Having scalability issue is a pretty nice problem to have, rather than the other way around (a good idea/tech that nobody uses).
 
The problem is that this is pretty much currently true for all aspects the protocol: marketing, exchange, governance and tokeneconmics... and I believe they are all 4 more critical at this points than the tech. I really wish our biggest problem was tech, so I could help make an impact. Having scalability issue is a pretty nice problem to have, rather than the other way around (a good idea/tech that nobody uses).
If you are a developer, it isn't like doing nothing helps with marketing, exchanges, governance, and tokenomics.

When I'm talking to a enterprise customer, they care about capacity. 60 TPS is nice but not enough. What we did for the DOE allows us capacity for a private, or hybrid public/private solution. DIDs are best on the public network, so showing that public Factom scales is still critical.

And there is a thread about a tokenomics idea I posted too.
 
In the past we found ourselves limited by fixes required to the existing Factomd, and a desire to avoid a rewrite. But I believe a rewite is easier, better, and less risky. We could have phase I done pretty rapidly (less than a year, maybe less than 6 months). Would need to do a thorough design against some known resources to set a timeline.
i just want to inject a little caution regarding rewrites, particularly for the benefit of our non-devs. They are very often a costly mistake that consume much more time and many more resources than anyone could have initially anticipated. You throw away working code that has had lots of real world testing and lots of bug fixes for code that does not yet exist.

This article regularly does the rounds on Hacker News and provides a bit of context on my caution. I strongly encourage everyone to read it: https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/

I like the technical content of the proposal and despite the risks involved with a rewrite I am not necessarily opposed to that either. However, I am deeply uneasy about the belief that it can be done rapidly, especially if we plan to write clean, maintainable and safe code. Good code is hard and these are not trivial changes. A rewrite is a significant undertaking and will require a correspondingly significant commitment of resources.

If we choose to go down this route on the basis that we can deploy it rapidly, then be prepared for me to quote this post at regular intervals (and particularly during grant rounds) should that basis prove to be false.
 
Last edited:
2. Actually, we are not years late to the game. The game for the next couple of years with government contracts and supply chain is digital identity. But these use cases require more tps than what we can promise. If we don't have it, we better have it on the roadmap.
You might think that we are good enough, but one of the big lessons from the DOE project is that real world applications for IoT and supply chain want to see this kind of capacity. For applications that are distributed, they'd like to do this on a public blockchain. And we are not there.
It would beneficial if a more full write up of the learnings could be done then. It is very difficult for the community to navigate with this information asymmetry. What is the timeline for such a demand, how many parties indicate such a need, how certain is it, etc.

I think everyone understands that not everything can be shared - and we do have a bad history with predictions missing targets. However, all of this talk we have now about changing the tokenomics in various way and radically change the architecture of the protocol would be well served by having more information about the outlook of various parties, to underpin decisions.
 
It would beneficial if a more full write up of the learnings could be done then. It is very difficult for the community to navigate with this information asymmetry. What is the timeline for such a demand, how many parties indicate such a need, how certain is it, etc.

I think everyone understands that not everything can be shared - and we do have a bad history with predictions missing targets. However, all of this talk we have now about changing the tokenomics in various way and radically change the architecture of the protocol would be well served by having more information about the outlook of various parties, to underpin decisions.
I believe we have many knowns, not limits on what people can share.

These threads and documents are intended to put as much detail as possible in front of the community as possible. The presentation was included to give a higher level view. The treads let us break down the issues and concerns.
 
i just want to inject a little caution regarding rewrites, particularly for the benefit of our non-devs. They are very often a costly mistake that consume much more time and many more resources than anyone could have initially anticipated. You throw away working code that has had lots of real world testing and lots of bug fixes for code that does not yet exist.

This article regularly does the rounds on Hacker News and provides a bit of context on my caution. I strongly encourage everyone to read it: https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/

I like the technical content of the proposal and despite the risks involved with a rewrite I am not necessarily opposed to that either. However, I am deeply uneasy about the belief that it can be done rapidly, especially if we plan to write clean, maintainable and safe code. Good code is hard and these are not trivial changes. A rewrite is a significant undertaking and will require a correspondingly significant commitment of resources.

If we choose to go down this route on the basis that we can deploy it rapidly, then be prepared for me to quote this post at regular intervals (and particularly during grant rounds) should that basis prove to be false.
The danger of rewrites is real. But this isn't a universal rule. I've done more rewrites than most developers. I can talk to that if anyone cares. here is what I learned.

A rewrite is justified only if you can eliminate whole classes of problems, and or can gain significant marketable utility and features.

But at all costs resist second system syndrome.

This proposal is very much the opposite of second system syndrome. It centers on elimination of dynamic synchronization per message because of the overwhelming complexity in consensus, validation, error recovery, and application integration it creates.

Dynamic synchronization seemed so compelling. Leaders immediately acknowledging everything. But this feature has only delivered complexity without applications like wallets reaping any benefits from it.

While I do get that many bug fixes exist in the code that doesn't come over, that can be balanced by removing the possibility of whole classes of bugs.

With ten second blocks, and block digests, we eliminate the acknowledged but not in a block state. In fact we replace acknowledments with block digests. We eliminate synchronization between all the leaders on minutes. We eliminate special synchronization between blocks. We eliminate negotiations about exactly where a leader failed when a leader abruptly goes off line.

Unit testing is so limited now due to these inter leader interactions that we have an entire infrastructure built to simulate a network to do unit testing. A feature I'd want to keep, (being able to simulate a network on your desktop is powerful) but network simulation becomes unnecessary for unit testing here because digests become the only interaction between leaders. That's just a set of data that can be used to drive unit tests. Timing issues are eliminated. Because the interactions between leaders is now limited to block digests produced by leaders at the end of the block.

Quite a bit of code comes over. Not talking about rewriting the networking. Or the FCT or EC transactions. There will be rewriting the Merkel tree building, but that's already done.

The APIs will need a good bit of work, but not a redesign. Mostly simplification because the data they need is all in the database. Because without a digest, a node only has transactions. After processing a digest, the data is in the database.

We have piles of code that just becomes unnecessary without a massive inflight state of all the leaders that one has to take into account in the APIs.
 
Thanks, @PaulSnow. I recognise the possible advantages of a rewrite, particularly where major functionality and design changes are being introduced. After all, I have just rewritten nearly the entire backend of factoshi.io, which I believe was the correct decision.

However, unlike the factoshi.io backend, factomd is large and complicated software even if you account for the simplifications inherent in this design. I want everyone to fully appreciate and understand the gravity, implications and risks involved with abandoning large swathes of the current codebase - which has only recently become relatively stable - in favour of something that does not yet exist. With that in mind, I do still like the technical contents of the proposal and I would, in theory, like to see it implemented.

You have invested a lot of time into the technical side of this proposal. In my view, the next step should be to invest a similar effort into a robust plan of action that is able to demonstrate how it can be implemented.
 
May I suggest that a business case is put on top of the implementation plan to demonstrate the commercial value of a rewrite?
That's covered to some extent by the discussion on the Department of Energy SBIR work. This architecture was driven by the effort and results from building a system to meet the performance required for a general purpose IoT deployment.

Supply chain applications need thousands to hundreds of thousands of digital identities (DIDs) each. And each DID has to create verifiable credentials (many would be on chain), and applications would be creating presentations (again, many would be on chain).

If we were to implement part of a replacement for Social Security numbers in the US, we would have to be able to create 100's of millions of DIDs (as per one of the SVIP proposals that one of our partners applied for, but would be a market for us in any event if the project progressed).

Data applications have huge transaction needs, possibly more so than payment rails.

To date when pressed about very high transaction volumes, we reasonably point out most of that is offline, and only signed states are needed in the public blockchain. However, DIDs don't work that way. If organizations are interacting who are not in closely bound partnerships (like supply chain relationships), then the DIDs really need to be on a public platform. Even if the Verifiable Credentials and Presentations are maintained privately.
 
Thanks, @PaulSnow. I recognise the possible advantages of a rewrite, particularly where major functionality and design changes are being introduced. After all, I have just rewritten nearly the entire backend of factoshi.io, which I believe was the correct decision.

However, unlike the factoshi.io backend, factomd is large and complicated software even if you account for the simplifications inherent in this design.
That is certainly a reasonable opinion given the current factomd implementation. However, I would assert that as much as 80% of the complexity in the current code is in synchronizing on minutes and on the end of block.

So the cornerstone of this proposal is to remove a major functional and design feature that has not delivered. We are getting rid of the Dynamic Message Synchronization (which was supposed to give us an "immediate" confirmation, but didn't due to the risk of stalls) in favor of 10 second blocks which delivers much improved confirmation times by removing a feature from factomd.

The new code eliminates dbstates, process lists, elections, tracking of temporary and permanent balances, save states, minute counting, state resets, all the interfaces, dynamic missing message protocols, and block height ambiguity.

All these areas have been the source of very hard to find and fix bugs, performance problems, and complex hard to understand code. In fact, they remain blockers on known bugs we have today that could stall the network if a few Authority Nodes cared to do so. We just don't have a good way to fix them currently.

Factom 2.0 seeks to replace that complexity with relatively standard blockchain type transaction collection, block building, block distribution, block validation, block selection. The only difference then between Factom and Bitcoin in this model is that we have more than one chain, and more than one validator. The code for the major chains and Validators already exists and does not require significant modification. We already validate Factoid transactions, Entry Credit transactions, and Entries going into User Chains.

I want everyone to fully appreciate and understand the gravity, implications and risks involved with abandoning large swathes of the current codebase - which has only recently become relatively stable - in favour of something that does not yet exist. With that in mind, I do still like the technical contents of the proposal and I would, in theory, like to see it implemented.

You have invested a lot of time into the technical side of this proposal. In my view, the next step should be to invest a similar effort into a robust plan of action that is able to demonstrate how it can be implemented.
I completely agree.
 
Last edited:
That's covered to some extent by the discussion on the Department of Energy SBIR work. This architecture was driven by the effort and results from building a system to meet the performance required for a general purpose IoT deployment.

Supply chain applications need thousands to hundreds of thousands of digital identities (DIDs) each. And each DID has to create verifiable credentials (many would be on chain), and applications would be creating presentations (again, many would be on chain).

If we were to implement part of a replacement for Social Security numbers in the US, we would have to be able to create 100's of millions of DIDs (as per one of the SVIP proposals that one of our partners applied for, but would be a market for us in any event if the project progressed).

Data applications have huge transaction needs, possibly more so than payment rails.

To date when pressed about very high transaction volumes, we reasonably point out most of that is offline, and only signed states are needed in the public blockchain. However, DIDs don't work that way. If organizations are interacting who are not in closely bound partnerships (like supply chain relationships), then the DIDs really need to be on a public platform. Even if the Verifiable Credentials and Presentations are maintained privately.
That is some valuable information, but it is still a far cry from a proper business case. Sure, there is possibly a large potential for additional usage. But how big is the probable gain and when can it be reaped? What kind of developments would need to happen in addition to the rewrite in order for the for the gain to materialize?
The benefits of increased stability could be difficult to quantify, but should of course still be considered.

I am not against the idea of a rewrite at all. I just think it would do us good to be thorough about qualifying the commercial upside of doing such a big change. If the community has to spend a lot of resources on a rewrite, I think it is a fair ask.
 
That is some valuable information, but it is still a far cry from a proper business case. Sure, there is possibly a large potential for additional usage. But how big is the probable gain and when can it be reaped? What kind of developments would need to happen in addition to the rewrite in order for the for the gain to materialize?
The benefits of increased stability could be difficult to quantify, but should of course still be considered.

I am not against the idea of a rewrite at all. I just think it would do us good to be thorough about qualifying the commercial upside of doing such a big change. If the community has to spend a lot of resources on a rewrite, I think it is a fair ask.
Both the DOE project and the SVIP point to the business use case. Real hard business opportunities are much more difficult to talk about because they involve companies we are talking to and the negotiations around them. Thus I'm giving you some generalities. But as business becomes real, we can talk about that more.

Also keep in mind on an open source project, it is much easier for me to talk about the tech and the development that I can actually work on than talk about generally what the business opportunities are for the wider ecosystem. But it really is the wider ecosystem opportunities that matter.
 
Top