furion's new toy: A full RPC steemd node for SteemData

in #steem3 years ago (edited)

I've finally setup my own full-RPC steemd node on a 6 core Xeon server with 256GB of ECC DDR4 RAM, in a datacenter nearby SteemData (over private, 1gbit network with sub 1ms network latency).


Running a full node is an additional maintenance workload, but it seems to be no longer avoidable. I hope that this new deployment, in combination with @gtg's node as a backup, will improve speed and reliability of SteemData services.

Public Steemd Nodes

Steemit's official nodes have been rock solid in the past month, and served well as a backbone for many of my services. I have also used @gtg's nodes extensively, since they are hosted in EU.

Screenshot from 2017-09-26 13-02-00.png

I am really happy about the proliferation of full steemd RPC nodes by the community, however I haven't had a chance to extensively test them yet.

Why Private node?

I currently run 3 databases as a service, and attempt to maintain steemd internal state synced up to the main SteemData database. I'm also syncing up the new databases from scratch (hive, sbds). All in all I'm currently performing millions of requests daily to steemd instances.

Unfortunately, SteemData servers are located in Germany, which adds a fair amount of network latency to most of the public nodes I tested. The per-request network latency, as well as limitations on available throughput were causing some issues, as the database indexers could not catch up with the blockchain head.

Why 256GB of RAM?

It is possible to run RPC nodes on hardware with lower specs, but unfortunately my needs require the fully specced out setup.

Reducing memory usage by selectively enabling features
It is possible to run RPC nodes with lower requirements. For one, not every app needs all the plugins. An app like Busy or Dtube doesn't need the markets plugin for example.
Secondly, its possible to blacklist certain operations from being indexed in account history plugin, which can also drastically reduce memory usage.

The point of SteemData is to process and store all the available information, so these optimizations do not apply.

Using SSD instead
Without high throughput and low latency requirements, its possible to run the shared memory file on a SSD. By doing so, a full RPC node could be hosted on a server with as little as 16GB of ram.

SteemData is making a lot of arbitrary requests, and to stay near-real time in state synchronization, the throughput and latency are crucial. Which is why I need all of the state to be mapped out in RAM, and the node be hosted in the same datacenter as the rest of SteemData servers. This setup is an over-kill during normal operations, but very much needed when syncing up from scratch.


I've made a custom docker image, based on Steemit's. (Dockerfile, run-steemd.sh)

I've assigned 200GB of 'ramdisk' for shared memory file, using ramfs,
with the following fstab entry:

ramfs /dev/shm ramfs defaults,noexec,nosuid,size=210GB 0 0

I've adopted @gtg's awesome full node config as a base, and tweaked it a bit.

rpc-endpoint =
p2p-max-connections = 200
public-api = database_api login_api account_by_key_api network_broadcast_api tag_api follow_api market_history_api raw_block_api
enable-plugin = witness account_history account_by_key tags follow market_history raw_block

enable-stale-production = false
required-participation = false
shared-file-size = 200G
shared-file-dir = /shm/steem

seed-node =         # @krnel (CA)
seed-node = anyx.co:2001                # @anyx (CA)
seed-node = gtg.steem.house:2001        # @gtg (PL)
seed-node = seed.jesta.us:2001          # @jesta (US)
seed-node =        # @liondani (SWISS)
seed-node = seed.riversteem.com:2001    # @riverhead (NL)
seed-node = seed.steemd.com:34191       # @roadscape (US)
seed-node = seed.steemnodes.com:2001    # @wackou (NL)

Lastly, I run everything in Docker.

docker run -v /home/steem_rpc_data:/witness_node_data_dir \
           -v /root/fullnode.config.ini:/witness_node_data_dir/config.ini \
           -v /dev/shm:/shm \
           -p 8090:8090 -d \


This setup is fairly new (in production for less than 1 day), but the results are already promising. The syncing speed is more than 100x faster vs using remote nodes, and I haven't ran into any throughput limitations yet. As long as the node doesn't crash, things should be golden.


...6 core Xeon server with 256GB of ECC DDR4 RAM

Are you still working on Viewly @furion?

Yes, I am.

I've been recently taking some time to fix/improve my steem services and witness infrastructure.

I've been fortunate enough to have a small team helping me with Viewly, so things are still progressing.

Very pleased to hear all of that. ^

One could get away with a lot less, but I want it to be as fast as possible.

Maybe it went into spam. Feel free to join us on Telegram.

Great idea. I would like to see more hopefully with an dumbed down explanation of what some components are. The scientific terms make it hard to follow, but from what I understood Im impressed!

I have 12 core xenon work station 38 gb ecc is an dell t5500 can I run a node from Ireland?

i am a quality control chemist but i love programming. i've been in bed with computers since childhood. its just awful that i turned away from ma dreams. But now im coming back to them. Some of these great projects excites my innermost zeal to pursue this God given talents. ive already made inquiries about IT education. And hopefully i will start in the next few months. Looking forward to learning from ya. And hey @furion, thank ya for sharing this great INSPIRATION!

Thanks for keeping the engine room running. It's a witness vote well spent.

Furion, please, give us a 101 class on this. I dont even know what you meant when saying RPC, nodes and all those technical things.

Hi @furion. Do you have any plans of making some tutorials about accessing Steemit database or something like that? Or if you guide us where we can learn all this stuff? I'm actually very interested in get into this stuff, have started learning web development and keen to know what languages should I learn to access the database.

For those who might think that such private RPC node doesn't serve the network. Of course it does serve the network. It will reduce load on other public nodes leaving those resources to those who can't run their private nodes.
I've finally setup my own full-RPC steemd node on a 6 core Xeon server with 256GB of ECC DDR4 RAM, in a datacenter nearby SteemData (over private, 1gbit network with sub 1ms network latency).
SteemData is open and free to use, but unfortunately its not a HA setup (high-availability).
I do have ~200 active connections on MongoDB alone, from various users and apps.

The database has been rock solid so far, and there is a fairly easy scalability path:

add more RAM to database server for high-performance
create a replica cluster for HA & backup purposes
The weak points in SteemData right now are:

node stability/availability
bugs in my codeBut still impressive results as the systems has been stable and supporting so much application. I particularly like Steemdata as it helps me navigate my own feed just in case I miss anything and helps me monitor users active users...probably we can get more dedicated team on the nodes as it goes down something but this one has been lengthy, I am not to sure of the overall effect it has on the platform...apart from that fantastic work...

Thanks for sharing! A link to your post was included in the Steem.center wiki article about SteemData. Thanks and good luck again!

Thanks for sharing this. Also the list of public nodes is much appreciated as I could not find it until now.

To add to that: I also have https://steemd.pevo.science in my list of nodes.

That's great news - thanks! I've been checking the sync status of the Hive DB each day, hopefully it might finish sooner because of this.

Probably today :)

Awesome - now the best web API for steemit will be back in sync! I can't wait to play with this :)

Wow! superior hardware for your node furion congratulations! you are running with OpenSUSE? or other distro?

another node! thanks for putting the effort and sacrifice to make the Steem Blockchain run smoother with alll the increasingly heavy traffic we're experiencing.

I think nodes are way underrated relative to the importance they serve to the network. Since nodes are the ones relaying all the blockchain traffic and no matter how much the witness signs, it all means nothing if there is no node to relay the data.

Appreciate the work you do for the benefit of everyone!

