#4 - Indexer: posts.py
What I am learning about Hivemind's design
posts.py handles "critical/core post operations and data" in Hivemind. When blocks are scanned, the operations that are related to posts/comments are parsed by this module and the appropriate actions are taken within the database to maintain state.
This excludes data such as body content, title, raw JSON data and votes. These are handled by
cached_posts.py, which I will write about next, after this one.
Links to Python scripts referred to in this post
These scripts are from the
master branch on Hivemind's GitHub repository.
blocks.py (the module responsible for processing raw blocks), methods in
posts.py are triggered by a couple of specific conditions. Here's a snapshot of the code block in
blocks.py that handles this:
# post ops elif op_type == 'comment_operation': Posts.comment_op(op, date) elif op_type == 'delete_comment_operation': Posts.delete_op(op) elif op_type == 'vote_operation': if not is_initial_sync: CachedPost.vote(op['author'], op['permlink'], None, op['voter'])
The first condition identifies a comment operation and this condition makes calls to the
comment_op() method in our
posts.py module, which then decides if it's a new post or an edit, or an undelete.
Here's a snapshot of what the code looks like:
def comment_op(cls, op, block_date): """Register new/edited/undeleted posts; insert into feed cache.""" pid = cls.get_id(op['author'], op['permlink']) if not pid: # post does not exist, go ahead and process it. cls.insert(op, block_date) elif not cls.is_pid_deleted(pid): # post exists, not deleted, thus an edit. ignore. cls.update(op, block_date, pid) else: # post exists but was deleted. time to reinstate. cls.undelete(op, block_date, pid)
New posts, edits or updates, and undelete operations are handled by the
comment_op() method. The following are the scenarios included.
If the above method ascertains that it's a new post operation, the
insert() method is called. It makes an entry into the database for the post.
It also checks if the post has a
parent_author and updates the parent's child count, in cached posts. Data is also inserted into
hive_feed_cache, if the post is not a comment.
hive_feed_cache is another cache that maintains the state of feeds (blogs and reblogs), offering efficient queries, in the same way that
cached_posts makes efficient post querying possible. I will write about that in a later post.
When a post is to be updated, the
update() method is called and data is passed to
cached_posts.py, which changes the post's data from the old to the new, to reflect the new state.
When a post undelete op is detected, it triggers an
undelete() method and the following happens within it:
- it sets the
is_deleteflag to 0
- it rebuilds the post
- undeletes from
- inserts the post into the
Going back to that code block I shared above from
blocks.py, the second condition triggers
Posts.delete_op(), where a comment (be it top level post or actual comment) is marked as deleted. It is also removed from
The last condition is not really connected to this module in particular, but I thought I would address it here, in brief. This records
vote operations for a comment and it triggers the
vote() method in
What have I learned?
posts.py module will be a good place to plug in code that handles
ad_posts for the Native Ads system. Options include a new DB table that holds ad core data (post IDs, moderation status, scheduling, etc) and then leveraging the
cached_posts to retrieve full details about an ad's data/content (from JSON content, for example).
feed_cache will be irrelevant for Native Ads, because the posts will not be displayed on UIs.
Posts in this series
I am currently working on a new feature called Native Ads, that may be added to Hivemind Communities in a future update.
For an overview of the Native Ads feature and how it will work, read this doc.
If you would like to take a look at the code, check out my fork of Hivemind on GitHub.