The basic algorithm divides points by a power of the time since a story was submitted. Comments in threads are ranked the same way.
Other factors affecting rank include user flags, anti-abuse software, software which demotes overheated discussions, account or site weighting, and moderator action.
—-
Personally, I appreciate that the rankings are done at a site level, and there isn’t a bunch of tracking and manipulation to give me a personal “feed” to drive engagement. A lot of comments on this site complain about those practices on other sites. I don’t think it’s welcome here.
I don't read everything I have from start to finish. A lot of this is for future reference.
Since that StackExchange post, I'm now up to about 36.6K PDF files in 4.4K directories, with 14.5K symlinks so I can put files in multiple directories.
I also have a separate version controlled repo with notes a bunch of subjects. I'm planning to eventually merge my PDF hierarchy and the notes to have a unified system. It's going to have to be done in stages.
How many GB is your PDF collection? Have you considered sharing it more widely?
I know about Sci-Hub, Anna's Archive, etc., but I'm not so interested in a giant landfill containing all papers ever written. I'm much more interested in a curated collection of useful papers.
The root directory of the archive is 142 GB large. It's not only PDFs, but mostly PDFs. It includes many things that were never online and some things that were online at one point but are not online any longer.
For copyright reasons I can not share the entire thing as-is. I have plans to share most notes in there and bibliographic data for most directories. Doing so would be a major project in itself as this was never designed for that. I have some information I would prefer to keep private in there that's going to have to be filtered out, and I would prefer to clean some of it up to be in a more "presentable" state.
As for how useful you'd find it, I think that depends entirely on the overlap between my interests and yours.
> As for how useful you'd find it, I think that depends entirely on the overlap between my interests and yours.
If that specialized-bibs repo is any indication, there seems to be reasonable overlap.
> For copyright reasons I can not share the entire thing as-is.
Of course. But if you'd like to store a non-encrypted backup copy on my system, I would be happy to offer my data storage services free of charge.
Alternatively: I'm training an LLM and it's transformative fair use.
My email is in my profile.
> I have some information I would prefer to keep private in there that's going to have to be filtered out, and I would prefer to clean some of it up to be in a more "presentable" state.
Totally understandable. If you ever get it into an acceptable state, please shoot me an email and I'll be happy to help out logistically.
That’s an impressive and thoughtfully structured system, especially at that scale. The use of symlinks and a separate version-controlled notes repository makes a lot of sense for long-term archival.
I’m curious — when working with such a large collection, how do you typically rediscover material or connect related ideas across different parts of the hierarchy? Do you rely primarily on directory structure, full-text search, or your notes as the main index?
And as you move toward merging the PDFs and notes into a unified system, do you see the notes becoming the central navigation layer, or will the directory structure remain primary?
How are stories ranked?
The basic algorithm divides points by a power of the time since a story was submitted. Comments in threads are ranked the same way.
Other factors affecting rank include user flags, anti-abuse software, software which demotes overheated discussions, account or site weighting, and moderator action.
—-
Personally, I appreciate that the rankings are done at a site level, and there isn’t a bunch of tracking and manipulation to give me a personal “feed” to drive engagement. A lot of comments on this site complain about those practices on other sites. I don’t think it’s welcome here.
(PS please feel free to disagree and post other ones I don’t know about below!)
I read a lot of backend and architecture articles and often struggle to revisit them later.
I’m curious how others handle this.
Do you: Use Notion or Obsidian? Bookmark everything? Keep markdown notes? Rely on memory?
What has worked long-term for you?
What hasn’t?
https://academia.stackexchange.com/a/173314/31143
https://www.reddit.com/r/datacurator/comments/p75xlu/how_i_o...
I don't read everything I have from start to finish. A lot of this is for future reference.
Since that StackExchange post, I'm now up to about 36.6K PDF files in 4.4K directories, with 14.5K symlinks so I can put files in multiple directories.
I also have a separate version controlled repo with notes a bunch of subjects. I'm planning to eventually merge my PDF hierarchy and the notes to have a unified system. It's going to have to be done in stages.
I know about Sci-Hub, Anna's Archive, etc., but I'm not so interested in a giant landfill containing all papers ever written. I'm much more interested in a curated collection of useful papers.
For copyright reasons I can not share the entire thing as-is. I have plans to share most notes in there and bibliographic data for most directories. Doing so would be a major project in itself as this was never designed for that. I have some information I would prefer to keep private in there that's going to have to be filtered out, and I would prefer to clean some of it up to be in a more "presentable" state.
As for how useful you'd find it, I think that depends entirely on the overlap between my interests and yours.
You might be interested in this project of mine: https://github.com/btrettel/specialized-bibs
If that specialized-bibs repo is any indication, there seems to be reasonable overlap.
> For copyright reasons I can not share the entire thing as-is.
Of course. But if you'd like to store a non-encrypted backup copy on my system, I would be happy to offer my data storage services free of charge.
Alternatively: I'm training an LLM and it's transformative fair use.
My email is in my profile.
> I have some information I would prefer to keep private in there that's going to have to be filtered out, and I would prefer to clean some of it up to be in a more "presentable" state.
Totally understandable. If you ever get it into an acceptable state, please shoot me an email and I'll be happy to help out logistically.
I’m curious — when working with such a large collection, how do you typically rediscover material or connect related ideas across different parts of the hierarchy? Do you rely primarily on directory structure, full-text search, or your notes as the main index?
And as you move toward merging the PDFs and notes into a unified system, do you see the notes becoming the central navigation layer, or will the directory structure remain primary?
I think dang mentioned one time having random stories bubble up to give them visibility and encourage variety and people hated it.