even more blogdex co-citation
June 3, 2003
As I have mentioned before, I have been playing around with blogdex data and co-citation.
I have now begun to crawl the top 200 stories of the day (instead of just the top 50) based on Alex’s suggestion to see if it improves results (the daily updates should begin to reflect this starting tomorrow). The first instance of the resulting graph does not look to promising so I might tweak the layout generation script.
Also: I have solved a problem with neato not terminating by placing an upper bound on the number of iterations for the layout algorithm (by using -Gmaxiter=10000 on the command line).
update: I have tweaked the graph format so that the nodes are smaller. This improves structure because the layout algorithm does not have to deal with node overlap.
Entry Filed under: information retrieval. .
2 Comments Add your own
Leave a Comment
Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <pre> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>





1.
Henry | September 11, 2003 at 11:11 am
Can you change the permissions on the pyblogdex directory? I’m very curious to see what you did.
2.
Nathan Jacobs | September 11, 2003 at 11:52 am
done.
The HTML files had been generating a lot of traffic from google (before the access control) so I decided to encode them with gzip to try to reduce the load.
I stopped the automated updates of this several months ago. But I have made some changes to the code (I basically implemented SimRank:http://citeseer.nj.nec.com/539641.html). An older copy of the source code (along with a few other projects) is available at http://www.khakipants.org/log/projects/builds/.