Exilian

Art, Writing, and Learning: The Clerisy Quarter => History, Science, and Interesting Information - The Great Library => Caucasian Prosopography Project => Topic started by: Jubal on October 23, 2017, 10:09:47 PM

Title: Updates & New Devlog
Post by: Jubal on October 23, 2017, 10:09:47 PM
So, as some of you may know, I'm now a PhD student at the University of Vienna, which rather changes the game for the CPP. I'm planning to leave the old testing version up on Exilian, but the newer versions of it are ultimately likely to be set up in a rather better protected environment by the university, and are using rather different technologies - Neo4J (https://neo4j.com/)r ather than SQL for the database-end, though still PHP at the front.

Nonetheless, I want to put a devlog here still to record what I'm doing with the project - I want to keep a public-facing record of where things are going, partly to keep myself on track and partly because I think having some chance for folk to publicly engage with my work is good from an academic transparency perspective. As such, here's the new devlog thread! I'm happy to post screencaps of what I'm doing or provide people with test-drive versions of the software, but I'm not going to be running a public version in the near future thanks to the cost & difficulty of setting up a server running neo4j etc.



Today's work:
Title: Re: Updates & New Devlog
Post by: Jubal on October 24, 2017, 10:18:10 PM
Today:


Tomorrow's job should be adding the "event" node and some basic features for editing it, though I need to also get on and do the "add new person" feature too.
Title: Re: Updates & New Devlog
Post by: Glaurung on October 25, 2017, 01:06:23 AM
Good to hear this is under way. If there's a way for me to do user testing, I'd be happy to put some time into it.
Title: Re: Updates & New Devlog
Post by: Jubal on October 25, 2017, 11:46:30 PM
Thankyou! It'll be a while before it's ready for that, I think - my current plan is to add event nodes, finish the biography nodes, add the location nodes, and then input all the data from the English version of the Chronicle of Giorgia Lasha and his Time (selected because it's the shortest chronicle covering the period) - at which point I then have a tiny dataset I can manipulate and test out some more interesting functionality with, which I guess will be the stage where extra testing might help.
Title: Re: Updates & New Devlog
Post by: Jubal on November 02, 2017, 04:57:34 PM
Finally, after some hiccups, the bibliography "add new books" system is nearly sorted (that is, the system that creates links between people's pages and the books that are referenced in their mini-bibliography). I've been wrestling with the last major hiccup for the past hour, which turned out to be a feature of Neo4j not having implicit integer to string conversion whereas PHP does, which managed to confuse my code a little. I currently don't have a feature for removing books, but that's a fairly low concern at this point since I'm unlikely to need it often. The whole system will also be needed for events: I was considering trying to integrate events better into people's pages, but I think it may be simpler for events to stay pretty firmly as their own separate thing/node type with their own pages, and just have tags/links between people and the events at which they were present.
Title: Re: Updates & New Devlog
Post by: Glaurung on November 02, 2017, 06:23:01 PM
The whole system will also be needed for events: I was considering trying to integrate events better into people's pages, but I think it may be simpler for events to stay pretty firmly as their own separate thing/node type with their own pages, and just have tags/links between people and the events at which they were present.
My feeling is that events should almost certainly be a separate data entity (node?) from people: one event can have multiple people involved (or possibly none at all); each person is likely to participate in multiple events. In terms of UI, a person's page might have a header section with biographical details, and then a list of events, while an event page could include a list of the people present.
Title: Re: Updates & New Devlog
Post by: Jubal on November 02, 2017, 07:01:05 PM
Yes - they've always been going to be a separate node type, with a separate edit page (with links to people and location objects). The question was more how well their UI could link into the person pages, and how that interacted with the reference links, which are one of the more headache-inducing parts of the whole system; one can't link from a link, only from a node, which sometimes makes referencing "why I've put this link here" tricky, and also loading in all the references from an event alongside the ones for the person and then sorting them out into some sort of coherent system and re-ordering them to get a proper ref & bibliography section is a headache I've decided is best avoided.
Title: Re: Updates & New Devlog
Post by: Glaurung on November 02, 2017, 11:43:58 PM
Yone can't link from a link, only from a node, which sometimes makes referencing "why I've put this link here" tricky
Perhaps make the links into nodes of their own (obviously of a new "relationship" or "link" type) so that you can attach relevant data to them, and then connect these "link" nodes to the relevant "ordinary" nodes? I don't know what your database system allows or encourages; I'm used to traditional relational (a.k.a. SQL) databases. But if you're wanting to record, not just that person A was at event B, but that we know this because of reference C, then it seem to me that the "link" between A and B has acquired node-like characteristics.
Title: Re: Updates & New Devlog
Post by: Jubal on November 03, 2017, 09:14:42 AM
I could do that, I just feel like it might make the whole system less intuitive and more cumbersome to work with from my perspective. I can attach properties to links, so I suspect it might be easier (and perhaps more transparent for the user) if I add some of this information as prose notes in the properties. The only technical thing I can think of where the concept of link-nodes might help would be that it would make it easier to switch certain whole sources on and off as to their impact on the database - but in general I'm not sure that's a useful enough feature for the workload at this stage, even if it might be generally fun to include.
Title: Re: Updates & New Devlog
Post by: Glaurung on November 03, 2017, 10:04:12 AM
I could do that, I just feel like it might make the whole system less intuitive and more cumbersome to work with from my perspective. I can attach properties to links, so I suspect it might be easier (and perhaps more transparent for the user) if I add some of this information as prose notes in the properties.
Ah, OK. I assumed from what you said earlier that a link was purely "A relates to B" and couldn't hold any more information, but link properties sound useful.
Title: Re: Updates & New Devlog
Post by: Jubal on November 04, 2017, 11:23:48 PM
Yes, link properties can be set and used to look things up - which I guess is more how things might work in an SQL database in the first place. They'll probably need numerical keys to fit onto references somehow, which will be a headache to say the least, but probably a smaller one than having to have linking nodes for every link I want to make...

I think the next thing to do is press on and get event nodes set up, anyway. Once I've got some data to play with it might help.
Title: Re: Updates & New Devlog
Post by: Jubal on November 06, 2017, 11:21:12 AM
Just got the "add person" feature working finally - it now reliably finds the highest unused id from the database, and you can plug in some book links, a name, and a mini-biography. Identity links etc still need to be added afterwards, which is a bit laborious - I might add in a list with some some checkboxes for common identities to make it easier to add several linkages in one go.
Title: Re: Updates & New Devlog
Post by: Glaurung on November 06, 2017, 07:07:02 PM
it now reliably finds the highest unused id from the database
Highest used ID (then adds 1), or lowest unused ID, perhaps?

Pedantry aside, well done - it's obviously an important step forwards.
Title: Re: Updates & New Devlog
Post by: Jubal on November 06, 2017, 09:06:35 PM
Aaaand now some bad news, which is that the id system has been storing all the numbers as strings, and I need to rewrite a LOT of code - basically all my cypher queries in the system so far - to treat the IDs as numbers or they won't go into order properly in the database query :( (I could alternatively just sort them in the PHP, but I suspect making them all numbers makes more sense.)
Title: Re: Updates & New Devlog
Post by: Glaurung on November 06, 2017, 11:40:32 PM
Oh dear, that sounds painful - string / number conversions are very rarely fun, or good for system performance :(
Title: Re: Updates & New Devlog
Post by: Jubal on November 07, 2017, 10:01:17 AM
PHP converts between numbers and strings implicitly - cypher/neo4j doesn't. This wasn't a problem until I needed to sort numbers above 10 and found my sorts were going, "1, 10, 11, 2, 3, 4" because text-sort order. I think I've basically fixed it now anyway. This is why I'm only adding records to the database very slowly!
Title: Re: Updates & New Devlog
Post by: Jubal on November 16, 2017, 12:17:16 AM
Update: I now have the basic event pages & nodes set up with editing facilities. The most important aspect of these - the dating system - is still not done, but you can view events and tag people in them. I may leave the date system for a while yet because when I get to doing it I want to work out a good visualisation & editing interface and how the program will "streamline" the data from the rather disparate graph.
Title: Re: Updates & New Devlog
Post by: Jubal on December 06, 2017, 08:36:56 PM
Been making horribly slow progress lately :( I did finally make a small nudge forward today, by getting the "assumption" switch added to identity links. This signals whether an identity is assumed or attested. For example, if someone has an identifiably Georgian name/ancestry and is fighting for the Georgians and has land-holdings in Georgia, the chances are that such a person is an Orthodox Christian. Chroniclers wouldn't have noted such identities down, because they were assumed at the time - and as such we have to make some educated assumptions about them now to paint an accurate picture of the Georgian court. If I've actually got a definite reference for something, it's instead put down as attested. This, once I've amended the search functions to use it, will allow users an easy way to filter out my assumptions from the data-set if they want to.
Title: Re: Updates & New Devlog
Post by: Jubal on January 15, 2018, 07:33:18 PM
The "place" nodes now have their pages viewable and editable. The next step for these is to add functionality for defining coordinates (for point-places) or polygons, as well as place nesting, whereby I'll be able to link places up within a tree such that place X should be treated as within place Y (and so on). Whilst at some point in the distant future I may want to do fancier things regarding the relative positions of places, for most database use I think the actual positional location will mainly be used for deploying it onto map readouts, via "event" nodes (which will also have their own attachments to places).

One of the major UI things I need to do for the database in the near future is predictive searching and other things to make the data inputting work easier. Much of the current set relies on creating links between nodes with different id numbers, whereas for the sake of speed I will probably need ways to put links in that rely on more memorable features such as event names.
Title: Re: Updates & New Devlog
Post by: Jubal on January 16, 2018, 10:35:33 PM
Coordinate boxes added, including it rejecting anything that isn't a number, and a bit more of the editing interface for "nesting" places and connecting places to events (which I'll hopefully have finished by the end of tomorrow evening).

I'm wondering if I should make the coordinate boxes also reject numbers that are clearly outside the Caucasus region, as a check against typos...
Title: Re: Updates & New Devlog
Post by: Glaurung on January 17, 2018, 12:12:26 AM
I'm wondering if I should make the coordinate boxes also reject numbers that are clearly outside the Caucasus region, as a check against typos...
Probably a good idea - maybe pop up an "Are you sure?" message rather than an outright rejection, just in case there are genuine situations where people want to input such data.

With an eye to future use by people studying other areas, you could make it more flexible - if you can have some sort of system set-up data somewhere, design it so that the people setting up the prosopography system input the limits themselves at the start of their project. Then the system checks input data against these numbers, rather than any limits embedded in the code. You could have similar limits for dates too, I guess.
Title: Re: Updates & New Devlog
Post by: Jubal on January 18, 2018, 02:01:49 PM
I think for the moment I'm focussing on "what am I going to need this to do" - tweaks like that to make something more general-purpose deployable shouldn't be too hard to put in at a later stage if needed, given the structure of the thing. The number of data types really isn't very big, so going in and fiddling with a parameter manually in the code if I need to is perfectly possible.

Anyway, you can now add a location to an event, and they're restricted to one location each. I also did some UI improvements to make buttons and status messages say what they do better (there's been enough copypasting that, for example, the "add item" pages all had "edit" as their bottom button, which has now been fixed.)

I don't currently have any duplication checks for nodes etc, which could be a problem at later stages, but I suspect that duplicates may be best dealt with post-hoc by means of a few specific duplication-search features, rather than trying to pre-emptively set up "smart" systems to warn of them?
Title: Re: Updates & New Devlog
Post by: Jubal on June 17, 2018, 05:52:03 PM
Argh, haven't updated this for ages. I've now done my first pass over the info from The Chronicle of Giorgi Lasha and The Life of Tamar, and I'm now doing the section on Tamar in the History and Eulogy of Monarchs. This third source is by far the most detailed of the three parts of the KT that document Tamar's reign, and it's definitely presenting some new challenges - unlike other similar databases, I'm "pre-analysing" most of the information, and arguing a case for one or another interpretation where things conflict (whilst also, where possible, allowing users to tweak and prefer a different interpretation), which is necessary to allow the more powerful analytical functions I want the database to ultimately have but is definitely a big block of the work.
Title: Re: Updates & New Devlog
Post by: Jubal on March 20, 2019, 12:11:11 AM
I literally finished the History and Eulogy today, and it still needs some cleaning. It's taken me far longer than I feel it should've done and I'm sure I've not done it as consistently as I'd like, I'm going to need to do some retroactive application of some rules to make things a touch more consistent.

Next stage though is a bit of data cleanup then work in the analytical tools - I want to get the front end useable before I dig into the reign of Giorgi III (the reigns of Tamar plus her father Giorgi is my minimum hoped for scope for the project).