Blog

  • What did you do today?

    Today…

    sitting sleeping showering eating reading thinking imagining dreaming programming eating sniffling breathing playing sighing reading listening watching talking wondering hugging tickling eating plotting planning checking worrying hoping praying

    being

  • A Just Cause

    Fyodor Dostoyevskiy

    I’ve got to read more Dostoevsky! I think you all might enjoy a few of his quotes. I’ve highlighted a few words here and there to make it a better simulation of a motivational poster!

    • “A just cause is not ruined by a few mistakes”
    • “The soul is healed by being with children”€
    • “What is hell? I maintain that it is the suffering of being unable to love
    • “It is not the brains that matter most, but that which guides them—the character, the heart, generous qualities, progressive ideas.”
    • “If you want to be respected by others the great thing is to respect yourself. Only by that, only by self-respect will you compel others to respect you”€
    • “Much unhappiness has come into the world because of bewilderment and things left unsaid
    • “Love the animals: God has given them the rudiments of thought and joy untroubled
    • “Innovators and men of genius have almost always been regarded as fools at the beginning (and very often at the end) of their careers”€
    • “Men reject their prophets and slay them, but they love their martyrs and honor those whom they have slain”
    • “Man has such a predilection for systems and abstract deductions that he is ready to distort the truth intentionally, he is ready to deny the evidence of his senses only to justify his logic
    • “There are… things which a man is afraid to tell even to himself, and every decent man has a number of such things stored away in his mind”€
    • “If there is no immortality, there is no virtue”
    • “Deprived of meaningful work, men and women lose their reason for existence; they go stark, raving mad
  • Automated Merger of GEDCOM Files

    Have you ever had the misfortune of having to merge two GEDCOM files while doing family history research? I hope not. It’€™s dehumanizing. Dehumanizing? Yes, because doing a job that should be done 99% by a machine is most definitely de-human-izing.

    Here’€™s a typical example:
    Billy Bob Jones and Suzy Lee Jones are a brother and sister team working together on their family’s genealogy. The two of them live two or three time zones away from each other, and so they each maintain their own files and coordinate their research by email.

    One day, Billy makes an amazing discovery: he finds the birth date of their great-great-great-great-great grandfather Olaf. What’€™s more, he discovers that Olaf’€™s death date was off by 5 years, so he updates that. Billy emails Suzy to tell her the exciting news, and Suzy requests that he send an updated version of the GEDCOM file so she can benefit from his new research. So Billy sends the requested file.

    Suzy imports the file, which only contains Olaf and his ancestors, a total of 200 people (impressive, it’€™s true). Then she begins to merge all of the now-duplicate individuals, sources, events, …. Five hours later, she finishes, and now wonders if the five hours of tedious merging operations was a fair price to pay for the updated information.

    Here’€™s what should happen:
    Upon receiving the GEDCOM file from Billy, Suzy imports it into her family history software, which informs her that one individual has been updated, one fact being modified, another added. She tells the program that this is A-OK and she procedes to make further Amazing Discoveries.

    Here’€™s what should __really__ happen:
    Upon updating the Olaf information, Billy tells his genealogy program to notify Suzy of all updates made since they last synchronized their records. The next time Suzy opens her genealogy program, it notifies her of the new information provided by Billy, and she approves the merger.


    People have been thinking about this problem for years. The sad reality, however, is that such an advanced merge capability does not seem to be available in any current consumer software (as far as I can tell). Well, why not? Part of the reason is surely that reliably determining the differences between family trees is a complex problem. But what if both versions of the tree are guaranteed to have a common individual? In other words, what if we know that Billy and Suzy’s parents are going to exist in both of their files. What if, in fact, they exist in both files with the same record ID’€™s? What if the family history software allows the two to synchronize their data by specifying the ID of a common root individual?

    If we can depend on constant record ID’€™s, our job is much simpler. So what if Billy imports his current data into a new system, after which Suzy synchronizes with his system, essentially copying over all of his data, with matching record ID’€™s and a set common individual. Then, when one of the two makes a change, resynchronizing involves no more than comparing the corresponding ancestors of the common individual, notifying the user of any differences.

    That’€™s where my project comes in. “geddiff”€™ is designed to be the program that does that comparison. It’€™s straightforward (is it not?) It limits its scope so the algorithm stays relatively simple. It will easily integrate with revision control systems such as Subversion. It will be licensed under the LGPL license, allowing anybody to link to it or call it externally, while ensuring that the source code itself remains open. It will be based on libgedcom, avoiding unnecessary duplication of effort.

    Is this relevant? Is it doable? Does anybody want to help? Let me know!

    References:
    Geddiff project at Google Code: http://code.google.com/p/geddiff/
    Beyond Project, discussing many of the same ideas: http://www.beyondproject.org/
    BeyondGen, related discussion group: http://groups.google.com/group/beyondgen?lnk=li