A Document to Reference

tarazkp (85)in LeoFinance • 2 years ago

As negative as I am about the impacts that some of the generative AI is going to have on society as it competes against and beats the skillsets of the majority of people, across nearly every skill that we are capable of, there is an inevitability to it. And, it has been heading that way for a very long time already, even though the majority of people don't really pay attention to the way information is handled, and how we as humans interact with it.

For instance, for those of us who work in typical company, how are you handling your documentation? Typically, it looks something like a Sharepoint, Box or Drive repository, using a naming convention and file structures to ensure that a particular document is stored in the right place. Then, there are likely duplicates of that document, for instance, a spreadsheet that gets stored in several locations depending on which stakeholders need it and then, it has also been distributed by email, getting stored locally. On top of this there are edits, so new versions are created, making it near impossible to know what is the latest version. And then on top of this, there are access models for who is able to see what and when. Then there are other repositories, like Teams file storage to contend with, further splitting information.

But, what is interesting is, all of these are stored in a hierarchical structure like a physical filing cabinet, which is logical based on what we have done previously, but doesn't scale when the cost of creating documents has gone from high, to free. The amount of documents have exploded in the digital era, even if we aren't printing them into a physical form.

And, it is because of this explosion of documentation that we have developed increasingly clever ways to remove the need to dig through folders. Sure, we probably still have them on our desktops, but considering that the entire internet is essentially stored in folders, how do you search for the football scores? Instead of diving through the news folders using a convention unique to that site, then moving to dog through folders of another convention to see the weekly weather on another site, we have put interfaces over the top so that we get the information we need, in a consumable format, that doesn't require us knowing the source from where it is arriving. We don't know where it is stored, nor do we care, we just want to know if it will rain or not, and if our team won.

What the Large Language Models (LLMs) are doing, is essentially scraping through all of those folders to pull bits of information out that it deems relevant and then sticking it together in a comprehensible way, so we can understand it. It is pretty clever, but this isn't actually the solution to the problem of information integrity, as the source matters.

Firstly, a far better way to store documents is in a timeline, which is what a blockchain does with bits of information. The reason this is so valuable is that it provides time accuracy for the content that can be referenced when needed. But, by itself, this is not very useful for an organization, because knowing when something was created to find it is harder than a filing system. So, to be useful, it requires cross-referencing with contextual meaning. For instance, if it is a contract that was written for Customer X. However, there might be multiple versions of that document, as it has moved through drafts and revisions before the final. If every revision is also a "new" document on the time line, but referencing the one prior, it creates a single lifecycle line for the document, meaning there is only one copy of the document, multiple versions, but like a blockchain transaction, it is always trackable.

And then, for this to be useful, we need to be able to visualize it as some kind of final document, or visualize a version of that document in time, like a previous draft. What this is doing is essentially creating a snapshot view of the document in the same way a block explorer can show a particular block.

Corporations need blockchains just to manage their information flows.

But, corporations are also going to start changing what they mean by "document" because what the LLMs are able to do is take small slices of a larger document and put it together to create something else. For instance, if it was asked to create a powerpoint presentation on the financial numbers for the quarter, plus whatever marketing has added into the mix, it would be able to go through SalesForce, scrape reports in Sharepoint and take highlights from marketing materials to create a composite document. Once it is tweaked a bit and learns what it needs to do more precisely, it can repeat that report with updated information automatically, rather than having paid person manually locate, search through the documents and cut and paste into the powerpoint.

If your job is creating powerpoints, start retraining.

The AI tools for finding, extracting and creating views of data are going to get better and better, but that differentiator is going to be the integrity of the information it is using. Right now, the LLMs are largely seen to be used scraping the internet, but where they are likely going to be the most valuable is in closed corporate environments, making sense of all the information that is coming into the various repositories around the company and across organizations. Because the AI doesn't care about the location, a timeline with context is the most sensible way for document and information creation. So, an enduser will create something like a contract and save it, without knowing where it is saved, just that it is saved. From there, the next person in the chain doesn't need to know where it is either, they just need to know what they are looking for. And in between, the AI is creating handshakes.

Because information is timelined and contextualized across multiple reference points like who created it, as well as being able to be compared with other similar content, the quality of information goes up, and gets handed to the right person at the right time they need it.

Ever struggled to find a document at work?

Now, for a corporation, immutability isn't something they need (or want), but they do want traceability for as long as they are legally obligated to have a piece of information. So, a pseudo-blockchain suits their purpose for the trackability of the references, not the documents themselves. For instance, it isn't required for the chain to hold all the image data, or videos, it just needs to hold the references, whilst other storage holds the content. This gives them the ability to track all documentation, as well as filter granularly based on an infinite number of search filters to find just what they want, or see it in the way that suits them.

For instance, https://hiveisbeautiful.com/ is a great site that visualizes Hive transactions that look like this.

Embedded into a single transaction are multiple reference points, so depending on what is required, only slices are used. In that bunch there, you can see some upvotes, some claims, some Splinterlands, some Hive-Engine etcetera. Each relevant interface will use a bit of that information for its usecase, whether it is a transfer or a submission of a team into a battle. It is just a document with random bits of information on it.

This is the future of business documentation.

A future where information floats about in a type of data soup with tags attached to it, and AI filter that information based on its programmed needs. Because everything is on a timeline, it will be able to increase the relevancy, know which is first and last, and ensure that the most correct information goes into the view given to the user. And, relevancy is far higher when there is some control over what kinds of information is in the system already. Since it is all coming from the same corporation, information trust is higher than if it is coming from random internet sources where there may be no known track record of who created it.

For years, we have already been moving in this direction, which is why the people who have grown up on mobile phones only don't have good folder structure methods and can struggle in corporate environments using them - which is most. And, there is a massive amount of human error in systems that rely on SOPs and naming conventions to ensure document integrity, because people just aren't consistent enough. Automation is the only way, and because it is also the one that will bring the most profits, it is the way it will go.

As simple as ledger logic is, it is going to fundamentally change the way businesses handle their information, because it aligns so well with the automation processes they want to employ. It is designed to be logical and have integrity, which is what is missing on the internet of information at the moment. However, given some time, the internet will start to reorder "itself" into a more logical structure that the AIs are able to better manage and rely on. This means that it will start to weed out low quality information in a process of "no confidence" voting mechanisms, where content that doesn't have strong enough references, are omitted from search requests.

This is a form of web of trust.

@blocktrades has been dabbling with webs of trust for a while now and at scale, it is going to need AI support to really make it useful, because the amount of relationships between individual pieces of information is very high, and then being able to consolidate it into something useful in a timely fashion takes a lot of processing. Humans can't do it, which is why we use heuristics to judge our world in order to think fast, and the AIs will do the same except at a much greater amount of information input, and subsequent usecases and view outputs we will demand of it. But, for an individual company, it doesn't take that much effort, because there is already narrow context and known rules that can be applied to the usecase, laws, industry etc.

Document management isn't something most people think about as they search the internet, or even when they have lost that report they need for their meeting in the morning. It just isn't sexy. But, all the information we consume digitally, is stored in a document of some kind somewhere, whether it be a news story, or a dataset. But, it is fundamental to the way we live our lives and the industry is evolving to run parallel to blockchains, even though they are still far behind and are yet to really understand what they are looking to do. And even as they chase the tech, they still don't see the application for blockchains.

Taraz
[ Gen1: Hive ]

Posted Using LeoFinance Alpha

#business #technology #future #supplychain #investing #blockchain #dave #leofinance

2 years ago in LeoFinance by tarazkp (85)

$48.20

Sort:

Trending

[-]

nthtv (51) 2 years ago

Easy storage and retrieval of information or documents, I agree, is very ineffective when managed by a file / folder type of means. But, before shooting headlong into blockchain or AI, most use cases can be effectievly handled with less exotic tooling, like a document management system (database in front of the information / documents) or a NoSQL front.

Blockchains strengths lie in being trustless. For information or document retrieval, it may work at small volumes, but not when it reaches any significant level, even for a small business - compared to a Doc management system or key-value store (NoSQL). And it boils down to being ACID compliant within milliseconds and plus quick recovery, which blockchains are not. A whole host of problems arise without data, or documents, being ACID.

AI... better for creation in areas like marketing, while not so good for discrete, repeatable, auditable, provable and legally compliant proof of record.

So, what makes documents, information easy to find or discover? Part of that answer is it has to be fast. So far, I haven't seen an indexing solution that out performs a database or key-value store. So why not use them? And AI... no. I'm not bashing AI, in fact it has a ton of uses, but for discrete document or info storing and retrieval, especially for financial applications, there's long proven, effective ways that serves us well.

So, why can't people find "stuff"? They simply don't use a tool and insist they should just be able to throw information any dam where they please. Then, some magical, all access bot, should scoop it up and add it to a neural network, train it at $1,000 an hour so you can hire a "prompt artist" to concoct magical phrases to extract a semblance of the data, dripping in a trendy designer layout... uh, yea.. right.

This is the part where you chuff, bow out your chest, cross your arms and screech, "OK, smarty pants... how would YOU do it?" TO which one simply replies "Settle down... Francis..." and show examples:

Any piece of info, receipt, user manual, article I store into document management. It could be an image, hyperlink, pdf, audio, video, etc and tag it by group, type and description. One place to store and look for any and all types of info, like this showing some of my crypto notes

But that's not the trick - the trick is... you have to be able to handle EVERYTHING, which is not a problem as I also store calendar appointments, stock tickers, receipts, legal docs, tax filings, maintenance schedules, programming notes, health info, home and garden, employment records, retirement rules, computer specs, ...

Here's just a few of those categories that store computer info (helps when I build rigs out)

Just thought to shine light on some old skool software that's hellish performant and strong like bull. Me.. luddite? Nah, I'm adding AI interface to it currently... after all, it's go to be everything...

$0.31

2 votes

[-]

tarazkp (85) 2 years ago

most use cases can be effectively handled with less exotic tooling, like a document management system (database in front of the information / documents) or a NoSQL front.

There are plenty of DMSes, none of them do the job well enough at this point.

Blockchains strengths lie in being trustless. For information or document retrieval, it may work at small volumes, but not when it reaches any significant level, even for a small business - compared to a Doc management system or key-value store (NoSQL).

This is why I mentioned "pseudo" blockchain. It doesn't have to be a blockchain, just resemble components of it. No company wants their data immutable.

So, why can't people find "stuff"? They simply don't use a tool and insist they should just be able to throw information any dam where they please.

I don't think you understand the complexity of the problem at scale. We are talking about globally distributed organizations, handling hundreds of millions of documents yearly, localized legal models, external collaboration, multiple repositories, complex user access models and then, automation based on the changing information within the documentation. It isn't simple storage and retrieval.

However, what does work is using a database or key value store similar to what you are using (based on your screenshots) as the "blockchain" of references. Then it doesn't matter where the information is actually stored, as they are all linked to the timeline backbone, including the versions, which at an interface level can appear as a single document.

This allows people to throw information wherever they want and as long as it is appropriately labelled at that point (done automatically or user-defined), it will be allowed to join into the stream.

One place to store and look for any and all types of info, like this showing some of my crypto notes

"One place" just doesn't work at enterprise level for so many reasons. One of them is of course that human data hygiene is terrible. One of the others is just the practicality across quite different tech stack needs based on department usecases.

$0.00

2 votes

[-]

nthtv (51) 2 years ago (edited)

Enjoying the solution ideas. And if you don't mind me with additional foods for thought...

References are fine within an entity, as there's control and knowledge over it's availability and existence. But linking in any other case, I'm gunna balk.

It's all good and fine when all linked systems are up and behaving, but one broken or unavailable reference can degrade validity quick. If every document or info isn't available 100% perfectly all the time 24x7x365, else there's that inevitable finger pointing shootout scene, when there's data that needs to be produced and it's either incomplete or late. Other people's / department's / company's systems ALWAYS go down when you need them most - that I learned from trading... nah, it's always been true.

However, federate it into "one place" and you have reproducable results and control over the data, even if transient. 100 million documents.. decent sized, doable.

I prolly sound jaded, but having been called into meetings and seeing Bob blaming remote Jane blaming remote Sally, ad infinitum, gets old fast. I've personally witnessed it in airlines, electrical grid systems, all the way down to daycare, schooling, local govts...

None of the DMs do the job well enough? At 100 million documents I wouldn't suggest a third party solution - coder up.. way cheaper, faster and functionality out the wazoo.

$0.00

[-]

tarazkp (85) 2 years ago

However, federate it into "one place" and you have reproducable results and control over the data, even if transient. 100 million documents.. decent sized, doable.

One place just doesn't work for a global corporation, as there are localized laws that prevent it.

It's all good and fine when all linked systems are up and behaving, but one broken or unavailable reference can degrade validity quick.

I prolly sound jaded, but having been called into meetings and seeing Bob blaming remote Jane blaming remote Sally, ad infinitum, gets old fast.

I am sure it does. I have gone into these companies too, and trained them on solutions that do work at scale :)

$0.00