Making whitepapers more accessible and searchable

Making whitepapers more accessible and searchable

Everyday, someone, somewhere is publishing a whitepaper dealing with any number of topics from technology, business, finance, government, healthcare; you name it. And just as likely, the few people who spend the time to read them are wishing it were easier to find and search the content they are interested in without having to plow through a lot of dense and quite often “dry” reading. Here on the Essentia AI team, we thought there might be an easier way for publishers, aggregators or anyone else who has to deal with whitepapers to be able to make an easily accessible archive that makes previewing, searching and downloading the right content fast and simple.

To demonstrate, we chose a publicly available source of US government policy whitepapers from the Congressional Research Service.

In it’s own words, the CRS provides “access to research products” to the US Congress for the “sole purpose of supporting Congress in its legislative, oversight, and representational duties.” With the thousands of policy papers available here, it is a rich source of information for anyone interested in the issues shaping today’s legislation.

For our test, we randomly downloaded 500 documents, over 6,000 pages, from the CRS website. To make this trove of papers searchable, all we had to do was upload the documents to Essentia AI. In a few minutes, they were all processed and at this point we could begin searching for information we were interested in.

One obvious thing on everyone’s mind today is “inflation” and what factors are driving it. A simple search results in over 344 content pages across the various documents that mention “inflation”.

To narrow down further, we added the search term “supply chain” to see how the current supply chain woes are contributing to inflation.

From this, the results were filtered down to only 21 content pages. After a quick review of the text, we tagged them for future reading.

We were able to do this for any number of hot button topics including “covid”, “gun control”, “infant formula”, etc. In addition, once we found informative and well written whitepapers, we could quickly lookup the author’s name(s) and then find other papers that they have written. The process of search, preview, tag and comment really helped us to build an archive organized around our interests and made it much easier to consume this information.

A public version of this project has been made available at the following address for anyone to play with.

CRS Public Project

We think publishers could greatly benefit by using Essentia AI to store and make available, through public projects, their library of whitepapers. Through the use of the tagging and commenting tools, it would be easy to curate and guide users to the right content quickly.

We invite publishers, researchers and any other reader of whitepapers or other public documents to sign up and try it for themselves.