In this article we will connect searching, as performed by classical search engines, web scraping and large language models (LLMs) to creatively answer a difficult question. The idea is simple: we start from a query, in our case
novel blockchain applications
. Our aim is to come up with a list of what has been published on the web about this topic in articles, blog posts and other pages. We start by asking this query to DuckDuckGo through the Python package duckduckgo_search
and get 10 answers. For each of them, we download the content of the page (using an LLM as webscraper, as described below), collect all the contents and ask the LLM to look for novel ideas. Critically, for each idea we also request a good search query. At this point we iterate, by scraping the pages, collecting them together, and prompt the LLM for a summary (or potentially iterating a few mode times, but here we limit ourselves to one single iteration).
A few utility classes are needed. The first one is Chat
to implement a simple chat without memory. The LLM is provided by https://www.openai.com through their openai
Python package. The code is quite basic, with small tools to count the number of tokens and an easier parsing of JSON responses.
class Chat:
"Simple wrap around OpenAI's functionalities for no-memory chat"
def __init__(self, model="gpt-4-turbo"):
from openai import OpenAI
self.client = OpenAI()
self.model = model
def count(self, message):
import tiktoken
encoding = tiktoken.encoding_for_model(self.model)
num_tokens = len(encoding.encode(message))
return num_tokens
def __call__(self, user_message):
self.chat_completion = self.client.chat.completions.create(
messages=[
{
"role": "user",
"content": user_message,
}
],
model=self.model,
)
assert len(self.chat_completion.choices) == 1
retval = self.chat_completion.choices[0].message.content
preamble = '```json'
epilogue = '```'
if retval.startswith(preamble) and retval.endswith(epilogue):
import json
retval = json.loads(retval[len(preamble):-len(epilogue)])
return retval
The function extract_content
is used for web scraping and uses both BeautifulSoup
and the LLM: we download the page, extract the text (which will probably contain menus, advertisements, links to other sections, possibly comments from users and other material to which we are not interested) and ask the LLM to extract its main content. In general results are quite ok, especially considering how simple this function is.
def extract_content(url, title, body, chat):
import requests
from bs4 import BeautifulSoup
response = requests.get(url)
soup = BeautifulSoup(response.content)
content = soup.get_text()
user_message = f"""
You are a webscraper; your goal is to extract the main content of the webpage \
reported below after <<<>>>. The title of the page is "{title}"; a summary of \
body is {body}.
You are not given the full HTML page but just the extracted text. This text \
may contain non-essential elements like the navigation menus, advertisements, \
links and many other parts that should not be included in the main content: \
only report the title of the page and its most relevant content; do not report anything else. \
Report the relevant text verbatim, do not edit yourself. Use Markdown format for the output.
<<<>>>
{content}
"""
return chat(user_message)
The query
function is our interface to https://duckduckgo.com and does little more than using the duckduckgo_search
package and returning the results. A try/expect
block is used to prevent marlformed pages or broken links (of which I have found a few while teting).
def query(query, chat, max_results=10, verbose=False):
from duckduckgo_search import DDGS
results = DDGS().text(query, max_results=max_results)
contents = []
for result in results:
try:
content = extract_content(result['href'], result['title'], result['body'], chat)
contents.append(content)
except Exception as e:
if verbose:
print(e)
return contents
And here we are, ready to start our journey.
chat = Chat()
initial_contents = query('novel blockchain applications', chat, 10, False)
The list initial_contents
is a simple list of what DuckDuckGo has returned; we format it as a string and pass it to the LLM.
as_str = lambda contents: '\n\n<<<>>>\n\n'.join(initial_contents)
initial_message = f"""
You are a technology expert; your focus is in particular on blockchain \
technology and applications thereof. Below you have select suggestions for \
applications of the blockchain, each divided by <<<>>>. \
Your task is to come up with novel suggestions on how to use the blockchain. \
Come up with 10 suggestions. For each suggestion, \
prepare the query for a search engine to gather further material. \
Format the output as json; the json should contain the suggestions in a list \
named "suggestions"; each entry will have a "title", a "description" and a \
"search_query" attributes.
<<<>>>
{as_str(initial_contents)}
"""
initial_response = chat(initial_message)
The second (and last, for us) iteration involved more web searching and scraping.
iterated_contents = []
suggestions = initial_response['suggestions']
for suggestion in suggestions:
title = suggestion['title']
print(title)
description = suggestion['description']
search_query = suggestion['search_query']
results = query(search_query, chat, 5, verbose=False)
iterated_contents.append(dict(title=title, description=description, results=results))
iterated_message = f"""
You are a technology expert; your focus is in particular on blockchain \
technology and applications thereof. Below you have select suggestions for \
applications of the blockchain, each divided by <<<>>>. \
Your task is to come up with novel suggestions on how to use the blockchain. \
Summarize the 10 most interesting applications suggested below, then if you can \
suggest new applications as well. \
Format the output as json; the json should contain the suggestions in a list \
named "suggestions"; each entry will have a "title" and a long "description" \
attributes.
<<<>>>
{as_str(sum((s['results'] for s in iterated_contents), []))}
"""
iterated_response = chat(iterated_message)
This is the final result, which is not bad given the simplicity of the code. A few more iterations (and more web searches) would have helped.
text = ''
for suggestion in iterated_response['suggestions']:
title = suggestion['title']
description = suggestion['description']
text += f'# {title}\n{description}\n'
from IPython.display import display, Markdown
display(Markdown(text))
Revolutionizing Identity Management with Blockchain
Blockchain technology is set to transform identity verification processes by decentralizing the storage and management of digital identities, thereby enhancing privacy, security, and user control. The immutable nature of blockchain ensures that identity records cannot be altered or forged, while smart contracts in blockchain platforms can facilitate automatic verification processes without the need for intermediaries. This innovation not only addresses the common challenges of existing systems, such as data breaches and identity theft, but also enhances compliance with KYC and AML regulations, boosts efficiency, and opens up access to services for underserved populations by providing reliable digital identities.
Enhancing Renewable Energy Markets with Blockchain
Blockchain technology can significantly affect renewable energy markets through traceability, efficient energy trading, and improved supply chain management for green hydrogen. By leveraging blockchain, energy trading can be conducted securely and transparently between peers without centralized intermediaries, thereby reducing costs and improving grid stability. Furthermore, blockchain’s capabilities in tracking and certifying the renewable origin of energy sources can bolster the adoption of green certificates and ensure compliance with sustainability standards.
Blockchain in Education: Streamlining Certificate Verification
Blockchain provides a powerful tool for the education sector by enabling the secure, transparent, and immutable storage of academic credentials. By using blockchain, educational institutions can issue verifiable digital certificates that reduce the potential for fraud and streamline the certification verification process for employers. This not only enhances the trust in academic qualifications but also significantly reduces the administrative burdens associated with the verification process, leading to a more dynamic and responsive educational system.
Blockchain for Transparent Charitable Donations
Utilizing blockchain technology in the philanthropic sector can introduce unprecedented levels of transparency and efficiency. Smart contracts on blockchain can automate and enforce the allocation of funds upon meeting certain conditions, ensuring that donations are used as intended by donors. Moreover, every transaction recorded on a blockchain ledger allows donors to track where and how their contributions are being utilized, fostering trust and encouraging more generous giving.
Blockchain Transforming Public Sector Data Management
Governments can implement blockchain to enhance the management of public records by making them more secure, transparent, and accessible. Blockchain’s decentralized nature prevents unauthorized tampering with public records, while its capability for automation streamlines data management processes. This technology not only promotes transparency and efficiency in governmental operations but also significantly reduces the costs associated with maintaining public records and improves public trust in governmental data handling.
Blockchain-Enabled Supply Chain Management
Within supply chains, blockchain can revolutionize how goods are traced, from production to consumption, ensuring the authenticity and provenance of products. Blockchain’s transparency helps in preventing counterfeits and unauthorized substitutions in supply chains, particularly in sectors like pharmaceuticals, luxury goods, and electronics. This not only enhances consumer trust but also strengthens the security and efficiency of supply chains across the globe.
Decentralized Machine Learning with Blockchain
Blockchain can support decentralized machine learning initiatives, such as federated learning projects, where data remains at the source, and insights are shared via a decentralized network. This ensures data privacy and security, enhancing collaborative efforts across various stakeholders without compromising sensitive information.
Real Estate on Blockchain
Blockchain technology can profoundly impact the real estate sector through the tokenization of property, allowing for fractional ownership and improving liquidity in the real estate market. Smart contracts can automate property sales, leasing, and management processes, reducing the need for intermediaries and making transactions more efficient and less susceptible to fraud.
Blockchain in Art Provenance
Blockchain can provide a reliable digital ledger for tracking the provenance and authenticity of artworks, significantly benefiting the art market. By using blockchain to record detailed history and transactions of artworks, stakeholders can ensure authenticity and prevent frauds, making the art market more open and secure.
Blockchain for Digital Content Distribution
In the digital media industry, blockchain can address challenges related to copyright and distribution by allowing creators to retain control and monetization of their content. Blockchain platforms can manage rights and royalties transparently and efficiently, ensuring that creators are compensated adequately for their work and reducing piracy.