This is a Python program that automatically generates an "awesome list" for a specific keyword as a markdown file. An "awesome list" is a list of resources related to a specific topic. Currently, the resources include GitHub projects, Google Scholar articles, YouTube videos, courses, slides and presentations, software and tools and podcasts. The awesome list is automatically generated using GPT models; you can choose between different models to generate the list, such as GPT 3.5 or GPT 4.
poetry installOPENAI_API_KEY=<your_openai_api_key>
A Google account.
Visit the Google Cloud Console.
If you haven't already, create a new project by clicking on the "Select a project" dropdown at the top-right corner, then click on "NEW PROJECT".
Once your project is created and selected, navigate to the Navigation menu (three horizontal lines at the top-left corner), and then click on "APIs & Services" > "Credentials".
Click on the "Create Credentials" button and select "API key". Once created, your API key will be displayed.
Copy your API key and save it securely. You'll use this key in your application to authenticate your requests.
Go to Google Custom Search homepage.
Click on "Create a custom search engine".
In the "Sites to search" section, you can specify websites you want to search or choose "Search the entire web" to allow broader search capabilities. However, if you choose "Search the entire web", make sure to toggle "Search only included sites" off under the "Sites to search" section.
Fill in other required fields like the name of your search engine.
Click on the "Create" button at the bottom.
Once your search engine is created, you'll be directed to a setup page. Here, find and copy the "Search engine ID" (also called "cx" in some contexts). You'll use this ID in your application to specify which custom search engine to use for queries.
Finally, add the following environment variables to .env file:
GOOGLE_CLOUD_API_KEY='<google cloud api key>'
CUSTOM_SEARCH_ENGINE_ID='<custom search engine id>'
We've provided a Streamlit interface for running this application. To use it:
Run the Streamlit application using Poetry:
poetry run streamlit run streamlit_run.pyOpen http://localhost:8501
You can easily input the necessary parameters (like model type, keyword, and description) through the UI and generate your awesome list!
The main class used in this project is the AwesomeListGenerator. This class accepts the following parameters:
keyword: A string representing the keyword for which the awesome list will be generated.description: A string providing a description related to the keyword.model: A string representing the OpenAI model to be used for generating the markdown (default is "
gpt-3.5-turbo-16k").data_extraction_batch_size: An integer representing the number of data items to process in each batch (default is
10). For example, if the batch size is 10, then the data will be fetched from the data sources in batches of 10 (like
10 github projects at a time).number_of_results: An integer representing the number of results to fetch from each data source (default is 20). the
number of results to fetch from each data source (default is 20). For example, fetch 20 Github projects then process
them with LLM model in batches based on data_extraction_batch_size.After initializing the class with these parameters, invoke the save_and_return_awesome_list method to generate the
markdown file. Here's an example:
# Initialize an instance of the AwesomeListGenerator
generator = AwesomeListGenerator(keyword="Your Keyword",
description="Your Description",
model="gpt-3.5-turbo-16k",
data_extraction_batch_size=10,
number_of_results=20)
# Generate and save the markdown
markdown_content = generator.save_and_return_awesome_list()The program will generate a markdown file in the output directory named after your keyword (e.g., Your_Keyword.md).
This file contains the "awesome list" generated by the program.
The AwesomeListGenerator program operates in two main phases: Data Scraping and Data Processing.
In the data scraping phase, the program fetches resources related to your provided keyword from multiple data sources. Currently, the resources include GitHub repositories, Google Scholar articles, YouTube videos, and podcasts. The program utilizes specialized scrapers for each source, each of which is designed to fetch the most relevant and highest quality resources.
For instance, the GitHub scraper fetches repositories that match the keyword, sorted by the number of stars (a common indicator of a repository's relevance and quality). Similarly, the Google Scholar scraper retrieves articles related to the keyword and sorted by citation count.
Once the data is scraped, it is passed on to the data processing phase. In this phase, the program uses the selected GPT model to process the fetched resources. The model filters and ranks the resources based on relevance to the keyword, quality of content, and potential usefulness to users. The GPT model also formats the data into a markdown list, adding necessary formatting such as links and brief descriptions.
Notably, both scraping and processing operations are executed in batches. This batch-wise operation allows the program
to support as many results as needed, based on the configured number_of_results and data_extraction_batch_size. This
way, you have control over the extent of data being handled at a time, ensuring efficient resource usage.
We're looking to expand the number of data sources in the future. Here are some ideas we have in mind:
If you're interested in contributing, you can pick one of the above tasks or propose your own ideas. We welcome all kinds of contributions and appreciate your interest in our project!
We love seeing the incredible awesome lists that our community creates. If you've used our tool to generate an awesome list, feel free to let us know, and we will feature your project here!
Did you find this project useful? If it has brought value to you, please give us a on GitHub. This gesture not only validates our efforts but also helps this project reach more people and continue development.
Feel free to fork the repository, contribute by submitting pull requests, or open an issue. Your feedback and contributions are always welcome!