Website Name: Blue Cross and Blue Shield of Illinois This folder has the data for the link: https://www.bcbsil.com/member/policy-forms/machine-readable-file The index json file which contains links to downloadable json.gz is: 2022-10-21_Blue-Cross-and-Blue-Shield-of-Illinois_index.json ---------------------------------------------Detials---------------------------------------------------------------------- reporting_entity_type: Health Insurance Issuer reporting_entity_name: Blue Cross and Blue Shield of Illinois 51467 links for json.gz were found in 2022-10-21_Blue-Cross-and-Blue-Shield-of-Illinois_index.json As these links are all not have a .json.gz ending, instead some links leads to other webpages with nothing to download, so we will filter the links from which we can get the data. All the downloadable links are also written to this file: 2022-10-21_Blue-Cross-and-Blue-Shield-of-Illinois_index.txt All the downloaded file are extracted from .json.gz to .json and then converted into .csv. After conversion the .json files are deleted to save memory. --------------------------------------------------------------------------------------------------------------------------- The .csv files are the actual data ! for the codes either check out the main.py or final_script.ipynb