Website Name: Care First This folder has the data for the link: https://individual.carefirst.com/individuals-families/mandates-policies/machine-readable-file.page?submit=true&componentID=1645309246851 There were 6 .json files having links to downdloadables 1. Table-Of-Content-Carefirst-HMO.json 2. Table-Of-Content-Carefirst-PAR.json 3. Table-Of-Content-Carefirst-PPO.json 4. Table-Of-Content-CFA.json 5. Table-Of-Content-NCAS.json 6. Table-Of-Content-Netlease-FlexLink.json ---------------------------------------------Detials---------------------------------------------------------------------- For File_1: File_1.reporting_entity_type: HEALTH INSURANCE ISSUER File_1.reporting_entity_name: CareFirst Inc -->3386 links for were found in ./fileData1/Table-Of-Content-Carefirst-HMO.json As these links are all not have a .json.gz ending, instead some links leads to other webpages with nothing to download, so after filtering the links from which we can get the data. All the downloadable links are also written to this file: Table-Of-Content-Carefirst-PAR.tx ======================================================================================================================== For File_2 File_2.reporting_entity_type: HEALTH INSURANCE ISSUER File_2.reporting_entity_name: CareFirst Inc These links are expired!!! ======================================================================================================================== For File_3 File_2.reporting_entity_type: HEALTH INSURANCE ISSUER File_2.reporting_entity_name: CareFirst Inc -->4018 links for json.gz were found in Table-Of-Content-Carefirst-PPO.json As these links are all not have a .json.gz ending, instead some links leads to other webpages with nothing to download, so we will filter the links from which we can get the data. All the downloadable links are also written to this file: Table-Of-Content-Carefirst-PPO.txt ======================================================================================================================== For File_4 File_2.reporting_entity_type: HEALTH INSURANCE ISSUER File_2.reporting_entity_name: CareFirst Inc -->1047 links for json.gz were found in Table-Of-Content-CFA.json As these links are all not have a .json.gz ending, instead some links leads to other webpages with nothing to download, so we will filter the links from which we can get the data. All the downloadable links are also written to this file: Table-Of-Content-CFA.json.txt ======================================================================================================================== For File_5 File_2.reporting_entity_type: HEALTH INSURANCE ISSUER File_2.reporting_entity_name: CareFirst Inc -->318 links for json.gz were found in Table-Of-Content-NCAS.json As these links are all not have a .json.gz ending, instead some links leads to other webpages with nothing to download, so we will filter the links from which we can get the data. All the downloadable links are also written to this file: Table-Of-Content-NCAS.json.txt ======================================================================================================================== For File_6 File_2.reporting_entity_type: HEALTH INSURANCE ISSUER File_2.reporting_entity_name: CareFirst Inc -->2940 links for json.gz were found in Table-Of-Content-Netlease-FlexLink.json As these links are all not have a .json.gz ending, instead some links leads to other webpages with nothing to download, so we will filter the links from which we can get the data. All the downloadable links are also written to this file: Table-Of-Content-Netlease-FlexLink.txt -------------------------------------------------------End-------------------------------------------------------------------- All the downloaded file are extracted from .json.gz to .json and then converted into .csv. After conversion the .json files are deleted to save memory. The .csv files are the actual data ! for the codes either check out the main.py or final_script.ipynb