Where are artificial intelligence chatbots like ChatGPT getting their healthcare information? This website, for one, it turns out.
For an April 19 story, The Washington Post analyzed Google's C4 data set, which is used for large language models including Google's T5 and Facebook's LLaMA (ChatGPT owner OpenAI does not disclose where its data comes from). The Post ranked 10 million websites based on the number of "tokens," or words or phrases, in the data set.
Here are its top five sources for science and health information:
1. journals.plos.org
2. frontiersin.org
3. link.springer.com
4. ncbi.nlm.nih.gov
5. nature.com
The large language models also ingest content from Becker's, including Becker's Hospital Review (ranked No. 3,824), Becker's ASC (No. 18,003), Becker's Spine (No. 23,586) and Becker's Dental (No. 946,137).