Every year, researchers perform thousands of studies using high-throughput, molecular data. Journal editors and research funders require that these datasets be shared publicly so that others can verify and reuse them. Repositories like
Gene Expression Omnibus (GEO) now store hundreds of thousands of high-throughput, molecular datasets. These massive amounts of data offer exciting opportunities for researchers to reuse the data. However, due to the sheer number of datasets and a lack of formal annotations to describe the data, it is difficult to
find datasets relevant to a particular research topic. This is especially true when researchers seek to gather many datasets on a topic. We created GEOfinder to help improve this process.
GEOfinder assumes that a researcher has previously found some datasets relevant to their research topic and wishes to find more. They might have found these datasets by reading journal articles or using GEO's "Advanced Search" feature. Whereas GEO supports keyword-based searches, GEOfinder uses
semantic search, based on
text embeddings derived from a
large language model. We have found that this approach often outperforms and/or complements GEO's search tool.
To get started, you can manually enter GEO accession IDs below. Alternatively, you can use GEO's search tool to find datasets and import them here. To do so, initiate a search
here. Then select "Series" under "Entry type" on the left panel. The search results will include a checkbox next to each series. Select the series that align with your research topic. Then click "Send to," select "File," and click "Create File." Import that file in GEOfinder using the upload feature (paperclip icon) below.
Let us know if you run into problems or would like to request additional features.
Close