It's hard to believe I first experimented with Diffbot nearly five years ago. You can see that first post up on the Adobe Medium account - Natural Language Processing, Adobe PDF Extract, and Deep PDF Intelligence. Since then I've tested out various APIs and features from them and was lucky enough to connect with them recently about a new initiative, a web search API. There's multiple examples of this out in the wild already, but most just scrape/hack against Google. Google had an API, the Custom Search JSON API (I even covered it back when folks still talked about the JAMStack) but the API is now deprecated and officially turning off January 1, 2027. Diffbot's API (which quietly launched about two weeks ago) is against their own crawled index. Why does this matter? Honestly the docs do a damn good job of explaining why you should care (emphasis mine): "Candidates are retrieved and reranked with a cross encoder model trained to rank factual relevance over popularity, primary sources…
No comments yet. Log in to reply on the Fediverse. Comments will appear here.