Web Discovery Project란 무엇인가요? 팔로우
Web Discovery Project는 Brave 검색의 성장과 독립에 기여할 수 있는 개인정보 보호 방식입니다. If you opt in, you’ll contribute some anonymous data about searches and web page visits made within the Brave Browser (including pages arrived at via some, but not all, other search engines). This data helps build the Brave Search independent index, and ensure we show relevant results to your search queries and support more relevant experiences with Brave products and services.By “data” we mean search queries, search result clicks, the URLs of pages visited in the browser, time spent on those pages, and some metadata about the pages themselves.
Web Discovery Project는 백그라운드에서 실행되므로 기여자가 별도의 노력을 기울일 필요가 없습니다. Data contributed cannot be linked back to whoever contributed it, or grouped together, which prevents deanonymization attempts. Opt out at any time.
우리가 Web Discovery Project를 구축한 이유
Providing relevant search results is essential to building a search engine people want to use. It’s how we create a private search engine that still competes with big tech on quality and completeness. To ensure search results are as relevant as possible, Brave needs to understand some key things, including:
- How closely search results match the search keywords (matching to exact words, parts of words, or synonyms)
- How recent searches are for those keywords
- How often a search result is clicked for a given keyword
- How popular search keywords are
- What pages are popular or novel
- Which sites only allow crawling by the Google search bot
Ensuring relevance also means reducing the “noise” from web content that makes a search less relevant. For example, if you search for “Europe weather” and see results relating to European history or European business, you would say the results are less relevant to your query. Web Discovery Project를 통한 학습으로 Brave 검색은 이러한 노이즈를 개인정보 보호 방식으로 필터링할 수 있습니다. Making search more relevant shouldn’t come at the expense of your online privacy.
Context
Most search providers—like Google and Microsoft—collect data about your search behavior, both in the search engine and the browser (like Chrome or Edge). This data includes your queries, what search results you click, the URLs of the pages you visit, time spent on those pages, and metadata (such as page title, content-type, etc) about the pages themselves. Other, non-independent search engines (like DuckDuckGo) don’t necessarily collect data themselves. But they still rely on this kind of collection via their dependence on other big tech indexes (like Bing). And this data can—and often is—associated with you personally.
Search providers collect this kind of data to continuously grow their indexes—the list of billions of web pages they draw from to deliver results—and ensure results are relevant and never stale. This collection isn’t inherently bad. But it’s shortcomings become apparent when you look at Brave’s alternative way:
- Web Discovery Project를 통해 익명의 일반화된 데이터를 기여할 수 있습니다.
- Web Discovery Project는 이 데이터를 사용자와 연결하지 못하도록 설계되었습니다. This means there’s no data for Brave to sell to advertisers, or lose to theft or hacking, allowing us to promise through technology rather than words.
- Brave의 Web Discovery Project는 옵트인 방식이며 완전히 투명합니다.
The protection of unlinkability
Brave doesn’t follow the sneaky practices of other big tech search engines. Web Discovery Project는 옵트인 방식이며, Web Discovery Project를 통해 수집된 데이터에는 익명성을 보장하기 위한 특정 보호 장치가 있습니다. 이러한 보호 장치 외에도 Web Discovery Project는 "연결 불가능성" 원칙을 준수합니다. This means we do not link data to you, your browser, or your device. Brave Search has no concept of a user or session ID, which prevents record linkability. 또한 Web Discovery Project에는 사용자에게 특정되거나 개인 정보 또는 민감한 정보가 포함된 웹사이트나 검색이 포함되지 않도록 하는 여러 보호 장치가 포함되어 있습니다.
What keywords are being searched most often? What websites do those keywords lead to? How are those websites interacted with? These kinds of directional questions help Brave Search navigate the world of available web pages, and separate signal from noise. And this, in turn, helps us understand the parts of the web worth indexing for users.
Web Discovery Project에 옵트인하면 브라우저가 다음 데이터를 기기에서 처리하여 Brave의 서버로 안전하게 전송합니다:
- A fraction of the addresses (URLs) of the web pages visited in the Brave Browser, along with engagement metrics (how much time is spent on the page)
- A fraction of the queries (e.g. “New York weather today”) conducted in some search engines (outside of Brave Search) within the Brave Browser, along with the associated click on a result (if any)
- Metadata of those visited pages (e.g. if the page contains a video, info about page author or owner, page title, etc.), never the content of the page itself.
- For a complete list, check out Brave’s GitHub repo
With this data, Brave can learn (in a private, unlinkable way) things like how many visits to a website (e.g. Wikipedia) lasted longer than 20 seconds, or how many times a given query (e.g. “What is Wikipedia?”) led a user to click through to that website. This calibrates Brave Search to know a website is legitimate, and that users find the content valuable. This, in turn, allows the search engine to understand result relevance, and to serve pages with higher relevance at the top of search results.
This data does not allow Brave to know things like associated queries (e.g. other queries conducted by people who searched “What is Wikipedia?”) or the other websites visited. And it of course tells us nothing that would allow us to link the data to an individual or their device.
기본적으로 모든 사용자는 Web Discovery Project에서 옵트아웃됩니다. If you’ve chosen to opt in, you can opt out again at any time. Whatever you choose—opt in or opt out—your experience in Brave or Brave Search will not change.
To opt out, open a new tab in the Brave browser and click Settings. "Web Discovery Project"로 스크롤하여 이 설정을 끄세요.
Web Discovery Project는 가벼우며 백그라운드에서만 실행됩니다. There should be no noticeable impact on browsing speed, page-rendering speed, or other similar metrics. However, there may be some small (but likely unnoticeable) overhead in the form of extra CPU and bandwidth consumed. Note that the Web Discover Project only runs on desktop devices, so there is no impact on mobile data plans. If you notice performance issues, please notify us immediately.
All URLs sent must be publicly available—that is, they must have the same content regardless of who is contributing them. This can only be true if the pages are not behind a log-in, individual session, or other authentication. All URLs sent must have been visited by at least 20 different people, which establishes a distributed quorum similar to k-anonymity.
Additionally, there are a variety of heuristics applied to rule out URLs that encode access i.e. capability URLs (such as shared docs, Dropbox links, invoice links, etc). By design, none of these URLs are sent. And, even if they somehow were, the record-unlinkability protocol means no one with access to the data could recover other URLs from the same origin, or associate any data with anyone.
The above protections also apply to search queries. Any query containing what appears to be personal data, such as emails, phone numbers, or hashes, are automatically discarded rather than sent.
- Web Discovery Project에 대한 개요는 Brave의 GitHub 저장소에서 확인할 수 있습니다.
- 최상위 README를 읽어보세요.
- 소스 코드 보기.
If you spot a potential problem, please create an issue on the repo, or contact us.