Privacy Protection in Personalized Web Search

1Ritu Jhalani

1Assistant Professor, IIIM, Jaipur

I. Introduction

Personalized web search (PWS) is a promising method to improve the accuracy of web search, but effective personalized search needs to collect and summarize user information, which has caused many users to pay serious attention to privacy infringement. In fact, these concerns have become one of the obstacles to the deployment of personalized search applications. How to personalize privacy protection is a challenge. In this report, we will discuss issues related to privacy protection in personalized search.

Search engines are very important for laymen who are looking for information online. Users may fail when search engines provide irrelevant information that does not meet their true and intended intentions. These irrelevant information are the result of numerous changes in the user’s background and background, as well as ambiguity in the text. PWS is a universal search technology that provides tailored search results based on individual user needs. For this purpose, collect and analyze user information to find the requirements behind the query. There are two types of PWS.

 a) Click on log based

 b) Based on profile

The click-based method works based on the clicked page in the user’s query history and can handle repeated queries from the same user, which is also a limitation of the method.

The profile-based approach is very effective. Using a sophisticated interest model can improve the search experience by using the user’s personal and behavioral information that is implicitly collected from the query history, browsing history, click data bookmarks, and user documents.

Privacy protection level in personalized search Since personal information can reveal the user’s private life, such as political orientation, family life, and hobbies, disclosure of this information can cause serious concern for the user.

Although there is a big difference between user query and click search results. Since the query consists of users, the user’s information needs are displayed, and the search results consist of web page publishers.

Therefore, the query may contain more personally identifiable information than the viewed search results.

Different users have different requirements for privacy protection. Many users do not want to disclose their personal information. Other users may be willing to share some personal information for better search results or services. Therefore, many need to adjust the privacy protection level for different users to meet the different preferences of personalization and privacy protection [1].

Level I: Pseudo-identity

The personalized web search system has a level one privacy protection. If the user identity is replaced by a pseudo identity, the pseudo identity contains less personal identity information than the real identity. With public databases and IP addresses, user identities can be mapped to a single or small group of users, and geographic information (eg cities, states) can be protected.

Level I is the lowest level of privacy protection. Unfortunately, this level is not enough to protect the user’s privacy, because it allows to aggregate all of the user’s information requirements description, which can in turn facilitate the identification of users.

 level II: group identity

If a group of people share a user identity, the PWS system has a second level privacy protection group. When a group of users sends their profile to the search engine, this level of protection can be implemented so that the search engine can only create a group user profile for the group, not a user profile for each individual user.

This may reduce the effectiveness of personalization because the information requirements of the group are used to simulate the information needs of individual users. Level II has higher privacy protection than Level I. A good way to implement Level II privacy protection is to set up a proxy for a group of users, and all users will communicate with search engines through the proxy.

 Level III: No identity

If the user identity is not available to the search engine and the description of the user information requirement cannot be gathered on the search engine side, even at the group level, the PWS system has third-level privacy protection.

Because search engines do not have user profiles, personalized web search must be supported on the user’s own computer. Personalized search is achieved by combining general search with localized personalization results.

Level IV: No Personal Information

If the search engine does not have a user identity and not a description of the user’s information needs, the personalized web search system has IV level privacy protection

At level IV, search engines do not know the description of individual user or user information needs. Therefore, user privacy is fully protected.

Encryption technology can be used to achieve this level of privacy protection. For example, a search engine can publish an index to a trusted third party; the user sends the query to a third party, a third party searches and returns the result to the user. Another possibility is that the law requires search engines to ensure that no user information or user description information is stored. Even if the search engine responds directly to user search requests, there will not be any user memory.

Level IV has the highest level of privacy protection for personalized search. However, due to high communication and encryption/decryption technologies, the cost will increase.

Personalized web search software architecture Web Search Application User Client – Server architecture where a web browser (client) sends a query to a query engine (server). The search engine analyzes the user information requirements, finds the index structure of the document, and returns the ranking list of search results to the user’s web browser. The search engine stores user search logs to provide suggestions such as personalization and anti-spam. The search engine automatically deletes the log or keeps the log indefinitely.

On the basis of storing personal identification information, three software architectures are proposed and their privacy protection is analyzed [1].

I) Server-side personalization

II) Client Personalization

III) Client Server Collaboration Personalization

For server-side personalization, personally identifiable information is stored on the search engine side. Users create an account/profile that identifies themselves on the search engine by explicitly entering or implicitly providing a search history. The advantage of this architecture is that the search engine can use all its resources in a personalization algorithm. This architecture is used by some general search engines, such as Google Personalized. This architecture does not even provide Class I privacy protection. Many users are afraid that search engines will invade privacy and hinder the widespread adoption of this architecture. However, if search engines replace user identities with false identities, Level I privacy protection can be achieved. If the user decides to use an agent to communicate with a search engine, secondary privacy protection can be achieved. Due to personalization on the search engine side, it is not possible to achieve level III and IV privacy protection

For client personalization, personally identifiable information is stored on the user’s personal computer. The client-personalized search agent can perform query expansion to generate new queries before sending the queries to the search engine. The personalized search agent may also rearrange the search results after receiving the search results from the search engine to match the inferred user preferences. Not only the user’s search behavior but also his contextual activities and personal information can be incorporated into the user’s profile. Allows the construction of richer personalized user models. The two main benefits are – sensitive contextual information stored and used on the client, for personalised calculations and storage distributed across the client. The disadvantage is that the personalization algorithm cannot use some knowledge that is only available on the server side. The page score of the result file. Two levels of privacy protection are implemented, but if the client communicates with a search engine using an anonymous network, third-level privacy protection can be achieved. Even level 4 privacy protection can be achieved by companies with search engines.

Client-server collaboration personalization is a compromise between client personalization and server-side personalization. User profiles are stored on the client, but the server is also involved in personalization. In the query, the client extracts the context information from the user profile and sends it along with the query to the search engine. The search engine then personalizes with the received context. Compared with client personalization. The advantage of this architecture is that you can use search engine

internal resources. The main disadvantage is that streamlined contextual information may not be as powerful as the entire user profile. Due to the relatively complex architecture, no personalized product can be seen in this category.

II. Quick review

PWS (Personalized Web Search) is performing an online summary of user profiles in a hierarchical classification to protect personal privacy without affecting search quality. Researchers are working hard to improve search quality and privacy protection through generalization techniques [2].

Privacy protection in PWS applications, modeling user preferences as hierarchical user profiles. The PWS framework, known as UPS (User-Customizable Privacy-Protected Search), can adaptively summarize query profiles while meeting user-specified privacy requirements. The purpose of runtime generalization is to strike a balance between assessing individualized utility and exposing two predictive indicators of privacy risk in a broad profile. The two algorithms Greedy DP and Greedy IL are used for runtime generalization. The experimental results also show that GreedyIL is significantly superior to GreedyDP in terms of efficiency [3].

The UPS framework is an effective method aimed at retrieving what users want by guaranteeing privacy [4].

Privacy protection in PWS applications can be achieved through model user preferences as a hierarchical user profile by studying a PWS framework called UPS that adaptively summarizes profiles through queries while keeping user-specific privacy in mind Claim. Program fragmentation techniques decompose the program by analyzing specific program data and control processes [5].

References

  1. S. Xuehua, T. Bin and Z. ChengXiang, “Privacy Protection in Personalized Search,” Special Interest Group on Information Retrieval Forum, vol. 41, no. 1, pp. 4-17, 2007.
  2. V. M. Sharvari and K. Shilpa, “Client side Privacy Protection Using Personalized Web Search,” in 7th International Conference on Communication, computing and Virtualization 2016, 2016.
  3. S. Lidan, B. He, C. Ke and C. Gang, “Supporting Privacy Protechtion in Personalized Web Search,” IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING VOL:26 NO:2, pp. 1-15, 2014
  4. S. U. R. S. K. S. Porna Sai, “Supporting privacy protection in personalized web search A survey,” Indian Journal of Innovations and Developments, vol. 3, no. 3, pp. 45-49, 2014.
  5. W. Manali and D. Ingle, “Supporting Privacy Protection in Personalized Web Searching and Browsing,” International Journal of computer Science and Information Technologies, vol. 6, no. 4, pp. 4086-4093, 2015.