Web crawlers

Subclass of:

707 - Data processing: database and file management or data structures

707705000 - DATABASE AND FILE ACCESS

707706000 - Search engines

Patent class list (only not empty are listed)

Deeper subclasses:

Class / Patent application number	Description	Number of patent applications / Date published
707710000	Category specific web crawling	159

Document	Title	Date
Entries
20100049761	SEARCH ENGINE METHOD AND SYSTEM UTILIZING MULTIPLE CONTEXTS - A method for context-based searching includes retrieving content over a computer network, segmenting the content into a plurality of cohesive segments, and identifying at least one cohesive segment of the plurality of cohesive segments with at least one context of a plurality of contexts. In the method, the plurality of contexts are resident on one more computer-readable storage media in a searching system. The method further includes indexing, in the plurality of contexts, the plurality of cohesive segments identified with the plurality of contexts.	02-25-2010
20100070485	Social Analytics System and Method For Analyzing Conversations in Social Media - Conversations in an online content universe are monitored. A social analysis module analyzes individual conversations between publishers in the online content universe. Publishers that influence a conversation are identified.	03-18-2010
20100076954	Representative Document Selection for Sets of Duplicate Dcouments in a Web Crawler System - Duplicate documents are detected in a web crawler system. Upon receiving a newly crawled document, a set of documents, if any, sharing the same content as the newly crawled document is identified. Information identifying the newly crawled document and the selected set of documents is merged into information identifying a new set of documents. Duplicate documents are included and excluded from the new set of documents based on a query independent metric for each such document. A single representative document for the new set of documents is identified in accordance with a set of predefined conditions.	03-25-2010
20100082595	Data Retrieving and Using Method and Device thereof - A data retrieving and using method includes following steps: linking a designated website to obtain a corresponding webpage data, retrieving a specific key information of the corresponding webpage data by an information integrating module and/or an information integrating application software, and integrating the specific key information as an information demo zone and a hyperlink to a frame of a user interface.	04-01-2010
20100082596	Video-related meta data engine system and method - A engine for use with a display is disclosed. The engine includes a presenter that presents at least two audio/visual works, at least one software application capable of at least one metadata-related interaction with the at least two audio/visual works, communication points over which the audio/visual works are received, and over which at least a portion of the at least one metadata-related interaction occurs, and a hierarchical taxonomy that effects a common metadata reference to each recurrence of a particular object across the audio/visual works, and across each of the at least one metadata related interaction.	04-01-2010
20100082597	Methods and apparatus for automated true object-based image analysis and retrieval - The present invention is an automated and extensible system for the analysis and retrieval of images based on region-of-interest (ROI) analysis of one or more true objects depicted by an image. The system uses a Regions Of Interest (ROI) database that is a relational or analytical database containing searchable vectors that represent the images stored in a repository. Entries in the ROI database are created by an image locator and ROI classifier that work in tandem to locate images within the repository and extract the relevant information that will be stored in the ROI database. Unlike existing region-of-interest search systems, the ROI classifier analyzes objects in an image to arrive at the actual features of the true object, instead of merely describing the features of the image of that object. Graphical searches are performed by the collaborative workings of an image retrieval module, an image search requestor and an ROI query module. The image search requestor is an abstraction layer that translates user or agent search requests into the language understood by the ROI query.	04-01-2010
20100088308	System and Method for Automated Discovery, Binding, and Integration of Non-Registered Geospatial Web Services - A method and computer system for identifying internet web pages containing documents that comply with a predetermined XML schema. The method includes searching the internet with a search engine for web pages using initial search terms and identifying a first set of HTTP URLs, web crawling at least the first set of HTTP URLs to identify additional HTTP URLs, appending a query to the identified URLs, and evaluating the responses to the query to determine which responses comply with the predetermined XML schema. The XML schema can be a Web Mapping Services schema. The system can store responses that comply with the XML schema in a database of servers, periodically check the database for validity, and convert the map requests for map servers in the database to a GIDB Portal Interface API.	04-08-2010
20100094859	Folksonomy-Enhanced Enterprise-Centric Collaboration and Knowledge Management System - An enterprise search system includes a server coupled to a data repository storing information specific to persons engaged in the enterprise, including enterprise-related activity and data related to such activity, and a search application executing on the server from a digital media accessible to the server. The search application, in response to criteria entered, searches sources within the enterprise and returns results specifically associated with additional information specific to individual ones of the persons engaged in the enterprise.	04-15-2010
20100094860	INDEXING ONLINE ADVERTISEMENTS - In one embodiment, a method for a detection server in communication with each of multiple web pages of multiple websites on multiple web servers, the detection server in communication with an ad indexing server, includes automatically accessing from the detection server a file for rendering the web page from a web server, automatically building an object model of the web page at the detection server using the accessed file, automatically scanning the object model at the detection server for one or more elements that are advertisements, automatically analyzing each scanned advertisement at the detection server to determine one or more attributes of the scanned advertisement, and automatically storing data at the ad indexing server on the determined attributes of the scanned advertisements found at the detection server to facilitate an indexing of advertisements on the web pages of the websites.	04-15-2010
20100106705	SOURCE CODE SEARCH ENGINE - In an embodiment, a method of operating a software search engine is provided. The method includes populating a software code database from one or more sources of source code. The method also includes receiving a search query for a software code search engine (	04-29-2010
20100106706	Method and apparatus for identifying related searches in a database search system - A method of generating a search result list also provides related searches for use by a searcher. Search listings which generate a match with a search request submitted by the searcher are identified in a pay-for-placement database which includes a plurality of search listings. Related search listings contained in a related search database generated from the pay-for-placement database are identified as relevant to the search request. A search result list is returned to the searcher including the identified search listings and one or more of the identified search listings.	04-29-2010
20100114857	USER INTERFACE WITH AVAILABLE MULTIMEDIA CONTENT FROM MULTIPLE MULTIMEDIA WEBSITES - Automatically and repeatedly crawling multiple multimedia websites to identify and collect information about the multimedia content that is available for delivery over the Internet to a client device for playback on a media player operating on the client device. In one embodiment, the method normalizes the collected information by converting the different formats of the collected information into a common format and converting the different nomenclatures of the collected information into a common nomenclature. The method updates an index with the normalized information, and sends a data feed to the client device to populate the user interface on the client device with the normalized information of the index. The user interface allows the user to navigate and select the multimedia content that is available for delivery over the Internet for playback on the media player.	05-06-2010
20100114858	HOST-BASED SEED SELECTION ALGORITHM FOR WEB CRAWLERS - A host-based seed selection process considers factors such as quality, importance and potential yield of hosts in a decision to use a document of a host as a seed. A subset of a plurality of hosts is determined, including some but not all of the plurality of the hosts, according to an indication of importance of the hosts, according to an expected yield of new documents for the hosts, and according to preferences for the markets the hosts belong to. At least one seed is generated for each host of the determined subset of hosts, wherein each generated at least one seed includes an indication of a document in the linked database of documents. The generated seeds are provided to be accessible by a database crawler.	05-06-2010
20100114859	SYSTEM AND METHOD FOR GENERATING AN ONLINE SUMMARY OF A COLLECTION OF DOCUMENTS - An improved system and method for generating an online summary of a collection of documents is provided. A list of documents may be received, and the titles of the list of documents may be obtained. A set of terms that frequently occur in the titles of the documents may be iteratively expanded and overlapping phrases may be merged until there may be no more terms that occur in the titles with a frequency that exceeds a predefined threshold. In an embodiment, an article summarizer operably coupled to a search engine may be provided to generate a summary of a list of references to web pages in search results using titles of the web pages. The summary of the web pages may then be sent with the list of references to the web pages as search results to a client device for display to a user.	05-06-2010
20100114860	APPARATUS FOR SEARCHING INTERNET-BASED INFORMATION - The present invention relates to an Internet-based information search apparatus for searching for information by accessing web resources through the Internet using both a method of representing the identifier of a web resource and the identifier.	05-06-2010
20100114861	SYSTEM AND METHOD FOR INFORMATION ACQUISITION - A system for information acquisition includes a Global Positioning System (GPS) receiver for determining a current location, a search module connected to the GPS receiver, and a configuration module connected to the search module. At least one request item is stored in the configuration module. The search module is configured for requesting and acquiring associated information with the current location according to the request item to and from an information center through the Internet.	05-06-2010
20100114862	METHOD AND APPARATUS FOR GENERATING A RANKED INDEX OF WEB PAGES - After a sample set of web pages (	05-06-2010
20100114863	Search and storage engine having variable indexing for information associations - An apparatus, system and method for an open indexing system, which includes an indexing engine associated with at least one processor and having one or more open inputs for inputting of indexing criteria, at least one computerized search engine for obtaining information across at least one computing network in accordance with the indexing criteria, at least one repository comprising at least one computing memory for storing information obtained via the at least one computerized search engine and corresponded to the indexing criteria, and at least one reporting engine, wherein an output of the reporting engine is manipulable responsive to modification to one or more categorizations dependent on the indexing criteria, and wherein the output is dependent solely on the information in said at least one repository.	05-06-2010
20100114864	METHOD AND SYSTEM FOR SEARCH ENGINE OPTIMIZATION - It is provided a method and system for optimizing multiple website pages for search engine presence and positioning. Rule data collections are constructed by a management engine which may be guided by a consultant. Page selection criteria may be associated with the rule data collections. A rule implementing application program applies page editing actions of the rule data collection on appropriate website pages, thus creating optimized website pages. Thus, it is provided automatic implementation of certain search engine optimization (SEO) operations on multiple website pages, decreasing page editing and programming work load SEO of consultants and website programmers.	05-06-2010
20100121835	System and method for improving integrity of internet search - A system and method are provided to receive a search query from a user, typically via a web browser, the Internet, and a web server. A search engine obtains a set of potential search results based on the search query. For each Internet domain or web site mentioned in the search results, a set of data sources is accessed to obtain information concerning the legitimacy of the business associated with the Internet domain or web site. The legitimacy information is used to reorder or to change or to augment the appearance or presentation of the search result for the Internet domain or web site. The processed search results are returned to the user.	05-13-2010
20100125562	SYSTEM AND METHOD FOR GENERATION OF URL BASED CONTEXT QUERIES - A system and method for generation of URL context queries. A request is received over a network from a user for generation a URL based context query, wherein the request comprises at least one query generation criteria. A multidimensional datapace having a spatial axis, a temporal axis, a topical axis and a social axis is searched for clusters of related data objects using the query generation criteria, wherein at least one cluster of data objects relating to the query generation criteria is identified. Permissions are checked relating to each data object cluster of related data objects. If the user does not have permission to view the data object, it is removed from the cluster. A URL having a context query comprising at least one context criteria is generated from the properties of the cluster of data objects. The URL having a context query is then transmitted to the end user.	05-20-2010
20100125563	SYSTEM AND METHOD FOR DERIVING INCOME FROM URL BASED CONTEXT QUERIES - A system and method for deriving income from URL based context queries. A URL based user context query is received over a network from a user, wherein the user context comprises at least one user context criteria. A query is formulated based on the context criteria so as to search for user profile data, social network data, spatial data, temporal data, topical data and context query bid data that is available via the network and relates to the context so as to identify entries in a context query bid database that relate to user context criteria. A dynamic webpage is generated having content relating to the query and advertisements associated with the selected bid are inserted into the webpage. The dynamic webpage is transmitted to the user. The advertiser associated with the selected bid is charged a fee when a user interface event relating to the dynamic webpage occurs.	05-20-2010
20100125564	Mobile SiteMaps - A method of analyzing documents or relationships between documents includes receiving a notification of an available metadata document containing information about one or more network-accessible documents, obtaining a document format indicator associated with the metadata document, selecting a document crawler using the document format indicator, and crawling at least some of the network-accessible documents using the selected document crawler.	05-20-2010
20100131487	HTTP CACHE WITH URL REWRITING - URL rewriting is a common technique for allowing users to interact with internet resources using easy to remember and search engine friendly URLs. When URL rewriting involves conditions derived for sources other than the URL, inconsistencies in HTTP kernel cache and HTTP user output cache may arise. Methods and a system for rewriting a URL while preserving cache integrity are disclosed herein. Conditions used by a rule set to rewrite a URL may be determined as cache friendly conditions or cache unfriendly conditions. If cache unfriendly conditions exist, the HTTP kernel cache is disabled and the HTTP user output cache is varied based upon a key. If no cache unfriendly conditions exist, then the HTTP kernel cache is not disabled and the HTTP user output cache is not varied. A rule set is applied to the URL and a URL rewrite is performed to create a rewritten URL.	05-27-2010
20100131488	Digital Images of Web Pages - Embodiments of methods, apparatuses, or systems relating to digital images of web pages are disclosed.	05-27-2010
20100145924	Methods and Devices for Locating Information on a Web Page - Methods and systems locating information on a web page are presented. A client device, such wireless communication device, transmits a request for a web page and a search value. A web proxy receives the request and retrieves the web page from a web server. The web proxy then pre-processes the web page to divide it into web page segments. Alternatively, the request may arrive after the web proxy has pre-processed the web page. The web proxy preferably transmits one or more web page segments containing the search value to the client device, and the client device displays these segments. Furthermore, the client device may transmit the request for the web page and the search value directly to the web server, receive the associated web page from the server, and then locate and display the appropriate web page segment(s).	06-10-2010
20100145925	METHOD AND ARRANGEMENT FOR ENABLING COMMUNICATION WITH A CLIENT DEVICE - A method and arrangement for enabling communication with a client device by making a currently valid communication address of the device publicly available. The client device sends a freely composed connectivity key to a publicly available connectivity server, the connectivity key being searchable by means of web searching using a search engine. The client device also sends connectivity parameters to the connectivity server including at least the communication address, which then becomes publicly available in the connectivity server by web searching of the associated connectivity key. If a new currently valid communication address is obtained for the client device, the connectivity parameters can be updated by sending the new communication address to the connectivity server.	06-10-2010
20100145926	SYSTEM FOR PROVIDING ADVERTISEMENTS AND METHOD THEREOF - The present invention relates to an advertisement providing system and a method thereof. The advertisement providing system provides a web page including at least one of web page information, news information, and blog information to a user terminal, and extracts an advertisement keyword of information selected by the user in correspondence to web page provision. When the information selected by the user is an internal page, a web page corresponding to the selected information is provided to the user terminal, advertisement contents matching the extracted advertisement keyword and the number of steps included in user action information are selected and provided based on advertisement register information including a plurality of advertisement contents matching the advertisement keyword and the number of steps, and user action information is updated based on the user action provided by the user terminal. Accordingly, advertisement efficiency can be substantially increased by continuously providing the advertisement desired by the advertisement provider to the user through the number of steps.	06-10-2010
20100161587	Browser Operation With Sets Of Favorites - Methods, apparatus, and products for browser operation with sets of favorites, the browser supporting tabbed browsing, where the browser operation includes opening, by the browser, a first member of a set of favorites in a new tab and loading, by the browser, a Uniform Resource Locator (‘URL’) for each member of the set into navigation memory for back and forward navigation functions for the new tab.	06-24-2010
20100161588	UNSUPERVISED DETECTION OF WEB PAGES CORRESPONDING TO A SIMILARITY CLASS - A method of detecting web pages belonging to at least one similarity class from a plurality of web pages includes determining clusters of the plurality of web pages based on characteristics of the content of the web pages. For each of the determined clusters, at least one metric is determined indicative of similarity among resource locators associated with the web pages of that cluster. A determination of web pages belonging to the at least one similarity class is based on the determined clusters and the determined similarity metrics.	06-24-2010
20100169300	Ranking Oriented Query Clustering and Applications - Techniques described herein allow for suggesting creation of tools for improving search engine performance. Specifically, these tools focus on producing more relevant search engine results via a URL-based query clustering method. These tools first extract tokens from Uniform Resource Locators associated to search queries. With these tokens, these tools form query clusters of common tokens. The resulting clusters can be used to help understand the similarities in user search queries via URL-based cluster queries to produce more relevant search results.	07-01-2010
20100169301	System and method for aggregating and ranking data from a plurality of web sites - System and method for collecting information from a plurality of related sites, analyzing the information and storing the relevant information in a data base for future use. According to one embodiment of the present invention, the system uses the provided list of sites, whether obtained automatically or separately, queries them and analyzes the result retrieved from each site. The information may also optionally and preferably be ranked.	07-01-2010
20100174700	System and method to generate specific DM content for distribution - A content generating process wherein a broadcaster contacts a moderator and makes a request for specific media content. The moderator processes the request for media content and communicates to a population. Said communication prompts or incentivizes at least one member of a population to provide media content to a content library. Said library content may also be filtered by members of a population. The moderator in some aspects supplies media content to a broadcaster who may broadcast the collected media responsive to the request for specific content.	07-08-2010
20100205168	Thread-Based Incremental Web Forum Crawling - The incremental web forum crawling technique described herein is a web forum crawling technique that employs a thread-wise strategy that takes into account thread-level statistics, for example, the number of replies and the frequency of replies, to estimate the activity trend of each thread. To extract such statistical information, the technique employs a simple yet very robust approach to extract the timestamp of each post in a discussion thread. It also employs a regression model to predict the time of the next post for each thread.	08-12-2010
20100223252	METHOD AND SYSTEM FOR WEB SEARCHING - A method and system for providing personalized search results is disclosed. A computer receives input from a user to navigate to a web site (either directly or as a result of choosing a result from a search result page). The computer navigates to the web site and stores information about the web site in a file. The computer determines web sites associated with a search query of the user as the search query is being entered into a search area of a user interface. The associated web sites are sites that have been previously navigated to by the user. The determining step includes obtaining the web sites associated with the search query from a data structure previously generated from the file. The data structure includes parsed entries of URLs associated with the previously navigated web sites. Based on the determining step, web site links corresponding to the associated web sites are displayed as the search query is being entered.	09-02-2010
20100228718	Evaluation of web pages - Determining a web page evaluation value includes obtaining a plurality of web pages with the same or approximately the same content; determining a plurality of generation times and a plurality of first evaluation values that correspond to respective ones of the plurality of web pages; identifying a web page among the plurality of web pages that has the earliest generation time; and determining a second evaluation value of the identified web page according to the plurality of first evaluation values.	09-09-2010
20100228719	PROCESS AND SYSTEM FOR INCORPORATING AUDIT TRAIL INFORMATION OF A MEDIA ASSET INTO THE ASSET ITSELF - An enhanced metadata structure and associated process is provided which captures and stores metadata gathered about the source and usage of a media asset or file. The source and usage metadata is integrated, such as by encoding within the enhanced media file, as the media asset is transferred and used. The integrated metadata accumulates, as a trail of source information and usage information in the enhanced media asset, and can be extracted upon arrival at a target computer system.	09-09-2010
20100241620	APPARATUS AND METHOD FOR DOCUMENT PROCESSING - A computer implemented document processor comprising: an information capture module configured with an interface to multiple document sources and to capture a plurality of data elements from documents sourced; an interface for receiving objectives and/or criteria; an information processor comprising an assessment module operable to analyse and assess a current state indicated by said data elements; one or more of: a profiles module; a scenario module; and a themes module; a memory; and an output module.	09-23-2010
20100241621	Scheduler for Search Engine Crawler - A scheduler for a search engine crawler includes a history log containing document identifiers (e.g., URLs) corresponding to documents (e.g., web pages) on a network (e.g., Internet). The scheduler is configured to process each document identifier in a set of the document identifiers by determining a content change frequency of the document corresponding to the document identifier, determining a first score for the document identifier that is a function of the determined content change frequency of the corresponding document, comparing the first score against a threshold value, and scheduling the corresponding document for indexing based on the results of the comparison.	09-23-2010
20100250515	TRANSFORMING A DESCRIPTION OF SERVICES FOR WEB SERVICES - One embodiment is a method that receives a description of services desired by a service requestor and then crawls web sites to extract information on services offered by service providers. The extracted information is used to transform the description of services desired by the service requestor into an improved description of services.	09-30-2010
20100250516	METHOD AND APPARATUS FOR WEB CRAWLING - A method and system for retrieving data from a webpage is described herein. A scheduler organizes, or rather orders, a group of webpage identifiers according to some predetermined criteria. Based upon this ordering, a fetcher may be configured to fetch data from webpages identified by the identifiers. To promote efficiency and reduce the latency between when a webpage is updated and when the fetcher retrieves data from the webpage, the scheduler may be configured to reorder the identifiers in such a manner that it causes an identifier that was less relevant, and would not have been sent to the fetcher, to become more relevant. In this way, the method and system may be particularly useful for retrieving data related to webpages that are updated frequently, such as social media webpages, for example.	09-30-2010
20100262592	Web Crawler Scheduler that Utilizes Sitemaps from Websites - Methods and systems for a web crawler scheduler that utilizes sitemaps from websites are described. A web crawler scheduling system receives a notification from a website or web server. In response to the notification, the system accesses one or more sitemap(s) for documents associated with the website or web server. The system schedules crawls of the documents based on information identified from the sitemaps. The system crawls at least a subset of the documents scheduled for crawling.	10-14-2010
20100268701	NAVIGATIONAL RANKING FOR FOCUSED CRAWLING - Systems and methods of navigational ranking for focused crawling are disclosed. In an exemplary embodiment, a method may include using a classifier to distinguish at least one target web page from other web pages on a website. The method may also include modeling the web pages on the website by a directed graph G=(V, E), wherein each web page is represented by a vertex (V), and a link between two web pages is represented by an edge (E). The method may also include assigning each web page (u) in V is assigned a weight p(u) based on the classifier to calculate a navigational ranking indicating relevance of a web page.	10-21-2010
20100287155	Software And Method That Enables Selection Of One Of A Plurality Of Online Service Providers - A novel electronic information transport component can be incorporated in a wide range of electronic information products, for example magazine collections, to automate the mass distribution of updates, such as current issues, from a remote server to a wide user base having a diversity of computer stations. Advantages of economy, immediacy and ease of use are provided. Extensions of the invention permit automated electronic catalog shopping with order placement and, optionally, order confirmation. A server-based update distribution service is also provided. In addition, an offline web browser system, with hyperlink redirection capabilities, a novel recorded music product with automated update capabilities and an Internet charging mechanism are provided.	11-11-2010
20100306185	Self Populating Address Book - System, methods and computer program products for creating and maintaining an address book are described. The address book may collect or update its existing contact information from sent or received communications. Contact information associated with the existing contacts also may be collected (or updated based on information received) from outside sources (e.g., external to an application hosting or accessing the address book). The address book may intelligently combine profile data from various sources to enrich the existing records associated with the contacts.	12-02-2010
20100318508	Sitemap Generating Client for Web Crawler - Methods and systems for a sitemap generating client for web crawlers are described. The client accesses one or more sources of document information about the documents available on a website, such as the file system, access logs, or pre-made URL lists. Document information is extracted from the sources and one or more sitemaps are generated based on the extracted document information. A notification is transmitted to a remote computer, informing that the sitemap(s) are available for access and likely have been updated. If the remote computer is associated with a web crawler, the remote computer may access the sitemap(s) and use the sitemaps to schedule a crawl of documents included or available on the website.	12-16-2010
20110029503	APPARATUS AND METHODS FOR MANAGING A SOCIAL MEDIA UNIVERSE - Disclosed are methods and apparatus for managing social media universes. In one embodiment, media content and community members that have been associated with a new concept for creating a new universe are searched on a plurality of media content servers. For each found new concept, an association is retained between the new universe for the new concept and any found media content and community members. When a requesting user requests to view the new universe, a representation of the media content and the community members that are associated with the new universe is displayed for the requesting user.	02-03-2011
20110029504	SEARCHING AND ACCESSING DOCUMENTS ON PRIVATE NETWORKS FOR USE WITH CAPTURES FROM RENDERED DOCUMENTS - A facility for exposing an index of private documents is described. In a private network, the facility (1) identifies electronic versions of documents that are available inside the private network, including a distinguished document; (2) constructs an index covering the identified electronic versions of documents; and (3) exports the constructed index from the private network to an index publication server. At the index publication server, the facility (1) receives the exported index; (2) receives a query via a public network; and (3) uses an index, based upon the received index, to generate a query result for the received query that contains the distinguished document.	02-03-2011
20110035367	Methods And System For Efficient Crawling Of Advertiser Landing Page URLs - A method and apparatus for the efficient storage, retrieval, and processing of landing pages and related metadata for use in a paid search advertising business model is provided. These techniques promote efficient crawling in situations including one landing page associated with multiple sponsored listings belonging to the same or different accounts. One or more landing pages may be crawled, based at least in part on one or more of the landing page URLs represented in a table. In an embodiment, each URL identifier is placed in an active or inactive queue, with only entries in the active queue crawled.	02-10-2011
20110040742	VARIABLE DENSITY QUERY ENGINE - Embodiments of the present invention address deficiencies of the art in respect to search engines and provide a novel and non-obvious method, system and computer program product for a variable density query engine. In an embodiment of the invention, a search engine data processing system can be provided. The system can include a content index, and a variable density search engine coupled to the content index. The variable density search engine can include program code enabled to vary a density of entries in a result set according to a varying size of the result set. In this regard, in one aspect of the embodiment, the density can range from a title for each entry in the result set to a full textual description for each entry in the result set to an audiovisual element for each entry in the result set.	02-17-2011
20110040743	SYSTEM AND METHOD FOR ADDING IDENTITY TO WEB RANK - Embodiments of the present invention provide systems, methods and computer program products for generating search results comprising web documents with associated expert information. One embodiment of a method for generating such search results includes receiving one or more search queries, selecting one of the one or more search queries, determining one or more categories of web documents responsive to the selected search query and crawling a web graph of linked web documents to identify one or more web documents tagged as within the one or more categories responsive to the selected search query. The method further includes generating a result set of the one or more web documents identified as within the one or more categories responsive to the selected search query, ranking the result set and generating a list of ranked search results responsive to the selected search.	02-17-2011
20110047141	SYSTEM AND METHOD FOR INCREASING TRAFFIC TO WEBSITES - The present document describes a system and method for changing volume of traffic to a website. The system comprises: a user database and a promotional server in communication with each other. The user database comprises user accounts. Each account is associated with a user identification and is also for keeping track of benefits. The promotional server is configured to perform the steps of: detecting an access request to the website by a user from a Web browser; consulting a user database to identify the user from the access request; identifying a method of calling the website from a value of a field or a value of a variable in the access request; from at least one of the identity of the user and the method of calling the website, establishing, by consulting a promotions database, a determined amount; increasing a value of the points field associated to the user in the user database by the determined amount; sending instructions to the Web browser to display one of: information indicative of at least one of an increase in, and the possibility to increase, the points field associated to the user in the user database, the points field being representative of a benefit to the user; and information indicative of the possibility for the user to register in the user database.	02-24-2011
20110055194	System and Method for Asynchronous Crawling of Enterprise Applications - Methods and systems for providing searchable data associated with enterprise applications are provided. An asynchronous feed may be generated from data stored in a database and searched by search engine crawlers. The feed may be populated with searchable data based on a searchable object definition that describes the location of searchable data within the database. The feed also may enforce access restrictions set by the enterprise applications to prevent unauthorized access to the searchable data.	03-03-2011
20110066609	Crawling Browser-Accessible Applications - Crawling a browser-accessible application by causing a target application and a bridge application to run concurrently in a browser-controllable player, and iteratively receiving from the bridge application current state information of the target application, storing the state information on a data storage device if the state information is not found on the data storage device, where the state information is stored as a descendant state of an initial state of the target application, and interacting with the target application in accordance with a predefined simulation algorithm, thereby effecting a new state of the target application, until a predefined termination condition is reached.	03-17-2011
20110072000	SYSTEMS AND METHODS FOR PROVIDING ADVANCED SEARCH RESULT PAGE CONTENT - The present invention provides a method and system for generating search results including receiving a search request and accessing a corpus of data relating to web content to determine relevant content. The method and system includes determining at least one semantic object in the search results set and generating an object filter on the basis of the at least one semantic object. The method and system further includes generating a search result output display for the presentation of at least a portion of the search result set and active data links for one or more of the semantic objects and toggling the search result output display to present at least a portion of a subset of the search results set in response to selection of a given active data link, the subset including content having semantic object associated therewith.	03-24-2011
20110072001	SYSTEMS AND METHODS FOR PROVIDING ADVANCED SEARCH RESULT PAGE CONTENT - The present invention provides a method and system for generating search results including receiving a search request including at least one search term and accessing a corpus of data to determine relevant content for inclusion in a search result set on the basis of the search request. The method and system includes determining a plurality of applications associated with the search request and generating a search result output display for the presentation of at least a portion of the search result set and at least a portion of the applications.	03-24-2011
20110082853	SYSTEM AND METHOD FOR EXTRACTING CONTENT FOR SUBMISSION TO A SEARCH ENGINE - A system and a method for automatically submitting Web pages to a search engine, which is preferably used for submitting dynamic Web pages, but may optionally be used for any type of Web page. The present invention features a gateway server for providing these Web pages to the search engine, either directly or optionally through an autonomous software search program. Optionally and more preferably, the gateway server modifies the Web page before serving it to the autonomous software search program and/or search engine.	04-07-2011
20110087646	Method and System for Form-Filling Crawl and Associating Rich Keywords - Techniques are provided for the efficient location, processing, and retrieval of local product information derived from web pages generally locatable through form queries submitted to web pages often referred to as the “deep” or “hidden” web. In an embodiment, information such as product information and dealer-location information is located on a web page form such as a dealer-locator form. After location of a suitable web page form, editorial wrapping is performed to create an automated information extraction process. Using the automated information extractor, deep-web crawling is performed. A grid-based extraction of individual business records is performed, and matching and ingestion are performed in conjunction with a business listing database. Finally, metadata tags are added to entries in the business listing database. Metadata tags also may be added to entries in other databases.	04-14-2011
20110087647	SYSTEM AND METHOD FOR PROVIDING WEB SEARCH RESULTS TO A PARTICULAR COMPUTER USER BASED ON THE POPULARITY OF THE SEARCH RESULTS WITH OTHER COMPUTER USERS - A system and method for providing Web search results to a particular computer user based on the popularity of the search results with other computer users is described. One embodiment monitors, using one or more servers, at least one Web service for new actions of sharing of Web content by computer users; identifies, from the new actions of sharing of Web content by computer users, a data item that satisfies predetermined interestingness criteria; parses the data item to obtain at least one Uniform Resource Locator (URL); crawls at least one Web page corresponding to the at least one URL to obtain the content of the at least one Web page; analyzes the content of the at least one Web page; and updates an index based on the content of the at least one Web page, the index being usable in processing a Web search query from a particular user.	04-14-2011
20110087648	SEARCH SPAM ANALYSIS AND DETECTION - Defeating click-through cloaking includes retrieving a search results page to set a browser variable, inserting a link to a page into the search results page and clicking through to the page using the inserted link. Investigating cloaking includes providing script associated with a suspected spam URL, modifying the script to de-obfuscate the script and executing the modified script to reveal cloaking logic associated with the script.	04-14-2011
20110099159	System and Methods for Web Data Transformation Sourcing - A computer-implemented system for web data transformation sourcing is disclosed to include a search module defined to receive a set of original input data types and a set of ultimate output data types. The search module is defined to locate one or more web based sources defined to transform the set of original input data types into the set of ultimate output data types. The search module is further defined to generate a transformation solution that when executed utilizes the one or more located web based sources to transform the set of original input data types into the set of ultimate output data types. The transformation solution is digitally conveyed.	04-28-2011
20110106786	HOSTED SEARCHING OF PRIVATE LOCAL AREA NETWORK INFORMATION WITH SUPPORT FOR ADD-ON APPLICATION - Hosted searching of private LAN information is described. The apparatus includes a LAN crawler to automatically and repeatedly crawl a LAN having multiple devices, using a discovery module to discover the devices, a generic-probing module to attempt to collect the descriptive information according to a first set of probing requirements, and multiple specific-probing plug-ins each of which attempt to collect the descriptive information according to a second set of specific probing requirements. In another embodiment, the apparatus also includes a hosted on-demand search system including a centralized-search server to create and synchronize a private search database. The centralized-search server includes an application interface to receive a request to access the private search database from a third-party add-on application, to provide the accessed information to the third-party add-on application, and to receive from the third-party add-on application an application rendered component to be displayed on the user interface.	05-05-2011
20110106787	HOSTED SEARCHING OF PRIVATE LOCAL AREA NETWORK INFORMATION - Hosted searching of different local area network (LAN) information is described. The apparatus for hosted searching of different private LAN information includes a LAN crawler to automatically and repeatedly crawl a LAN having multiple devices, and a hosted on-demand search system including a set of one or more centralized-search servers to create and synchronize a separate private search database for each of the private LANs based on received reports from of different instances of the LAN crawler deployed on the multiple private LANs, at least some of which are operated by different entities.	05-05-2011
20110119247	METHOD AND APPARATUS FOR OBTAINING AND PROVIDING ADDITIONAL INFORMATION ABOUT WEB RESOURCES - A method of obtaining information in a client connected to a web server, the method comprising: requesting the web server for additional information about at least one web resource; and receiving the additional information about the at least one web resource from the web server. A method of providing information in a web server connected to a client, the method comprising: receiving a request for additional information about at least one web resource from the client; and transmitting the additional information about the at least one web resource to the client in response to the request.	05-19-2011
20110125726	SMART ALGORITHM FOR READING FROM CRAWL QUEUE - A smart algorithm for processing transaction from a crawl queue. If the crawler has in memory a predetermined number of URLs for a given host, the crawler reads from the crawl queue URLs from other hosts. As a result the crawler processes multiple hosts concurrently, and thus, uses machine resources more effectively and efficiently to process the URLs. The smart algorithm can further consider other criteria in deciding which URLs to read from the queue. These criteria can include the response time for each repository (host) the crawler processes. Additionally, the crawler can allocate its resources according to content groups (e.g., two pools), one group for faster content delivery and the second group one for slower content delivery. Thus, crawler resources can be partitioned or divided across different pools depending on repository response time. Other criteria can be provided and considered as well.	05-26-2011
20110137885	Adaptive Selection of a Search Engine on a Wireless Communication Device - A wireless communication device communicates with a web site over an established communication link. The device includes a controller that automatically detects whether the website provides a search engine for use by a user. If the web site provides search engine, a controller at the device configures an adaptive, context-sensitive search function of the browser to receive user input, and to perform a search based on the user input. A user interface includes a display to display the results of the search.	06-09-2011
20110137886	Data-Centric Search Engine Architecture - Described is a data-centric web search engine technology/architecture, in which document metadata, including offline-extracted metadata, is used as part of a search indexing and ranking pipeline. A web data management component receives crawled documents and extracts document metadata from the documents. An indexing component uses the document metadata to build an index for the documents. A serving component uses the index and the document metadata to serve content, e.g., search results. Also described is the use of query metadata extracted from queries of a query log for use in the pipeline.	06-09-2011
20110145215	METHOD FOR ANALYZING WEB SPACE DATA - A method for analyzing data from the web that determine the importance that a chosen subject has in society, e.g., subject matter relating a concert, a scientific discovery, a football match, a person, a corporation, a brand, or a car, and analyze such data that can represent the entire society better than the known techniques. The method according to the invention can avoid malicious alterations and is able to measure and detect the temporal relations among all the web resources that talk about a particular topic or subject matter.	06-16-2011
20110145216	FILE CHANGE DETECTOR AND TRACKER - A method for detecting changes in a computing environment. In an example embodiment, the method includes observing a file system of the computing environment during a predetermined time interval and providing a signal when a predetermined change to the file system is detected during the predetermined time interval; employing the signal to log a description of detected file system changes; and using a logged description of the file system changes to perform an incremental crawl of the file system. In a more specific embodiment, the predetermined time interval includes an interval of time between crawls of the file system. The predetermined change to the file system includes a change to content of a file included in the file system, a change in user access rights to a file, a change in a location of a file of the file system, a change in a folder of the file system, a deletion of a file or folder in the file system, and so on.	06-16-2011
20110145217	SYSTEMS AND METHODS FOR FACILITATING DATA DISCOVERY - A system for facilitating data discovery on a network, wherein the network has one or more data storage devices. The system may include a crawler program configured to select at least a first set of files and a second set of files, each of the first set of files and the second set of files being stored in at least one of the one or more data storage devices. The system may also include a data fetcher program configured to obtain a copy of the first set of files, the data fetcher program being further configured to resist against obtaining a copy of the second set of files. The system may also include circuit hardware implementing one or more functions of one or more of the crawler program and the data fetcher program.	06-16-2011
20110145218	SEARCH SERVICE ADMINISTRATION WEB SERVICE PROTOCOL - The embodiments described herein generally relate to a method and system for enabling a client to configure and control the crawling function available through a crawl configuration Web service. A client is able to configure and control the crawling function by defining the URL space of the crawl. Such space may be defined by configuring the starting point(s) and other properties of the crawl. The client further configures the crawling function by creating and configuring a content source and/or a crawl rule. Further, a client defines authentication information applicable to the crawl to enable the discovery and retrieval of electronic documents requiring authentication and/or authorization information for access thereof. A protocol governs the format, structure and syntax (using a Web Services Description Language schema) of messages for communicating to and from the Web crawler through an application programming interface on a server hosting the crawler application.	06-16-2011
20110145219	OBJECTIVE AND SUBJECTIVE RANKING OF COMMENTS - A system may receive a request for comments associated with a particular document, identify a comment associated with the particular document, generate an objective score for the comment that is independent of a user associated with the request, identify the user associated with the request, generate a subjective score for the comment based on parameters associated with the identified user, generate a combined score for the comment by combining the objective score and the subjective score, and provide the comment, ranked based on the combined score, to the user for presentation with the particular document.	06-16-2011
20110153588	CREATION OF AD HOC SOCIAL NETWORKS BASED ON ISSUE IDENTIFICATION - An external network is crawled (searched) to identify issues that may exist in the external network. Once an issue is identified, internal networks are crawled to determine one or more experts on the issue. A social network of the one or more experts is created. This way, the experts can quickly address the issue.	06-23-2011
20110153589	DOCUMENT INDEXING BASED ON CATEGORIZATION AND PRIORITIZATION - Disclosed are methods and systems for improving indexing throughput. The methods and systems involve receiving one or more documents for indexing, categorizing the one or more documents based on a document type, a document size and a processing priority, assigning buckets to the categorized one or more documents according to the document type, the document size and the processing priority and scheduling the buckets for processing based on a document type priority, a bucket type and number of threads available to process the buckets.	06-23-2011
20110161309	Method Of Sorting The Result Set Of A Search Engine - A method is disclosed wherein the webpages listed in the result set of a search engine is sorted according to the relevance of the webpages to a list of prioritised search terms. Search terms which are phrases that are delimited by prepositions are considered search terms with high priority. Search terms which nouns are set to high priority. Search terms which are adjectives, verbs, auxiliary verbs, articles, conjunctions, pronouns and prepositions are set to low priority.	06-30-2011
20110167053	VISUAL AND MULTI-DIMENSIONAL SEARCH - A system that can analyze a multi-dimensional input thereafter establishing a search query based upon extracted features from the input. In a particular example, an image can be used as an input to a search mechanism. Pattern recognition and image analysis can be applied to the image thereafter establishing a search query that corresponds to features extracted from the image input. The system can also facilitate indexing multi-dimensional searchable items thereby making them available to be retrieved as results to a search query. More particularly, the system can employ text analysis, pattern and/or speech recognition mechanisms to extract features from searchable items. These extracted features can be employed to index the searchable items.	07-07-2011
20110173176	Automatic Generation of an Interest Network and Tag Filter - Computer system and method automatically generate a social interest network. The social interest network indicates or represents (1) respective relevance between system users and taggers, and (2) respective affinity between users and taggers. A tag-based search engine searches and retrieves tagged contents. The search engine also retrieves semantic information associated with the tagged contents and tagger. Semantic information about the searcher-user is compared to the search retrieved semantic information. A comparator determines respective relevance of taggers to the searcher-user and respective affinity of the searcher-user to the taggers. The social interest network results and enables collaboration between users/taggers and filtering of various search results.	07-14-2011
20110173177	SIGHTFUL CACHE: EFFICIENT INVALIDATION FOR SEARCH ENGINE CACHING - Updated queries are maintained in a cache. A search engine receives a query from a user through a query entry field. The search engine determines search results corresponding to the user query. A new entry mapping the user query to the search results is generated in a cache of results. A web crawler retrieves a new batch of documents for a particular document collection. A search index associated with a search engine is updated to reflect new documents in the document collection. A search engine of queries receives documents from the new batch of documents as inputs. Based on the received documents, the search engine of queries determines which of the queries would have returned the documents as relevant in a search. These queries are determined to be stale and invalidated.	07-14-2011
20110173178	METHOD AND SYSTEM FOR OBTAINING SCRIPT RELATED INFORMATION FOR WEBSITE CRAWLING - A web crawler system has an automatic website crawler and a virtual browser that provides script related information to the website crawler. The virtual browser transforms an HTML document included in a web page of the website into an XML document, and builds a document object model containing document objects in a tree structure based on the XML document. The virtual browser extracts from the DOM scripts that are potentially executable, and executes the extracted scripts using a browser object model provided for the virtual browser containing objects and methods and properties that are used for script execution so as to capture script related information generated by execution of the scripts.	07-14-2011
20110173179	SEARCH ENGINE AND METHOD WITH IMPROVED RELEVANCY, SCOPE, AND TIMELINESS - A search engine and a method achieve timeliness of documents returned in a search result by a relevancy feedback mechanism driven by the frequency in which a URL is returned in recent searches. The relevancy feedback mechanism includes one or more random processes which determine whether or not a cached or indexed web page associated with a URL in the search result should be refreshed. In addition, the random processes also determine whether or not hyperlinks in the cached or indexed web page should be followed to access related web pages. Accesses of web pages resulting from the operations of the random processes are used to update any document index maintained by the search engine. Relevancy scoring functions implemented in look-up tables are also disclosed. A more accurate relevancy scoring function is achieved using a lexicon based on anchortexts of extracted hyperlinks of web documents.	07-14-2011
20110179010	METHOD AND APPARATUS FOR PROVIDING SUPPLEMENTAL VIDEO CONTENT FOR THIRD PARTY WEBSITES - A method, apparatus and article of manufacture for providing supplemental video content for third party websites is disclosed. In one embodiment, coded instructions are transmitted from a content enhancement server to a host server, for incorporation into the webpage source code. The host server is controlled by a first entity and the content enhancement server is controlled by a second entity commercially distinct from the first entity. Keywords are generated by execution of the coded instructions in the webpage received in the client computer from the host server, and the keywords are sent to a content enhancement server, which generates supplemental substantive video content information for transmission to the client.	07-21-2011
20110179011	DATA OBFUSCATION SYSTEM, METHOD, AND COMPUTER IMPLEMENTATION OF DATA OBFUSCATION FOR SECRET DATABASES - A data obfuscation system, method, and computer implementation via software or hardware allows a legitimate user to gain access via a query to data of sufficient granularity to be useful while maintaining the confidentiality of sensitive information about individual records. Output values of a data request are obfuscated in a repeatable manner, via the use of an Obfuscating Function (OF), while maintaining the amount of obfuscation within a range so that the transformed values provide to a user information of a prescribed level of granularity. The data obfuscating system and method is particularly applicable to databases. The data obfuscation engine may be implemented in hardware and/or software within a stand alone or distributed environment.	07-21-2011
20110191321	CONTEXTUAL DISPLAY ADVERTISEMENTS FOR A WEBPAGE - Embodiments of the invention disclose an advertisement or segment of a webpage that displays suggested search queries as selectable links. Suggested queries may be based on content associated with the webpage, or the description of the webpage (such as a URL), or default suggestions. In one example, content of a page is crawled for terms that are mapped to suggested queries. Queries may be represented as textual links or multimedia images embedded in pages accessed over a network, and selection of a query may direct or enhance search engine traffic.	08-04-2011
20110196854	Providing a www access to a web page - A method and a system for providing an Internet access to a web page or a website are disclosed. The files defining the websites are accessed and indexed locally, which allows a publisher or a user of the web site to control the keywords by which the web page or a website can be found on the Internet. The user makes the web page or the website searchable by inputting the index into a search engine available to Internet users. The search engine is adapted to process queries of index input.	08-11-2011
20110202521	Enhanced database search features and methods - The invention teaches systems, methods and devices for searching data storage systems and devices. It is emphasized that this abstract is provided to comply with the rules requiring an abstract that will allow a searcher or other reader to quickly ascertain the subject matter of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims.	08-18-2011
20110208714	LARGE SCALE SEARCH BOT DETECTION - A framework may be used for identifying low-rate search bot traffic within query logs by capturing groups of distributed, coordinated search bots. Search log data may be input to a history-based anomaly detection engine to determine if query-click pairs associated with a query are suspicious in view of historical query-click pairs for the query. Users associated with suspicious query-click pairs may be input to a matrix-based bot detection engine to determine correlations between queries submitted by the users. Those users indicating strong correlations may be categorized as bots, whereas those who do not may be categorized as part of flash crowd traffic.	08-25-2011
20110208715	AUTOMATICALLY MINING INTENTS OF A GROUP OF QUERIES - The automatic search intent mining technique described herein pertains to a technique for mining search intent from a group of queries. The automatic search intent mining technique described herein automatically mines search intents from a group of queries. The technique leverages knowledge of query log data in order to determine search intent. The automatic search intent mining technique, in one embodiment, utilizes three kinds of information sources: Web page content, Web page structure and search engine query log data to mine intents for a group of queries. In one embodiment of the technique, the three data sources are used separately to mine candidate search intents for each of the three sources. The candidate search intents extracted from each of the three sources are then integrated to form the final search intents.	08-25-2011
20110213764	Dynamic Search Health Monitoring - A method for monitoring search performance on a server computer includes determining the processing time for a plurality of operations related to a search on the server computer. The determined processing time for each of the plurality of operations is stored in a database. Aggregate processing times are determined for the plurality of operations and the aggregate processing times are stored in the database.	09-01-2011
20110218986	SEARCH ENGINE OPTIMIZATION ECONOMIC PURCHASING METHOD - In a computer workflow environment wherein alternate pages containing various levels having mixed or fixed maxima, minima, and between results pages, various page IP Address Binaries containing geographical location and market audience sizes, and econometric pricing across various levels search engine page states, national pages, wherein including limitations or additions to variables, to optimize profit by allowing computer users access to affordable choices upon performance.	09-08-2011
20110225139	USER ROLE BASED CUSTOMIZABLE SEMANTIC SEARCH - User role based customizable searches, where crawled documents may be evaluated against user roles or attributes during crawl time, are provided. Metadata retrieved from searched documents may also be evaluated against the user roles and/or attributes such that customized search results ranking documents based on their content beyond textual content may be provided.	09-15-2011
20110225140	SYSTEM AND METHOD FOR DETERMINING AUTHORITY RANKING FOR CONTEMPORANEOUS CONTENT - The present invention provides a method and system for weighting contemporaneous content including, in response to a user content request, determining a plurality of contemporaneous content items relating to the user content request, the contemporaneous content items including ultra-fresh content items having been only recently generated. In the method and system, for each of the contemporaneous content items, identifying one or more authors of the content items and determining an expertise level for the one or more authors and determining an expert weighting for each of the content items based on the expertise level for the corresponding one or more authors. The method and system further includes ranking the contemporaneous content items in response to the user content request based on the expert weighting and presenting at least a portion of the contemporaneous content items in response to the user content request.	09-15-2011
20110225141	Distributed Catalog, Data Store, and Indexing - This disclosure relates to a system and method for distributed catalog processing, data caching, and indexing to create an efficient, scalable, secure, high availability, disaster recovery enabled backup and storage system.	09-15-2011
20110231384	EVOLUTIONARY TAGGER - The invention is a process, system, workflow system for data retrieval processes, software, Web Site, service and SaaS (Software as a Service) created to support a data retrieval process from various document types to custom or preset retrieval data structures. The program supports manual, automatic and semiautomatic data retrieval using its internal features or external add-ons. It links data points in the structure to the corresponding data points in the document, stores documents, structures and links between them and outputs results in various formats. Links between a document and a retrieval data structure are established either automatically or manually by the user. After all required links are set, results can be retrieved from the program as an XML (Extensible Markup Language) structure with required data or as a PDF (Portable Document Format) or HTML (Hypertext Format Language), in MS Office formats and others containing a/the retrieval data structure, the original document or both with links between corresponding data points.	09-22-2011
20110231385	OBJECT ORIENTED DATA AND METADATA BASED SEARCH - An object oriented search mechanism extracts structural metadata and data based on type of document contents and data sources connected to the documents. Relationships between textual and non-textual elements within documents as well as metadata associated with the elements and data sources are utilized to generate a unified object model with the addition of semantic information derived from metadata and taxonomy, which are used to enhance search indexing, ranking of search results, and dynamic adjustment of result rendering user interface with fine tuned relevancy. Additional data from data sources connected to the documents may also be used to unlock hidden data such as data that has been filtered out in an original document.	09-22-2011
20110231386	INDEXING AND SEARCHING EMPLOYING VIRTUAL DOCUMENTS - Relationships between linked and/or embedded documents as well as documents sharing data source(s) are captured and rendered through virtual documents. Virtual documents are created representing linked/embedded documents and data sources associated with a relevant document. Relationships between real and virtual documents are preserved and rendered along with search results providing a user a comprehensive picture of search results.	09-22-2011
20110231387	ENGAGING CONTENT PROVISION - A model is created and from seed trivia facts will create a database of pruned and ranked trivia facts and associated trigger terms. Search, email, or other information provider systems are configured to detect usage of the trigger terms and provide relevant trivia facts in response to the usage.	09-22-2011
20110238653	PARSING AND INDEXING DYNAMIC REPORTS - A parsing and indexing mechanism for dynamically generated reports is provided. Upon detection of a dynamically generated report, a data source for the dynamically generated report may be identified based on metadata or other information associated with the report. Crawleable or machine readable metadata and data may be generated using the data source such that data represented in the report and/or other relevant data from the data source can be indexed and searched.	09-29-2011
20110246442	Location Activity Search Engine Computer System - A computer system that includes a computer that couples with a database. The computer includes program code or modules to gather location and activity content from disparate sources, and through text analytics, extract associations from the content and populate the database with the associations between locations and activities. Further modules provide end user interaction through presentation of a search user interface specific to locations and activities. Additional modules provide the capability to search the database, rank the results of the search and present the results to the user.	10-06-2011
20110246443	SUGGESTED CONTENT WITH ATTRIBUTE PARAMETERIZATION - A flexible and extensible architecture allows for secure searching across an enterprise. Such an architecture can provide a simple Internet-like search experience to users searching secure content inside (and outside) the enterprise. The architecture allows for the crawling and searching of a variety of sources across an enterprise, regardless of whether any of these sources conform to a conventional user role model. The architecture further allows for security attributes to be submitted at query time, for example, in order to provide real-time secure access to enterprise resources. The user query also can be transformed to provide for dynamic querying that provides for a more current result list than can be obtained for static queries.	10-06-2011
20110258175	MARKER SEARCH SYSTEM FOR AUGMENTED REALITY SERVICE - A search system, a user device, and a server for AR service are disclosed. The search system includes a search engine configured to search a web content and a marker, in response to an input of a user, a matching unit configured to match the searched web content with the searched marker, and an output unit configured to transmit a document including the searched web content and the searched marker to the user.	10-20-2011
20110258176	Minimizing Visibility of Stale Content in Web Searching Including Revising Web Crawl Intervals of Documents - A method and system is disclosed for associating an appropriate web crawl interval with a document so that the probability of the document's stale content being used by a search engine is below an acceptable level when the search engine crawls the document at its associated web crawl interval. The web crawl interval of a document is determined through an iterative process and updated dynamically by the search engine after every visit to the document by a web crawler. A multi-tier data structure is employed for managing the web crawl order of billions of documents on the Internet. The search engine may move a document from one tier to another if its web crawl interval is changed significantly.	10-20-2011
20110270820	Dynamic Indexing while Authoring and Computerized Search Methods - Disclosed herein is a computer-implemented method of dynamically indexing content at the time of authoring or generating content, comprising: applying an authoring or editing or translating or capturing tool for generating content, associated with an autonomous indexer and sorter application; dynamically parsing, indexing and sorting the content in the background as per a lexicon or attributes; storing the content and the related index in a computer network and updating the index in a search engine manager or master or metadata. The method described further comprising the authoring or editing or translating tool is associated with a spellchecker in the indexer and sorter application, for spellchecking the terms before indexing.	11-03-2011
20110270821	ADDING DOMINANT MEDIA ELEMENTS TO SEARCH RESULTS - A method and system for determining dominance of the media elements of display pages is provided. The dominance system provides a scoring mechanism for scoring the dominance of media elements of display pages based on features of each media element of the display page. To generate the scores for the media elements of the display page, the dominance system first identifies the media elements and then identifies the features of the media elements. The dominance system then scores the identified media elements using the provided scoring mechanism and the identified features.	11-03-2011
20110276561	Representative Document Selection for Sets of Duplicate Documents in a Web Crawler System - Duplicate documents are detected in a web crawler system. Upon receiving a newly crawled document, a set of documents, if any, sharing the same content as the newly crawled document is identified. Information identifying the newly crawled document and the selected set of documents is merged into information identifying a new set of documents. Duplicate documents are included and excluded from the new set of documents based on a query independent metric for each such document. A single representative document for the new set of documents is identified in accordance with a set of predefined conditions.	11-10-2011
20110282858	Hierarchical Content Classification Into Deep Taxonomies - A document may be classified by traversing a hierarchical classification tree and comparing the words in the document to words in documents representing the nodes on the classification tree. The document may be classified by traversing the classification tree and generating a comparison score based on word comparisons. The score may be used to trim the classification tree or to advance to another node on the tree. The score may be based on a scarcity or importance of individual words in the document compared to the scarcity or importance of words in the category. The result may be a set of classifications with scores for those classifications.	11-17-2011
20110282859	IDENTIFYING UNIVERSAL RESOURCE LOCATOR REWRITING RULES - A computer-implemented process for identifying universal resource locator rewriting rules may receive input of universal resource locators of an application, to form received universal resource locators, may represent the received universal resource locators in a specialized graph and may apply analysis algorithms and heuristics to properties of the specialized graph. The computer-implemented process may further identify universal resource locator rewriting patterns using the specialized graph to form detected patterns and may generate rewrite rules corresponding to the detected patterns.	11-17-2011
20110282860	DATA COLLECTION, TRACKING, AND ANALYSIS FOR MULTIPLE MEDIA INCLUDING IMPACT ANALYSIS AND INFLUENCE TRACKING - A system is disclosed for data collection, media analysis, and web tracking. The collected data may include a broad search for a reference database and a narrow search for a comparative database. A contact relationship management database is used to store and distribute profiles for individuals and companies. An RSS feed database may update frequently and provide relevant search results. The system may analyze the collected data and tracking of that data. Analysis may be used to identify relevant data. Profiling of users and businesses may be used for targeting and generating profile data that may include specific information for a user or business. Monitoring and/or tracking may be used for identifying changes in data. The system may provide an analysis of impact of an event/source based on user impressions or web hits in view of a particular event/source. The impact may include a social influence value. In another embodiment, a return on investment (“ROI”) in view of the influence is provided.	11-17-2011
20110289068	PERSONALIZED NAVIGATION USING A SEARCH ENGINE - Personalized navigation for one or more individuals' use of a search engine is provided. Identification of a query submitted to the search engine is performed. If the query is identified to be a personal navigational query, which is a query via which the individuals intend to navigate to a particular site or information object that they have previously viewed, the particular site or information object associated with the query is identified, and results of the search are personalized based on knowledge of the identified site or information object.	11-24-2011
20110295832	Identifying Communities in an Information Network - Techniques for identifying one or more communities in an information network are provided. The techniques include collecting one or more nodes and one or more edges from an information network, performing a random walk on the one or more nodes to produce a sequence of one or more nodes, creating a sequence database from one or more sequences produced via random walk, and mining the sequence database to determine one or more patterns in the network, wherein the one or more patterns identify one or more communities in the information network.	12-01-2011
20110302147	METHODS AND APPARATUS FOR COMPUTING GRAPH SIMILARITY VIA SEQUENCE SIMILARITY - This disclosure describes systems and methods for identifying and correcting anomalies in web graphs. A web graph is transformed into a sequence of tokens via a walk algorithm. The sequence is fingerprinted to form a set of shingles. The singles are compared to shingles for other web graphs in order to determine similarity between web graphs. Actions are then carried out to remove anomalous web graphs and modify parameters governing web mapping in order to decrease the likelihood of future anomalous web graphs being built.	12-08-2011
20110307467	DISTRIBUTED WEB CRAWLER ARCHITECTURE - A distributed web crawler architecture is provided. An example system comprises a work items, a duplicate request detector, and a callback module. The work items monitor may be configured to detect a first work item from a first web crawler, the work item related to a URL. The duplicate request detector may be configured to determine that a second work item associated with the URL is present in a work queue, the work queue to provide work items to a fetcher, the second work item associated with a second web crawler The callback module may be configured to create a callback for the first web crawler, the callback indicating that a web page retrieved as a result of processing of the second work item is to be provided to the first web crawler, without queuing the first work item.	12-15-2011
20110307468	SYSTEM AND METHOD FOR IDENTIFYING CONTENT SENSITIVE AUTHORITIES FROM VERY LARGE SCALE NETWORKS - A method and system for identifying nodes with similar content. In one aspect, the method comprises determining a structure of a network of nodes, said structure defined by incoming links and outgoing links between nodes within said network, grouping said nodes within said network into a first set of modules, calculating a first modularity value between each of the modules within the first set, said modularity value indicating a degree of similar content within each module, calculating a topical relevance value for each of the modules, selecting those modules whose topical relevance value exceeds a threshold value and calculating an authority score for the selected modules.	12-15-2011
20110313996	CAMPAIGN TRACKING PLATFORM FOR SOCIAL MEDIA MARKETING - Methods and systems for facilitating a campaign tracking platform for social media marketing are provided. According to one embodiment, a method for collecting click information regarding tracking links is provided. A tracking link is generated corresponding to a target source of content through which a subscriber of the social media campaign tracking platform can share the content with third parties via social media. The tracking link has encoded therein structured metadata indicative of a social media action within which the tracking link is contained. Responsive to receiving a click-through event for the tracking link from a requestor, click information is stored in a consumption database associated with the social media campaign tracking platform and the requestor is redirected to the target source.	12-22-2011
20110313997	SYSTEM AND METHOD FOR PROVIDING A CONSOLIDATED SERVICE FOR A HOMEPAGE - A total homepage service providing system includes an information provider information administration unit configured to register and administrate information of an information appliance of an information provider and information of the information provider; a homepage generation unit configured to automatically generate a homepage which can be displayed on the information appliance of the information provider and an information appliance of an information user, using metadata received from the information appliance of the information provider; a homepage registration and administration unit configured to store a file of the generated homepage, and register and administrate the homepage; and an index generation and administration unit configured to generate one or more homepage indexes for an information search, using keywords extracted and classified from the generated homepage, and administrate the generated homepage indexes.	12-22-2011
20110320426	RICH SITE MAPS - Providing a website map to a user. A method includes gathering information about web pages in a website, including information related to web page relationships, controls, and executable code underlying one or more web pages in the website. A relationship map is created. The relationship map includes representations of relationships between the web pages, the controls and the executable code underlying one or more web pages in the website. The method further includes graphically displaying at least a portion of the relationship map in a graphical user interface at the computing system.	12-29-2011
20110320427	SYSTEM AND METHOD FOR COLLECTING DOCUMENT - Provided is a system and method for collecting a document. The system may include an identification information receiver to receive, from a host of a site, identification information of a document of which an update may occur, a collection request transfer unit to transmit a collection request for the document based on the identification information, an update information collector to receive update information of the document from the host, and a search result provider to provide, to the host, a search result extracted from the update information of the document, in response to the search request being received from the host. The system for collecting the document may reduce load of a web site, and may improve accuracy of the document to be collected.	12-29-2011
20120023089	METHOD TO SEARCH A TASK-BASED WEB INTERACTION - Presented is a method, system and computer readable product to search a task-based web interaction. A task-based web interaction search query is provided to a search engine. The search results are classified into a set of information parameters. The information parameters are compared against a repository containing multiple sets of information parameters. Upon identification of a corresponding set of information parameters, a task-based web interaction associated with the identified set of information parameters is presented.	01-26-2012
20120023090	METHODS AND APPARATUSES FOR PROVIDING INTERNET-BASED PROXY SERVICES - A proxy server receives, from multiple visitors of multiple client devices, a plurality of requests for actions to be performed on identified network resources belonging to a plurality of origin servers. At least some of the origin servers belong to different domains and are owned by different entities. The proxy server and the origin servers are also owned by different entities. The proxy server analyzes each request it receives to determine whether that request poses a threat and whether the visitor belonging to the request poses a threat. The proxy server blocks those requests from visitors that pose a threat or in which the request itself poses a threat. The proxy server transmits the requests that are not a threat and is from a visitor that is not a threat to the appropriate origin server.	01-26-2012
20120023091	System and Method for Enabling Website Owner to Manage Crawl Rate in a Website Indexing System - Web crawlers crawl websites to access documents of the website for purposes of indexing the documents for search engines. The web crawlers crawl a specified website at a crawl rate that is based on multiple factors. One of the factors is a pre-set crawl rate limit. According to certain embodiments, an owner for a specified website is enabled to modify the crawl rate limit for the specified website when one or more pre-set criteria are met.	01-26-2012
20120030187	SYSTEM, METHOD AND APPARATUS FOR TRACKING DIGITAL CONTENT OBJECTS - A system and method for secure document management including tagging and/or remotely tracking documents exchanged between one or more users and a document repository. In some embodiments, the security policies for documents are determined based at least in part on document content, metadata associated with the document, and/or usage history of the document.	02-02-2012
20120036118	Web Crawler Scheduler that Utilizes Sitemaps from Websites - Methods and systems for a web crawler scheduler that utilizes sitemaps from websites are described. A web crawler scheduling system receives a notification from a website or web server. In response to the notification, the system accesses one or more sitemap(s) for documents associated with the website or web server. The system schedules crawls of the documents based on information identified from the sitemaps. The system crawls at least a subset of the documents scheduled for crawling.	02-09-2012
20120041938	OPERATIONALIZING SEARCH ENGINE OPTIMIZATION - A method for managing reference to an entity on a network includes determining shares of voice for an entity and other entities across a plurality of channels with respect to a plurality of search terms. The method also includes correlating shares of voice for the entity and the other entities with respect the search terms to determine a relative change in share of voice for the entity with respect to the other entities. Thereafter, shares of voice for the entity across the plurality of channels may be correlated to determine relative changes in share of voice for the entity within each of the channels. The relative change in share of voice for the entity with respect to the other entities and the relative changes in share of voice for the entity within each of the channels may then be displayed.	02-16-2012
20120041939	System and Method for Unification of User Identifiers in Web Harvesting - Web Intelligence that automatically associate different user identifiers that belong to the same user. An analytics system may include a Web crawler that crawls Web-sites of interest, e.g., social media Web-sites. The Web crawler retrieves from the Web-sites data items that were posted by users, who identified themselves on the Web-sites using various user identifiers (e.g., usernames or nicknames). The system may further include a correlation processor that automatically correlates user identifiers that appear in the retrieved data items. The correlation processor may identify different user identifiers that are used by the same user on different Web-sites. Once two or more identifiers have been associated with a given user, the network content and network activity of that user can be jointly analyzed and acted upon.	02-16-2012
20120041940	SYSTEM AND METHOD FOR ADDING IDENTITY TO WEB RANK - Embodiments of the present invention provide systems, methods and computer program products for generating search results comprising web documents with associated expert information. One embodiment of a method for generating such search results includes receiving one or more search queries, selecting one of the one or more search queries, determining one or more categories of web documents responsive to the selected search query and crawling a web graph of linked web documents to identify one or more web documents tagged as within the one or more categories responsive to the selected search query. The method further includes generating a result set of the one or more web documents identified as within the one or more categories responsive to the selected search query, ranking the result set and generating a list of ranked search results responsive to the selected search.	02-16-2012
20120047121	CONTENT SIGNATURE NOTIFICATION - A client application installed on end user computers generates metadata from the content of web pages visited by end users and provides the metadata to a search engine. When an end user visits a web page, the end user's computer downloads and displays the web page to the end user. The client application may simultaneously access the web page content and generate this metadata in the form of a content signature of the web page from the web page content. The client application then provides the content signature to a search engine. The search engine may employ content signatures to identify new web pages to crawl and index. Additionally, the search engine may employ content signatures to identify changes to web pages and determine the crawl frequency of web pages.	02-23-2012
20120047122	SYSTEM, METHOD AND COMPUTER READABLE MEDIUM FOR WEB CRAWLING - In a web crawler, a URL selection module selects URLs for pages to be downloaded. The URL selection module accesses an interaction data store that stores interaction data for web pages, including interaction data that indicates human interactions with the pages. To reduce the effects of link farms, the URL selection module filters the URLs to select only those URLs that have human interaction histories and provides the selected URLs to a download module for web page downloading.	02-23-2012
20120059815	USER ACCESSIBILITY TO RESOURCES ENABLED THROUGH ADAPTIVE TECHNOLOGY - A computer implemented method, system and/or computer program product identify an appropriate resource for a user. A user profile is created for a user. A request, from the user, is received for a requested resource. Based on the user profile, a user-specific scope of the request, which defines a type of resource that is being requested by the user, is established. An identifier of an appropriate resource that meets the user-specific scope of the request is transmitted to the user.	03-08-2012
20120066198	SYSTEM FOR TARGETING ADVERTISING CONTENT TO A PLURALITY OF MOBILE COMMUNICATION FACILITIES - A system for targeting advertising content includes the steps of: (a) receiving respective requests for advertising content corresponding to a plurality of mobile communication facilities operated by a group of users, wherein the plurality includes first and second types of mobile communication facilities with different rendering capabilities; (b) receiving a datum corresponding to the group; (c) selecting from a first and second sponsor respective content based on a relevancy to the datum, wherein each content includes a first and second item requiring respective rendering capabilities; (d) receiving bids from the first and second sponsors; (e) attributing a priority to the content of the first sponsor based upon a determination that a yield associated with the first sponsor is greater than a yield associated with the second sponsor; and (f) transmitting the first and second items of the first sponsor to the first and second types of mobile communication facilities respectively.	03-15-2012
20120066199	SYSTEM FOR TARGETING ADVERTISING CONTENT TO A PLURALITY OF MOBILE COMMUNICATION FACILITIES - A system for targeting advertising content includes the steps of: (a) receiving respective requests for advertising content corresponding to a plurality of mobile communication facilities operated by a group of users, wherein the plurality includes first and second types of mobile communication facilities with different rendering capabilities; (b) receiving a datum corresponding to the group; (c) selecting from a first and second sponsor respective content based on a relevancy to the datum, wherein each content includes a first and second item requiring respective rendering capabilities; (d) receiving bids from the first and second sponsors; (e) attributing a priority to the content of the first sponsor based upon a determination that a yield associated with the first sponsor is greater than a yield associated with the second sponsor; and (f) transmitting the first and second items of the first sponsor to the first and second types of mobile communication facilities respectively.	03-15-2012
20120066200	Metasearch Engine for Ordering Items Returned In Travel Related Search Results Using Multiple Queries on Multiple Unique Hosts - Process and system for metasearching on the Internet performed by a metasearch engine, comprising: receiving an HTTP request from a client device for the metasearch engine to send a plurality of search queries to a plurality of unique hosts that provide access to information to be searched, the HTTP request associated with a plurality of travel related items that may be ordered comprising at least one airline ticket and at least one other type of travel related item; sending the plurality of search queries to the plurality of unique hosts in response to the HTTP request; receiving search results from the plurality of unique hosts; incorporating the received search results into a response; communicating the response from the metasearch engine to the client device; receiving another HTTP request from the client device for placing an order for at least one of the plurality of travel related items; processing the order.	03-15-2012
20120072407	METHOD AND SYSTEM FOR TRIGGERING WEB CRAWLING BASED ON REGISTRY DATA - A method of triggering crawling of a domain includes receiving information related to a domain from a registrar and processing the information related to the domain. The method also includes storing the processed information in a registry zone file and forming a list of registry data based on the processed information. The list of registry data comprises a subset of the registry zone file. The method further includes crawling one or more of the domains in the list of registry data.	03-22-2012
20120072408	METHOD AND SYSTEM OF PRIORITISING OPERATIONS - A method and system for prioritising operations on network objects are provided. The method includes gathering Web 2.0 available relationship data on the relationships between network entities, wherein network entities are network users and network objects. The relationship data for a network entity is analysed and a first relative score is determined based on the relationship data. For a network object, a second relative score is determined which is a dynamic score based on user interactions with the network object and formed using the first relative scores of network entities interacting with the object. The method then prioritizes an operation on a network object using the second relative score.	03-22-2012
20120078874	Search Engine Indexing - Exemplary embodiments include a search engine indexing method, including finding a page on a server that includes keywords, scanning the page for a tag designating a portion of the page from which to index the keywords and in response to a presence of the tag within the page, indexing the portion of the page that is designated by the tag.	03-29-2012
20120078875	WEB BROWSER CONTACTS PLUG-IN - A system for collecting information from visited websites in a web browser. The system includes a web browser plug-in application having a set of user preferences for collecting certain target information from website. The system monitors the user navigating to one or more selected web-sites and parsing the websites for relevant target information. The system then compares the target information with the verified information stored in the database to generate relevant verified information. The user is alerted to the relevant verified website information and any target information that was not matched to the database. The user is then given the option of storing the relevant verified website information or searching for additional information. The unmatched information could potentially be crowd sourced to find the verified information or saved as verified information on its own. The user can then import the collected information into a workstation application.	03-29-2012
20120078876	Method of Recommending Items to a User Based on User Interest - Although recording of usage data is common in scholarly information services, its exploitation for the creation of value-added services remains limited due to concerns regarding, among others, user privacy, data validity, and the lack of accepted standards for the representation, sharing and aggregation of usage data. A technical, standards-based architecture for sharing usage information is presented. In this architecture, OpenURL-compliant linking servers aggregate usage information of a specific user community as it navigates the distributed information environment that it has access to. This usage information is made OAI-PMH harvestable so that usage information exposed by many linking servers can be aggregated to facilitate the creation of value-added services with a reach beyond that of a single community or a single information service.	03-29-2012
20120089590	User Preference Correlation for Web-Based Selection - A database of user preference information is extracted and compiled from multiple websites by web-crawling robots without cooperation or specific participation by users. Users who interact with a website are frequently required to register and create a login or userID name that uniquely identifies them. Thereafter, when an individual rates an item, it is often recorded and published under their userID name such that other users can see how a specific individual rated the item. Although there is no requirement that a specific user register on different websites utilizing the identical userID, it is extremely common that this practice occurs and the use of identical userIDs on multiple sites is used herein to expand preference analysis beyond a single site. Once the database exists, user's can request or be passively offered suggestions that result from preference associations across multiple websites as performed by a preference analysis and suggestion function.	04-12-2012
20120102018	Ranking Model Adaptation for Domain-Specific Search - An adaptation process is described to adapt a ranking model constructed for a broad-based search engine for use with a domain-specific ranking model. An example process identifies a ranking model for use with a broad-based search engine and modifies that ranking model for use with a new (or “target”) domain containing information pertaining to a specific topic.	04-26-2012
20120102019	METHOD AND APPARATUS FOR CRAWLING WEBPAGES - A method and apparatus for crawling webpages are provided. The method and apparatus involve obtaining a root Web address list; obtaining a list of Web addresses linked to the root Web address list; evaluating content of pages of the Web addresses based on the obtained list of Web addresses; adjusting a crawling depth according to the evaluation of the content of the pages of the Web addresses; and crawling webpages according to the adjusted crawling depth.	04-26-2012
20120102020	Generating Search Result Listing with Anchor Text Based Description of Website Corresponding to Search Result - An apparatus and method is described that generates a description of a website using data from secondary sources, meaning sources other than the website itself. Relevant information is identified, within anchor text of links in the content of these secondary sources, and analyzed. Based on this analysis, a description of the website is generated.	04-26-2012
20120109927	ARCHITECTURE FOR DISTRIBUTED, PARALLEL CRAWLING OF INTERACTIVE CLIENT-SERVER APPLICATIONS - In one embodiment, a distributed computing system includes a first worker node configured to execute a first job, a second worker node configured to execute a second job, and a master node including a processor coupled to a memory. The first job indicates a first portion of an interactive client-server application to be crawled. The second job indicates a second portion of an interactive client-server application to be crawled. The second worker node and the first worker node are configured to execute their respective jobs in parallel. The second job indicates a second portion of an interactive client-server application to be crawled. The master node is configured to assign the first job to the first worker node, assign the second job to the second worker node, and integrate the results from the first worker node and the second worker node into a record of operation of the application.	05-03-2012
20120109928	SYNCHRONIZATION SCHEME FOR DISTRIBUTED, PARALLEL CRAWLING OF INTERACTIVE CLIENT-SERVER APPLICATIONS - A method for synchronizing a state graph includes generating a partial state graph by executing a crawling task to crawl an interactive client-server application, transmitting the partial state graph from a first electronic device to a second electronic device, and transmitting the partial state graph on a periodic basis. The partial state graph includes one or more new states of the interactive client-server application identified while crawling the interactive client-server application since a previous transmission.	05-03-2012
20120109929	TECHNIQUE FOR EFFICIENT PARTIAL CRAWLING OF INTERACTIVE CLIENT-SERVER APPLICATIONS IN A PARALLEL, DISTRIBUTED ENVIRONMENT - An electronic device includes a memory including a crawling application and a processor coupled to the memory. The processor is configured to execute the crawling application, which causes the processor to receive a job, crawl the interactive client-server application based on the initialization information until a boundary condition is reached, and report the results of crawling. The job contains initialization information indicating a portion of an interactive client-server application to be crawled. Crawling it includes programmatically determining possible actions available on a first state of the interactive client-server application, recording the first state, selecting an action, recording the actions not taken, taking the action, reaching a second state, recording the second state, and recording the action taken as a transition between the first state and the second state. Reporting the results of the interactive client-server application includes reporting the first state, second state, the transition, and actions not taken.	05-03-2012
20120109930	TECHNIQUE FOR COORDINATING THE DISTRIBUTED, PARALLEL CRAWLING OF INTERACTIVE CLIENT-SERVER APPLICATIONS - An electronic device includes a memory and a processor coupled to the memory. The memory contains a master state graph. The master state graph includes information regarding the operation of interactive client-server application. The processor is configured to send a first job to a first worker node, send a second job to a second worker node, receive results of crawling the interactive client-server application, and integrate results of crawling the interactive client-server application into the master state graph. The first job includes crawling instructions for crawling a first portion of an interactive client-server application. The second job includes crawling instructions for crawling a second portion of the interactive client-server application. The first worker node and second worker node crawl the interactive client-server application in parallel.	05-03-2012
20120109931	TECHNIQUE FOR COMPRESSION OF STATE INFORMATION IN THE CRAWLING OF INTERACTIVE CLIENT-SERVER APPLICATIONS - An electronic device includes a memory including a state graph, and a processor coupled to the memory. The state graph includes a plurality of states of an interactive client-server application to be crawled. Te plurality of states and transitions result from the crawling of the client-server application. The plurality of states includes an initial state and a second state. The initial state includes one or more initial state nodes. The second state includes one or more second state nodes. The processor is configured to determine the differences between the initial state and the second state and compress the second state with respect to the initial state using the differences, resulting in a compressed state.	05-03-2012
20120109932	RELATED LINKS - Methods and systems for providing related links are disclosed. In one aspect, a method comprises: retrieving textual information associated with a web page upon loading of the web page at a client; extracting a set of keywords from the received textual information; determining one or more keywords of the set of keywords using a keyword repository that maintains a list of keywords and their respective rankings; sending the one or more keywords as a search query to a search engine to obtain a list of search results ordered by their respective rankings; returning a number of search results with the highest rankings to the client for display on the web page.	05-03-2012
20120117052	WEB FORUM CRAWLING USING SKELETAL LINKS - A method and system for identifying informative links of a web site for use in crawling the web site is provided. A forum crawler analyzes sample web pages of a web forum to identify informative links and then crawls the web forum by following links determined to be informative and not following other links. The forum crawler system determines whether links are informative based on whether they are part of the overall structure of the web site or are used to select sequential information that has been split onto multiple web pages.	05-10-2012
20120124027	METADATA DATABASE SYSTEM AND METHOD - Systems, methods and computer readable media for computerized control and management of a metadata database. The metadata database can include event data, standards, survey questions and response, and event response templates. Event projection can be based on data retrieved from a past events database. Control can include real-time control of subsystems within the complex system and providing reports and visualizations. The visualizations can include profile graphs, bar graphs, dashboards and hyperbolic mapping.	05-17-2012
20120130980	SYSTEM AND METHOD FOR SEARCHING NETWORK-ACCESSIBLE SITES FOR LEAKED SOURCE CODE - A method of detecting leakage of sensitive source code on network-accessible sites is provided. The method includes determining a set of unique identifying elements that identify a sensitive source code module accessed from a source code repository; using a crawler server connected to an external network to automatically search a list of one or more network-accessible sites for text that matches one or more of the unique identifying elements in the set of unique identifying elements, to provide search results; collecting the search results in a memory of the crawler server; determining a relevancy for each of the search results based at least in part on a number of the unique identifying elements that were matched and on a number of search results; sorting the results according to the relevancy; and providing the results to a user, to indicate whether sensitive source code was found on the network-accessible sites.	05-24-2012
20120143844	MULTI-LEVEL COVERAGE FOR CRAWLING SELECTION - Some implementations provide techniques for determining which URLs to select for crawling from a pool of URLs. For example, the selection of URLs for crawling may be made based on maintaining a high coverage of the known URLs and/or high discoverability of the World Wide Web. Some implementations provide a multi-level coverage strategy for crawling selection. Further, some implementations provide techniques for discovering unseen URLs.	06-07-2012
20120158694	Combinators - A method of managing a database system that receives N number of requests from one or more nodes in the database system. The N requests are combined before initiating operations to attend to the requests. The number of operations to attend to the requests is reduced and this reduced number of operations is executed.	06-21-2012
20120166412	SUPER-CLUSTERING FOR EFFICIENT INFORMATION EXTRACTION - A set of clusters associated with a plurality of web pages is received. A first data set and a second data set are generated by applying a first rule and the second rule, respectively, to web pages of a first cluster of the set of clusters. The second rule is substituted for the first rule responsive to having an acceptable extraction accuracy when applied to the first cluster. The extraction accuracy of the second rule is determined by comparing attributes of the second data set to attributes of the first data set.	06-28-2012
20120166413	Automatic Generation of Tasks For Search Engine Optimization - A method and a device for search engine optimization, that receives an identifier that identifies a domain, one or more keywords for analysis relative to a search engine, and search engine usage data, for each received keyword, gathering search engine results data, for at least one received keyword, mapping the at least one keyword to at least one web page within the identified domain, said mapping based on at least one of said search engine usage data and said search engine results data, and for at least one of the received keywords, generating at least one instruction to modify a web page element in a web page to which the at least one received keyword is mapped.	06-28-2012
20120166414	SYSTEMS AND METHODS FOR RELEVANCE SCORING - Systems and methods for relevance scoring are provided. Traditional scoring models use word frequency and placement to determine relevance. In contrast to these models, embodiments of the present invention provide cluster-based relevance scoring and tagging. Some embodiments use various cluster mappings and vector space models to generate relevance scores. In addition, the cluster mappings can be updated overtime to reflect a change in topic clustering.	06-28-2012
20120173506	System And Method For Harvesting Electronically Stored Content By Custodian - A system and method for harvesting electronically stored content by custodian is provided. Content associated with user names for one or more custodians is maintained in a collaboration environment. A custodian list with names of at least a portion of the custodians is received. Access reports each having user names and associated unique identifiers for the custodians with access to the content within a collaboration environment are obtained. One or more of the user names are mapped with at least one of the custodians by comparing the list of custodians to the access reports and by determining a selected user name for the at least one custodian. The content associated with the at least one custodian is identified using the selected user name.	07-05-2012
20120173507	SEARCHING THROUGH CONTENT WHICH IS ACCESIBLE THROUGH WEB-BASED FORMS - One embodiment of the present invention provides a system that facilitates searching through content which is accessible though web-based forms. During operation, the system receives a query containing keywords. Next, the system analyzes the query to create a structured query. The system then performs a lookup based on the structured query in a database containing entries describing the web-based forms. Next, the system ranks forms returned by the lookup, and uses the rankings and associated database entries to facilitate a search through content which is accessible through the forms.	07-05-2012
20120173508	Methods and Systems for a Semantic Search Engine for Finding, Aggregating and Providing Comments - One of the deficiencies of the existing search engines is that the search engines do not evaluate the trustfulness of comments before the searched comments are returned to end users. In addition, existing search engines overlook the analyzing and aggregating of the comments whose subjects are semantically, hierarchically related. Furthermore, as the use of non-textual comments has become popular nowadays, it is highly desirable that such search engines finding and providing comments have the capability to analyze, evaluate and aggregate both textual and non-textual comments, or heterogeneous comments in other words. The purpose of the invention is to overcome the abovementioned deficiencies of the existing search engines that find and provide comments.	07-05-2012
20120173509	SEARCH ENGINE USING WORLD MAP WITH WHOIS DATABASE SEARCH RESTRICTIONS - A search operation can provide geographically restricted and verified information to a user. A two-step approach is used to perform these searches. The first step is to obtain high relevance search results by searching only in a specific region defined for a search operation. The second step further improves the quality of the search results by performing contact address correlation. If the search server finds a reliable reference address in the search results, then these search results can be presented to the user, whereby search results that are not correlating well with legitimate and registered addresses for the site are removed from the search result lists. Therefore, the region-restricted search does searching in a selected geographical region and only presents legitimate web pages or search results to a user.	07-05-2012
20120179665	HEALTH MONITORING SYSTEM - A health monitoring system can include an intake tracker, an output tracker, a personal monitor, and a recommender. The health monitoring system can track food, exercise, and personal characteristics in order to provide health assessments and recommendations. The intake tracker can include a camera for capturing images of products. A data acquisition system can be used to acquire data about products that are photographed using a variety of methods. The data can be entered into a database with a picture of the product for later look-up. The data acquisition system can include a user-assisted segmentation method of an image to identify the product.	07-12-2012
20120179666	METHODS AND SYSTEMS FOR MONITORING AND TRACKING VIDEOS ON THE INTERNET - Method and system for discovering and identifying a video object. The method includes crawling at least one predetermined website, discovering at least one video link at the predetermined website, processing information associated with a first database for storing one or more video links, and determining whether the discovered video link was already discovered before based on at least information associated with the first database. Additionally, the method includes, if the discovered video link is determined not to have been discovered before, updating the first database based on at least information associated with the discovered video link, downloading at least one video object based on at least information associated with the discovered video link, and processing information associated with the downloaded video object.	07-12-2012
20120179667	SEARCHING THROUGH CONTENT WHICH IS ACCESSIBLE THROUGH WEB-BASED FORMS - One embodiment of the present invention provides a system that facilitates searching through content which is accessible though web-based forms. During operation, the system receives a query containing keywords. Next, the system analyzes the query to create a structured query. The system then performs a lookup based on the structured query in a database containing entries describing the web-based forms. Next, the system ranks forms returned by the lookup, and uses the rankings and associated database entries to facilitate a search through content which is accessible through the forms.	07-12-2012
20120185458	CLUSTERING CROWD-SOURCED DATA TO IDENTIFY EVENT BEACONS - Embodiments for identifying event beacons are provided. Position observations for a beacon are grouped into a plurality of clusters based at least on spatial distance. A location of each cluster is compared to event locations corresponding to events. Based on the comparison, the beacon is associated with the event, and the location of the beacon is set to the location of the event. In some embodiments, location requests are analyzed to identify event beacons, and the event information for the event beacons is used to identify event locations in response to the location requests.	07-19-2012
20120185459	IDENTIFYING UNIVERSAL RESOURCE LOCATOR REWRITING RULES - A computer-implemented process for identifying universal resource locator rewriting rules may receive input of universal resource locators of an application, to form received universal resource locators, may represent the received universal resource locators in a specialized graph and may apply analysis algorithms and heuristics to properties of the specialized graph. The computer-implemented process may further identify universal resource locator rewriting patterns using the specialized graph to form detected patterns and may generate rewrite rules corresponding to the detected patterns.	07-19-2012
20120191691	METHOD FOR ASSESSING AND IMPROVING SEARCH ENGINE VALUE AND SITE LAYOUT BASED ON PASSIVE SNIFFING AND CONTENT MODIFICATION - A method for determining the value of a given page or pages in aggregate to a search engine based on key-word search results and optionally modifying the outbound results to optimize the value and layout of the page or pages. A listening system is inserted within the network for the purpose of listening to both inbound to and outbound traffic from the web server and optionally modifying outbound responses. The device uses an algorithm to decide the relative value of the page as it is traversed. The system also detects web server errors, scanning depth of the search engine and makes recommendations based on the examined traffic and desired results. Human visitors are distinguished from search engines by looking at the HTTP headers and therefore search engine depth and effectiveness in page scanning can be calculated.	07-26-2012
20120191692	SEMANTIC MATCHING BY CONTENT ANALYSIS - A method, apparatus, system, article of manufacture, and computer readable storage medium provide media content. A web page context for a web page is determined and stored in a database. One or more media content files are analyzed to extract information that is stored in the database. The information is compared to the web page context. A matching media content file is determined from the one of the one or more media content files that matches the web page context based on the comparison. The matching media content file is then provided (e.g., to an internet portal web site).	07-26-2012
20120191693	SYSTEMS AND METHODS OF IDENTIFYING AND HANDLING ABUSIVE REQUESTERS - Aspects relate to categorizing requests for online resources as originating from spiders or not. Such resources are associated with respective contacts, and if a non-spider requests a resource, then a contact associated with that resource can be notified. The resources can each comprise a profile associated with a contact. For example, a profile can be a profile comprising information about a person, such as contact information, selected search results, and a pre-defined query that can be used with a given search engine. Personal whitelists or whitelists specific to a particular resource can be used to determine whether or not a given requesting entity should be treated as a spider or not when requesting that resource.	07-26-2012
20120191694	GENERATION OF TOPIC-BASED LANGUAGE MODELS FOR AN APP SEARCH ENGINE - Topic-based language models for an application search engine enable a user to search for an application based on the application's function rather than title. To enable a search based on function, information is gathered and processed, including application names, descriptions and external information. Processing the information includes filtering the information, generating a topic model and supplementing the topic model with additional information.	07-26-2012
20120197860	INTEREST CONTOUR COMPUTATION AND MANAGEMENT BASED UPON USER AUTHORED CONTENT - Embodiments of the present invention provide a method, system and computer program product for interest contour computation and management based upon user generated content and associated meta-data. In an embodiment of the invention, an interest contour computation and management method is provided. The method includes crawling content sources disposed about a computer communications network for authored content created by an end user. The method further includes identifying meta data provided for the authored content and adding the meta data to a user interests profile of the end user. The meta-data further can include extracted text from the content. Of note, the method can further include receiving from the end user a specified time period and limiting the addition of the meta data to meta data applied to the authored content during the specified time period.	08-02-2012
20120197861	INTELLIGENT CONTENT DISCOVERY FOR CONTENT CONSUMERS - Embodiments of the present invention provide a method, system and computer program product for intelligent content discovery for content consumers in the global Internet. In an embodiment of the invention, a method for intelligent content discovery for content consumers includes parsing a list of previously viewed content in a content browser executing in memory of a computer to identify different content sources for the previously viewed content. The method also includes directing crawling of the content sources over a computer communications network to retrieve updated content from the content sources. The method yet further includes filtering the updated content into a subset of updated content according to at least one parameter corresponding to one of an end user profile of an end user and an end user preference of the end user. Finally, the method includes presenting a list of the subset of updated content in the content browser.	08-02-2012
20120203758	OPPORTUNITY IDENTIFICATION FOR SEARCH ENGINE OPTIMIZATION - A method of identifying search engine optimization opportunities is disclosed. The method may include selecting a search engine optimization object associated with an entity and collecting search engine optimization data associated with the search engine optimization object. The method may also include calculating a current value of the search engine optimization object to the entity and estimating a future value of the search engine optimization object to the entity based on the collected search engine optimization data.	08-09-2012
20120215757	WEB CRAWLING USING STATIC ANALYSIS - A crawler including a document retriever configured to retrieve a first computer-based document, a link identifier configured to identify an actual string within the computer-based document as being a hyperlink-type string, and a static analyzer configured to perform static analysis of an operation on a variable within the first computer-based document to identify a possible string value of the variable as being a hyperlink-type string, where any of the strings indicate a location of at least a second computer-based document.	08-23-2012
20120215758	SYSTEM AND METHODS FOR IDENTIFYING COMPROMISED PERSONALLY IDENTIFIABLE INFORMATION ON THE INTERNET - In one embodiment, a method includes generating, by a computer system, a search-engine query from stored identity-theft nomenclature. The method also includes querying, by the computer system, at least one search engine via the search-engine query. Further, the method includes crawling, by the computer system, at least one computer-network resource identified via the querying. In addition, the method includes collecting, by the computer system, identity-theft information from the at least one computer-network resource. Additionally, the method includes processing, by the computer system, the identity-theft information for compromised personally-identifying information (PII).	08-23-2012
20120215759	LOG COLLECTION DATA HARVESTER FOR USE IN A BUILDING AUTOMATION SYSTEM - A building automation system (BAS) comprising a plurality of end devices, at least one communication network, and a server engine comprising a data harvester. The end devices are each associated with at least one of a space, a system, or a subsystem for at least a portion of a building or a campus. The communication network communicatively couples to at least a portion of the plurality of end devices to the server engine. In one embodiment, the server engine is adapted to dynamically implement the data harvesting capability to periodically establish communications with, to receive and store data about, end devices and to selectively control the utilization of the communication network in order to prevent overrun or data loss. Methods of handling log collection from end devices in a building automation system (BAS) based upon a distributed schedule provided by a user or a priority scheme are also disclosed.	08-23-2012
20120215760	COLLECTING AND SCORING ONLINE REFERENCES - One example embodiment includes a method for indexing online references of an entity. The method includes identifying one or more channels of the Internet to be searched for references to an entity and identifying one or more signals to be evaluated within each of the one or more channels. The method also includes crawling the Internet for online references to the entity, wherein crawling the Internet comprises searching the one or more channels of the Internet for references to the entity and evaluating the one or more signals. The method further includes constructing a reverse index of the references, wherein the reverse index is based on each channel in which a reference is found and the one or more signals evaluated for the reference.	08-23-2012
20120221545	ISOLATING DESIRED CONTENT, METADATA, OR BOTH FROM SOCIAL MEDIA - Desired content, metadata, or both can be isolated from the full content of social media websites having content-rich pages. Achieving this can include obtaining from the content-rich pages a language-independent representation having a hierarchical structure of nodes and then generating a node representation for each node. Feature vectors for the nodes are generated and a label is assigned to each node representation according to a schema. Assignment can occur by executing a trained classification algorithm on the feature vectors. The schema has schema elements and each schema element corresponds to a label. For each schema element, all node representations having matching labels are gathered and then one node representation is elected from among those with matching labels to be assigned to a schema element field in a template. The template can be applied to extract desired content, metadata, or both according to the schema from all the content-rich pages.	08-30-2012
20120221546	METHOD AND SYSTEM FOR FACILITATING WEB CONTENT AGGREGATION INITIATED BY A CLIENT OR SERVER - A method for facilitating Web content aggregation initiated by a client is disclosed. A Web site aggregation list is created. At least one Web site in the aggregation list is spidered from a user-identified computer. At least one attribute of content of the at least one spidered Web site is merged with at least one attribute of content of another Web site. The merged attributes are displayed to a user.	08-30-2012
20120226678	OPTIMIZATION OF SOCIAL MEDIA ENGAGEMENT - Methods for optimizing social media are disclosed. Such methods may include identifying at least one keyword utilized for at least one webpage, identifying social media correspondence referencing the at least one keyword, analyzing content collected from the social media to determine a frequency of references to the at least one keyword and generating at least one report including information based on the analysis. The report may include recommendations for optimizing social media by, for example, increasing visibility by using high-performing keywords. Systems for performing the methods are also disclosed.	09-06-2012
20120233147	INDEXING AND SEARCHING FEATURES INCLUDING USING REUSABLE INDEX FIELDS - Indexing and searching features are provided including associated system, methods, and other implementations. A computing system of an embodiment is configured to reuse or repurpose physical index fields for different tenants as part of providing efficient and scalable indexing and searching services. A method of one embodiment operates to provide an indexed data structure that includes a number of reusable index fields that are shared and used to index information associated with a plurality of tenants. Other embodiments are included.	09-13-2012
20120246137	VISUAL PROFILES - A method for generating a visual profile is provided. User-specific data is extracted from various data repositories. The data is presented to the user for selection for inclusion in a visual profile. A visual profile is generated using the data selected by the user by manipulating the data in a visual manner and/or generating visual depictions of the data using a database of multimedia content items. Visual profiles may be displayed and/or searched.	09-27-2012
20120246138	SEARCH ENGINE OPTIMIZATION USING PAGE ANCHORS - A web content search request including a search term is received at a searching/indexing device. A web search is performed based upon the search term. A markup language (ML) document returned via the web search including the search term is parsed. A location of the search term within the ML document is identified. A hypertext link to the identified location of the search term within the ML document is configured.	09-27-2012
20120246139	SYSTEM AND METHOD FOR RESUME, YEARBOOK AND REPORT GENERATION BASED ON WEBCRAWLING AND SPECIALIZED DATA COLLECTION - A website system collecting specialized data on users and organizations from a web crawler. The website system receives from a user a search string (via a search webpage provided by the website for example) with a request to create a technology overview and research report with recent updates and research information in the field for a user specified technology/subject area. It creates a technology overview and research report and presents it. Similarly, it creates user profiles, yearbooks, resumes, etc. based on the specialized data collected from web crawling.	09-27-2012
20120254149	BRAND RESULTS RANKING PROCESS BASED ON DEGREE OF POSITIVE OR NEGATIVE COMMENTS ABOUT BRANDS RELATED TO SEARCH REQUEST TERMS - An automated method is described for displaying brand results in response to a search query. A search query that includes one or more search terms is received at a processor. The one or more search terms are associated with one or more brands. Brand ratings are retrieved for the one or more brands from a memory that stores brand ratings for a plurality of brands. The brand ratings are based upon an analysis of web content items that mention the brands. A web page display of brand results is generated that shows links to one or more web pages of the one or more brands that the one or more search terms are associated with. The links to the one or more web pages of the brands with the most positive brand rating are displayed first.	10-04-2012
20120254150	DYNAMIC ARRANGEMENT OF E-CIRCULARS IN RAIS (RICH ADS IN SEARCH) ADVERTISEMENTS BASED ON REAL TIME AND PAST USER ACTIVITY - A first advertisement and a second advertisement are served responsive to a search query. User interactions with the query results, such as with the first advertisement, are detected and used as inputs to selecting at least a portion of the second advertisement. User actions can include a hover, a selection, a button selection, or the like.	10-04-2012
20120254151	SYSTEM AND METHOD FOR KEYWORD EXTRACTION - A computer-implemented system and method for keyword extraction are disclosed. The system in an example embodiment includes a keyword extraction component to extract relevant keywords from content of a web page, to identify items relevant to the extracted keywords, and to rank the relevant items.	10-04-2012
20120259832	SYSTEM FOR HANDLING A BROKEN UNIFORM RESOURCE LOCATOR - A method and apparatus for receiving a request for a Uniform Resource Locator (URL), determining the URL is broken, retrieving query data from a first database mapping the broken URL to the query data and retrieving one or more substitute URLs from a second database mapping the broken URL to the query data.	10-11-2012
20120259833	CONFIGURABLE WEB CRAWLER - A configurable web crawler allows a user configure a web crawl by specifying one or more of thread throttling rules, domain restriction rules, and crawling rules. The configurable web crawler crawls the web beginning with a seed uniform resource locator and according to the crawl configuration.	10-11-2012
20120265748	SYSTEMS AND METHODS FOR DETECTING THE STOCKPILING OF DOMAIN NAMES - Systems, methods, and computer program products are provided for detecting the stockpiling of domain names. In one exemplary embodiment, there is provided a method for detecting a status of a domain name. The method may include receiving information related to the domain name from a registrar. The method may include crawling the at least one domain name, wherein the crawling receives first information located on a website associated with the domain name and receives second information related to a registration of the domain name. The method may also include storing the first information and the second information, wherein the crawling is initiated at a first time prior to expiration of the at least one domain name and a second time after expiration of the at least one domain name.	10-18-2012
20120265749	High Precision Internet Local Search - High-precision local search is performed on the Internet. A map image-rendering software provider embeds spatial keys into maps, which are then provided to producers of Internet content such as map providers. For example, a homeowner may post a message on a web bulletin board advertising his house for sale, and including a map showing the location of the house. When a search engine's web crawler encounters a page having a spatial key embedded in an image, the spatial key is indexed with the other content on the page. Because the spatial key identifies a small geographic area, indexing the content with the spatial key allows search queries to be limited by area and still provide useful results. Thus, a user of a search engine searching for “house for sale” in a specific area will be directed to web pages that meet the geographic and content search terms.	10-18-2012
20120271812	METHOD AND SYSTEM FOR PROVIDING USER-CUSTOMIZED CONTENTS - A method for providing user-customized contents, includes: receiving contents order information from a user and constructing a contents order information DB; opening the contents order information DB; receiving contents corresponding to the contents order information from a contents provider; and providing the received contents in a user-customized form.	10-25-2012
20120278302	MULTILINGUAL SEARCH FOR TRANSLITERATED CONTENT - The multilingual search for transliterated content technique described herein enables a user to submit a search query in both a native script and its foreign script (e.g., Roman script) transliteration and return relevant results in both the scripts while taking care of the spelling variations in transliterated forms. The technique crawls the World Wide Web for data in both the native script and foreign script transliterated forms of the data. It uses a transliteration engine to generate native script equivalents of the foreign script transliterated data and disambiguates the data in native script (whenever possible). The unique native script word forms are then used to jointly index the data in both the scripts. If the query is in native script, it is directly searched for in the index, otherwise the transliterated query is first converted into native script form(s) and then searched in the indexed database to retrieve and rank results in both the scripts.	11-01-2012
20120284251	PRIORITIZING CRAWL LISTS USING SOCIAL NETWORKING RANKINGS - Methods, systems, and computer-storage media having computer-usable instructions embodied thereon, for prioritizing crawl lists based on social networking rankings are provided. Various scores are associated with users based on a variety of factors including activity levels with respect to social networking services, activity levels with respect to search engines, and interactions with other users in a social networking environment. The scores are used to compute a ranking for the users and, based on the rankings, a crawl list is prioritized such that content associated with the social networking environment is crawled at an appropriate time.	11-08-2012
20120284252	System and Method For Search Engine Optimization - A method for SEO by optimizing interactions with or through a CDN (content distribution network).	11-08-2012
20120303606	Resource Download Policies Based On User Browsing Statistics - Web crawling polices are generated based on user web browsing statistics. User browsing statistics are aggregated at the granularity of resource identifier patterns (such as URL patterns) that denote groups of resources within a particular domain or website that share syntax at a certain level of granularity. The web crawl policies rank the resource identifier patterns according to their associated aggregated user browsing statistics. A crawl ordering defined by the web crawl polices is used to download and discover new resources within a domain or website.	11-29-2012
20120310912	CRAWL FRESHNESS IN DISASTER DATA CENTER - Content that is stored at a secondary location for a service is crawled before it is placed in operation to assist in maintaining an up to date search index. The content that is crawled at the secondary location includes content that is obtained from the primary location of the service. When a crawler at the secondary location attempts to access content that is stored at the primary location, the crawler is directed to access the corresponding copy of the content that is stored at the secondary location instead of accessing the content at the primary location. The content may be crawled at the secondary location at different times, such as when the information is updated, according to a schedule, and the like.	12-06-2012
20120317089	Scheduler for Search Engine Crawler - A scheduler for a search engine crawler includes a history log containing document identifiers (e.g., URLs) corresponding to documents (e.g., web pages) on a network (e.g., Internet). The scheduler is configured to process each document identifier in a set of the document identifiers by determining a content change frequency of the document corresponding to the document identifier, determining a first score for the document identifier that is a function of the determined content change frequency of the corresponding document, comparing the first score against a threshold value, and scheduling the corresponding document for indexing based on the results of the comparison.	12-13-2012
20120317090	SUPPORT FOR INTERNATIONAL SEARCH TERMS - TRANSLATE AS YOU CRAWL - A search engine server delivers search results to a web browser of a client device communicatively coupled to the search engine server via the Internet. The system identifies new web pages in a source language during crawling, translates them into a plurality of destination languages, creates reverse indexes in respective languages, and stores both reverse indexes and cache web pages in a database. Upon the entry of search strings by a user using a web browser, the search engine server responds by delivering links of web pages in the user-desired language (the language of the search string or a language chosen by the user) as well as web pages translated from a plurality of destination languages, ranked based upon popularity or other means. The search engine server contains a plurality of translators that translate new web pages, links that are obtained during crawling, in to a plurality of destination languages.	12-13-2012
20120323881	INTERACTIVE WEB CRAWLER - The claimed subject matter provides a system or method for web crawling hidden files. An exemplary method comprises loading a web page with a browser agent, and executing any dynamic elements hosted on the web page using the browser agent to insert pre-determined values. A list of form controls may be retrieved from the web page using the browser agent, and the controls may be analyzed using a driver component. Form control values may be sent from the driver component to the browser agent, and an event may be submitted to the web page by the browser agent or scripted content may be run to trigger operations on the web page corresponding to the form control values. A URL may be generated for various form control values using a generalizer.	12-20-2012
20120323882	DATA EXTRACTION SYSTEM, TERMINAL APPARATUS, PROGRAM OF THE TERMINAL APPARATUS, SERVER APPARATUS, AND PROGRAM OF THE SERVER APPARATUS - This invention provides a terminal searching for web pages on the web and extracting the prescribed data from the web pages and a server verifying and accumulating the extracted data. The prescribed data can be extracted from the web pages on the web in a manner that the process relating to the data extraction is distributed between the terminal and the server. Therefore, necessary processes up to the data extraction are distributed, and the burden placed on each apparatus can be lessened. Further, new data not formerly found in the web pages can be found out and extracted from the web pages that has been updated or newly made.	12-20-2012
20120323883	ONLINE CONTENT COLLECTION - An online content collection system includes a scanning server to scan web sites to retrieve a potential creative uniform resource locator (URL). The scanning and retrieving includes parsing web pages for the web sites, identifying a potential creative URL from the parsed web pages that matches a predetermined criterion for retrieving potential creative URLs, and retrieving the potential creative URL that matches the predetermined criterion. A data storage may be used to store creative URLs. An online content collection server analyzes the retrieved potential creative URL by determining whether the retrieved potential creative URL has been seen before by comparing the retrieved potential creative URL against the creative URLs stored in the data storage, and determining whether the retrieved potential creative URL points to a creative if the retrieved potential creative URL has been seen before.	12-20-2012
20120330922	ANCHOR IMAGE IDENTIFICATION FOR VERTICAL VIDEO SEARCH - Anchor images and information associated therewith are accumulated during a Web crawling operation. One or more rules are applied to the accumulated candidate anchor images to filter out candidate anchor images that are not appropriate for use as the anchor image for a particular target video. The remaining candidate anchor image is then selected as the anchor image for the particular video.	12-27-2012
20130013583	ONLINE VIDEO TRACKING AND IDENTIFYING METHOD AND SYSTEM - A method and system of identifying and tracking online videos comprises the steps of searching and discovering targeted video on the Internet, filtering out manageable amount of online videos from large amount of search results of the targeted video, acquiring online video contents through websites, identifying acquired videos by their contents, and generating different tracking reports according to video identification results and other historical records.	01-10-2013
20130024440	METHODS, SYSTEMS, AND COMPUTER-READABLE MEDIA FOR SEMANTICALLY ENRICHING CONTENT AND FOR SEMANTIC NAVIGATION - Methods, systems and computer-readable media enable various techniques related to semantic navigation. One aspect is a technique for displaying semantically derived facets in the search engine interface. Each of the facets comprises faceted search results. Each of the faceted search results is displayed in association with user interface elements for including or excluding the faceted search result as additional search terms to subsequently refine the search query. Another aspect automatically infers new metadata from the content and from existing metadata and then automatically annotates the content with the new metadata to improve recall and navigation. Another aspect identifies semantic annotations by determining semantic connections between the semantic annotations and then dynamically generating a topic page based on the semantic connections.	01-24-2013
20130036107	SYSTEMS AND METHODS FOR TREND DETECTION USING FREQUENCY ANALYSIS - Systems and methods for trend detection using frequency analysis in accordance with embodiments of the invention are disclosed. In one embodiment of the invention, trend detection includes generating a discrete time sequence of word counts for a target word using a trend detection device, performing frequency analysis of the discrete time sequence of word counts to determine contributions of frequency components within different frequency ranges to the discrete time sequence of word counts using the trend detection device, and detecting that the target word is a trending keyword based upon at least the frequency analysis of the discrete time sequence of word counts for the target word using the trend detection device.	02-07-2013
20130041881	OPTIMIZING WEB CRAWLING WITH USER HISTORY - A politeness manager estimates traffic to the sites based on historical log data generated and sent by plug-ins or toolbars on client web browsers. The historical log data details dates and times the web browsers visit different web sites that is used to understand what timeframes specific web sites are busy and what timeframes the web sites are not busy. Crawl rates for different timeframes for a web site are determined based on the historical log data, and web crawlers are scheduled to crawl the web site according to the crawl rates to minimize the chances that web crawler requests are responsible for the site crashing.	02-14-2013
20130041882	Technology for web site crawling, including action sequences for selecting non-hypertext-link parameters - A web site page has a reference for providing an address for a next page. The web site is crawled by the crawler program, which parses the reference from one of the web pages and sends the reference to an applet running in the browser. The address for the next page is determined by the browser responsive to the reference and is sent to the crawler.	02-14-2013
20130041883	SEARCH ENGINE WITH GEOGRAPHICAL VERIFICATION PROCESSING - A search operation provides geographically restricted and verified information to a user. A first step obtains high relevance search results by searching only in a specific region defined for a search operation. A second step improves the quality of the search results by performing contact address correlation. If the search server finds a reliable reference address in the search results, then these search results can be presented to the user, whereby search results that are not correlating well with legitimate and registered addresses for the site are removed from the search result lists. The region-restricted search does searching in a selected geographical region and only presents legitimate web pages or search results to a user. Thus, the region-restricted search operation improve quality and may minimize search time and reduce a huge volume of non-valued Internet traffic, which is likely to impair the overall performance and experience on the Internet.	02-14-2013
20130046747	SYNTHESIZING DIRECTORIES, DOMAINS, AND SUBDOMAINS - SEO for an entire website can change the presence of the website on the internet, and change which webpages of the website rank higher for different internet searches. The SEO optimized website can provide a particular webpage in response to a particular search engine query rather than a generic landing page. SEO can determine a unified website configuration having individual webpages with higher search engine rankings for specific search engine parameters. This can allow for enhanced search engine optimization that directs search engine results to rank selected pages within a website higher than others to provide a more directed search result within the website.	02-21-2013
20130046748	IMAGE SEARCH ENGINE SYSTEM WITH MULTI-MODE RESULTS - An image search engine server having an image search engine that performs image searches based on a search term that is augmented by a built-in thesaurus and/or a dictionary. For a thesaurus-based algorithm, the approach is to send a query back to the user, who can select the image search domain, sub-domain, and other hierarchical search refinements from one or more dropdown menus. The items in the dropdown menus that the user selects during the “query back” are used to augment the search string entered by the user to better refine the image search. If the user entered search string is a single string of dictionary word or words or the dictionary mode is elected, then synonyms for that search string are used to generate the augmented search string for the final context-based search operation. The result is improved image search results.	02-21-2013
20130046749	IMAGE SEARCH INFRASTRUCTURE SUPPORTING USER FEEDBACK - An Internet infrastructure supports searching of images by correlating a search image and/or search string with that of plurality of images hosted On Internet servers, supports delivery of search result pages to a client device based upon a search string or search image, and may contain images from a plurality of Internet servers. The image search server delivers a search result page containing images upon receiving a search string and/or search image from the web browser. The selection of images in the search result page is based upon: (i) word match, that is, by selecting images, titles of which correspond to the search string; and (ii) image correlation, that is, by selecting images, image characteristics of which correlates to that of search image. The selection of images in the search result page also occurs on the basis of popularity and may be refined by taking into account user feedback/preferences.	02-21-2013
20130046750	WEB SEARCH WITH VISITED WEB PAGE SEARCH RESULT RESTRICTIONS - An Internet infrastructure supports searching of web links to select search results by processing browser activity information along with one or more of favorite lists, and related metadata, user profiles, and trends based on browser activity behavior and favorite behavior. A plurality of web browsers located on client device are incorporated with a browser activity-monitoring module that tracks user's Internet usage, processes this information, and sends this information periodically or upon user request to the server to aid in improving search operation results. The search engine server communicatively couples to the plurality of web browsers and supports delivery of search results/web links to the client device based upon a search string, browser activity information, and possibly the favorite lists and related metadata. The gathered browser activity information, favorite lists, and related metadata are stored in one or more server databases that are associated with the search engine server.	02-21-2013
20130046751	Method and Arrangement for Control of Web Resources - A method and arrangement in a server (	02-21-2013
20130054558	UPDATED INFORMATION PROVISIONING - One or more techniques and/or systems are disclosed for providing, in an automated fashion, updated information to a user regarding a topic indicated as being of interest to the user. At a first point in time, a first request for updated information on a first topic indicated as being of interest to a user can be received, such as from a website or service to which the user is connected. First updated information for the first topic can be requested from a data store that comprises updated information on one or more topics indicated as being of interest to the user. If the data store comprises the first updated information, the first updated information can be returned to the sender of the request, at the first point in time, such that the user may be presented with fresh content regarding a topic known to be of interest to the user.	02-28-2013
20130054559	System and Method for Generating a Knowledge Metric Using Qualitative Internet Data - An online marketing research measurement that allows a user to derive and/or monitor knowledge metrics, such as awareness metrics, recommendation metrics, advocacy metrics, etc. about a target subject, such as the user's brands and/or products using existing data on the Internet. Rather than requiring responses solicited from active participants in a survey (as in traditional surveys), unsolicited opinion data residing on the Internet can be gathered and processed for deriving various types of knowledge metrics. A recommendation metric can be derived from opinion data gathered from the Internet, which reflects a measure of recommendation opinions about the target subject. Users may identify the specific brand in which they are interested. After an Internet crawler is sent out to select data, the engine cleans the results of poor quality data, codes the data according to the appropriate constructs or variables, and then scores the sentiment using the system's sentiment engine.	02-28-2013
20130054560	Metadata Database System and Method - Systems, methods and computer readable media for computerized control and management of a metadata database. The metadata database can include event data, standards, survey questions and response, and event response templates. Event projection can be based on data retrieved from a past events database. Control can include real-time control of subsystems within the complex system and providing reports and visualizations. The visualizations can include profile graphs, bar graphs, dashboards and hyperbolic mapping.	02-28-2013
20130054561	SEARCH ENGINE SUPPORTING MIXED IMAGE & TEXT SEARCH INPUT - Searching of images by correlating a search image with a plurality of images hosted in Internet based servers by an image search server. The image search server supports delivery of search result pages to a client device based upon a search string or search image, and contains images from a plurality of Internet based web hosting servers. The image search server delivers a search result page containing images upon receiving a search string and/or search image from the web browser. The selection of images in the search result page is based upon: (i) word match, that is, by selecting images, titles of which correspond to the search string; and (ii) image correlation, that is, by selecting images, image characteristics of which correlates to that of search image. The selection of images in the search result page also occurs on the basis of popularity.	02-28-2013
20130060747	WEB SEARCH SYSTEM WITH GROUP INTERACTION SUPPORT - An Internet infrastructure supports searching of web links to select search results by processing browser activity information along with one or more of favorite lists, and related metadata, user profiles, and trends based on browser activity behavior and favorite behavior. A plurality of web browsers located on client device are incorporated with a browser activity-monitoring module that tracks user's Internet usage, processes this information, and sends this information periodically or upon user request to the server to aid in improving search operation results. The search engine server communicatively couples to the plurality of web browsers and supports delivery of search results/web links to the client device based upon a search string, browser activity information, and possibly the favorite lists and related metadata. The gathered browser activity information, favorite lists, and related metadata are stored in one or more server databases that are associated with the search engine server.	03-07-2013
20130060748	WEB SEARCH WITH MULTI-LANGUAGE SEARCH INPUT TRANSLATION - A search engine server supports delivery of search results using an international search string option by identifying websites that provide support in English as well as the language of the international search string. The international search string is a search string in any of the languages that are listed/supported by the search engine server. The search engine server delivers web links of websites that provide support in both English as well the language of the international search string by identifying conjugate English terms, strings or phrases for the international search string, that provide exact or approximate equivalent meaning for searching. In addition, the search engine server also provides web links of websites that provide international language support by utilizing a thesaurus in English that provides synonyms for the conjugate English terms. The search engine server also translates websites where there is no support in the language of the search string.	03-07-2013
20130060749	CREATIVE WORK REGISTRY - A third party servers communicatively coupled to a search engine server gather vectors to web content and delivers a report to registered creative work owners by identifying vectors to web content that contain similarities to their copyrighted creative works. The search engine server identifies similarities to the works of the registered owners of the creative works and provides protection by reporting to the registered owners as well as host third party servers, in case of textual, image, audio and video creative works. This service is an added value based service of the search engine server to the registered owners of the creative works upon service charge basis. The search engine server also provides additional services that include reporting to the host third party servers that contain web content having similarities to that of creative works of registered owners and assisting the third party servers to delete the content upon consideration.	03-07-2013
20130060750	System and Method for Displaying Publication Dates for Search Results - A system and method for displaying publication information for search results. User input is received to perform a search of a communications network. Search results are generated in response to the user input. A content date is determined for content included in each of the search results. The search results and the content date associated with each of the search results is displayed.	03-07-2013
20130060751	SYSTEM AND METHOD FOR MANAGING WEB SEARCH INFORMATION IN NAVIGATIONAL HIERARCHY - The current methods and system create a plurality of tree text history entries in the tree text history section within the context of the search terms by associating with a plurality of history data from a plurality of searches using a plurality of search terms in a hierarchical format and allow users to manage the plurality of tree text history entries created in the tree text history section. The plurality of tree text history entries created in the tree text history section include a title and optionally a search or sub-search term. The current methods and system allow managing history data comprising the steps of adding, filtering, modifying, deleting, sorting, pruning, prioritizing, importing, and/or exporting the tree text history entries created in the tree text history section, depending on user preferences.	03-07-2013
20130073536	INDEXING OF URLS WITH FRAGMENTS - A URL inspector may determine a uniform resource locator (URL) which includes an indexable fragment. A URL separator may separate, from the URL which includes the indexable fragment, a base URL occurring prior to the indexable fragment. An indexer may process content of the base URL to obtain processed content thereof. A rendering system may render the processed content together with the URL which includes the indexable fragment to obtain rendered content. A content converter may convert the rendered content into indexable content.	03-21-2013
20130080415	SYSTEM AND METHOD FOR GENERATING NOTIFICATIONS RELATED TO NEW MEDIA - A method of generating notifications related to availability of new media content is provided. The method includes receiving a notification subscription including a request to monitor for new media content. The method also includes detecting new media content based on the subscription, and if a change is detected, determining an access right and transmitting a notification of the new media content. The metadata of the new media content is used in the subscriptions to determine when to generate notifications. Media content may include media articles, media selections, theatrical media releases, live content, or miscellaneous media sources.	03-28-2013
20130086036	Dynamic Search Service - Textual information processed by an application may be used to access data from one or more on-line data source (e.g., Wikipedia) which may be used to enhance the user experience or to improve user productivity from using the application. One such application may be a search service that accesses such data based on input data provided to the application. For example, the application may parse instant messages sent and received by a user to extract keywords, phrases or links, which are then used to retrieve information from a repository of data obtained form various data sources. In this manner, data related to the subject matters of the user's communication may be readily accessed by the user, if desired, in a convenient manner To deliver real time performance, the repository of data may be pre-processed (e.g., indexed) to facilitate information retrieval.	04-04-2013
20130091117	Sentiment Analysis From Social Media Content - The sentiment engine includes a sentiment module configured to gather opinions or determine sentiment expressed in documents, a crawling module configured to crawl servers to obtain at least a subset of the documents or opinions from social media websites, a keyword module configured to extract keywords from documents, a filtering module configured to filter keywords and documents, and a classification module configured to classify documents, sentences, and/or keywords, a polarity prediction module configured to predict the polarity of a sentiment sentence, and a social media net promoter score (SNPS) configured to calculate a loyalty metric of users from social media websites. The functionality of these modules may be combined with one another or in addition to other modules.	04-11-2013
20130091118	AUDITING OF WEBPAGES - A method of performing an audit of auditable objects within webpages of a website includes identifying an auditable object marker and crawling a portion of a website to identify multiple webpages of the website that each include the auditable object marker. The method may further include configuring an audit rule to determine a property of an auditable object of each of the webpages where the auditable object marker is associated with the auditable object. The method may further include performing an audit of each of the webpages according to the audit rule to determine the property of the auditable object for each of the webpages and grouping the webpages based on the property of the auditable object for each of the webpages.	04-11-2013
20130097148	METHODS AND SYSTEMS FOR MODIFYING SEARCH ENGINE RANKINGS OF WEB PAGES - Systems and methods for applications of orthogonal corpus indexing (OCI) are described. In one aspect, the systems and methods described improve the ranking in a search engine of a web page. The systems and methods process a database using OCI to derive keywords relating to database content. They process a web page to determine a keyword relating to web page content, select content from the database based on the keyword, and add the selected content to the web page to improve its search engine page rank. In another aspect, the systems and methods described generate content for an advertisement. The systems and methods process a database using OCI to derive keywords relating to database content. They receive an ad word related to the advertisement, determine a keyword relating to the received ad word, and select content from the database based on the keyword for addition to the advertisement.	04-18-2013
20130103666	MAPPING UNIFORM RESOURCE LOCATORS OF DIFFERENT INDEXES - A server may identify a first address stored in a first search index; determine one or more first identifiers associated with the first address; identify a second address stored in a second search index; determine one or more second identifiers associated with the second address; map the first address to the second address based on a first identifier, of the one or more first identifiers, and a second identifier, of the one or more second identifiers; and transmit the mapping, of the first address to the second address, to a first server associated with the first search index or to a second server associated with the second search index.	04-25-2013
20130103667	Sentiment and Influence Analysis of Twitter Tweets - The present invention is directed to a system, method, and article of manufacture that employs a sentiment engine for conducting sentiment and influence analysis of various types of messages from the social media hosts or websites to extract opinions on different categories, which includes services, products or hotels, and others, collectively referred to as “the keyword product”. The sentiment engine includes a sentiment module configured to gather opinions or determine sentiment expressed in documents, a crawling module configured to servers of social network websites to obtain at least a subset of the documents or opinions from social media websites, a keyword module configured to extract keywords from documents, a filtering module configured to filter keywords and documents, and a classification module configured to classify documents, sentences, and/or keywords, a polarity prediction module configured to predict the polarity of a sentiment sentence, and a social media net promoter score configured to calculate a loyalty metric of users from social media websites, and a message analysis module configured to conduct analysis of a message from host social media sites, forums, blogs and product/service providers. The message analysis module includes analyzing message from other host social media sites.	04-25-2013
20130110812	ACCOUNTING FOR AUTHORSHIP IN A WEB LOG SEARCH ENGINE	05-02-2013
20130110813	Routing Query Results	05-02-2013
20130117252	LARGE-SCALE REAL-TIME FETCH SERVICE - System and method for fetching embedded object content as part of a batch crawl. A fetch server receives a request on a request thread to retrieve content for objects embedded in a document, such as a web page. The fetch server attempts to locate the content of the object in cache first and in disk storage next. If the content is not located in the cache the fetch server may switch the request to a worker thread. If the content is not located in the disk storage, the fetch server may schedule a request to retrieve the content of the embedded object through a batch web crawl. Scheduling a request may include determining that a request to crawl the content of the object has already been scheduled or inserting a request into a scheduling queue.	05-09-2013
20130117253	SYSTEMS AND METHODS FOR IDENTIFYING HIERARCHICAL RELATIONSHIPS - Embodiments include a computer-implemented method that includes identifying a candidate parent entity having one or more characteristics indicative of the entity having a parent hierarchical relationship to another entity of an entity set, identifying a candidate child entity set including entities of the entity set that each have one or more characteristics indicative of the entity having a child hierarchical relationship to the candidate parent entity, comparing characteristics of the candidate parent entity to characteristics of an entity of the candidate child entity set to determine whether a hierarchical relationship exists between the candidate parent entity and the entity of the candidate child entity set, determining that a hierarchical relationship exists between the candidate parent entity and the entity of the candidate child entity set, and updating a hierarchical index to reflect the hierarchical relationship between the candidate parent entity and the entity of the candidate child entity set.	05-09-2013
20130117254	Method and System to update user activities from the World Wide Web to subscribed social media web sites after approval - Disclosed is a system and method to search the World Wide Web for latest user activities and information and update the user activities and information to user subscribed social media websites with user's approval. User subscribes their personal information and interests to the present invention. Present invention crawls and formats the information available in the World Wide Web for the provided user information and interests. The formatted information is notified to the user for approval. The user reviews the information, accepts or rejects the information. The user can edit the information to change the content. The approved information is updated to user subscribed social media websites.	05-09-2013
20130124497	EXPERIENCE GRAPH - Method and system for organizing and sharing content through experience are described. In one embodiment, content may be organized and shared among users through a specific experience. A method for sharing content in a network may include: collecting contents related to a specific experience from a specific user; generating an experience graph of the specific experience; enabling the specific user to invite other users to join the experience graph; and enabling each user inside the experience graph to share new content into the experience graph.	05-16-2013
20130124498	BROWSER BASED LANGUAGE RECOGNITION SUPPORTING CENTRAL WEB SEARCH TRANSLATION - A web browser agent or plug-in installed into a web browser of a client device provides translation services along with a search engine server. The system accesses a web page in one (local) language and then translates to another (foreign) language and displays the translated content in a web page for user's viewing. The web browser agent is an add-on software tool or plug-in, provided by the search engine server and installed into the web browser. As a result of installation, a toolbar appears on the top of the web browser's page. This toolbar provides the interface to enable local translation of web pages from a local/web language to a target/foreign language useful to the user. Centralized (cloud computing) translation services by servers of a third party may also be employed. Web pages in any number of languages may be accessed using this operations/structure.	05-16-2013
20130124499	SYSTEM AND METHOD FOR DIRECTING CONTENT TO USERS OF A SOCIAL NETWORKING ENGINE - A system and method for providing a third generation social network. The system provides processes that allow physical objects to be represented as social objects in the social network. A user may then interact with the social objects. These interactions allow the system to collect the content of the interactions of a particular user. The content of the interactions may then be analyzed and used to direct specific content to specific users that may have an interest in the specific content as indicated by the content of the interactions of those users. Furthermore, the system provides a method for associating data with a shape in an image to allow a user and/or groups of users to interact with the image.	05-16-2013
20130132364	CONTEXT DEPENDENT KEYWORD SUGGESTION FOR ADVERTISING - Various technologies described herein pertain to suggesting context dependent keywords for advertising. A set of seed queries can be identified from a context, where the context is a source keyword, a search query, a category, or a landing page. Moreover, the set of seed queries can be inputted to a search engine. A predetermined number of web pages returned by the search engine upon executing the set of seed queries can be retrieved. Candidate keywords can be extracted from the web pages returned by the search engine. Further, keywords from the candidate keywords can be selected from the candidate keywords based on relevance scores of the candidate keywords.	05-23-2013
20130144858	SCHEDULING RESOURCE CRAWLS - Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for scheduling resource crawls. In one aspect, a framework is provided for scheduling resource crawls such that a crawl scheduler determines the health of a document, i.e., whether it can be crawled, the popularity of the document, and the frequency of “interesting,” i.e., substantive, content changes, and based on this information, estimates an appropriate crawl interval for each web resource to improve crawl resource utilization.	06-06-2013
20130144859	DISTRIBUTED GLOBALLY ACCESSIBLE INFORMATION NETWORK IMPLEMENTED WITH A LOCAL INFORMATION NETWORK - A distributed information network is constructed for gathering information from sites distributed across a globally accessible computer network, i.e., the Internet. The distributed information network preferably includes a root server that stores a list of multiple distributed sites each represented by metadata. A network browser delivers an information search request to the root server, which in response develops a profiled information search request. The information provider of each of the distributed sites stores metadata corresponding to information content that is retrievable in response to the profiled information search request for search results derivable from the information content to which the metadata correspond. A profiled information communication link between the root server and each of the multiple distribution sites enables formation of a path for delivery of the search results to a destination site, from a site or sites represented by the metadata of the profiled information search request.	06-06-2013
20130144860	System and Method for Automatically Identifying Classified Websites - Systems, methods, and computer readable storage mediums are provided to automatically identifying a classified website. A website is determined to be a candidate site based on a set of heuristics. From among pages constituting the candidate site one or more pages are determined to be listing page candidates and one or more pages are determined to be detail page candidates. Then a listing page score is determined using a listing page classifier. Similarly, a detail page score is determined using a detail page classifier. The listing page and detail page scores each indicate the likelihood that the pages are part of a classified website. A candidate site score is determined based in part on a combination of the listing page score and the detail page scores. Then when the candidate site score is above a threshold the candidate site is determined to be a classified website.	06-06-2013
20130159277	TARGET BASED INDEXING OF MICRO-BLOG CONTENT - Target based indexing of micro-blog content may include extracting, labeling, and indexing data contained in micro-blog entries. For example, by adapting natural language processing (NLP) technologies to a micro-blog entry, data is extracted in order to create an index. In one embodiment, a search engine may access the index in order to return results of a search query. In another embodiment, a user interface may display micro-blog entries categorically, allowing the user to access micro-blog entries by event, quote, opinion, or other category.	06-20-2013
20130173579	SCENARIO-BASED CRAWLING - An interactive session can be established between a crawling bot and a Web site. The crawling bot can defines a session state representing a user state for interacting with one or more Web sites, a set of conditions, and a set of scenarios to be selectively activated based on whether the set of conditions are satisfied. The crawling bot can receive content from the Web site during the interactive session. The crawling bot can parse the content from the Web site and can matching the parsed content against a previously defined set of items to determine whether the content matching condition is satisfied. If the content matching condition is satisfied and if the state condition is satisfied, the crawling bot, activating of the scenarios defined by the crawling bot can be active, which is not activated if the content matching condition and the state condition are not satisfied.	07-04-2013
20130173580	SCENARIO-BASED CRAWLING - An interactive session can be established between a crawling bot and a Web site. The crawling bot can defines a session state representing a user state for interacting with one or more Web sites, a set of conditions, and a set of scenarios to be selectively activated based on whether the set of conditions are satisfied. The crawling bot can receive content from the Web site during the interactive session. The crawling bot can parse the content from the Web site and can matching the parsed content against a previously defined set of items to determine whether the content matching condition is satisfied. If the content matching condition is satisfied and if the state condition is satisfied, the crawling bot, activating of the scenarios defined by the crawling bot can be active, which is not activated if the content matching condition and the state condition are not satisfied.	07-04-2013
20130173581	SCENARIO-BASED CRAWLING - An interactive session can be established between a crawling bot and a Web site. The crawling bot can defines a session state representing a user state for interacting with one or more Web sites, a set of conditions, and a set of scenarios to be selectively activated based on whether the set of conditions are satisfied. The crawling bot can receive content from the Web site during the interactive session. The crawling bot can parse the content from the Web site and can matching the parsed content against a previously defined set of items to determine whether the content matching condition is satisfied. If the content matching condition is satisfied and if the state condition is satisfied, the crawling bot, activating of the scenarios defined by the crawling bot can be active, which is not activated if the content matching condition and the state condition are not satisfied.	07-04-2013
20130173582	INDEXING SECURE ENTERPRISE DOCUMENTS USING GENERIC REFERENCES - A web crawler indexes documents including information about document contents and metadata including information such as a URL. However, some applications rely on URL's that change frequently or are constructed to include user information so that the contents retrieved is customized to the user. An approach is provided for storing generic URL's in an index at crawl time, which are customized for the user at search time. A callback mechanism may be used to dynamically transform the generic URL into a URL that is specific to the user issuing the query and/or includes current information that may change frequently. In this way, when the query or search results are returned to the user, the user receives links that are active and valid for that particular user, directing the user to the appropriate site, application, etc. without requiring continuous updating of a very large index.	07-04-2013
20130191365	Method to search objectively for maximal information - A method is disclosed for extracting maximal information in the output of document searches by key word queries. The method is based on Shannon information theory for objective ranking of the results. The data base may be unlinked, such as documents distributed over directories on a PC, or linked, such as the world-wide web. Approximate expressions for the Shannon information are disclosed using the existing word-frequencies in the natural language. The method enables numerical ranking of a list of concordances with footnotes referencing their source documents. Relatively extended concordances may be used for display on computer screens, or relatively short concordances for display on mobile devices.	07-25-2013
20130198161	MONITORING CONTENT REPOSITORIES, IDENTIFYING MISCLASSIFIED CONTENT OBJECTS, AND SUGGESTING RECLASSIFICATION - Provided is a technique for organizing content objects in an enterprise content management system. Auditing of the content objects is performed to identify one or more content objects that are to be re-classified. A content object is selected. A first category associated with the content object is obtained. A relevancy score is obtained for the first category. A list of candidate categories and relevancy scores for each of the candidate categories are obtained. In response to determining that the first category does not correspond to a candidate category or that the relevancy score does not exceed a threshold, the content object is identified as improperly categorized, and the candidate categories that have associated relevancy scores that exceed the threshold are provided in an audit report.	08-01-2013
20130204859	PROVIDING TEMPORAL BIAS FOR SEARCH-RESULT-WEBPAGE ITEMS - A method, system, and medium are provided to temporally bias items included in a search results webpage. The items include navigational links, search results, advertisements, or any other content or items included in a search results webpage. User engagement with the items via the search-engine results page and/or other webpages is tracked. A user-engagement score and an age of the items are determined. A temporal-bias factor is calculated using a decay function that increases in intensity with the age of the items. A rank score is calculated for each item based on the user-engagement score and the temporal-bias factor—the temporal-bias factor decreasing the rank score as a function of the age of the items. The items are ranked based at least in part on the rank score and one or more of the items are chosen for presentation in a search-engine results page.	08-08-2013
20130212084	SYSTEM AND METHOD FOR ADDING IDENTITY TO WEB RANK - Embodiments of the present invention provide systems, methods and computer program products for generating search results comprising web documents with associated expert information. One embodiment of a method for generating such search results includes receiving one or more search queries, selecting one of the one or more search queries, determining one or more categories of web documents responsive to the selected search query and crawling a web graph of linked web documents to identify one or more web documents tagged as within the one or more categories responsive to the selected search query. The method further includes generating a result set of the one or more web documents identified as within the one or more categories responsive to the selected search query, ranking the result set and generating a list of ranked search results responsive to the selected search.	08-15-2013
20130218864	Real Estate Search Engine - Some embodiments provide a method that receives several attributes of a property and a price of the property. For each attribute in the several attributes of the property, the method performs a hedonic analysis to compute a value that correlates a portion of the price of the property to the attribute of the property. The method stores the computed values for later use in a search for the property.	08-22-2013
20130218865	SYSTEMS AND METHODS FOR IDENTIFYING AND ANALYZING INTERNET USERS - This disclosure describes systems, methods, and apparatus for generating reports enhancing an understanding of Internet users based on their generated content and actions taken by others in response to the generated content.	08-22-2013
20130226895	SYSTEM AND METHOD FOR MULTIMEDIA STREAM DATA SEARCHING AND RETRIEVAL - A system can search for data streams. A processor searches for a data stream device or provider on a network. It is determined if the data stream device or provider includes a stored stream index. The stream index is accessed as a reference stream index if the stream index is discovered. Data streams are searched for using the reference stream index.	08-29-2013
20130226896	ALTERNATIVE WEB PAGES SUGGESTION BASED ON LANGUAGE - Many websites publish variants of their web pages based on language and region. However, when a user is directed toward the incorrect web page for the user's language preference, it there is not a simple way for the user to select the appropriate localized or region specific version of the web page. According to an embodiment, a language preference from a user may be received. A first language for a first web page may be identified and the first web page may be received by a computing device of the user. A second language for a second web page may be identified. The second web page may comprise an alternate version of the first web page. The first web page or the second web page may be selected according to the language preference of the user and the selected web page may be presented to the user.	08-29-2013
20130226897	Minimizing Visibility of Stale Content in Web Searching Including Revising Web Crawl Intervals of Documents - A method includes comparing a first instance with a second instance of a document in a plurality of documents. The first instance is obtained from a remote location at a specified time before the second instance is obtained from the remote location, and (i) the specified time is determined in accordance with a first crawl interval associated with the document, (ii) each document in the plurality of documents is assigned to a tier in a plurality of tiers, each tier having a distinct associated range of web crawl intervals, and (iii) the first crawl interval is assigned a first tier. The method also includes computing a second crawl interval for the document, which is a function of the document comparison; and determining whether the second crawl interval is in the first tier. When the second crawl interval is not, the first document is reassigned to another tier.	08-29-2013
20130226898	Web Crawler Scheduler that Utilizes Sitemaps from Websites - Systems and methods for scheduling documents for crawling are disclosed. In some implementations, a method includes obtaining sitemap information for a plurality of websites; and analyzing the sitemap information to identify a website, in the plurality of websites. The website has sitemap information that is at least potentially out of date. The method also includes updating the sitemap information for the identified website by downloading updated sitemap information for the identified website; and scheduling documents for crawling in accordance with the updated sitemap information for the identified website.	08-29-2013
20130226899	METHOD AND SYSTEM FOR TRIGGERING WEB CRAWLING BASED ON REGISTRY DATA - A method of triggering crawling of a domain includes receiving information related to a domain from a registrar and processing the information related to the domain. The method also includes storing the processed information in a registry zone file and forming a list of registry data based on the processed information. The list of registry data comprises a subset of the registry zone file. The method farther includes crawling one or more of the domains in the list of registry data.	08-29-2013
20130232132	MANAGING SEARCH-ENGINE-OPTIMIZATION CONTENT IN WEB PAGES - A method for managing the Search Engine Optimization (SEO) content of web pages is disclosed. In one embodiment, such a method includes providing a set of web pages organized in a hierarchical structure. Each web page has an SEO content pattern associated therewith. The method establishes an inheritance scheme for the hierarchical structure such that the SEO content patterns of parent web pages are inherited by children web pages. The method further enables a user to override the inheritance scheme for selected web pages such that the SEO content patterns of the selected web pages override the SEO content patterns of their respective parent web pages. A corresponding apparatus and computer program product are also disclosed.	09-05-2013
20130238589	SYSTEM AND METHOD FOR PROVIDING PLUGGABLE SECURITY IN AN ENTERPRISE CRAWL AND SEARCH FRAMEWORK ENVIRONMENT - Systems and methods for providing an enterprise crawl and search framework, including features such as use with middleware and enterprise application environments, pluggable security, search development tools, user interfaces, and governance. In accordance with an embodiment, the system includes an enterprise crawl and search framework which abstracts an underlying search engine, provides a common set of application programming interfaces for developing search functionalities, and allows the framework to serve as an integration layer between one or more enterprise search engine and one or more enterprise application. A pluggable security environment which includes one or more enterprise application security APIs, authentication services, security plugin, authorization service, and data service, allows an application developer to add security information to enterprise application data before inserting or creating indexes on the search engine, and deploy the enterprise application and use any policies in its configuration to configure enterprise application domain security, so that at query time, the security environment retrieves security keys of a user performing an enterprise application search, and passes those keys to the search engine, where they are used to filter the query results.	09-12-2013
20130238590	SYSTEM AND METHOD FOR SUPPORTING HETEROGENEOUS SOLUTIONS AND MANAGEMENT WITH AN ENTERPRISE CRAWL AND SEARCH FRAMEWORK - Systems and methods for providing an enterprise crawl and search framework, including features such as use with middleware and enterprise application environments, pluggable security, search development tools, user interfaces, and governance. In accordance with an embodiment, the system includes an enterprise crawl and search framework which abstracts an underlying search engine, provides a common set of application programming interfaces for developing search functionalities, and allows the framework to serve as an integration layer between one or more enterprise search engine and one or more enterprise application. A computing environment can be used to display an administration interface for use in administering the framework.	09-12-2013
20130238591	SYSTEM AND METHOD FOR PROVIDING A GOVERNANCE MODEL FOR USE WITH AN ENTERPRISE CRAWL AND SEARCH FRAMEWORK ENVIRONMENT - Systems and methods for providing an enterprise crawl and search framework, including features such as use with middleware and enterprise application environments, pluggable security, search development tools, user interfaces, and governance. In accordance with an embodiment, the system includes an enterprise crawl and search framework which abstracts an underlying search engine, provides a common set of application programming interfaces for developing search functionalities, and allows the framework to serve as an integration layer between one or more enterprise search engine and one or more enterprise application. A user interface is provided for use in validating a search box against at target environment as part of implementing search within that environment.	09-12-2013
20130238592	APPLICATION STORE TASTEMAKER RECOMMENDATIONS - An application store tastemaker recommendation service determines experts within a user's social network(s), receives recommendations from the experts, filters and/or ranks mobile application query results based at least in part on the recommendations. Additionally, the service may further determine the experts based on data compiled about previous actions, reviews, comments, etc., of the experts. Further, the service may provide recommendations to the user to aid in selecting mobile applications for purchase, and may provide an avenue for completing such purchases.	09-12-2013
20130254181	Aggregation and Categorization - Disclosed is a computer-implemented method to aggregate products from online stores, the method comprising crawling one or more websites associated with one or more online stores; collecting information pertaining to products of the stores; extracting key data about each product; and classifying the products into one or more categories based on the key data.	09-26-2013
20130262428	Systems for Discovering Sensitive Information on Computer Networks - One embodiment of a system of the present invention for discovering sensitive information on computer network includes means for discovering databases on a computer network, means for defining a pattern for a data discovery, means for discovering qualifying records by matching the pattern with field names and/or record values in the databases, means for sending electronic notification to a database administrator managing the qualifying database, means for receiving a selection choice from the database administrator managing the qualifying database identifying the status for the qualifying records.	10-03-2013
20130268507	USER TASK COMPLETION VIA OPEN MARKET OF ACTIONS AND/OR PROVIDERS - Among other things, one or more techniques and/or systems are provided for facilitating the completion of a user task. That is, user intent (e.g., intentions of a user to perform a user task) may be identified. The user intent may comprise an entity (e.g., a movie entity) and/or an action (e.g., an order movie tickets action) that the user wants to perform on the entity. A provider list may be created based upon one or more providers capable of performing the action on the entity (e.g., a movie application may be capable of performing the order movie tickets action on the movie entity). Providers may be dynamically selected for inclusion within the provider list at run-time. For example, an open market of providers may be maintained (e.g., providers may be added, removed, and/or updated over time), such that providers may be selected from the open market to complete user tasks.	10-10-2013
20130268508	DYNAMIC TABLE FRAMEWORK FOR MANAGING DATA IN A HIGH PERFORMANCE WEB SERVICE - A system and method provides a dynamic table framework for managing data in a high performance web service. An example embodiment includes: receiving a request at a web service; creating a dynamic record from the request; obtaining a runtime corresponding to the dynamic record, the runtime including an associated plurality of symbol values corresponding to the request; choosing a model corresponding to the runtime, the model including a plurality of symbol managers, each of the plurality of symbol managers being associated with the plurality of symbol values, each of the plurality of symbol managers for processing a specific task of the model; executing the model, by use of a data processor, to process the request, the model using at least one of the plurality of symbol managers; and returning results generated by execution of the model.	10-10-2013
20130275406	Building Of A Web Corpus With The Help Of A Reference Web Crawl - Computer-implemented method for building a web corpus (WCD) comprising the steps of: sending by a web crawler (WC) a query to a reference web crawl agent (RWCA), this query containing a least one identifier of a resource, receiving by the web crawler (WC) a response from the reference web crawl agent (RWCA); if this response does not contain the resource identified by the identifier, downloading by the web crawler (WC) the resource from the website (WS) corresponding to the identifier and adding the resource to the web corpus (WCD; and if this response contains the resource identified by the identifier, adding the resource to the web corpus (WCD).	10-17-2013
20130282689	Navigable Website Analysis Engine - An optimization engine allows website publishers and other network document publishers to view and navigate statistics and scoring methodologies of a search engine. Publishers may thus gain a better understanding of how their website or network document is scored and how to optimize those documents to increase a search engine score. The user is thus able to navigate the network from the perspective of a search engine, viewing webpages, websites, and links in the same way a search engine would analyze them. Upon making changes to a website or network document, publishers may further request on-demand re-crawling of their website or network document to view changes in the score. Alerts may also be activated by a user to notify the user when certain conditions are met.	10-24-2013
20130282690	Navigable Website Analysis Engine - An optimization engine allows website publishers and other network document publishers to view and navigate statistics and scoring methodologies of a search engine. Publishers may thus gain a better understanding of how their website or network document is scored and how to optimize those documents to increase a search engine score. The user is thus able to navigate the network from the perspective of a search engine, viewing webpages, websites, and links in the same way a search engine would analyze them. Upon making changes to a website or network document, publishers may further request on-demand re-crawling of their website or network document to view changes in the score. Alerts may also be activated by a user to notify the user when certain conditions are met.	10-24-2013
20130297585	SEARCH TECHNIQUES FOR RICH INTERNET APPLICATIONS - A computing device includes one or more rich internet application (RIM client engines. Each RIA client engine includes a corresponding private RIA storage area. The computing device also includes a per-RIA public storage area for each RIA. The per-RIA public storage area including a subset of data items in the private RIA storage area of the corresponding RIA client engine. A search engine of the computing device may search the data items in the one or more per-RIA public storage areas and link to content in the private RIA storage area of the corresponding RIA client engine at a given data item matching a search request	11-07-2013
20130311440	COMPARISON SEARCH QUERIES - A computer implemented method, system and computer program product for providing search results in response to a search query includes receiving, by a processor, a search query from a user. A processor detects that the search query includes a request for a comparison-mode query and the processor automatically detects terms in the search query indicating that the query includes components. The comparison-mode query is decomposed into respective, individual component queries for the respective components and the query is performed as respective component queries for the respective, individual components. This includes finding an individual result for each respective, individual component from a single, remote website. The user is presented the individual results of the component queries, which includes aligning the results side-by-side and vertically, so that although the results are for respective, individual components, the alignment tends to help the user compare the individual results.	11-21-2013
20130318064	INDIRECT DATA SEARCHING ON THE INTERNET - The present invention includes an Internet analysis process that includes initializing a data set, accessing a search engine to acquire search results, parsing the search results, rather than a native search engine indexable resource, to output a conclusion, and providing an updated data set. The present invention further includes an Internet analysis system that includes a data set initializer to initialize a data set, a search engine to acquire search results, a bot to parse the search results, rather than a native search engine indexable resource, to output a conclusion, and an updated date set.	11-28-2013
20130318065	INDIRECT DATA SEARCHING ON THE INTERNET - The present invention includes an Internet analysis system that includes a data set initializer to initialize a data set, a search engine to acquire search results, and a bot to parse the search results, rather than a native search engine indexable resource, to output a conclusion.	11-28-2013
20130318066	INDIRECT DATA SEARCHING ON THE INTERNET - The present invention includes an Internet analysis process that includes initializing a data set, accessing a search engine to acquire search results, and parsing the search results, rather than a native search engine indexable resource, to output a conclusion.	11-28-2013
20130325840	METHOD AND SYSTEM FOR INTERACTIVE SEARCH RESULT FILTER - For filtering web search results, a method includes extracting metadata attributes and associated attribute values from web search results from a web search engine. The web search results organized into a results list with web page data grouped as an entry in the results list. The metadata attributes and associated attribute values extracted from the results list. The method includes presenting the extracted metadata attributes and receiving input from the user indicating one or more selected metadata attributes and a position indication for each selected metadata attribute. Each position indication indicates where in a custom report that attribute values for each selected metadata attribute are to appear. The method includes filtering the received results list based on the selected metadata attributes and displaying the filtered results list to the user in a custom report arranged by the selected position indication for each selected metadata attribute.	12-05-2013
20130332442	DEEP APPLICATION CRAWLING - The deep application crawling technique described herein crawls one or more applications, commonly referred to as “apps”, in order to extract information inside of them. This can involve crawling and extracting static data that are embedded within apps or resource files that are associated with the apps. The technique can also crawl and extract dynamic data that apps download from the Internet or display to the user on demand, in order to extract data. This extracted static and/or data can then be used by another application or an engine to perform various functions. For example, the technique can use the extracted data to provide search results in response to a user query entered into a search engine. Alternately, the extracted static and/or dynamic data can be used by an advertisement engine to select application-specific advertisements. Or the data can be used by a recommendation engine to make recommendations for goods/services.	12-12-2013
20130332443	ADAPTING CONTENT REPOSITORIES FOR CRAWLING AND SERVING - A system for searching files stored in a closed file source that is not accessible via a web crawler obtains file identifiers for files stored in the file source and creates a unique URL for each of the identifiers. Each URL may be based on a file identifier and a domain portion of a URL associated with the system. The system may provide the unique URLs to a search engine. The system may respond to a crawl request from the search engine for a particular URL by converting the URL back into a file identifier, obtaining the contents of the file, creating an HTTP response from the contents of the file, and returning the response to the search engine. The system may respond to a request for a seed URL with a plurality of URLs as links in a single HTTP response.	12-12-2013
20130332444	IDENTIFYING UNVISITED PORTIONS OF VISITED INFORMATION - Identifying unvisited portions of visited information to visit includes receiving information to crawl, wherein the information is representative of one of web based information and non-web based information, computing a locality sensitive hash (LSH) value for the received information, and identifying a most similar information visited thus far. Identifying unvisited portions of visited information further includes determining whether the LSH of the received information is equivalent to most similar information visited thus far and, responsive to a determination that the LSH of the received information is not equivalent to most similar information visited thus far, identifying a visited portion of the received information using information for most similar information visited thus far and crawling only unvisited portions of the received information.	12-12-2013
20130339336	INTERACTIVE WEB CRAWLER - The claimed subject matter provides a system or method for web crawling hidden files. An exemplary method comprises loading a web page with a browser agent, and executing any dynamic elements hosted on the web page using the browser agent to insert pre-determined values. A list of form controls may be retrieved from the web page using the browser agent, and the controls may be analyzed using a driver component. Form control values may be sent from the driver component to the browser agent, and an event may be submitted to the web page by the browser agent or scripted content may be run to trigger operations on the web page corresponding to the form control values. A URL may be generated for various form control values using a generalizer.	12-19-2013
20130346386	TEMPORAL TOPIC EXTRACTION - Methods, computer systems, and computer-storage media for forming a topic graph with at least one temporal element are provided. URL-query pairs are received and a topic graph is formed comprising the URL-query pairs. At least one topic associated with a URL and an importance of each topic is identified. In embodiments, a list of top topics is identified.	12-26-2013
20130346387	IDENTIFYING EQUIVALENT LINKS ON A PAGE - A computer-implemented process for identifying equivalent links on a page responsive to a determination that the crawler has not visited all required universal resource locators, locates a next URL to be crawled to form a current URL and processes the current URL to identify equivalent URLs. Responsive to a determination that the crawler has not visited the current URL, determine whether necessary to crawl all identified equivalent URLs and responsive to a determination that it is necessary to crawl all identified equivalent URLs, adding all equivalent URLs to a list of URLs to be crawled.	12-26-2013
20130346388	SEARCH CAPABILITY ENHANCEMENT IN SERVICE ORIENTED ARCHITECTURE (SOA) SERVICE REGISTRY SYSTEM - A method for searching a web service registry system by use of a search controller. A first search of a service registry program product is performed with a service name received by the search controller from a user. It is determined that the received service name does not have a service description associated with the received service name in the service registry program product. A second search of the service registry program product is coordinated with a candidate service name by use of the search module, wherein the candidate service name is semantically and syntactically interchangeable with the received service name such that the candidate service name identifies the service description associated with the received service name within the service registry program product. The service description is discovered to be associated with the candidate service name within the service registry program product and is subsequently returned to the user.	12-26-2013
20140006373	AUTOMATED SUBJECT ANNOTATOR CREATION USING SUBJECT EXPANSION, ONTOLOGICAL MINING, AND NATURAL LANGUAGE PROCESSING TECHNIQUES	01-02-2014
20140006374	METHOD AND APPARATUS FOR DERIVING AND USING TRUSTFUL APPLICATION METADATA	01-02-2014
20140006375	METHOD AND APPARATUS FOR ROBUST MOBILE APPLICATION FINGERPRINTING	01-02-2014
20140006376	AUTOMATED SUBJECT ANNOTATOR CREATION USING SUBJECT EXPANSION, ONTOLOGICAL MINING, AND NATURAL LANGUAGE PROCESSING TECHNIQUES	01-02-2014
20140012831	TILE CONTENT-BASED IMAGE SEARCH - Images are processed by extracting a number of small, fixed size pixel arrays, here called tiles. The image is thus represented as a collection of small parts in almost cookie cutter fashion. For storage, the tile data are added to a database and indexed for fast recall. Stored images can be rescaled, possibly rotated, and inserted again for more robustness. A sample image for recall is likewise processed, the extracted tiles serving as keys to find their stored counterparts. The original image can thus be recognized from even a small portion of the original image, if the sample offers enough tiles for lookup. The invention includes an image collection module, an image processing module, a storage module, a recall module and an interactive module by which a user can query a sample image or sub-image against the stored information.	01-09-2014
20140040233	ORGANIZING CONTENT - Methods, systems, and computer-readable and executable instructions are provided for organizing content. A method for organizing content can include building a customized content corpus for a user, building a concept graph customized for the user's context based on the customized corpus, and organizing, utilizing multi-view clustering, the content within the corpus based on the concept graph.	02-06-2014
20140040234	SYSTEM AND METHOD FOR TRAIL IDENTIFICATION WITH SEARCH RESULTS - A system and method are disclosed for identifying and generating a potential user trail. The trail may be an anticipated browsing path for a user based on current and/or historical browsing data, including search logs, browsing histories, and other data. The trail may be displayed as a search result summary or with individual search results in response to receiving a search query.	02-06-2014
20140046925	MOBILE SITEMAPS - A method of analyzing documents or relationships between documents includes receiving a notification of an available metadata document containing information about one or more network-accessible documents, obtaining a document format indicator associated with the metadata document, selecting a document crawler using the document format indicator, and crawling at least some of the network-accessible documents using the selected document crawler.	02-13-2014
20140052708	USER CUSTOMIZED DATA PAGE FOR SEARCH ENGINE DATA - A system and method for generating search engine data to be displayed on a display. A processor may send search queries to a search engine and receive result sets in response. Search engine data may be generated for URLs based on the search queries and the result sets. Report data may be displayed on the display based on the search engine data. The report data may include data effective to display a raw data page based on the search engine data. The processor may receive a request message to modify the report data. The request message may include a request to generate a user customized data page including filtered data from the search engine data. The processor may generate modified report data in response to the request message. The modified report data includes data effective to display the raw data page and the user customized data page.	02-20-2014
20140067787	SYSTEM AND METHOD TO IDENTIFY MACHINE-READABLE CODES - A method and a system to identify machine-readable codes using a web crawler are provided. Machine-readable codes include, but are not limited to, Universal Product Codes (UPC), quick response (QR) codes, stock-keeping units (SKUs) and international standard book number (ISBN) codes. A web crawler downloads pages from the World Wide Web. A determination module accesses the downloaded pages and identifies a machine-readable code corresponding to a product description included in the downloaded pages. The machine-readable code is included in a downloaded page of the downloaded pages. The determination module further extracts the product description from the downloaded page. A code database stores a record of the machine-readable code and the product description.	03-06-2014
20140074815	CALENDAR-BASED SEARCH ENGINE - A computer system including a computer-readable memory unit; and a processor coupled to the memory unit. The processor is configured to provide a graphical image representing a search engine interface for display on a screen of the computer system, wherein the search engine interface comprises an arrangement of cells, each cell representative of a calendar unit of time; cause performance of a search, upon selection of a particular cell, wherein said search is based on the unit of time represented by the selected cell; and display the results of the search.	03-13-2014
20140081945	SYNCHRONIZING HTTP REQUESTS WITH RESPECTIVE HTML CONTEXT - Synchronizing requests with a respective context includes, responsive to a determination that there are more pages to explore, performing regular crawling operations for a current page, recording a current page in a list of explored pages and extracting links from the current page. Responsive to a determination that there are more links to extract, a next link to analyze is selected to form a selected link and responsive to a determination that there is a new request associated with the selected link, a new request identifier is created and saved as an entry in a hashmap. Responsive to a determination that there is not a new request associated with selected link, a request associated with the selected link is updated with a new link value when the link value differs.	03-20-2014
20140081946	CRAWLING RICH INTERNET APPLICATIONS - Embodiments relating to a computer-implemented process, an apparatus and a computer program product is provided for crawling rich Internet applications. In one aspect the method includes executing an event in a set of events discovered in a state exploration phase according to a predetermined priority of events in each set of events in the sets of events discovered, wherein the event from a higher priority is exhausted before an event from a lower priority is executed and determining any transitions. Responsive to a determination that there are at least one transition any remaining set of events is executed in a transition exploration phase. In addition the method determines the existence of any new states as a result of executing an event in the set of events and returns to the state exploration phase, responsive to a determination that a new state exists.	03-20-2014
20140081947	METHOD AND APPARATUS FOR INTRANET SEARCHING - A method for processing an intranet includes crawling the intranet to identify at least some of the pages in the intranet, and determining, for each identified page, a number of links in a shortest path from a root page to the identified page.	03-20-2014
20140089288	NETWORK CONTENT RATING - A system rates content on a network. A database stores ratings for the content. A rating service creates the ratings for the content. The rating service merges a first rating of the content with a second rating of the content to produce a third rating for the content. A user interface obtains search results from the rating service. When the search results include the content, the user interface displays the rating of the content along with the search results.	03-27-2014
20140089289	SYSTEMS AND METHODS FOR FACILITATING OPEN SOURCE INTELLIGENCE GATHERING - Systems and methods (e.g., utilities) for use in providing automated, lightweight collection of online, open source data which may be content-based to reduce website source bias. In one aspect, a utility is disclosed for use in extracting content of interest from at least one website or other online data source (e.g., where the extracted content can be used in a subsequent search query). In other aspects, utilities are disclosed that are operable to perform various types of analyses on such extracted content and present graphical representations of such analyses on a display of a client device.	03-27-2014
20140114946	SEARCH HIT URL MODIFICATION FOR SECURE APPLICATION INTEGRATION - A flexible and extensible architecture allows for secure searching across an enterprise. Such an architecture can provide a simple Internet-like search experience to users searching secure content inside (and outside) the enterprise. The architecture allows for the crawling and searching of a variety of sources across an enterprise, regardless of whether any of these sources conform to a conventional user role model. The architecture further allows for security attributes to be submitted at query time, for example, in order to provide real-time secure access to enterprise resources. The user query also can be transformed to provide for dynamic querying that provides for a more current result list than can be obtained for static queries.	04-24-2014
20140129539	SYSTEM AND METHOD FOR PERSONALIZED SEARCH - Personalization of Internet search is effected through the use of ResultRank and searcher selected profile attributes and searcher selected query context attributes. These attributes are also referred to as hats (worn by the searcher). Searcher privacy is maintained by allowing limited use of a searcher's profile by the search engine. Query language interpretation is improved by capture and use of searcher behavior and hat selection, in past search sessions, without storage of individual profile or context information. ResultRank is maintained and adjusted, on a per hat basis such that future, similarly hatted searchers benefit from these past sessions. An average of ResultRank, across searcher selected hats, is utilized for improved SERP ranking Recognition of QLP's is improved by use of the hats. Custom support of public and private language community circles is incorporated. The technique is applied to organic as well as sponsored results. Steps are taken to minimize the impact of any attempt to artificially adjust ResultRank.	05-08-2014
20140129540	Modifying a Custom Search Engine for a Web Site Based on Custom Tags - Automatically creating and modifying a search engine for a website. User input may be received specifying an address of a website. A search engine may be automatically created for the website based on the user input. Webpages of the website may specify a plurality of tags specifying custom attributes of the webpages. During creation of the search engine, these custom attributes may be incorporated into the search engine index. Additional user input may be received customizing the search engine for various search engine contexts, e.g., based on the custom attributes of the webpages. Search engine results for the website may be based on various ranking functions, potentially including social impact of webpages of the website.	05-08-2014
20140129541	CONFIGURING WEB CRAWLER TO EXTRACT WEB PAGE INFORMATION - Web crawling configuration includes: obtaining a webpage comprising a plurality of receiving a user selection of a node in the webpage; presenting a set of web crawling configuration options pertaining to a web crawling action to be performed with respect to the node, the set of web crawling configuration options depending at least in part on a type of an element included in the node and comprising: a first option to perform a first web crawling action in the event that the node include a first type of the element; and a second option to perform a second web crawling action in the event that the node includes a second type of the element; receiving a user input specifying the web crawling configuration option; and storing user specified web crawling configuration option, performing the web crawling action on the node according to the user input, or both.	05-08-2014
20140136508	Computer-Implemented System And Method For Providing Website Navigation Recommendations - A system and method for providing Web site navigation recommendations is provided. A Web page of interest is identified as a destination Web page. A domain of Web pages related to the destination Web page is determined. Information is extracted from each Web page in the domain and a recommendation comprising instructions for navigating to the destination Web page is generated based on the extracted information.	05-15-2014
20140136509	PERSONALIZED SEARCH RESULT RE-RANK BASED ON RELATIONSHIP BOND STRENGTH ALTERATION AMONG DIFFERENT KEYWORDS - A method for searching includes displaying keywords on an electronic display. The keywords are from results of an internet search of search criteria. A keyword of a search result is related to another keyword of the search result with a particular bond strength and the bond strength includes an amount that keywords in a search result are related. The method includes receiving a selection of two or more of the displayed keywords, setting a bond strength between two or more of the selected keywords, and displaying search results with a bond strength of at least the selected bond strength.	05-15-2014
20140143228	TECHNIQUES FOR ASCRIBING SOCIAL ATTRIBUTES TO CONTENT - Techniques for ascribing social attributes to content items and for selecting content to display in a content feed are described. According to various embodiments, accessing one or more content items accessible via a network are accessed, each of the content items having received one or more social activity signals. Thereafter, members of an online social network service that submitted the social activity signals may be identified. Member profile data identifying member profile attributes of the members cemented the social activity signals may then be accessed. Thereafter, social attribute information may be generated and associated with each of the content items, the social attribute information identifying the member profile attributes of the members that submitted the social activity signals associated with each of the content items.	05-22-2014
20140149379	Search Engine Optimization Technique To Obtain Better Webpage Ranking On Major Search Engines - This system provides a web site with favorable web ranking on a local basis by major search engines. It does this by providing a web page containing the proprietary Question & Answer content section (this is the content section in which this patent will cover). The Q&A section contains content about the business, product(s) and location (city, state) and also contains embedded contextual web links that links to similar content within the same business vertical determined by SIC (Standard Industry Code). These outbound links contains primary keywords that the business wants to obtain optimal ranking and will link other businesses within the same state but not the same city so that direct local area competitors are not linked.	05-29-2014
20140149380	METHODS AND APPARATUSES FOR DOCUMENT PROCESSING AT DISTRIBUTED PROCESSING NODES - Briefly, the disclosure describes embodiments of methods or apparatuses for document processing at distributed processing nodes.	05-29-2014
20140149381	SYSTEM FOR DETECTING LINK SPAM, A METHOD, AND AN ASSOCIATED COMPUTER READABLE MEDIUM - A system for determining whether a website is an illegitimate website, the system comprising: a requester module configured to request one or more rules from a host server for a website and to receive a response from the host server in response to a request; an analysis module configured to determine whether a response or lack of a response received by the requester module indicates that the website is an illegitimate website; and a record module configured to store an indication that the website is an illegitimate website, wherein the one or more rules provide one or more instructions to a robot computer program regarding access of the website by the robot computer program.	05-29-2014
20140149382	Technology for Web Site Crawling - A web site page has a reference for providing an address for a next page. The web site is crawled by a crawler program, which parses the reference from one of the web pages and sends the reference to an applet running in a browser. The address for the next page is determined by the browser responsive to the reference and is sent to the crawler. The crawler selects non-hypertext-link parameters from the web page of the web site server by performing a programmed action sequence, including selecting items from lists of the web page in a particular sequence. The crawler sends the applet running in the browser, for the query to the web server for the next page referenced by the one web page, the selected parameters and a context arising from the particular sequence.	05-29-2014
20140156626	EMBEDDED EXTERNALLY HOSTED CONTENT IN SEARCH RESULT PAGE - Architecture that enables user interaction in a search engine results page (SERP) with externally hosted content. A control hosted by an external data source is seamlessly embedded within the SERP, and then functions transparently as if the control were an internally hosted control. The architecture includes the capability to trigger the addition of the control to the SERP, embed the control within the SERP, and enable the control to interact with the SERP. To seamlessly embed the external control, a key is created that uniquely identifies the external control. The key is encoded and injected into the web document index. At query time, the key is detected and causes a link to be rendered within the SERP. When the querying user selects on the link, other elements in the SERP are moved aside to make room for the external control.	06-05-2014
20140156627	MAPPING OF TOPIC SUMMARIES TO SEARCH RESULTS - Architecture that facilitates the mapping of multimedia topic summaries from one data source, to web search results from another data source. An algorithmic technique is provided that discriminates (selects) between topic summaries that are wanted and not wanted for presentation to the search engine user based on a predetermine set of characteristics or features. Topic summaries are each pre-associated with a topic identifier. A page identifier is extracted from or created for the webpage that is used to match the topic identifiers to the correct webpage. The page identifier is aligned with the topic identifier of the topic summaries to find matches between topic summaries and webpages. Once alignment is completed, the correct topic identifier is inserted into the internal extended representation (e.g., associated with the content header) of every webpage, which enables the subsequent fetch of the topic summaries for display to users.	06-05-2014
20140156628	SYSTEM AND METHOD FOR DETERMINATION OF CAUSALITY BASED ON BIG DATA ANALYSIS - A method and system for determining causality based on big data analysis are provided. The method comprises extracting a plurality of unstructured data elements from a plurality of unstructured big data sources; generating at least one signature for each of the plurality of unstructured data elements; identifying at least one common pattern within the signatures of the plurality of unstructured data elements; matching the at least one common pattern to at least one hypothesis by comparing at least one signature of the common pattern to at least one hypothesis; and determining the causality of the at least one common pattern based on the at least one hypothesis matching the at least one common pattern.	06-05-2014
20140156629	METHOD FOR MANAGING INFORMATION - A method for managing the exchange of information is provided, wherein the method includes receiving at least one information location identifier, wherein the at least one information location identifier may be associated with at least one information portal and associating with at least one network browser. The method further includes generating an information location identifier template responsive to the at least one information portal and communicating with the at least one information portal to identify resultant information.	06-05-2014
20140164349	DETERMINING CHARACTERISTIC PARAMETERS FOR WEB PAGES - A computer receives a search request, wherein the search request contains one or more parameters that allow a search to be performed. Responsive to the search request, the computer identifies a plurality of web pages connected by a plurality of links. The computer determines the number of links in the longest path that connects at least a portion of the plurality of web pages, wherein the longest path includes a sequence of at least two web pages of the plurality of web pages connected by a link of the plurality of links. The computer determines the number of links included in a web page of the plurality of web pages.	06-12-2014
20140164350	DIRECT PAGE VIEW MEASUREMENT TAG PLACEMENT VERIFICATION - Disclosed herein are strategies for verifying placement of a direct measurement tag useful for measuring Internet traffic of a plurality of users at a website. For example, a method may include receiving web page identification data that is derived from user clickstream data, determining a URL associated with a domain based on the webpage identification information, and providing a measurement code verification web crawler with the URL and the depth to which to explore the domain for verifying measurement code placement with the web crawler.	06-12-2014
20140172820	METHOD FOR SUMMARIZING EVENT-RELATED TEXTS TO ANSWER SEARCH QUERIES - A method and apparatus for receiving training data that comprise a plurality of event-and-time-specific texts that are contextually related to a plurality of events; iteratively processing the training data to generate a modified network model that defines a plurality of states; receiving additional data that comprise a plurality of additional event-and-time-specific texts that are contextually related to a particular event; processing the additional data by applying the modified network model to the additional data to identify, within the plurality of additional event-and-time specific texts, a particular set of texts that belong to a particular state of the plurality of states; identifying, within the particular set of texts, one or more texts that are most representative of all texts in the particular set of texts that belong to the particular state; wherein the method is performed by one or more special-purpose computing devices.	06-19-2014
20140181070	PEOPLE SEARCHES USING IMAGES - Methods, systems, and computer-readable media for resolving a search query for a person using an image of the person are provided. An image index containing web images and links to the web images is created. Identifiers of the web images are mapped to the links to the web images and stored in the image index. A search query for a person is received. Upon recognizing the intent of the search query is to find information about the person, at least one digital image related to the person is selected, and an identifier of the digital image is submitted to the image index. The identifier of the digital image is compared against the identifiers of the stored web images and determined to correspond to an identifier of a web image. A link mapped to the identifier of the web image is read and distributed for presentation to a user.	06-26-2014
20140188837	Application Identification Method, and Data Mining Method, Apparatus, and System - A data mining method, apparatus, and system are provided. The method includes: obtaining to-be-processed data, where the to-be-processed data includes records, and each record includes application information and remote end triplet information; performing clustering processing on records with same remote end triplet information and same application information, and according to the records with the same remote end triplet information and the same application information, calculating a service load amount corresponding to the remote end triplet information and the application information to obtain a clustering result including the remote end triplet information, the application information, and the service load amount; according to the service load amount or a proportion of the service load amount, selecting remote end triplet information and application information that have high reliability from the clustering result; and sending the remote end triplet information and application information that have high reliability to a deep packet inspection (DPI) subsystem.	07-03-2014
20140201185	HYBRID METHOD OF BUILDING TOPIC ONTOLOGIES FOR PUBLISHER AND MARKETER CONTENT AND AD RECOMMENDATIONS - Systems and methods are discussed to automatically create a domain ontology that is a combination of ontologies. Some embodiments include systems and methods for developing a combined ontology for a website that includes extracting collocations for each webpage within the website, creating first and second ontologies from the collocations, and then aggregating the ontologies into a combined ontology. Some embodiments of the invention include unique ways to calculate collocations, to develop a smaller yet meaningful document sample from a large sample, to determine webpages of interest to users interacting with a website, and to determine topics of interest of users interacting with a website. Various other embodiments of the invention are disclosed.	07-17-2014
20140207753	METHOD AND SYSTEM THAT ROUTES REQUESTS FOR ELECTRONIC FILES - A system, method, or computer-readable medium provide a look-up table having information on roots in repositories managed by a repository manager, the roots information in the look-up table being only n-levels deep. A file request is received, including filename and filepath with root. Before checking repositories managed by the repository manager for the requested file, the look-up table is referenced to determine whether the root of the requested file exists on one of the repositories managed by the repository manager. A check of the repository is bypassed when the look-up table does not indicate that the root exists on the repository. The repository is checked for the requested file, when the root is indicated as existing on the repository. The requested file is returned, if actually found on one repository. A “fail” response is returned, if the root is not indicated as existing in the look-up table.	07-24-2014
20140214789	CALCULATING A CONTENT SUBSET - A method for calculating a content subset can include crawling a number of webpages for content, determining a relevance to a particular domain of the content, determining a penalty value for each of the number of webpages; and calculating, utilizing a data tree-based model, a subset of the content to analyze based on the relevance and the penalty value.	07-31-2014
20140214790	ENHANCING SITELINKS WITH CREATIVE CONTENT - Methods and systems for enhancing online content with creative text relevant to the online content are provided. A plurality of candidate sitelinks is identified in response to a user search for online content. Each sitelink has associated with it a plurality of candidate creatives with which the sitelink may be presented to the user. The creatives are canonicalized to form clusters of candidate creatives. The sitelinks are also canonicalized. The creatives are matched to the candidate canonicalized sitelinks so as to provide enhanced sitelinks having increased relevance to the user search.	07-31-2014
20140222775	SYSTEM FOR CURATION AND PERSONALIZATION OF THIRD PARTY VIDEO PLAYBACK - A system and method for providing an improved video experience are disclosed. The system can comprise one or more algorithms for choosing and providing videos, or other media, to users based on a number of inputs. The system can provide a list of videos from which undesirable (e.g., offensive or low-quality) videos have been removed. The system can also create custom channels based on user preferences. The system can also update user preferences in real-time based on user feedback during use.	08-07-2014
20140222776	Document Reuse in a Search Engine Crawler - Systems and method are provided for setting a respective reuse flag for a corresponding document in a plurality of documents based on a query-independent score associated with the corresponding document. A document crawling operation is performed on the plurality of documents in accordance with the reuse flag for respective documents in the plurality of documents. This document crawling operation includes reusing a previously downloaded version of a respective document in the plurality of documents instead of downloading a current version of the respective document from a host computer in accordance with a determination that the reuse flag associated with the respective document meets a predefined criterion.	08-07-2014
20140244615	Search and Storage Engine Having Variable Indexing for Information Associations - An apparatus, system and method for an open indexing system, which includes an indexing engine associated with at least one processor and having one or more open inputs for inputting of indexing criteria, at least one computerized search engine for obtaining information across at least one computing network in accordance with the indexing criteria, at least one repository comprising at least one computing memory for storing information obtained via the at least one computerized search engine and corresponded to the indexing criteria, and at least one reporting engine, wherein an output of the reporting engine is manipulable responsive to modification to one or more categorizations dependent on the indexing criteria, and wherein the output is dependent solely on the information in said at least one repository.	08-28-2014
20140250097	SYSTEMS AND METHODS FOR INDEXING AND SEARCHING REPORTING DATA - A data management system for indexing reporting data of a contact center is disclosed. The data management system includes one or more reporting systems configured to store the reporting data. The data management system further includes a crawler configured to collect the reporting data from the one or more reporting systems. The data management system further includes one or more plug-in interfaces configured to enable the crawler to retrieve the reporting data from the one or more reporting systems. The data management system further includes an indexing server configured to index and store the collected contact center reporting data.	09-04-2014
20140250098	SYSTEM AND METHOD FOR INDEXING MOBILE APPLICATIONS - A system and method for indexing applications accessible through a user device are provided. The system includes crawling through a plurality of data sources to detect applications accessible through a user device; for each detected application, generating metadata characterizing the application; analyzing the generated metadata to classify the application to at least one category; and updating an application index to include at least the index application and the respective classified category.	09-04-2014
20140258261	LANGUAGE-ORIENTED FOCUSED CRAWLING USING TRANSLITERATION BASED META-FEATURES - A web page identified by a URL stored in a downloads queue is downloaded, and hyperlinks in the downloaded web page are identified. Each hyperlink is screened by parsing the hyperlink (optionally only the URL of the hyperlink) to identify features comprising character strings, computing for each feature values for one or more meta-features indicative of the hyperlinked web page being in a target language, aggregating the meta-feature values to generate a score for the hyperlink, and adding the URL of the hyperlink to the downloads queue conditional upon the score satisfying a screening criterion. The downloading, identifying, and screening are iteratively repeated to perform web crawling, and an index of web pages in the target language is constructed based on analysis of content of the downloaded web pages. The meta-features may include a transliterated target word meta-feature, a language code meta-feature, a country code meta-feature, or so forth.	09-11-2014
20140258262	Method and Computer Readable Medium for Providing, via Conventional Web Browsing, Browsing Capability for Search Engine Web Crawlers Between Remote/Virtual Windows and From Remote/Virtual Windows to Conventional Hypertext Documents - A method and computer readable medium is described for directing a search engine web crawler's local web browser to refresh the top-level container that is currently displaying the content presented by a remote computer with the new content that a navigational link, within a remote desktop, remote application window, or remote graphical windowing user session, points to. Links can be modified so as to be recognizable by the remote machine as unique from traditional hyperlinks. Upon navigation action on such a link, the client of a remote desktop, remote graphical application window, or remote graphical windowing user session is redirected so that it wholly reloads its computing context with that provided by a destination URL or URI. Such a URL or URI may point to another remote desktop, remote application window, or remote graphical windowing user session.	09-11-2014
20140280009	METHODS AND APPARATUS TO SUPPLEMENT WEB CRAWLING WITH CACHED DATA FROM DISTRIBUTED DEVICES - Methods and apparatus to supplement web crawling with cached data from distributed devices are disclosed. An example method includes accessing a first set of websites cached in a panelist device; comparing the first set of websites to a second set of websites to be analyzed by a crawler; and retrieving with the crawler a first website included in the second set of websites but not included in the first set of websites from a server associated with the first website.	09-18-2014
20140280010	SHARED MEDIA CRAWLER DATABASE METHOD AND SYSTEM - The embodiments relate to transcoding, cataloging, and extracting metadata about files stored in a storage device. In one embodiment, a crawler runs on the storage device and maintains a database that is stored in the volume with the data that has been cataloged by the crawler. The crawler may discover files of any type and extract associated metadata about the files. The crawler can extract metadata about client interaction with various files, such as edits, play counts, etc. The crawler may discover files of any type and extract associated metadata about the files automatically during a scan or at the request of a client. In one embodiment, the crawler may be responsive to file system events that indicate changes to the file system, such as additions, deletions, or other types of changes. In addition, the crawler may synchronize the database with the file system so that they indicated the same state for a particular file. Furthermore, the crawler may provide notifications to various entities regarding the state of a file.	09-18-2014
20140280011	Predicting Site Quality - Methods, systems, and apparatus, including computer programs encoded on computer storage media, for predicating a measure of quality for a site, e.g., a web site. In some implementations, the methods include obtaining baseline site quality scores for multiple previously scored sites; generating a phrase model for multiple sites including the previously scored sites, wherein the phrase model defines a mapping from phrase specific relative frequency measures to phrase specific baseline site quality scores; for a new site that is not one of the previously scored sites, obtaining a relative frequency measure for each of a plurality of phrases in the new site; determining an aggregate site quality score for the new site from the phrase model using the relative frequency measures of phrases in the new site; and determining a predicted site quality score for the new site from the aggregate site quality score.	09-18-2014
20140280012	CREATING RULES FOR USE IN THIRD-PARTY TAG MANAGEMENT SYSTEMS - Methods and system allow for creating rules for a tag management system. One or more implementations create rules for a tag management system can include crawling a page of a website. Additionally, one or more implementations identify the configuration of each of the tags implemented within the page. Further, one or implementations generate one or more rules that enable a tag management system to recreate the configuration of one or more tags implemented within the page. Further still, one or more implementations export the generated one or more rules to a tag management system.	09-18-2014
20140297617	METHOD AND SYSTEM FOR SUPPORTING GEO-AUGMENTATION VIA VIRTUAL TAGGING - A system and method provide for geo-augmentation through virtual tagging. A search infrastructure supports creation, managing and searching geo-coded virtual tags using mobile communication devices. Associated geolocations are added to a geolocation database along with pointers to the stored content. Searching of the geolocation database is performed upon receiving geolocation search input, wherein the infrastructure applies the geolocation based search input to the search database yielding search results delivered from the mobile communications device for presentation to the user.	10-02-2014
20140304249	EXPERT DISCOVERY VIA SEARCH IN SHARED CONTENT - Determining experts based on a search query of a user includes identifying items in a content collection that correspond to the search query, determining authors of the items, and ranking the authors according to relevance to the search query for each of the items for each of the authors. Determining experts based on a search query of a user may also include complementing the query with additional public search results prior to identifying the items. Complementing the query may include using an external data source to search based on the query. The external data source may be selected from the group consisting of Google Search, Yahoo Search, and Microsoft Bing. Determining experts based on a search query of a user may also include presenting the authors to the user in order of ranking The query may be a natural language query.	10-09-2014
20140310257	SYSTEM AND METHOD FOR INDEXING AND DISPLAYING DOCUMENT TEXT THAT HAS BEEN SUBSEQUENTLY QUOTED - A computerized system and method is presented for analyzing quotations made in a quoting document of text originally found in a source document. The quoting document and source document can be web pages publicly available on the World Wide Web. The present invention analyzes the quoting document for quoted text, searches the source document for that text, and stores the existence of the quotation in association with the source document. When displaying the source document, quoted text is highlighted. A link is provided between items of quoted text and a list of documents that have quoted that text. From this list the full text of a quoting document may be displayed.	10-16-2014
20140324815	SEARCH INFRASTRUCTURE REPRESENTING HOSTING CLIENT DEVICES - A system and method for supporting searching of client device hosted content. A search infrastructure supports creation, managing and searching of client device hosted content. A client device, which hosts content, communicates its client device identification (ID), type and access restrictions to the search infrastructure. In addition, the client device communicates a global network route to the client device content as a pointer for the search engine to provide a search requestor access to both the client device and specified content. Client device information is also provided to a client device registry accessible by the search infrastructure, for example a registry maintained in a cloud based service. Client devices can enter into client device services agreement with a third party storage system for the purposes of providing a higher probability that their client device hosted content will be available.	10-30-2014
20140324816	EXTENDED WEB SEARCH INFRASTRUCTURE SUPPORTING HOSTING CLIENT DEVICE STATUS - A system and method is provided for internet searching infrastructures and more particularly to hosted client device status supporting the delivery of search results hosted by a client device. A registry table retains client device status information so that when a search result includes specific device hosted content, that client device's status will be known. Client device status includes sleep, offline, predicted period of availability, do-not-disturb (DnD), power availability, or busy along with other status indications.	10-30-2014
20140324817	PREPROCESSING OF CLIENT CONTENT IN SEARCH INFRASTRUCTURE - A system and method is provided to distribute preprocessing of client device content. The client device performs preprocessing or alternatively transfers search accessible content to remote systems for preprocessing such as search system infrastructure, set-top boxes, other client devices, etc. Client device content is preprocessed so as to provide, for example, a preview of images available by providing thumbnails of the images, small excerpts of text or a video preview. Offloading of client device content preprocessing duties reduces web server operational requirements and subsequent power needs. Additionally, preprocessing of searchable content can be distributed across multiple content hosts and search infrastructure elements.	10-30-2014
20140324818	Scheduler for Search Engine Crawler - Systems and methods for scheduling document crawling are provided in which a list of document identifiers is obtained. Each respective document identifier identifies a corresponding document on a network. For each respective document identifier in the list of document identifiers, a content change frequency of the corresponding document is determined and a first score for the document identifier that is a function of the determined content change frequency of the corresponding document is also determined. The first score is compared against a threshold value. The document is scheduled for crawling based on the result of the comparison. The content change frequency for a respective document identifier is determined by comparing information stored for successive downloads of the document corresponding to the document identifier.	10-30-2014
20140337309	Multi-Dimensional Query Expansion Employing Semantics and Usage Statistics - Embodiments relate to systems and methods employing personalized query expansion to suggest measures and dimensions allowing iterative building of consistent queries over a data warehouse. Embodiments may leverage one or more of: semantics defined in multi-dimensional domain models, user profiles defining preferences, and collaborative usage statistics derived from existing repositories of Business Intelligence (BI) documents (e.g. dashboards, reports). Embodiments may utilize a collaborative co-occurrence value derived from profiles of users or social network information of a user.	11-13-2014
20140344241	USER-ENHANCED RANKING OF INFORMATION OBJECTS - A method for user-enhanced ranking of information objects, comprising: generating a graphical user-interface (	11-20-2014
20140344242	SYSTEMS, DEVICES, AND METHODS FOR PROVIDING MULTIDIMENSIONAL SEARCH RESULTS - Embodiment of the disclosure may includes systems, methods, and devices for providing multidimensional search results on a plurality of search planes. Such systems, methods, and devices may: (i) receive one or more search terms from one or more user interfaces of the system; (ii) perform a search of one or more informational repositories to obtain a list of search results wherein the informational repositories may include the Internet and one or more databases; (iii) process the list of search results to classify each search result in one of a plurality of categories; (iv) cause a presentation of the search results in a plurality of search planes on the display of the system such that each search plane corresponds to one of the plurality of categories. In addition, the software applications may include a sorting software application that groups the list of search results into one of a plurality of categories.	11-20-2014
20140351235	SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR CRAWLING A WEBSITE BASED ON A SCHEME OF THE WEBSITE - A system, method, and computer program product are provided for crawling a website based on a scheme of the website. In use, a difference between a first content and second content of a website is identified. Additionally, a scheme of the website is identified based on the difference. Furthermore, the website is crawled based on the scheme.	11-27-2014
20140351236	METHOD AND DEVICE FOR WEBSITE SEARCHING ON A WEB BROWSER - A method, apparatus, server and system for websites searching in a browser of a mobile terminal is presented. The method includes the steps of: loading one or more preconfigured website search engine information for generating a website search engine list on a browser search bar; receiving information on which website search engine has been selected from the generated website search engine list; receiving a search keyword input to the browser search bar; sending a search request to the selected website search engine to query the received search keyword; and displaying a search result returned by the selected website search engine upon a successful search.	11-27-2014
20140351237	SYSTEMS AND METHODS FOR CREATING, NAVIGATING, AND SEARCHING INFORMATIONAL WEB NEIGHBORHOODS - Systems and methods for the creation of hierarchical networks of overlapping informational web neighborhoods using percolation crawling. Each neighborhood comprises a set of closely linked pages that share a common set of concepts and intent and purpose. The neighborhoods represent web pages that share a common set of underlying concepts and semantic associations. Each such neighborhood can be semantically tagged.	11-27-2014
20140358887	APPLICATION CONTENT SEARCH MANAGEMENT - A search service accesses application content accessible via one or more enumerated applications. The search service ranks the accessed application content in combination with non-application content to produce a combined ranking. Responsive to a search query, the search service provides one or more search results based on the combined ranking.	12-04-2014
20140358888	Method, System, And Computer Program Product For Monitoring Online Reputations With The Capability Of Creating New Content - The present invention provides the capability to quickly and easily determine the online reputation a target, and to quickly and easily take steps to improve the online reputation of the target. For example, a method of monitoring and affecting online reputation may comprise gathering information potentially related to an online reputation of a target, filtering the gathered information to eliminate information not related to the target, computing a reputation score for the filtered information based on both positive and negative information related to the target, generating positive information relating to the target, and distributing the generated positive information relating to the target to a plurality of online locations.	12-04-2014
20140365459	Harvesting Addresses - Some embodiments of the invention provide an address harvester that harvests addresses from one or more applications executing on a device. Some embodiments use the harvested addresses to facilitate the operation of one or more applications executing on the device. Alternatively, or conjunctively, some embodiments use the harvested addresses to facilitate the operation of one or more applications executing on another device than the one used for harvesting the addresses. In some embodiments, a prediction system uses the harvested addresses to formulate predictions, which it then provides to the same set of applications from which it harvested the addresses in some embodiments.	12-11-2014
20150032717	REAL TIME IMPLICIT USER MODELING FOR PERSONALIZED SEARCH - A method and apparatus for utilizing user behavior to immediately modify sets of search results so that the most relevant documents are moved to the top. In one embodiment of the invention, behavior data, which can come from virtually any activity, is used to infer the user's intent. The updated inferred implicit user model is then exploited immediately by re-ranking the set of matched documents and advertisements to best reflect the information need of the user. The system updates the user model and immediately re-ranks documents and advertisements at every opportunity in order to constantly provide the most optimal results. In another embodiment, the system determines, based on the similarity of results sets, if the current query belongs in the same information session as one or more previous queries. If so, the current query is expanded with additional keywords in order to improve the targeting of the results.	01-29-2015
20150039584	REAL-TIME SHARED WEB BROWSING AMONG SOCIAL NETWORK CONTACTS - A determination is made that each of at least two social network contacts involved in a social messaging interaction initiate a separate web search associated with the social messaging interaction. A separate set of web search results returned to each of the at least two social network contacts is captured in association with each initiated separate web search. A combined live search results view that includes each captured separate set of web search results is provided to each of the at least two social network contacts. The combined live search results view provides navigation to web content returned to other social network contacts.	02-05-2015
20150066893	SYSTEMS AND METHODS FOR ATTRIBUTING PUBLISHERS FOR REVIEW-WRITING USERS - Methods and systems for tracking end users who submit reviews are provided. In some embodiments, reviews are submitted by end users via a reviewing application that reports review submission to a tracking system. In some embodiments, reviews are reported to the tracking system by review web sites that receive the reviews. In some embodiments, the tracking system uses a web crawler to retrieve review information from review web sites. User click records are used to attribute user acquisition to ad providers, and an amount of a reward granted for acquiring a given user may be altered based on records of reviews submitted by the given user.	03-05-2015
20150066894	Automatically Modifying a Custom Search Engine for a Web Site Based on Administrator Input to Search Results of a Specific Search Query - Automatically creating and modifying a search engine for a website. User input may be received specifying an address of a website. A search engine may be automatically created for the website based on the user input. Webpages of the website may specify a plurality of tags specifying custom attributes of the webpages. During creation of the search engine, these custom attributes may be incorporated into the search engine index. Additional user input may be received customizing the search engine for various search engine contexts, e.g., based on the custom attributes of the webpages. Search engine results for the website may be based on various ranking functions, potentially including social impact of webpages of the website.	03-05-2015
20150066895	SYSTEM AND METHOD FOR AUTOMATIC FACT EXTRACTION FROM IMAGES OF DOMAIN-SPECIFIC DOCUMENTS WITH FURTHER WEB VERIFICATION - Provided are systems and methods for building a domain-specific facts network. A system includes an optical character recognition (OCR) system configured to perform OCR on an image of a domain-specific document. The system also includes an OCR results analysis system configured to analyze the results of OCR of the domain-specific document. The system also includes a fact extraction system configured to extract data from the domain-specific document based on the analysis of the results of the OCR. The system also includes a web fact extraction system configured to extract data from the Internet; wherein the data is related to the data in the domain-specific document. The system also includes a validation system configured to validate data extracted from the domain-specific document and the Internet. The validated data is stored in a domain-specific facts network.	03-05-2015
20150074078	PROVIDING ENHANCED CONNECTION DATA FOR SHARED RESOURCES - Embodiments are directed to establishing a metadata repository that aggregates metadata for a plurality of data sources, inferring data source metadata at a metadata repository and to providing recommendations to data managers based on aggregated inputs. In one scenario, a computer system establishes a reference to one or more data sources, where each data source includes data elements. The computer system receives a data request for specified data elements stored on the data sources and accesses the established references to determine which data source the specified data elements are stored on. The computer system then retrieves at least one of the specified data elements from its determined data source and sends the retrieved data elements to a specified computer system, along with an indication of additional data elements that are relevant to the received data request, and a further indication of how those additional data elements are to be accessed.	03-12-2015
20150081664	DETERMINING AUDIENCE MEMBERS ASSOCIATED WITH A SET OF VIDEOS - Determining a video audience is disclosed, including: identifying a set of videos based at least in part on a received criterion; querying a video database to retrieve engagements associated with each of at least a subset of the set of videos; identifying a set of audience members associated with the engagements associated with each of the at least subset of the set of videos; and querying a user database to gather events associated with each of at least a subset of the set of audience members.	03-19-2015
20150095304	CRAWLING COMPUTER-BASED OBJECTS - Crawling computer-based objects is implemented by identifying a dependency between a first portion of a computer-based object set and a second portion of the computer-based object set, where the second portion is data-dependent on the first portion, and responsive to identifying the dependency, effecting a crawling of the first portion and thereafter a crawling of the second portion.	04-02-2015
20150095305	DETECTING MULTISTEP OPERATIONS WHEN INTERACTING WITH WEB APPLICATIONS - Detecting multistep operations when interacting with web applications is performed by identifying a set of multiple web pages of a web application, where the web pages in the set of multiple web pages are sequentially navigable, identifying a group of multiple web page elements at the same relative location in each of the web pages in the set of multiple web pages, determining that the identified groups of web page elements are similar to each other in accordance with a predefined similarity criterion, identifying an element that is common to each identified group of web page elements, and determining that a characteristic of the element is uniquely varied in each of the identified groups of web page elements.	04-02-2015
20150100563	METHOD FOR RETAINING SEARCH ENGINE OPTIMIZATION IN A TRANSFERRED WEBSITE - Systems and methods for implementing changes to a website without losing the indexing status and accumulated SEO metrics for web pages of the website may include creating a page mapping table that associates old web page URLs with new web page URLs. Old web page URLs may be obtained by crawling the website or by searching the indexing cache of one or more search engines. The old web page URLs are saved as source paths in the table. New web page URLs may be manually associated with the source paths as destination paths in the table, or the destination paths maybe automatically obtained. A web server or a reverse proxy server uses the page mapping table to send 301 redirects to devices that request the old web pages. Usage data of the new web page may be collected and analyzed to determine if an automatically identified destination path is correct.	04-09-2015
20150100564	SEARCH QUERY OBFUSCATION VIA BROADENED SUBQUERIES AND RECOMBINING - System, method, and computer program product to perform an operation to obfuscate search queries via broadened subqueries and recombining, by referencing an ontology to identify a set of generalized terms corresponding to at least one term of a received query, generating a plurality of subqueries based on the received query and the set of generalized terms, executing each of the plurality of subqueries to retrieve a result set for each respective subquery, and filtering the result sets using the received query to produce a result set responsive to the received query.	04-09-2015
20150106356	IDENTIFICATION OF DISTRIBUTED USER INTERFACE (DUI) ELEMENTS - Technologies are generally described to develop and implement a searchable knowledge source to identify distributed user interface (DUI) elements. In some examples, a DUI identification system may receive a control record of an application and populate one or more searchable knowledge sources based on an application description retrieved. The application description may include keywords, input elements, and output elements, and the searchable knowledge sources may be generated from control records of a multitude of applications. The DUI identification system may execute a query on the searchable knowledge sources based on the received keywords, input elements, and output elements associated with a target workflow from a requesting client. A query result that includes one or more DUI elements may be provided to the requesting client. The DUI elements may connect the input elements to corresponding output elements and match the keywords associated with the target workflow.	04-16-2015
20150106357	CONFIGURING WEB CRAWLER TO EXTRACT WEB PAGE INFORMATION - Web crawling configuration includes: obtaining a webpage comprising a plurality of receiving a user selection of a node in the webpage; presenting a set of web crawling configuration options pertaining to a web crawling action to be performed with respect to the node, the set of web crawling configuration options depending at least in part on a type of an element included in the node and comprising: a first option to perform a first web crawling action in the event that the node include a first type of the element; and a second option to perform a second web crawling action in the event that the node includes a second type of the element; receiving a user input specifying the web crawling configuration option; and storing user specified web crawling configuration option, performing the web crawling action on the node according to the user input, or both.	04-16-2015
20150112961	User Submission of Search Related Structured Data - Methods and apparatus related to obtaining search related structured data from a user. A user submitted update instruction may identify at least one URL and provide access to associated user supplied search related structured data. An associated record in a database may be modified by including the user supplied search related structured data in the record. The record is related to the URL and the database may be a structured data database associated with a search engine.	04-23-2015
20150112962	SYSTEM AND METHOD FOR LAUNCHING APPLICATIONS ON A USER DEVICE BASED ON THE USER INTENT - A method and system for launching applications on a user device responsive to a user intent are configured. The method includes receiving at least one environmental variable; analyzing the at least one environmental variable to determine the user intent, wherein the user intent represents a current topic of interest of a user of the user device; matching the determined user intent against an applications index to find at least a category of interest that best matches the determined user intent; and selecting an application associated with the matching category of interest; and causing a launch of the selected application on the user device.	04-23-2015
20150120692	METHOD, DEVICE, AND SYSTEM FOR ACQUIRING USER BEHAVIOR - Embodiments of the present invention provide a method, a device, and a system for acquiring a user behavior. In the embodiments of the present invention, an acquired URL request matches a database, and the database stores a URL actively initiated by a user recognized by adopting a web crawler technology. If a URL contained in the URL request matches a corresponding URL actively initiated by a user in the database, it may be determined that the URL request is actively initiated by the user. Therefore, a network forwarding device or a server can rapidly and accurately acquire a behavior that a user actively initiates a URL request so as to further analyze a user behavior.	04-30-2015
20150134634	SEARCH RESULTS BASED ON AN ENVIRONMENT CONTEXT - A computer performs a search and generates a context-aware search result. The computer crawls a plurality of servers to fetch a plurality of knowledge documents, parses the plurality of knowledge documents, and indexes the plurality of parsed knowledge documents in a search index. Parsing can include annotating at least one of the plurality of knowledge documents, and indexing can include building a term index and an annotation index. The computer receives from a requestor a search request including a search term, and requests and receives a context of an asset environment associated with the requestor. The computer determines a context-aware search result based, at least in part, on the search term, on the context, and on information stored in the search index, and transmits the context-aware search result to the requestor.	05-14-2015
20150134635	SEARCH RESULTS BASED ON AN ENVIRONMENT CONTEXT - A computer performs a search and generates a context-aware search result. The computer crawls a plurality of servers to fetch a plurality of knowledge documents, parses the plurality of knowledge documents, and indexes the plurality of parsed knowledge documents in a search index. Parsing can include annotating at least one of the plurality of knowledge documents, and indexing can include building a term index and an annotation index. The computer receives from a requestor a search request including a search term, and requests and receives a context of an asset environment associated with the requestor. The computer determines a context-aware search result based, at least in part, on the search term, on the context, and on information stored in the search index, and transmits the context-aware search result to the requestor.	05-14-2015
20150134636	SYSTEM AND METHOD FOR AGGREGATING AND RANKING DATA FROM A PLURALITY OF WEB SITES - System and method for collecting information from a plurality of related sites, analyzing the information and storing the relevant information in a data base for future use. According to one embodiment of the present invention, the system uses the provided list of sites, whether obtained automatically or separately, queries them and analyzes the result retrieved from each site. The information may also optionally and preferably be ranked.	05-14-2015
20150294020	SYSTEM AND/OR METHOD FOR EVALUATING NETWORK CONTENT - A system and associated methods for use with the system for evaluating network content over a communications network. The system includes: at least one storage unit operable to store and/or maintain a plurality of forum facilities, each of the forum facilities being independently associated with a network location that contains network content; at least one processor operable to execute software that maintains and controls access to the forum facilities for a number of users; and, at least one input/output device operable to provide an interface for the users to operate the software in order to retrieve and view the forum facilities from the storage unit while simultaneously retrieving and viewing the network content from selected network locations via the communications network. The forum facilities include user generated content received from at least one user regarding network content available at selected network locations.	10-15-2015
20150324478	DETECTION METHOD AND SCANNING ENGINE OF WEB PAGES - The present invention discloses a method for detecting web pages and a scanning engine, wherein the method for detecting web pages comprises: crawling the URL or content of a target web site, determining the web page of the web site by a returned result, and accessing the web page; judging whether the accessed web page conforms to at least one of the following rules: a general exception page rule, a custom exception page rule and a custom exception page behavior rule; if so, determining the accessed web page as an exception page. Through the embodiments of the present invention, the effect of accurately judging the exception pages can be realized.	11-12-2015
20150331914	PERSONALIZED ACTIVITY DATA GATHERING BASED ON MULTI-VARIABLE USER INPUT AND MULTI-DIMENSIONAL SCHEMA - A personalized activity data retrieval system and method provides users a platform to search activity data based on multi-variable user input. The present invention provides a search method where the system searches a database to gather activity information based on user interests and user attributes. A customization of search results are applied multi-dimensionally to customize the search result based on user interest and user attributes. As such, the search results are personalized to meet the user's search objective. Searches conducted with the same topic can be returned with different results for different users having varying attributes. Search results are more progressive such that they are more usable and the granularity of the customization increases.	11-19-2015
20150331947	BIDIRECTIONAL HYPERLINK SYNCHRONIZATION FOR MANAGING HYPERTEXTS IN SOCIAL MEDIA AND PUBLIC DATA REPOSITORY - A method for bidirectional hyperlink management of a hypertext associated with an on-line media is provided. The method may include searching the on-line media for at least one keyword associated with the hypertext. The method may also include scanning a website associated with the hypertext based on the search of the at least one keyword. The method may further include locating at least one dead-link uniform resource locator (URL) associated with the scanned website. Additionally, the method may include managing the at least one located dead-link based on a set of pre-defined rules associated with the on-line media.	11-19-2015
20150339378	SYSTEM AND METHOD FOR KEYWORD FILTERING - The present invention discloses a system and a method for filtering keywords. The system comprises: a text acquisition module configured to acquire text content to be filtered; a scanning module configured to scan the text content to be filtered, if the text content to be filtered contains keyword(s), record a position of each keyword in the text content to be filtered and acquire character pitch between keywords in the text content to be filtered according to the position of each keyword in text content to be filtered; and a pitch judgment module configured to judge whether the character pitch exceeds a preset character pitch, if not, filter the keyword(s) in the text content to be filtered. The invention improves identification capability for sensitive information and improves filtering adaptability for sensitive information by obtaining character pitch between keywords in text content to be filtered and judging character pitch.	11-26-2015
20150347429	Managing Searches for Information Associated with a Message - A method for managing information about a product. A processor searches documents for a location of a message of the product using a set of rules that are based on instructions for generating the message. The instructions are in a resource of the product. The processor then adds the location to an index of locations of the message in the documents.	12-03-2015
20150347602	POLICY BASED POPULATION OF GENEALOGICAL ARCHIVE DATA - An approach for managing a family tree archive is provided. The approach includes creating an electronic archive based on a family tree. The approach also includes automatically discovering Internet-based data associated with at least one member of the family tree. The approach additionally includes adding the Internet-based data to the archive. The approach further includes storing the archive at a storage device.	12-03-2015
20150356190	WEB DOCUMENT ENHANCEMENT - A method for enhancing a presentation of a network document by a client terminal with real time social media content. The method comprises analyzing a content in a web document to identify a relation to a first of a plurality of multi participant events documented in an event dataset, each of the plurality of multi participant events is held in a geographical venue which hosts an audience of a plurality of participants, matching a plurality of event indicating tags of each of a plurality of user uploaded media content files with at least one feature of the first multi participant event to identify a group of user uploaded media content files selected from the plurality of user uploaded media content files, and forwarding at least some members of the group to a simultaneous presentation on a browser running on a client terminal and presenting the web document.	12-10-2015
20150379018	COMPUTER-GENERATED SENTIMENT-BASED KNOWLEDGE BASE - A computer-automated method and system of providing a searchable knowledge base with decision-relevant attributes (including some subjective or sentiment-based attributes) for a plurality of individual items within a choice set are described. First, information (including texts) relevant to the plurality of individual items in the choice set is harvested from Internet sources. Next, normalized representations of statements are extracted from excerpts of the harvested texts that pertain to attributes of interest for the choice set, and corresponding scores for the attributes are derived from each of the normalized representations. The scores derived from the various harvested sources are aggregated for each attribute of each item. Finally, the knowledge base of the plurality of individual topics is generated.	12-31-2015
20160012153	CAPTURING RUN-TIME METADATA	01-14-2016
20160026713	Ranking External Content on Online Social Networks - In one embodiment, a social-networking system may access an enhanced search index of an online social network. The enhanced search index may include data from a social graph having a plurality of nodes and a plurality of edges connecting the nodes, where the nodes comprise a plurality of internal nodes corresponding to entities associated with the online social network, and a plurality of external nodes corresponding to objects associated with a third-party system. The social-networking system may then search the enhanced search index in response to a query received from a user to identify objects that substantially match the query. Each identified object may be scored by the social-networking system based at least in part on a connectivity of the corresponding external node to the one or more internal nodes. In response to the query, the social-networking system may send a search-results page referencing objects based on their scores.	01-28-2016
20160034581	DETECTION AND HANDLING OF AGGREGATED ONLINE CONTENT USING DECISION CRITERIA TO COMPARE SIMILAR OR IDENTICAL CONTENT ITEMS - A computer-implemented method is presented herein. The method obtains a first content item from an online source, and then generates a characterizing signature of the first content item. The method continues by finding a previously-saved instance of the characterizing signature and retrieving data associated with a second content item (the second content item is characterized by the characterizing signature). The method continues by analyzing the data associated with the second content item, corresponding data associated with the first content item, and decision criteria. Thereafter, either the first content item or the second content item is identified as an original content item, based on the analyzing. The other content item can be flagged as an aggregated content item.	02-04-2016
20160042070	Web Resource Compatibility With Web Applications - Techniques for web resource compatibility with web applications are described. According to one or more implementations, an indication of a request to navigate a web application to a web resource is received. Based on the request, a compatibility service is queried regarding compatibility status of the web resource with the web application. According to one or more embodiments, if a compatibility issue between the web resource and the web application is identified, a compatibility element is provided to mitigate the compatibility issue. At least some embodiments enable a user to provide feedback regarding presentation of the web resource by the web application with the compatibility element applied. At least some embodiments notify a developer of the web resource about a compatibility issue of the web resource with the web application.	02-11-2016
20160043913	MONITORING SOCIAL MEDIA FOR SPECIFIC ISSUES - Systems and methods of the present invention provide for one or more server computers communicatively coupled to a network and configured to: monitor one or more social media accounts; identify a specific issue common to the social media account(s) (and possibly one or more recommended remedies for the common specific issue); and generate and transmit, to a user of the social media account(s), a report identifying the instance of the common specific issue and, where applicable, the one or more recommended remedies.	02-11-2016
20160055243	WEB CRAWLER FOR ACQUIRING CONTENT - An adaptive web crawling system generates a first utility measurement based on web page snippets associated with individual search result items by crawling from a collection of web page crawling seeds and according to a specific user web crawling criteria. The system generates a second utility measurement based on features extracted from the full webpages downloaded according to the guidance of the first utility measurement results. A web page utility prediction function is introduced to forecast the second utility measurement based on the first utility measurement. The system adapts its priorities for web crawling based on the web page utility prediction function.	02-25-2016
20160055350	SYSTEM AND METHODS FOR IDENTIFYING COMPROMISED PERSONALLY IDENTIFIABLE INFORMATION ON THE INTERNET - In one embodiment, a method includes generating, by a computer system, a search-engine query from stored identity-theft nomenclature. The method also includes querying, by the computer system, at least one search engine via the search-engine query. Further, the method includes crawling, by the computer system, at least one computer-network resource identified via the querying. In addition, the method includes collecting, by the computer system, identity-theft information from the at least one computer-network resource. Additionally, the method includes processing, by the computer system, the identity-theft information for compromised personally-identifying information (PII).	02-25-2016
20160070793	SEARCHING METHOD AND SYSTEM AND NETWORK ROBOTS - The present invention proposes a searching method and system and a network robot. The method comprises: S	03-10-2016
20160070797	METHODS AND SYSTEMS FOR PRIORITIZING A CRAWL - Methods and systems for prioritizing a crawl are described. One aspect of the invention includes a method for identifying a plurality of storage locations each comprising a plurality of articles, ranking the plurality of storage locations based at least in part on events associated with the plurality of articles, and crawling the storage locations based at least in part on the ranking. Another aspect of the invention includes identifying a plurality of storage locations each comprising a plurality of articles, identifying a plurality of types of the plurality of articles, ranking the plurality of storage locations based at least in part on the plurality of types of the plurality of articles; and crawling the storage locations based at least in part on the ranking.	03-10-2016
20160078141	CUSTOMIZED SITE SEARCH DEEP LINKS ON A SERP - Systems, methods, and computer-readable storage media are provided for presenting customized deeplinks on a search engine results page (SERP) to a user via a browser in response to a website name query where the user intends to submit a task-specific query. If selected, the customized deeplink navigates the browser to a webpage of the website that is relevant to the task-specific query. Customized deeplinks are generated by comparing a history associated with the user's browser and the website query database's data. The website query database contains data associated with popular search terms mined from a website server hosting the website. Popular search terms and associated data may be mined from the website's browser log by identifying a search uniform resource locator (URL) pattern from a search form of the website and filtering browser log entries of the browser log that match the identified search URL pattern.	03-17-2016
20160085867	METHOD AND SYSTEM FOR AGGREGATING OPINIONS - A computer implemented method of and system for aggregating opinions corresponding to an organization are disclosed. According to the method, a plurality of opinions from a plurality of data sources may be received using a processor. Each data source of the plurality of data sources may include one or more opinions corresponding to the organization. Subsequently, two or more opinions of the plurality of opinions may be determined, using a processor, to be corresponding to the organization based on presence of one or more identifiers associated with the organization in each of the two or more opinions. Further, the two or more opinions may be presented, using a processor, to a user.	03-24-2016
20160092572	SEMANTIC SEARCHES IN A BUSINESS INTELLIGENCE SYSTEM - A computer-implemented method of executing a user query includes presenting a user interface to allow a user to enter a query, receiving a user-entered textual request through the interface, launching a search service to rewrite the textual request into a search query, sending the search query to a presentation server, receiving an answer to the query, and returning the answer to the user as a graphical representation. A computer-implemented method includes receiving a crawl request from a user, launching a crawl manager to monitor the crawl request and track statistics related to the crawl, starting a crawl task based upon the crawl request, indexing a business intelligence presentation server to create a data index, and storing the data index.	03-31-2016
20160092591	CLUSTERING REPETITIVE STRUCTURE OF ASYNCHRONOUS WEB APPLICATION CONTENT - A processor determines whether a DOM includes a repetitive pattern of a combination, formed by a tag of a leaf node and a tag of a parent node of the leaf node. Determining the repetitive pattern of the combination, the processor identifies a first inner cluster is identified by collapsing multiple instances of the repetitive pattern into a single instance. The processor generates a LSH signature for the single instance of the repetitive pattern. The processor determines an outer cluster, based on grouping one or more inner clusters, as part of a section rooted at a source node of the DOM, in which the source node is a parent node of the one or more inner clusters. Determining that a pair of outer clusters are near repetitive, the processor limits web content exploration to one of the pair of outer clusters.	03-31-2016
20160103913	METHOD AND SYSTEM FOR CALCULATING A DEGREE OF LINKAGE FOR WEBPAGES - A method for calculating linkage for a plurality of webpages (	04-14-2016
20160110455	IDENTIFYING CLIENT STATES - A method for identifying client states, receives a set of paths representative of a document object model (DOM) associated with a web page of a rich internet application and for each path in the set of paths received, extracts a subtree, as subtree X, for a current path. The method traverses all known sub-paths under the current path and delete corresponding subtrees from subtree X and reads contents of and determines states of subtree X to form a state X. The state X is added to a set of current states and responsive to a determination no more paths exist, returns the set of current states of the rich internet application.	04-21-2016
20160110456	INTERACTIVE WEB CRAWLER - The claimed subject matter provides a system or method for web crawling hidden files. An exemplary method comprises loading a web page with a browser agent, and executing any dynamic elements hosted on the web page using the browser agent to insert pre-determined values. A list of form controls may be retrieved from the web page using the browser agent, and the controls may be analyzed using a driver component. Form control values may be sent from the driver component to the browser agent, and an event may be submitted to the web page by the browser agent or scripted content may be run to trigger operations on the web page corresponding to the form control values. A URL may be generated for various form control values using a generalizer.	04-21-2016
20160117398	SYSTEMS AND METHODS FOR EXTRACTING SIMILAR GROUP ELEMENTS - Techniques for extracting similar group elements are described. In one embodiment, a received communication is analyzed for repeating patterns in the elements within the communication. An input may be received via a user interface identifying a particular element of the received communication. A system may then identify a particular position within a repeating pattern that is associated with the particular element. Every element within the communication that is in the same position within the repeating pattern may then be identified, stored, or output in a specified or preselected format. Various embodiments may account for multi-page response communications, various pattern recognition techniques, and automated or user-assisted systems.	04-28-2016
20160125050	SYSTEM AND METHOD FOR GENERATING SEARCH REPORTS - The present disclosure provides system and method for searching and analyzing information from a database based on a plurality of subject features. The system generates a list of relevant search results in a particular order and provides relevant excerpts from each of the search results in relation to each of the subject features, along with the association that represents the overlap between searched excerpt and the corresponding subject feature. The system may include a search report generator device that comprises a search module, a feature generation module, an analysis module, a ranking module, and a report generation module.	05-05-2016
20160125081	WEB CRAWLING - Briefly, embodiments disclosed herein may relate to Web crawling, and more particularly may relate to Web crawling for structured content, for example.	05-05-2016
20160125083	INFORMATION SENSORS FOR SENSING WEB DYNAMICS - Disclosed herein are techniques and systems for building “information sensors,” which are programmable “focused crawlers” that periodically discover, extract, analyze and aggregate structured information around a topic from the Web. A platform for building an information sensor allows a user to specify one or more data elements within a data source that the user desires to monitor, and an update frequency at which the data elements are to be extracted. Code may be generated based on the user specifications for creation and submission of the information sensor for storage in a database with metadata containing the code and update frequency. Once created, information sensors are scanned to check if running conditions are met, and if met, they may be executed by retrieving the metadata using a sensor identifier (ID). The code is executed to locate a data source, and periodically extract specified data elements therefrom to output structured time-series data.	05-05-2016
20160140236	KNOWLEDGE DISCOVERY AGENT SYSTEM - A system and method for processing information in unstructured or structured form, comprising a computer running in a distributed network with one or more data agents. Associations of natural language artifacts may be learned from natural language artifacts in unstructured data sources, and semantic and syntactic relationships may be learned in structured data sources, using grouping based on a criteria of shared features that are dynamically determined without the use of a priori classifications, by employing conditional probability constraints.	05-19-2016
20160171104	DETECTING MULTISTEP OPERATIONS WHEN INTERACTING WITH WEB APPLICATIONS	06-16-2016
20160171106	WEBPAGE CONTENT STORAGE AND REVIEW	06-16-2016
20160179793	CRAWLING COMPUTER-BASED OBJECTS	06-23-2016
20160179955	Device-Specific Search Results	06-23-2016
20160179960	OPTIMIZING WEB CRAWLING THROUGH WEB PAGE PRUNING	06-23-2016
20160188620	AUTO SUGGESTION IN SEARCH WITH ADDITIONAL PROPERTIES - A computer initializes a configuration specified in an extensible markup language (XML) configuration file. The XML configuration file specifies at least one data source, a dimension to map each item of a plurality of items that include products, product accessories, or product support documents in the at least one data source, and a display priority for each item. Next, the computer reads data from the at least one data source specified in the configuration file. The computer generates an XML dimension hierarchy file for the read data using the configuration file. The XML dimension hierarchy file includes a dimension node for each item. Each dimension node has at least one property attached to each item and at least one synonym that is searchable to index each item. Finally, the computer preprocesses the XML dimension hierarchy file to index the at least one data source.	06-30-2016
20160188716	Crowd-Sourced Crawling - A method includes determining, by a processing device of a user device, whether a set of crawling conditions are net by the user device, and generating, by the processing device a work request in response to the set of crawling conditions being met by the user device. The method also includes transmitting, by the processing device, the work request to a content acquisition server, and receiving, by the processing device, one or more crawling tasks from the content acquisition server. For each crawling task, the method further includes, requesting content from a content server based on information contained in the crawling task, receiving the content from the content server, and transmitting the content to the content acquisition server.	06-30-2016
20160188720	Crowd-Sourced Native Application Crawling - A method for performing crowd-sourced native application crawling is disclosed. The method includes determining a list of installed native applications installed on a user device and determining whether a set of crawling conditions are met. The method includes generating a work request in response to the set of crawling conditions being met by the user device and transmitting the work request to a content acquisition server. The work request includes the list of installed native applications. The method includes receiving a crawling task including an application access mechanism corresponding to a state of a native application. The method include launching the native application and setting the state of the native application based on the application access mechanism. The native application issues a content request to a content server. The method further includes receiving the content from the content server and transmitting the content to the content acquisition server.	06-30-2016
20160196352	POPULARITY OF CONTENT ITEMS	07-07-2016
20160203222	SEARCH METHOD, SEARCH SYSTEM, AND SEARCH ENGINE	07-14-2016
20160203224	SYSTEM FOR ANALYZING SOCIAL MEDIA DATA AND METHOD OF ANALYZING SOCIAL MEDIA DATA USING THE SAME	07-14-2016
20170235839	CONTENT SOURCE DRIVEN RECOMMENDATION FOR GIVEN CONTEXT OF CONTENT DELIVERY AND DISPLAY SYSTEM	08-17-2017
20190146954	HIERARCHICAL SEEDLISTS FOR APPLICATION DATA	05-16-2019
20190147004	HYBRID TASK ASSIGNMENT FOR WEB CRAWLING	05-16-2019
20220138188	GENERIC SCHEDULING - A system and method for customized scheduling of sources, including breaking down a source of content into at least two categories, including posts and engagements, and gathering content related to a specific source. A scheduler handles scheduling of posts and engagement for a single source and entities that are due to be crawled are sent to a scheduling queue, in which each content type for a source can have its own queue. A process points to the correct scheduler queue in order to request content to be crawled, attaches to the proper queue, processes requests, queries the social network for content, parses the response and sends any new data to be saved to the system.	05-05-2022
20220138271	Method, Device and Computer Program for Collecting Data From Multi-Domain - The present invention relates to a method for collecting data from a multi-domain in a data collection device. The method includes a step A of collecting data from a general web that is accessible through a search engine; a step B of collecting data from a dark web site that is not accessible with a general web browser and is accessible with preset specific software; and a step C of standardizing the collected data in a preset format and generating metadata for the collected data.	05-05-2022

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Web crawlers

Subclass of:

707 - Data processing: database and file management or data structures

707705000 - DATABASE AND FILE ACCESS

707706000 - Search engines

Patent class list (only not empty are listed)

Deeper subclasses: