I’d like to talk about methodlogical explorations in ISTIC and in other Chinese orginizations that conduct research in information science./


First, I’ll present what I understand is information science.

Then I’ll describe some information processing methods and tools used in China.

Thirdly, I’ll discuss information analysis for decision support.

Finally, a few words about knowledge organization and extraction./


Information can be translated into Chinese word Xinxi or Qingbao. Xinxi is closer to message in connotation, while qingbao is closer to intelligence. There are always fierce debate on the definition of Qingbao. Personally I love Hsue-Shen Chien’s definition very much: Qingbao is not fixed and dormant documents and materials, but excited and activated knowledge. Hsue-Shen Chien was called the father of China’s missiles. He is also a scholarly master crossing many disciplines, similar to Herbert A Simon./


Information Scientists mainly conduct research on external features of literature (here literature is broadly defined and it may include the Web)  so far  in China. I believe such is also the case in other places./


For instance, when we do citation analysis, we rely on references; when we do co-word analysis, we need to look at the keywords of publications; when we analyze co-authorship, we will draw the field “author” from the database. References, keywords, authorship, all these are the external features of documents./


Interestingly enough, just by analyzing these external features, we can tell a lot of stories. For instance, we are neither specialists in materials science nor specialsts in life science. We are only experts in library and information science, but we could point out the development trend in materials science and life science through a bibliometric approach and those domain specialists do listen to us. Another example that I like to cite very often is that XX and Vaughan Liwen identified the competitive relationship among companies just by co-link analysis. If there is no link at all between the two business link clusters, then these two compnies at the center of each cluster must be competitors! Co-link, again, is an external feature of a site./


The real challenge for information science, however, is to obtain qingbao or intelligence through analysing the internal features of documents. That is to say, through content analysis, text mining, or other approaches.  /


Next, I’ll give a few examples about real intelligence.

RAND corporation told DOD in the early 1950s that China would send troops to Korea to confront the U.S. military. This was a piece of valuable intelligence. But DOD thought this is a ridiculous prediction./


The second example. Daqing Oil Field is the first oil field in China and its location was kept as a secret. Once, People’s Daily published a photo of Wang Jinxi the Iron Man, who was a  national labor model. Through his hat in the picture, the Japanese intelligence experts judged that Daqing Oil Field must in the northen part of China. With the information leaked from this photo as a start, the Japanese intelligence staff gradually build a more and more complete picture and finnaly they determined the geographical location of the oil field correctly./


The third example is Professor Wang Xuan of Peking Universsity. He was the founder of the Founder Group. In the 1970s, he came to our ISTIC library very often and searched and read a lot of documents. He found that if we call the simplest printing through typesetting individual letters as the first generation of print technology, started from Guttenberg, then the second generation of print technology is  the mechanical image setter, which was in popular use in Japan. Next generation or the third generation of print technology is represented by image setters which involve the use of cathode ray tubes. This technology was popular in Europe and America. Prof Wang Xuan decided that he should make a frog leap and jump over the second and third generations of technology, and he wanted to research and develop  the fourth-generation technolgy, which is a laser  photo-typesetting system for Chinese characters. He made a tremendous success! Therefore he won the State Supreme Science and Technology Award. Keep in mind that document search and browsing helped him greatly.  Just imagine that if it were ISTIC information scientists that gave him the clue about choosing laser photo-typesetting system, then the role and the position of information science in the Chinese society would have been totally different!  /


The genetral process of reaching intelligence is like this. We have some information as input , which is mainly the internal features of literature to be analyzed. Then through a screening-identification-selection-judging process, we get intelligence as output. We hava a Chinese saying, 去粗取精去伪存真,or Discard the dross and select the essential. That is what should be realized in the middle process./


So in document analysis, we have both external feature analysis and internal feature analysis, but the anaysis of the internal features is always at the core./


Our current challenge and task in information science is to make content screening and analysis as deeply as possible with the help of modern IT, in addition to our past work mainly based on external features analysis.  /


During the process of internal feature analysis, we need a lot of knowledge nutrition from other disciolines, such as cognitive science, behavior science, psychological science and action research./


Now I turn to the second part. Here I would describe some information processing or analysis tools developed by Chinese information scienctists. /


Exampe 1 : Prof Wu Bin of Beijing University of Posts and Telecommunications, our research partner, has successfully developed a software tool called LiterMiner. It posesses the same functions as TDA of Thomson Reuters, but it has the advantage of easy processing of both English and Chinese texts, and equal treatment of the data from different database venders. For instance, it can deal with Web of Science data or Ei (Elsevier) data without barriers.

Next example is a CI system developed by Mr Zhang Dianyao and his team. Brfore retirement, Mr Zhang was a senior research fellow in the Institute for Aeronautical and Space Information./


This CI syetem has three-fold functions: providing CI on daily basis; making simple qualitative and quantitative analysis; and providing decision support. This system has found a lot of applications. For instance, the Capital Iron and Steel Company uses Mr Zhang’s CI system./


The next example is Threshold 21 China model (or T21 for short) co-developed by Millennium Institute and ISTIC. MI was founded by Dr Gerald Barney, who used to be the Special Assistant to President Jimmy Carter. During President Carter’s term, Dr Barney completed the Global 2000 Report, which has a mile-stone meaning in the history of sustanable development. Besed on this period of experience, he founded MI and devoted his rest of life to the cause of sustainable development. T 21 China model is used as an efficient tool for scenario buiding./


This model has many modules, including economy, agricuture, education, environment, energy and so on. You could add new module or delete the old module as you like./


Basically this is a dynamic model, but each module utilizes the state-of-the-art model in respective sector. For instance, the best population model was adopted in T21./


The model is designed to do what-if analysis. For example, if we keep the investment intensity in environmental protection at the current level, what will be the water quality and air quality like in 2020? The simulation curve would give you a straight answer. If the decision maker could not accept such a horrific environmental quality scenario, then he or she must change the investment profolio right now./


Now I come to my third part and I tell some stories on our information analysis support to decision making. /


The first case is CSTPC database, which is a Chinese version of SCI. Based on the statistics of the database, ISTIC organizes a news reliese meeting in Nov. or early Dec. each year, and this meeting was atteded by more than 1000 people./


Besed on CSTPC, we could summarise the general trend in China’s science, and we could do many other things, such as…

Our database could provide useful information for government S & T managers, institutional S & T managers  as well as editorial board of S & T journals. They contracted ISTIC to conduct specific analysis for them very often. /


If you regard our Chinese Journal Citation Report as a single document, then it must be the most highly-ranked piece ever by ISTIC staff./


This report was utilised widely. Its clients include CAS, CAST and CSFC./


My second example on decision support is so-called S&T Aided Decision System (STADS) designed by Mr WU Guangyin, the chief-engineer of wanfangdata. His basic idea is that there are 5 elements that interest most decision makers: Who are the most distinguishend scientists? What institutions are they affiliated with? In what discplines are they working? What foundation agencies supported their research? Which reesrach fields have become the so-called hot spots? Based on massive data accumulated by Wanfang Data, STADS could answer such questions quickly to some extents. This system is welcome by various comunities. For instance, Jining City of Shandong Province purchased more than a  dozen of STADS and presented them to local high-tech companies./


The next case in decision support is the work of Prof. Zhang Zhixiong, of the Document and Information Center of CAS. He is also one of our research partners. He has made intensive research in recent years on how to monitor and evaluate Web information to meet the needs of decision makers./


The objects that need to be monitered are legion. Here are just a few examples./


Mr Zhang’s monitoring system has many functions. The first function is to identify valuable information from crawled webpages through some sensitive words./


For instance, here is a piece of news that the system judges to be important: A federal agency of USA partners with Canada and Russia to build a counterterrilism training center for the Russian Federation Ministry of Defence./


The second function is to attribute the relevant piece of information to an appropriate category.  /


For instance, this page pools together all news items that are related to important R & D programs or orojects.  /


The third function is to identify Rich Text files after each crawl and then cache the Rich Text files for future use./


This page lists some rich text reports by EU or USA./


The fourth function is to extract key terms and objects from the Web pages./


For example, the system identifies from a report such key terms such as “open innovation”, nutrition agriculture, and so on./


The fifth function is to cluster the web pages in a web site for easy browsing and exploring./


Here is a case in visualised clustering./


The sixth function is the automatic identification of objects to be monitored./


Here, the system decides that SciDev.net is a site that should be monitored closely bucause there are a lot of relevant information there./


The seventh function is to identify important topics in a web site./


For instance, in SciDev, the most salient topics are climate change and developing world./


The last but not least function is to Identify the hot topics in a given period./


You see that at the end of 2010, the hottest topic is “renewable energy”./


My last part would tell you something about the work of Chinese information scientists in knowledge organization and knowledge extraction./


Prof Qiao Xiaodong and his team has developed a prototype of socalled S&T Vocabulary Service System. This system is more coarse-grained than an ontological system but more fine-grained than a thesaurus. Considering that the cost of building ontologies is always huge, we think that S&T Vocabulary Service System is a kind of apprppriate technology at current stage./


Mr Hua Bolin, my PhD student, is making a trial effort in knowledge extraction based on sentence matching and analysis. This project is funded by NSFC. I hope that he will have a chance to report his progress to you in the future.  /


I have no time to present you with a complete picture on methodlogical research of information science in China. I just cited some examples that impress me personally and hope that they provide you with a glimpse of China’s information science research.


Thank you very much for your attention.